Re: [htdig] PDFs, numbers, and percent signs

2001-01-10 Thread Philip E. Varner
Yes, "25%" shows up in the output of the parser. I searched for a word near an instance of it in a document, and the long results print out the "25%" too. Any other ideas? Phil On Tue, 9 Jan 2001, Geoff Hutchison wrote: : At 1:52 PM -0500 1/9/01, Philip E. Varner wrote: : So, I'm guessing

Re: [htdig] PDFs, numbers, and percent signs

2001-01-10 Thread David Adams
At this stage it is not so much you being given ideas as you supplying enough information. What parser are you using? Are you using it directly or via a script such as parsedoc or doc2html. You say "25%" occurs in the parser O/P. Is that the output direct from pdftotext, or from doc2html or

Re: [htdig] PDFs, numbers, and percent signs

2001-01-10 Thread Philip E. Varner
I figured out the problem and solution. In case anyone else has this problem in the future, here are a few of the gotchas. A Description of My Original Problem We have a bunch of PDF files that are the minutes from committee meetings. The members want to be able to search them, so htdig was

Re: [htdig] htdig

2001-01-10 Thread Douglas S. Davis
I would say abort the process and start again with the -v option to see what it is doing. It sounds like you need to set some max_hops to keep it from crawling out onto the web itself. HTH, Doug -- | Information Systems Coordinator | Monical Pizza Corporation | http://www.monicals.com | - -

[htdig] htdig

2001-01-10 Thread Chuck Umeh
Title: htdig Hi, I'm using htdig in my company's website and the htdig search engine does not work because of a corrupted database, I started the rundig script to re-index the database and the process has been running for more than one week. Any clue on how to resolve this problem or what

[htdig] indexing htdig.org

2001-01-10 Thread Tracey Guzouskas
Hello, I am using version htdig-3.1.5. I have already ran rundig onceon my site successfully as a test, I am now trying to re-index the site because of updates to the site and other configuration changes made to the htdig.conf file. However when I run htdig," ./htdig -c /my/conf/file/path/

Re: [htdig] indexing htdig.org

2001-01-10 Thread Peterman, Timothy P
Maybe you have a link to htdig.org somewhere in your web content? Check your "limits_urls_to" attribute to see if you are restricting the digging to the domains you want to index. Tracey Guzouskas wrote: Hello, I am using version htdig-3.1.5. I have already ran rundig once on my site

Re: [htdig] indexing htdig.org

2001-01-10 Thread Tracey Guzouskas
Tim, thanks for the quick response. I have 'limit_urls_to: " set to ${start_url}, which is pointing to my start.url file and there is no reference to htdig.org anywhere on the site. When I run htdig or rundig they both start right off on the htdig.org site. Is there anywhere else beside the conf

Re: [htdig] PDFs, numbers, and percent signs

2001-01-10 Thread Gilles Detillieux
According to Philip E. Varner: 1) The directive minimum_word_length defaults to 3, but when dealing with two-digit numbers, this should be set to two. The default would catch "25%", but not other numbers. This needs to be set in htdig.conf, AND in parse_doc.pl, if using it. parse_doc.pl

Re: [htdig] keep temp files while running indexer? How to...

2001-01-10 Thread Gilles Detillieux
According to Geoff Hutchison: At 10:43 AM -0800 1/9/01, [EMAIL PROTECTED] wrote: 1) Are we right in the assumptions we're making above (the temp files are being destroyed and are thus not available during indexing) and If you are not specifying the -a flag to htdig/htmerge then it will

Re: [htdig] keep temp files while running indexer? How to...

2001-01-10 Thread smurray3
A couple questions: 1) The file you're referring to is rundig.sh on http://www.htdig.org/contrib/ (right?) 2) Does the file have to be modified for my system or can I use it as is? (I know, dumb question) 3) Does the file go in bin/rundig or in cgi-bin/rundig? Thanks in advance! Steve

Re: [htdig] keep temp files while running indexer? How to...

2001-01-10 Thread Gilles Detillieux
According to [EMAIL PROTECTED]: A couple questions: 1) The file you're referring to is rundig.sh on http://www.htdig.org/contrib/ (right?) Yes, it's the Scripts sub-section of that part of the web site, which actually takes you to the http://www.htdig.org/files/contrib/scripts/

[htdig] installing as a user..... possible?

2001-01-10 Thread Clint Gilders
Hi I've been asked to setup a search engine for a company that I am doing CGI programming for. I would normally just use a perl based search for a site this size, but they want lots of of functionality (and I really like ht:/dig). I think ht:/dig would be perfect for what they want,

Re: [htdig] keep temp files while running indexer? How to...

2001-01-10 Thread Stephen Murray
Hi Gilles, When you wrote: "Only the one on the contrib section of the FTP site and web site is current." You were referring to rundig.sh at http://www.htdig.org/contrib/ - -- right? That's the one I should use? (As Geoff suggested?) Thank you! Steve * On 10 Jan 2001, at

Re: [htdig] htdig

2001-01-10 Thread Geoff Hutchison
At 2:18 PM -0500 1/10/01, Chuck Umeh wrote: I'm using htdig in my company's website and the htdig search engine does not work because of a corrupted database, I started the rundig script to re-index the database and the process has been running for more than one week. Any clue on how to

Re: [htdig] installing as a user..... possible?

2001-01-10 Thread Geoff Hutchison
At 8:38 PM -0800 1/10/01, Carlos Ramirez wrote: Just as long as you have access to a c++ compiler you should be able build it. And you have some way of running the htsearch CGI as a "normal user." On many web servers, CGIs must be installed by root. YMMV. -- -Geoff Hutchison Williams Students