Yes, "25%" shows up in the output of the parser. I searched for a word
near an instance of it in a document, and the long results print out
the "25%" too. Any other ideas?
Phil
On Tue, 9 Jan 2001, Geoff Hutchison wrote:
: At 1:52 PM -0500 1/9/01, Philip E. Varner wrote:
: So, I'm guessing
At this stage it is not so much you being given ideas as you supplying
enough information.
What parser are you using? Are you using it directly or via a script such
as parsedoc or doc2html.
You say "25%" occurs in the parser O/P. Is that the output direct from
pdftotext, or from doc2html or
I figured out the problem and solution. In case anyone else has this
problem in the future, here are a few of the gotchas.
A Description of My Original Problem
We have a bunch of PDF files that are the minutes from committee meetings.
The members want to be able to search them, so htdig was
I would say abort the process and start again with the -v option to
see what it is doing. It sounds like you need to set some max_hops to
keep it from crawling out onto the web itself.
HTH,
Doug
--
| Information Systems Coordinator
| Monical Pizza Corporation
| http://www.monicals.com
| - -
Title: htdig
Hi,
I'm using htdig in my company's website and the htdig search engine does not work because of a corrupted database, I started the rundig script to re-index the database and the process has been running for more than one week. Any clue on how to resolve this problem or what
Hello,
I am using version htdig-3.1.5. I have already
ran rundig onceon my site successfully as a test, I am now trying to
re-index the site because of updates to the site and other configuration changes
made to the htdig.conf file. However when I run htdig," ./htdig -c
/my/conf/file/path/
Maybe you have a link to htdig.org somewhere in your web
content? Check your "limits_urls_to" attribute to see if you are
restricting the digging to the domains you want to index.
Tracey Guzouskas wrote:
Hello,
I am using version htdig-3.1.5. I have already ran rundig once on my
site
Tim,
thanks for the quick response. I have 'limit_urls_to: " set to ${start_url},
which is pointing to my start.url file and there is no reference to
htdig.org anywhere on the site. When I run htdig or rundig they both start
right off on the htdig.org site. Is there anywhere else beside the conf
According to Philip E. Varner:
1) The directive minimum_word_length defaults to 3, but when dealing with
two-digit numbers, this should be set to two. The default would catch
"25%", but not other numbers. This needs to be set in htdig.conf, AND in
parse_doc.pl, if using it. parse_doc.pl
According to Geoff Hutchison:
At 10:43 AM -0800 1/9/01, [EMAIL PROTECTED] wrote:
1) Are we right in the assumptions we're making above (the temp
files are being destroyed and are thus not available during
indexing) and
If you are not specifying the -a flag to htdig/htmerge then it will
A couple questions:
1) The file you're referring to is rundig.sh on
http://www.htdig.org/contrib/ (right?)
2) Does the file have to be modified for my system or can I use it
as is? (I know, dumb question)
3) Does the file go in bin/rundig or in cgi-bin/rundig?
Thanks in advance!
Steve
According to [EMAIL PROTECTED]:
A couple questions:
1) The file you're referring to is rundig.sh on
http://www.htdig.org/contrib/ (right?)
Yes, it's the Scripts sub-section of that part of the web site, which
actually takes you to the http://www.htdig.org/files/contrib/scripts/
Hi
I've been asked to setup a search engine for a company that I am doing
CGI programming for. I would normally just use a perl based search for
a site this size, but they want lots of of functionality (and I really
like ht:/dig). I think ht:/dig would be perfect for what they want,
Hi Gilles,
When you wrote:
"Only
the one on the contrib section of the FTP site and web site is
current."
You were referring to rundig.sh at http://www.htdig.org/contrib/ -
-- right? That's the one I should use? (As Geoff suggested?)
Thank you!
Steve
*
On 10 Jan 2001, at
At 2:18 PM -0500 1/10/01, Chuck Umeh wrote:
I'm using htdig in my company's website and the htdig search engine
does not work because of a corrupted database, I started the rundig
script to re-index the database and the process has been running for
more than one week. Any clue on how to
At 8:38 PM -0800 1/10/01, Carlos Ramirez wrote:
Just as long as you have access to a c++ compiler you should be able build it.
And you have some way of running the htsearch CGI as a "normal user."
On many web servers, CGIs must be installed by root. YMMV.
--
-Geoff Hutchison
Williams Students
16 matches
Mail list logo