In fact, it seams that htsearch results are directories and files where the
searched word is inside the directory or file name.
Ex :
/foo/foo.html
Searched word : foo
Result :
/foo
/foo/foo.html
Is there a way to avoid htsearch to find those directories and files.
Thanks.
Loys.
Gilles
Hello,
I want to check that it is not possible to index a list of changed files
without reindexing all the data.
In fact the situation is that I know that that list of files needs to be
reindexed and I want to do that as fast as possible.
Thanks in advance.
Loys.
--
Geoff Hutchison wrote:
On Tue, 9 Jan 2001, Peterman, Timothy P wrote:
I have a related question. Can I merge more that two
databases at a time?
Not at the moment.
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
To
I am trying to do what I can to aid those with spelling difficulties perform
searches on our web pages.
This was triggered by seeing in the htsearch log that attempts to find
"accomodation" were finding some pages, but not the important ones (where it
is spelt correctly)!
Also this University
Thank you
Yours
Kamel.
- Original Message -
From: "Geoff Hutchison" [EMAIL PROTECTED]
To: "K" [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, January 18, 2001 2:28 PM
Subject: Re: Htdig ?
At 3:14 PM + 1/18/01, K wrote:
Hi,
I've found your e-mail into htdig site.
I
At 3:14 PM + 1/18/01, K wrote:
Hi,
I've found your e-mail into htdig site.
I hope you've got the solution to my pb.
I've got a datbase of url.
I would like to use htdig for a quick search to my database, how can
I do to index all my url. Could I index an acces file for exemple
Thank u
At 1:34 PM + 1/18/01, David Adams wrote:
1)What have other sites done to address this problem? (Spell checking
and correcting our own
Use good fuzzy methods, including the synonym file. We are working on
additional fuzzy matching code, but of course if anyone can come up
with sample
At 11:49 AM -0600 1/17/01, htdighelp wrote:
Is there some way to restart htdig so that is re-reads both the conf
and the url list and just continues on?
Not really. If you use the -l flag, you can kill htdig and it will
write out its progress to a file and re-read it the next time it's
called
At 1:28 PM +0100 1/18/01, Berthold Cogel wrote:
Is it possible to do the merging step twice? Is it possible to 'cascade'
this step?
Oh sure. It's probably most effective to work out some sort of "tree"
of merges if you want to do it efficiently. You just can't merge more
than two in one
According to Loys Masquelier:
In fact, it seams that htsearch results are directories and files where the
searched word is inside the directory or file name.
Ex :
/foo/foo.html
Searched word : foo
Result :
/foo
/foo/foo.html
Is there a way to avoid htsearch to find those directories
According to Geoff Hutchison:
At 1:34 PM + 1/18/01, David Adams wrote:
1)What have other sites done to address this problem? (Spell checking
and correcting our own
Use good fuzzy methods, including the synonym file. We are working on
additional fuzzy matching code, but of course
According to Loys Masquelier:
I want to check that it is not possible to index a list of changed files
without reindexing all the data.
In fact the situation is that I know that that list of files needs to be
reindexed and I want to do that as fast as possible.
You may be out of luck with
According to Pat Lennon:
I have a Linux box with approx 1 gig of html and pdf books. I want to
use htdig for the search engine. I dont want to assume to much
butwill 1 additional gig of hard disk cover the size of the index
database. I figure double may be a safe starting point. Also what
I have a Linux box with approx 1 gig of html and pdf books. I want to
use htdig for the search engine. I dont want to assume to much
butwill 1 additional gig of hard disk cover the size of the index
database. I figure double may be a safe starting point. Also what type
of memory
On Thu, 18 Jan 2001, Gilles Detillieux wrote:
There was talk of adding to the 3.2 code a feature whereby you can tell
htdig not to recheck all the indexed documents, but only check a given
list of URLs. I don't remember if this feature is already in the current
development snapshots.
Yes.
Isthere a way
to configure htdig to be used to just spider and collect pages and documents
without doing any of the index/search related stuff?
Thanks in
advance.
-Mark
Does someone have a step-by-step HOWTO for ht://Dig ? I have downloaded
and installed (configure;make;make install) everything and it is
installed right. I just can't seem to get the bloody thing to WORK!
Or, is there an issue using it on a webserver that uses the Microsoft
FrontPAge
It's version 3.1.5, on SuSE Pro 7.0 kernel 2.2.16. Apache version
1.3.14, PHP 4.0.4 FrontPAge extensions 4.0.4.3
Geordon
Original Message
On 1/18/01, 2:29:14 PM, "Ing. Noel Vargas Baltodano"
[EMAIL PROTECTED] wrote regarding Re: [htdig] HOWTO? setp-by-step?:
Hi Gordon:
First of
Geordon
I suppose that apache is running fine, right?
The next question would be, what exactly do you want to do with htdig? Just
have a search engine for your site or what?
Geordon VanTassle wrote:
It's version 3.1.5, on SuSE Pro 7.0 kernel 2.2.16. Apache version
1.3.14, PHP 4.0.4
I've been able to successfully install (and execute, giving valid results),
the 1/14/01 snapshot of this.
To get this to work, however, I had to invoke the "--without-zlib" option.
I obtained zlib113, uploaded it to the server, decompressed, and attempted to
compile. (I do NOT have authority
Tht is correct: Apache works JUST fine. :) And yes, I just want to have
ht://Dig index my server and use it as a basic search.
I have it installed, like I said, and I can get to the SEARCH page, as
well as enter something in the box and have the CGI run. However, it
doesn't seem to FIND
Just wondering, are there any other tools that will search the style of output that
htdig
results are in?
Mike
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:
On Thu, 18 Jan 2001, Mike Paradis wrote:
Just wondering, are there any other tools that will search the style of output that
htdig
results are in?
I'm not sure what you mean. You make it sound like you want to parse the
output. So yes, there are a variety of PHP, Perl, Java, etc. wrappers to
On Thu, 18 Jan 2001, Geordon VanTassle wrote:
/opt/www which is fine for me. In /opt/www/db there are the files, and
they have "size," so I know that there is something IN them. Who should
When you say it doesn't "find anything." Does it come up with an error?
Have you tried running
I'm working on a site where some pages will be dynamically generated
using java servlets and many static pages will be linked to only through
dynamically generated content, or through javascript.
I'm wondering what the best strategy for implementing htdig on a site
like this would be.
I was
In a site I'm working on, we have some links that are standard HTML
(click on it, current page is replaced in the browser). We also have
some links that run a javascript function which opens a popup window,
leaving the current browser window open as well.
Of course, since the javascript links
okay, here's one for the gurus.
I'd like to be able to preserve user state, which is held in the query string.
So my idea is to return just the urls from a search that match the state
of the user. Basically, we have a ?lang=en or ?lang=fr, and since many
of our pages are not translated yet,
Ok, I took a look at it again, and everything can be read by the
Webserver. When I run the htsearch from either the command line OR the
HTML interface, it comes back and says that there were no matches found.
Now, granted, there's not a lot on my site at this point, but when I
searched
Geordon
You have to edit the htdig.conf file according to your directory structure
(start url, etc.) and your needs.
Then you run htdig and htmerge. Try running htdig with the -i option to
re-do the database from scratch and the -vvv option to see exactly what it
is indexing.
--
Noel Vargas
I have an existing, working htdig database, and I wanted to add more data to it
that's indexed from another URL, using a different configuration file. The
reason I used a different configuration file is because I want to run the two
indexing runs at different times, using a cron job. Anyway, what
30 matches
Mail list logo