Re: [htdig] 3.1.5 -- Limiting Duration / Resources for HTDIG

2000-05-24 Thread Gilles Detillieux
According to [EMAIL PROTECTED]: > I am running into some situations where HTDIG appears to be going into an > infinite loop. The pattern is that it either does, or does not, occur when > searching a specific website; re-attempting the HTDIG--against the same > site--usually produces the same r

Re: [htdig] Problem with content-type text/html; charset=SOMETHING

2000-05-24 Thread Gilles Detillieux
According to Gordon Harty: > I have some pages that I want to index that have something like: > > > > or > > > > In both cases these files do not get indexed into the search database. > I would like to have all pages that have a content type that begins > with "text/html" to be indexed by th

Re: [htdig] Replacing db files?

2000-05-24 Thread Gilles Detillieux
According to David Sklar: > I will definitely transfer the database files to the search machines using > different filenames (but on the same partition, so moving them will be quick), > I was just wondering if I should do anything special to htsearch to tell it > not to do searches for the few sec

Re: [htdig] the mysterious "Deleted, no excerpt" problem

2000-05-23 Thread Gilles Detillieux
According to Patrick Robinson: > I don't know why BBEdit might have strewn text files with nulls, but I'm > also not sure why htdig can't read those files. But I might suspect that > there's a null-terminated string that contains the document. In my case, > there was typically a null as the firs

Re: [htdig] htsearch doesn't work

2000-05-23 Thread Gilles Detillieux
According to Sam Xie: > Hi! I just installed htdig/3.1.5 onto my machine(Solaris 2.6). The rundig works > fine. However, whenever I submitted a search, I got a error message said, > Internal Server Error > The server encountered an internal error or misconfiguration and was unable to > c

Re: [htdig] Weird endings problem

2000-05-22 Thread Gilles Detillieux
According to Alexey Rodriguez: > On Fri, 19 May 2000, Geoff Hutchison wrote: > > At 4:02 PM -0500 5/19/00, Gilles Detillieux wrote: > > >much out. Just a simple guess, though: did you try removing any old > > >word2root.db or root2word.db files before running htfuzzy

Re: [htdig] Security and access for privat websites

2000-05-22 Thread Gilles Detillieux
According to Andreas Vogt: > Now, I set up htdig with two different confs. So public parts can be > searched by htdig, and also private parts by different databases. > > The private search.html is protected by .htaccess and "require user...". > > But as /cgi-bin/htserach is executable by any

Re: [htdig] Indexing binary files by filename

2000-05-20 Thread Gilles Detillieux
According to Geoff Hutchison: > At 4:56 PM +0100 5/19/00, Darrell Berry wrote: > >"Indexing binary files by filename (simply need to write a minimal parser > >for this)" > > > >its on the todo list---can i cast my vote for it happening soon? we have a > >site which is about 50% text documents and

Re: [htdig] Stability of beta, and a couple newbie questions

2000-05-19 Thread Gilles Detillieux
According to Joe Sanderson: > I've been using the 3.2.0b2 beta for the last two weeks, and have found > it to be quite stable. I need to make the decision about using this > beta as a site search engine, or to use the latest 3.1 released > version. I'm looking for input on the stability of this

Re: [htdig] multiple "documents" in one file?

2000-05-19 Thread Gilles Detillieux
According to David Sklar: > I am attempting to use htdig to index a large number (~100,000) files each of > which are pretty small (~500 bytes). Running htdig -vv and using strace seems > to indicate that htdig is spending most of its time opening and closing these > files, and not actually doing

Re: [htdig] maximum_pages

2000-05-19 Thread Gilles Detillieux
According to Jim Cole: > Hi - Mainly a curiosity question here. Just had a user ask why they > couldn't go past the first ten pages of results. I know this can be bumped > up via maximum_pages, and that the default is 10. But I was wondering if > there was actually a reason for the default limit o

Re: [htdig] htmerge: Deleted, no excerpt problem

2000-05-19 Thread Gilles Detillieux
According to Andre Dalle: > Chunks of our web site are failing to index due to being > dropped by htmerge. ... > I have checked the mailing list archives, and am sure the usual > suggested problems are not at fault.. > > - robots.txt does not exclude the file (htdig should have never indexed > it

Re: [htdig] How-to Suppress HtSearch PageList Output in htdig-3.2.0b2.

2000-05-19 Thread Gilles Detillieux
According to Jeff Hill: > HtSearch in htdig-3.2.0b2 appears to put a pagelist at the end of its > output differently than htdig-3.1.5 > > I'm using php as a wrapper, and can't seem to surppress the page list. > This wouldn't be so bad, except that the page list is incorrect (due to > the PHP wrap

Re: [htdig] Weird endings problem

2000-05-19 Thread Gilles Detillieux
According to Alexey Rodriguez: > Then i run htdig, after that i run htfuzzy endings and i had > several "DB2 problem...: missing or empty key value specified" while it > was building root2word and word2root databases. > It is maybe a broken spanish.aff > I am checking the source

Re: [htdig] How do I gain control over links inside excerpts?

2000-05-18 Thread Gilles Detillieux
According to Søren Hermansen: > Wow! a personal answer. Thank you. We haven't found a program that could answer questions on this list. :-) > How do I make an answer show up in the mail thread on htdig.com? You should put [EMAIL PROTECTED] on the Cc: list. This happens automatically with a re

Re: [htdig] Error when linking on Alpha server

2000-05-18 Thread Gilles Detillieux
According to Eric Litot: > Sorry disturbing you again but my trouble remains and all I tried failed. > For example, I have modified the Makefile.config file in adding the -lc > library reference, then the -lresolv one, without any success. > What is strange is that I already compiled the 3.1.5 htd

Re: [htdig] SORT and locale

2000-05-17 Thread Gilles Detillieux
According to "NEPOTE Charles (Neuilly Gestion)": > I understand that ht://Dig is based on the Gnu sort command. Not necessarily just GNU's sort. Any sort program that treats all characters as unique, and sorts based on the character's binary encoding, should do the job. This is the default beha

Re: [htdig] RE: [Cooker] SORT and locale

2000-05-17 Thread Gilles Detillieux
According to "NEPOTE Charles (Neuilly Gestion)": > > On Wed, May 17, 2000 at 01:45:12PM +0200, NEPOTE Charles > > (Neuilly Gestion) wrote: > > > > > With Mandrake 7.0 or a cooker installed in french. > > > > > > when I : > > > > > > "sort db.wordlist" > > > > > I obtain : > > > > > tue i:1 > > >

Re: [htdig] htmerge - error closing file

2000-05-17 Thread Gilles Detillieux
According to [EMAIL PROTECTED]: > I am running htmerge and keep getting "word sort: failed" error. I > have gone through all FAQ's and relevant mail, done all modifications > but now I get one of two errors (1)write error (Input/ouput error) or (2)bin/sort: >error closing > file. When I du -k my

Re: [htdig] Endings databases of two languages

2000-05-17 Thread Gilles Detillieux
According to Andreas Hudzieczek: > Therefore, I am now looking for a possibility to sort of have two > "endings databases". > Do I need a specific english.0 file, although regular english indexing > return good endings, whenever I include a secondary language beside > english? You can certainly h

Re: [htdig] How do I gain control over links inside excerpts?

2000-05-17 Thread Gilles Detillieux
According to Søren Hermansen: > I would like to gain control over a link inside an excerpt. > > What I mean is: If the following result of a search is e.g.: > > Programming Language Support Reference Manual Chapter 9 > ... , sdb provides a mechanism for examining the source text. > Procedur

Re: [htdig] Accent problem.

2000-05-16 Thread Gilles Detillieux
According to "NEPOTE Charles (Neuilly Gestion)": > I made very serious test : with only 7 documents, always regenerating the > database from scratch to prevent corruption problems of the database ; to do > so I used : > time rundig -v -s -a -c /etc/htdig/htdig.essai.conf|tee > /var/lib/htdig/essai

Re: [htdig] A Suggestion on Accents

2000-05-16 Thread Gilles Detillieux
According to [EMAIL PROTECTED]: > > >Rather than a fuzzy accents search method, why not make the htdig database > > >accent independent? After all, it is case independent already! > > >For example: > > > > > >Garçon -> Garçon -> garçon -> garcon > > > > I would make the analogy to wor

Re: [htdig] date info in db

2000-05-15 Thread Gilles Detillieux
According to [EMAIL PROTECTED]: > I'm using htdig to index/search a database of .pdf files. It was all set up > recently, and all of the files were uploaded at once, so when a search is > done all of the results list a date of 29-Apr-00, although the documents were > created on different dates

Re: [htdig] 2 questions:   and bad_words

2000-05-15 Thread Gilles Detillieux
According to "NEPOTE Charles (Neuilly Gestion)": > According to Gilles Detilleux: > > > According to "NEPOTE Charles (Neuilly Gestion)": > > > I have the same problem using a french locale (fr_FR), on a Linux > > > Mandrake 7.0 box. > > > As a newbie I won't hack the code... I am interested by Gi

Re: [htdig] Accent problem.

2000-05-15 Thread Gilles Detillieux
According to "NEPOTE Charles (Neuilly Gestion)": > I am searching to solve some problems in ht://Dig 3.1.5. > > I tested and reproduce that : > > If : > -- more than one html file contains : both words "tué" and "tue" per file ; > -- or an html files contains the word "tue" and the html which

Re: [htdig] Indexing news articles ?

2000-05-15 Thread Gilles Detillieux
According to Vincent Royer: > As you can see above, there's an index.html file containing > relatives links to news articles. The index.html page is > correctly indexed but none of the articles. Moreover, apache > use the MIME type message/news when news articles > are browsed. Any idea ? Yes, if

[htdig] Re: 3.2.0b2 - problem with either no stars, or infinite loop writing out (PR#846)

2000-05-15 Thread Gilles Detillieux
star generation. Date: Wed, 03 May 2000 17:56:50 -0400 From: "Terry Luedtke" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Subject: Re: [htdig3-dev] Too many stars Gilles Detillieux <[EMAIL PROTECTED]> 03-May-00 17:24:00 >>> >Accord

Re: [htdig] Attention RPM users - quick poll (slightly off)

2000-05-12 Thread Gilles Detillieux
Thanks for responding. According to Stephen L Arnold: > I didn't use the htdig RPM, precisely because I've had problems such > as you describe with other packages (and not because I tried to install > the wrong RPM). > > I think the state of glibc, the linux kernel, support for libc5, etc, > is

Re: [htdig] htsearch won't execute. How do I....

2000-05-12 Thread Gilles Detillieux
According to James McLaughlin: > Ok here is the lay out of the CGI-BIN on my box > >>drwxr-xr-x 2 root root 4096 May 12 10:01 ./ > >>drwxr-xr-x 7 root root 4096 May 9 11:04 ../ > >>-rwxr-xr-x 1 root root 404568 Mar 22 19:19 htsearch* ... > The requested URL

Re: [htdig] Error when executing rundig (with xpdf-0.90 package)

2000-05-12 Thread Gilles Detillieux
According to [EMAIL PROTECTED]: > Receiving this when executing rundig > Error (11663): Illegal character <2f> in hex string > Error (11663): Illegal character <29> in hex string > Error: Unterminated hex string > Error: Unknown operator 'TeTeT65.' > Error: Dictionary key must be a name object > E

Re: [htdig] re: parsing stuff

2000-05-12 Thread Gilles Detillieux
According to gil cohen: > Okay, I wrote the perl program to the _exact_ specifications as it should > be, I even tested it and it seems to work perfectly. I put in the config > file just the way it should be, and it crashes. It brings down the whole > machine with it. Following the specificatio

Re: [htdig] how to exclude users' psges ?

2000-05-12 Thread Gilles Detillieux
According to Gerard GACHELIN: > I don't want to index users' pages on our web site. > Of course something like /~*/ on the exclude_urls line in htdig.conf doesn't > work. > > -- > ** > * Gerard GACHELIN e-mail : [EMAIL PR

Re: [htdig] re: parsing stuff

2000-05-11 Thread Gilles Detillieux
According to gil cohen: > Okay, here's a proram I wrote: > > - > cat $1|tr -d '\12'|sed -e 's/.*//' -e 's/<\/title>.*//' >> /test > echo "Content-Type: text/html" > echo '' > cat $1 > - > > Then, I put the following in the config file: > text/html->text/html "sh /RIDOF.sh

Re: [htdig] Compiling Help in Mandrake 7.0

2000-05-11 Thread Gilles Detillieux
According to James McLaughlin: > checking for c++... c++ > checking whether the C++ compiler (c++ ) works... no > configure: error: installation or configuration problem: C++ compiler cannot > create executables. > > I sent this to some other mailing lists and was told to: "Try reinstalling > th

Re: [htdig] htdig-3.1.5 +prune_parent_dir_href patch version 0.0

2000-05-11 Thread Gilles Detillieux
According to Peter L. Peres: > On Thu, 11 May 2000, Gilles Detillieux wrote: > >indexing to /doc/javadoc, though, why not just set limit_urls_to > >to http://my.host.domain/doc/javadoc and not add anything else to ... > Their documentation part is linked to from a large page un

Re: [htdig] BAD TAG IN SERIALIZED DATA: 110 and DB2 error messages

2000-05-11 Thread Gilles Detillieux
According to Alain FORCIOLI: > We're running HTDIG 3.1.5 on a ix86 Redhat 6.1. > Htdig currently indexes about 5 documents (HTML and PDF) > (http://www.unesco.org/). > > > A Cron job make an incremental indexation every day (from Monday to > Saturday). Sunday is a special day where an initia

Re: [htdig] 2 questions:   and bad_words

2000-05-11 Thread Gilles Detillieux
According to "NEPOTE Charles (Neuilly Gestion)": > I have the same problem using a french locale (fr_FR), on a Linux > Mandrake 7.0 box. > As a newbie I won't hack the code... I am interested by Gille's > solution. Is > it possible to simply remap ascii char 160 to ascii char 20. What are > the fi

Re: [htdig] indexing a bi-lingual site

2000-05-11 Thread Gilles Detillieux
According to Gerard GACHELIN: > I'd like to index a bilingual site (french and english) with htdig 3.1.5. > > english and french data are mixed. > > What is the best way to do this ? Indexing the site should be easy, as long as your system supports locales correctly. Set your locale in htdig.c

Re: [htdig] htdig-3.1.5 +prune_parent_dir_href patch version 0.0

2000-05-11 Thread Gilles Detillieux
According to Peter L. Peres: > the htdig will reap the first URL from an Apache index, and push it first. > The next document indexed on that server, will be the parent directory. > Thus, not only did htdig index almost everything (by climbing the parent > links first), but it got to index the pla

[htdig] Attention RPM users - quick poll

2000-05-10 Thread Gilles Detillieux
ufficient. e.g. htdig-3.1.3-2glibc.i386.rpm 1a) If the command "rpm -qi htdig" does not list Gilles Detillieux as the Packager, who is listed? 1b) If the command "rpm -qi htdig" does not list a host in the scrc.umanitoba.ca domain as Build Host, what is listed?

Re: [htdig] unknown locale, but really strange, not the normal problem

2000-05-10 Thread Gilles Detillieux
According to Albert Kasper: > It is RedHat 6.1 on my side, too, sorry for the typo. > The line is exactly > "locale: de_DE" > > (no trailing spaces). > > Maybe there is some special place where the line has to be inserted (e.g. > first ilne, below a certain command, ...)? Based on some of the p

Re: [htdig] how does the dig the ranking ?

2000-05-10 Thread Gilles Detillieux
According to [EMAIL PROTECTED]: > Once Again: Can anyone tell me how the ranking is calculated by the > dig-algorithm? is there a formula? does it matter at which position in the > doc the search-term is found? In the 3.1.x series, position does matter. Words at the top of the document are ranke

[htdig] duplicate e-mail messages

2000-05-10 Thread Gilles Detillieux
Hi, folks. Is anyone else on the list getting duplicate messages from the mailing list being resent to them from [EMAIL PROTECTED], or is it just me? It seems that whenever my address is on the To: or Cc: list, I get 3 copies of the message - 1 directly to me, 1 from the [EMAIL PROTECTED] mailin

Re: [htdig] unknown locale, but really strange, not the normal problem

2000-05-10 Thread Gilles Detillieux
According to Albert Kasper: > It is RedHat 6.1 on my side, too, sorry for the typo. > The line is exactly > "locale: de_DE" > > (no trailing spaces). > > Maybe there is some special place where the line has to be inserted (e.g. > first ilne, below a certain command, ...)? Shouldn't matter. Cou

Re: [htdig] htdig-3.1.5 +prune_parent_dir_href patch version 0.0

2000-05-10 Thread Gilles Detillieux
According to Geoff Hutchison: > Host issues are a bit tricky. Someone proposed an "signature" method for > adding server aliases, but it also has not been tried. This would probably > need to be an option since it might misidentify "duplicate" servers. Yes, I think all duplicate elimination algor

Re: [htdig] htdig-3.1.5 +prune_parent_dir_href patch version 0.0

2000-05-10 Thread Gilles Detillieux
According to Peter L. Peres: > wrt test case etc: I'll try to describe what happened and how I got the > idea: > > 1 week ago I was indexing and I caught htdig looping in a directory called > /usr/doc/javadoc. The /usr/doc directory is soft-linked under > DocumentRoot, that's how htdig got there.

Re: [htdig] htdig-3.1.5 +prune_parent_dir_href patch version 0.0

2000-05-09 Thread Gilles Detillieux
According to Peter L. Peres: > + 1.1 The problem: > + > + When on an open system (ex: Linux) used on an intranet (no direct connection > + to the Internet), documentation is added to the HTML DocumentRoot tree, by > + adding symbolic links to the documentation under the DocumentRoot, and htdig > +

Re: [htdig] unknown locale, but really strange, not the normal problem

2000-05-09 Thread Gilles Detillieux
According to Albert Kasper: > Well, it worked for de_DE (same does for en_US e.g.). > > cut--- > 249 0xF9: ù -al-n--gt---? > 250 0xFA: ú -al-n--gt---? > 251 0xFB: û -al-n--gt---? > cut--- > > All other characters you mentioned are that way, too. > Config file for htdig is ok, too (the

Re: [htdig] Question about htdig

2000-05-09 Thread Gilles Detillieux
According to Celeste: > I work for the Division of Continuing Studies of the University of > Wisconsin-Madison. We are interested in adding search to our site(s). > Does your program work on a IIS server? Also, your information refers to > a database. Do we have to have a database in order to use

Re: [htdig] Q: htdig-3.1.5 & allow_in_form feature

2000-05-09 Thread Gilles Detillieux
According to [EMAIL PROTECTED]: > i have some trouble to get the feature "allow_in_form search_algorithm" working. > I would likely use this feature, so users can choose different search algorithms > from a drop-down menu within their html search form. > > Is this feature not yet implemented in h

Re: [htdig] How to manage infinite-loop conditions in Htsearch. (3.1.5)

2000-05-09 Thread Gilles Detillieux
According to [EMAIL PROTECTED]: > Appears that, in real world, htsearch 3.1.5 will from time to time loop; due > basically to configuration file not set up to deal with actual conditions at > searched web site(s). > > Does Unix have any ability to limit elapsed time (and/or disk space) used b

Re: [htdig] unknown locale, but really strange, not the normal problem

2000-05-08 Thread Gilles Detillieux
> - Original Message - > From: "Gilles Detillieux" <[EMAIL PROTECTED]> > To: "Albert Kasper" <[EMAIL PROTECTED]> > Cc: <[EMAIL PROTECTED]> > Sent: Friday, May 05, 2000 11:28 PM > Subject: Re: [htdig] unknown locale, but really strange

Re: [htdig] Portuguese

2000-05-05 Thread Gilles Detillieux
According to Rodrigo Luiz Anami: > Look the output of your program. It means that my locale doesn't work, does > it ? > > 0 0x00: ^@ A-c-- ... > 192 0xC0: --c-- ... > 255 0xFF: ~? --c-- Yes, characters 192 through 255 (except 215 & 247) should be accented letters, but th

Re: [htdig] unknown locale, but really strange, not the normal problem

2000-05-05 Thread Gilles Detillieux
According to Albert Kasper: > I'm using htdig on my site and want to enable search for umlauts. > System is Red Hat Linux 6.2 > > Directory /usr/share/locale exists, sub-Directory "de_DE", too. > - > ls /usr/share/locale/de_DE > LC_COLLATE LC_CTYPE LC_MESSAGES LC_MONETARY LC_NUMERIC LC_T

Re: [htdig] Portuguese

2000-05-05 Thread Gilles Detillieux
According to Rodrigo Luiz Anami: > Anyone has success in configuring htdig with other language like portuguese ? > I have installed htdig on a slackware linux machine for support portuguese > language, but when I look into db.wordlist after running htdig It don't > have some accented word. For e

Re: [htdig] Suse 6.2 + symlinks + htdig 3.1.5

2000-05-05 Thread Gilles Detillieux
According to Peter L. Peres: > I will get my act together and post complete docs next time. And the > patch. Btw, the patch only affects the Retriever class in htdig. I have > verified that the patched build htsearch and the unpatched htsearch are > identical by binary diff (only the date etc are

Re: [htdig] Suse 6.2 + symlinks + htdig 3.1.5

2000-05-05 Thread Gilles Detillieux
According to Peter L. Peres: > looked in the archives and run htsearch manually. Here is my output: > > root@plp5:/usr/local/httpd/cgi-bin$ ./htsearch -vvvmail -s htdig ^ Huh? > I thought 3.1.5 is production code. No ? > > Please

Re: [htdig] patch for boolean keywords

2000-05-05 Thread Gilles Detillieux
According to Alexey Rodriguez: > I am releasing a patch for choosing boolean keywords from > configuration file. The attribute should look like: > > boolean_keywords: and or not > > If you don't put anything it doesn't matter what is the default > value. For example i use: > >

Re: [htdig] Problems compiling / using htdig with Mandrake 6.1

2000-05-04 Thread Gilles Detillieux
According to David Robley: > This is a wild guess - but is it possible that rundig has somehow > acquired DOS end of line characters? I've known this to do weird things > in shell scripts. > > Open it with vi and see if there are ^M characters at the end of each > line - if so, delete them and tr

Re: [htdig] Search in pdf documents

2000-05-04 Thread Gilles Detillieux
According to Andoni Ayala: > Works fine the search of accented words in html files, but, in .pdf > files not work fine. > > Example: > > in .pdf file: petición > > but when i run "rundig" it save in db.wordlist the word "petici" > > ¿where are my mistake? HTML files will generally use ISO-885

Re: [htdig] boolean keywords

2000-05-04 Thread Gilles Detillieux
According to Alexey Rodriguez: > Hi guys, i need to use spanish boolean keywords. I browsed the faq > and the source code and found that there is no customization for this > through htdig.conf. I think that changes for this are trivial but i would > like to know first if there is already a p

Re: [htdig] Problems compiling / using htdig with Mandrake 6.1

2000-05-03 Thread Gilles Detillieux
According to Flos: > I'm a French Linux newbie so please excuse the English mistakes. > > I am trying to install htdig (from htdig-3.1.5.tar.gz) on a server with > Mandrake 6.1. The ./configure, make and make install steps were excutes > without errors. > > However, when I try to run rundig (usi

Re: [htdig] Suse 6.2 + htdig 3.1.5: looping again

2000-05-02 Thread Gilles Detillieux
According to Peter L. Peres: > On Tue, 2 May 2000, Gilles Detillieux wrote: > >What do you consider to be the "normal HTML tree"? Are you referring to > >a certain subsetof your whole web site, which is all you want to index? ^ (dropped space) &

Re: [htdig] Suse 6.2 + htdig 3.1.5

2000-05-02 Thread Gilles Detillieux
According to Peter L. Peres: > wrt: looping, and indexing only the offending directory. > > It seems that I have made a logical mistake, but I think that there is a > missing feature in htdig. Apache obviously generates an absolute link to > the parent directory in any index. Since some directori

Re: [htdig] patch for Accents fuzzy algorithm for 3.2.0b2

2000-05-02 Thread Gilles Detillieux
This is an adaptation for 3.2.0b2 of Robert Marchand's latest fix to his accents fuzzy match algorithm. You should be able to apply this patch in the main source directory of the htdig-3.2.0b2 source tree with "patch -p1 < this_file". Robert's fix changed the algorithm to avoid putting the key a

Re: [htdig] patch for Accents fuzzy algorithm for 3.1.5

2000-05-02 Thread Gilles Detillieux
he htfuzzy directory to apply it. > It is to be applied after the last patch posted by Gilles Detillieux. Your patch seems to have been mangled a bit by your mailer, plus it seems to contain tabs where the source files you sent previously had spaces, so I'm guessing the earlier files go

Re: [htdig] Suse 6.2 + htdig 3.1.5: looping again

2000-05-02 Thread Gilles Detillieux
According to Peter L. Peres: > On Mon, 1 May 2000, Gilles Detillieux wrote: > >It's been said time and time again on this list, but I'll repeat it. > >htdig DOES keep track ofvisited URLs, and does NOT re-index any page > >with a unique URL more than once per in

Re: [htdig] Suse 6.2 + htdig 3.1.5: looping again

2000-05-01 Thread Gilles Detillieux
According to Peter L. Peres: > I have run htdig with various options and have settled down to a > compression of 3 -l and a few other things. Performance and database size > are much better than before, but it still tries to re-index the whole site > after a while, i.e. it loops. > The site i

Re: [htdig] Meta Tags

2000-04-27 Thread Gilles Detillieux
According to Vishal Shah: > its ignoring the meta tags, but the confusion now is that the results are > still the same. > what could I be missing ? If you're using 3.1.x, you need to reindex from scratch. > > -Original Message----- > > From: Gilles Detillieu

Re: [htdig] Meta Tags

2000-04-27 Thread Gilles Detillieux
According to Vishal Shah: > I am indexing a site with meta tags which are repeated on all the pages. Is > there a way to tell tdig to index those pages and ignore the meta tags ? keywords_factor:0 keywords_meta_tag_names: meta_description_factor:0 max_meta_description_leng

htdig@htdig.org

2000-04-27 Thread Gilles Detillieux
According to Andoni Ayala: > In may htdig.conf i have > > locale: es_ES > > but when appears generación > > split in: generaci It would seem that either this locale is not defined on your system, or it's broken. Have a look at the /usr/share/locale directory to see if an es_ES

Re: [htdig] accented words

2000-04-27 Thread Gilles Detillieux
According to Alexey Rodriguez: > i already indexed the spanish documents, thanks. > i have another problem now: a vast majority of people using the > search service that i'll provide, do not have a good ortography. So i need > a way to provide mixed search to accented and not accented

Re: [htdig] htdig / Suse 6.2: very long run ?

2000-04-26 Thread Gilles Detillieux
According to Geoff Hutchison: > > Is there some support for parsing dvi and ps files ? dvi can be turned > >into (ugly) text using dvi2ascii and there is a corresponding converter > >for ps. > > I would check the conv_doc.pl script and plug in a dvi->txt > converter. I believe it already handl

Re: [htdig] Fwd: Help with "Missing Pages"

2000-04-26 Thread Gilles Detillieux
According to Danny Summers: > >I am new to htdig and having problems with it dropping or loosing > >indexed pages on second or subsequent digs. I can trash the URL's > >htdig databases, do a new run and everything is there, fully > >searchable. The next scheduled run, it drops most of the ind

Re: [htdig] Problem with Javascript

2000-04-25 Thread Gilles Detillieux
According to Keith Vance: > The java script works on all the rest of the site, > just not the htdig result pages. I get this error: "A runtime error has > ocurred. Do you wish to Debug? Line:60 Error: Expected ')' > > Here is my nomatch.html file. [snip] >onMouseOver="on('h00'), > showHid

Re: [htdig] xpdf source, patch?

2000-04-24 Thread Gilles Detillieux
According to Jane Dudley: > I am trying to wire pdf search capability into htdig on our site. I was > getting the "encryption" error msgs, so I went to the xpdf site and > downloaded the patch. I went to patch and recompile, and realized there > are no "source code" files in the xpdf source code d

Re: [htdig] Wrappers and htdig 1.3.5

2000-04-24 Thread Gilles Detillieux
According to Brandon Bell: > I think the only solution is to hack the htdig code as you suggest below and > revert the separator back to the '&'. > > If I use CGI.pm Version 2.64, which apparently does allow for the semicolon > separator, I still have the problem that when HTML forms are submitte

Re: [htdig] Indexing through ezmlm-cgi

2000-04-20 Thread Gilles Detillieux
According to James Moore: > excludes_urls is set to empty! > > Anyother suggestions? Yes. Check your spelling. If you misspelled the attribute name in your htdig.conf as you did above, you're not hitting the right attribute. Also, if you run htdig -ivv (at least 2 v's), you'll get a better ex

Re: [htdig] keywords and quotes

2000-04-19 Thread Gilles Detillieux
According to Kostas Kavoussanakis: > On Tue, 18 Apr 2000, Gilles Detillieux wrote: > > Sorry, Kostas, but there is no other solution. There have been a number > > of bug fixes between 3.1.2 and 3.1.5, the two most important for you being: > > Thank you very muc

Re: [htdig] keywords and quotes

2000-04-18 Thread Gilles Detillieux
According to Kostas Kavoussanakis: > Apologies if this is a FAQ, if it is please point me to the correct > place to search. Do the quotes in the META tag below have to be double > in order for htsearch to rank them high? > > > > An "All" search for "skills" or "staff skills" returns the page wi

Re: [htdig] Indexing scope

2000-04-17 Thread Gilles Detillieux
According to Geoff Hutchison: > On Sun, 16 Apr 2000, Dave Lers wrote: > > So the second dig is always adding one hop to the local database_one, that > > works (I assume the local hops to local files/dirs that were already indexed > > pose no problems*). Do I have to mess with htdig-dbgen? That fil

Re: [htdig] Using H1 instead of TITLE

2000-04-13 Thread Gilles Detillieux
According to Rutger Wessels: > For a large webproject, I used HTDIG to index the site. But the site > uses file locations in the tag in order to locate files in the > directory structure. I know that's not the best thing to do, but the > people who maintain the whole site like it that way and sin

Re: [htdig] Preventing htdig reading a $ in a template...

2000-04-13 Thread Gilles Detillieux
According to James Rigg: > Is there a way of escaping a $ in a template to prevent htdig thinking > it is a command for it to process? > I have a JavaScript in a page which refers to a URL and passes a $ as > part of a parameter. htdig strips out the $ and then the page barfs a > JavaScript error

Re: [htdig] htm extension

2000-04-13 Thread Gilles Detillieux
According to Patrick Baker: > I have a mix of .html and .htm extensions for pages. I can search and find > the inforamtion in .html page extensions but not the .htm pages. Is there a > switch I need to set to search these as well. Check the settings of attributes like bad_extensions, valid_exte

Re: [htdig] htdig FAQ

2000-04-13 Thread Gilles Detillieux
According to Malki Cymbalista: > I'm not sure this is the correct address for this, but I don't know wher > to go. > A while back I sent in an answer to one of the FAQ's and it does indeed > appear in the FAQ (question 4.2. How can I change the output format of > htsearch?). It says there that I

Re: [htdig] Sort by Date from Meta Tags [patch]

2000-04-11 Thread Gilles Detillieux
According to Geoff Hutchison: > On Tue, 11 Apr 2000, Gilles Detillieux wrote: > > Adding these capabilities to the 3.1.5 release would take a fair bit more > > effort, so if you want to try the bleeding edge, and don't mind hacking the > > code a bit, just wait a little

Re: [htdig] Sort by Date from Meta Tags ???

2000-04-11 Thread Gilles Detillieux
According to Michael Pfennich: > Is there a possibility, to sort the result by the Metatag "Created" > and/or > "Changed", and not the modification time of the File Not quite. In 3.2, there is a use_doc_date option, that causes htdig to use the "date" meta tag as the document's modificatio

Re: [htdig] htdig and alt img tags

2000-04-11 Thread Gilles Detillieux
According to David Robley: > On 11 Apr, Byron Jones wrote: > >> > also, while the database is being built, if i do a search, the results > >> come > >> > back with the correct number of results, however no results are shown. > >> > >>You need to use the -a flag when running htdig and htmerge. > >

Re: [htdig] Problems with GET URLS

2000-04-10 Thread Gilles Detillieux
n standard, then it might make good sense to use them. I don't expect they'd do the job for everyone who requested duplicate suppression, though. There's been talk of using MD5 checksums for this purpose. It's on the to-do list, but I don't know of anyone actively working on

Re: [htdig] Problems with GET URLS

2000-04-10 Thread Gilles Detillieux
Hi, Paul. I didn't see any response to this in the archives. I can't imagine anything within htdig which would explain the behaviour you're describing. Rather, it seems to me that perhaps your PHP scripts aren't putting out correct DC meta tags for all pages. They should look like A

Re: [htdig] Request for new htdig META property: htdig-description

2000-04-07 Thread Gilles Detillieux
According to Patrick Jennings: > Well, that was a nuance of my explicit setup and, given that I was > rolling my own solution, it was the simplest way of getting the result > I needed. (I love modifying one line of code to get significant new > functionality.) A more generalised solution would b

Re: [htdig] 3.1.5 update drama

2000-04-07 Thread Gilles Detillieux
According to N Irons: > Installing 3.1.2 was not difficult. Installing 3.1.5 on top of 3.1.2 is > giving me real problems. > > I'm using FreeBSD 2.7, running configure like this: > > ./configure --prefix=/usr/home/irons/apps/htdig-3.1.5 \ > --with-bin_dir=/usr/home/irons/bin \ > --with-config_d

Re: [htdig] htdig configuration?

2000-04-07 Thread Gilles Detillieux
According to Sam Xie: > I am not able to search the anchored(HREF=...) URLs eventhough I > removed the "limit_urls_to:". I just wnat to build a database_index > including those web sites which are anchored in a web page. I also > noticed that htdig is not able to search the subdir, is it possibl

Re: [htdig] Request for new htdig META property: htdig-description

2000-04-07 Thread Gilles Detillieux
According to Geoff Hutchison: > On Wed, 5 Apr 2000, Patrick Jennings wrote: > > The question is: will this have any negative side affects (other than > > making "description" function as a second "keywords" META if > > "htdig-description" is present)? I've scanned through the code, and > > it all

Re: [htdig] Help with FreeBSD & htdig

2000-04-07 Thread Gilles Detillieux
According to Sean: > Sorry if this was posted a minute ago.. I don't thik it made it. It sometimes takes a while for posted messages to get back to you. > I am trying to install htdig on a freebsd box 2.2.5 and keep getting the > following error in the make.. ... > gcc -o db_archive db_archive

Re: [htdig] Searching for "All" versus "Any"]

2000-04-06 Thread Gilles Detillieux
According to [EMAIL PROTECTED]: > On Thu, Apr 06, 2000 at 12:36:33PM -0500, Gilles Detillieux wrote: > > OK, try without the -c again. Did it look like this both times? > > > > littérature i:4 l:6 w:105469c:5 > > > before merge: > > littér

Re: [htdig] Searching for "All" versus "Any"]

2000-04-06 Thread Gilles Detillieux
According to [EMAIL PROTECTED]: > On Thu, Apr 06, 2000 at 11:59:35AM -0500, Gilles Detillieux wrote: > > Oops. Sorry, that should have been > > > > grep -c 'littérature.*i:4[^0-9]' db.wordlist > > > > okay - > > before htmerge > &

Re: [htdig] Searching for "All" versus "Any"]

2000-04-06 Thread Gilles Detillieux
According to [EMAIL PROTECTED]: > On Wed, Apr 05, 2000 at 03:48:23PM -0500, Gilles Detillieux wrote: > > > > grep 'littérature.*i:4' db.wordlist > > > > before and after htmerge, to see if the word is there before and after > > you run htmerge. I

Re: [htdig] Searching for "All" versus "Any"]

2000-04-05 Thread Gilles Detillieux
Coincidentally enough, Geoff answered a question this morning that dealt with the numbers in -v output... Date: Wed, 5 Apr 2000 08:32:41 -0500 To: "NEPOTE Charles (Neuilly Gestion)" <[EMAIL PROTECTED]> Cc: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]> Subject: Re: [htdig] What are the numbers

<    1   2   3   4   5   6   7   8   9   10   >