[htdig] combined words

2000-06-19 Thread Reich, Stefan
Hello HTDIGs ;-) Here's another problem on my way to metatag-name aware search. Is there an option to tell htsearchs suffix algorithm, that some words are part of combined words? One example: there is a german word "arzt" (doctor) which can be combined with the word "helfer" (assistant). So

[htdig] German Synonyms

2000-06-28 Thread Reich, Stefan
Hi all, some time ago, someone asked for a German synonym dictionary. Couldn't find a positiv answere to this in the board. Has someone got one in the meantime or has someone a hint were to get (buy if necessary) one in a htdig-usable format? Tnx Stefan

AW: [htdig] Fw: amp in Titles

2000-06-28 Thread Reich, Stefan
I'm not quite sure, but I had a similar problem, where a document containing amp; comes out of htdig with amp;amp in the source, so it looked like amp in the browser. First you should check the HTML source of the result file. If this is your problem too, you can switch this behavior off using

[htdig] Bug or Feature

2000-07-07 Thread Reich, Stefan
Hi, just found out, that 3.1.5 behaves different than previous versions, when it comes to showing the "next" button on the search results page. On one of our sites, using an older htdig, I see a next button, even if I'm on the search results page number maximum_pages. e.g. if I configure

AW: [htdig] using htdig in two separate sections of a website

2000-07-26 Thread Reich, Stefan
You can also get around the "hiding" problem, by setting the restrict_url via a htsearch2.cgi, which appends the restrict and calls the original htsearch.cgi. -Ursprüngliche Nachricht- Von: Geoff Hutchison [mailto:[EMAIL PROTECTED]] Gesendet: Dienstag, 25. Juli 2000 18:53 An: alan Cc:

AW: [htdig] Indexing intranet and internet site.

2000-08-03 Thread Reich, Stefan
Hi, we do it the following way: We have put a shellscript into each web-servers cgi-bin directory. The script does nothing more than calling "/htdigbin/htsearch -c configfile". So you don't need to allow the config to be set from the search form. Bye Stefan -Ursprüngliche

[htdig] Sorting Option

2000-08-15 Thread Reich, Stefan
Hi Folks, I'm afraid, that there is no builtin solution for the following problem, but maybe someone has an idea of how to get near. Ok, so here is what I want to do: On our server, we have a sort of address list. Each address document contains something like this (simplified): titletom

AW: [htdig] AND OR in restrict

2000-08-15 Thread Reich, Stefan
Hello, I don't think logical operators are supported for restrict (at least for HTDIG = 3.1.5) You can send more than one restrict, which acts like an or (restrict=g21restrict=g22) And wouldn't make sense, because a single document has only one url. Bye Stefan -Ursprüngliche

AW: [htdig] Sorting Option

2000-08-15 Thread Reich, Stefan
Hi Geoff, I expected this answer :-( But thanks a lot for your response Bye Stefan -Ursprüngliche Nachricht- Von: Geoff Hutchison [mailto:[EMAIL PROTECTED]] Gesendet: Dienstag, 15. August 2000 15:29 An: Reich, Stefan Cc: [EMAIL PROTECTED] Betreff: Re: [htdig] Sorting Option At 10

[htdig] Not all search hits shown

2000-09-05 Thread Reich, Stefan
Someone an idea what could be the cause of the following problem: A search returns e.g. 90 hits, but the search result list is empty after page 2. Page 2 itself only contains 4 entries. I'm using HTDIG 3.1.5, indexing several sites. I suspect all missing search results are from one site. I do

AW: [htdig] Not all search hits shown

2000-09-05 Thread Reich, Stefan
. September 2000 11:23 An: 'Reich, Stefan'; [EMAIL PROTECTED] Betreff: RE: [htdig] Not all search hits shown we too faced this problem, that all because of database. Try re-indexing your database again from the scratch and you will get the perfect result as far as my experience with htdig is concerned

[htdig] Still not all search hits shown

2000-09-06 Thread Reich, Stefan
: Dienstag, 5. September 2000 14:53 An: Reich, Stefan Cc: [EMAIL PROTECTED] Betreff: Re: [htdig] Not all search hits shown At 10:53 AM +0200 9/5/00, Reich, Stefan wrote: I do an url replacement (only for this site!) in the htdig config replacing ip by replace#1 and in htsearch config replace#1 by fqdn

AW: [htdig] Start_url attribute in HTDIG.CONF

2000-09-06 Thread Reich, Stefan
And one addition: If you already split the lines by \, make sure, there is no blank after the \, because htdig will stop indexing at this url. Bye Stefan -Ursprüngliche Nachricht- Von: Torsten Neuer [mailto:[EMAIL PROTECTED]] Gesendet: Mittwoch, 6. September 2000 11:23 An: Huby,

[htdig] HTMERGE doesn't remove URL

2000-10-20 Thread Reich, Stefan
Good Afternoon all, I'm just setting up a multiple database scenario for htdig 3.1.5. Each site get's its own database. In addition I want to merge all the databases into one collection database. So far ererything works. Now I encountered the following problem: If pages are removed from a

AW: [htdig] HTMERGE doesn't remove URL

2000-10-20 Thread Reich, Stefan
. Oktober 2000 16:59 An: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Betreff: Re: [htdig] HTMERGE doesn't remove URL According to Reich, Stefan: I'm just setting up a multiple database scenario for htdig 3.1.5. Each site get's its own database. In addition I want to merge all the databases into one

[htdig] Valid Punctiation Question

2000-10-25 Thread Reich, Stefan
I have a strange problem with htdig, which I think may be caused by the valid punctiation mechanism. In my document I have a Datestring like 1998-10-05 I set valid punctuation to - Now searching for 1998 1998-10 1998-10-05 199810 19981005 all lead to the same result. But I only want 19981005

AW: [htdig] Valid Punctiation Question

2000-10-26 Thread Reich, Stefan
, but would be good to know if there is another option. -Ursprüngliche Nachricht- Von: Geoff Hutchison [mailto:[EMAIL PROTECTED]] Gesendet: Donnerstag, 26. Oktober 2000 05:20 An: Reich, Stefan Cc: '[EMAIL PROTECTED]' Betreff: Re: [htdig] Valid Punctiation Question At 4:02 PM +0200 10/25/00

[htdig] Htdig as external Link Checker? (Maybe off-topic)

2000-12-13 Thread Reich, Stefan
Hi community, I need to generate a List for my boss, which contains all external Links of our Web-Site (which gets already indexed by htdig) including the status (means if the target of this link exists or not) Can HTDIG help me with this by: 1. Create a List of external URLs (all URLs, which

[htdig] PDF Problem

2000-12-14 Thread Reich, Stefan
Hi, this is not really a HTDIG problem, but at least it's HTDIG related ;-) On Indexing PDF Documents xpdf-0.90 and xpdf-0.92 generates: Error: Uknown Type 0 character set: Adobe-Identity Acrobat Reader shows the document ok: (http://www.gesundheitsgespraech.de/common/koch.pdf) Any help hint

[htdig] METATAGS in Search Results

2000-12-15 Thread Reich, Stefan
Hello community, anyone a good idea of how to get Metatag content into the search result list? e.g. we have metatags like dc.creator, dc.type, dc.format and so on. Now I want a result list, which contains description, date, size and the metatags. I'm afraid this is not possible with htdig,

AW: [htdig] Going for the big dig

2000-12-20 Thread Reich, Stefan
Not quite sure if this helps, but maybe ;-) If I'm right, Lotus.com is running on Notes Domino something servers. We've experienced lots of problems with this notes servers, because of their meshed link structure. As Notes Links are generated on the fly by the server, you get something like:

[htdig] External Converter Prob

2001-01-11 Thread Reich, Stefan
Hi all, all my descriptions are starting with "content-type: text/html". Is this normal behavior or is it, because I'm using an external converter to do some modifications on the spidered html files. I registered my converter for text/html - text/myhtml conversion. I've patched the html parser

AW: [htdig] prepare a search index for a different URL

2001-01-12 Thread Reich, Stefan
You can use the url_part_aliases feature to get this done. You need two config files, one for digging, one for searching. In the dig config, you set url_part_aliases: http://original_url.com/ replace#1 In the search config you set url_part_aliases: http://new_url.com/ replace#1 replace#1 may be