First, congratulations to the new board and thanks to those who served for the past 
year.

Second, I'd like to ask for the group's recommendations on an intranet search 
(engine|tool) which runs on Linux and is suitable for a small to
 midsize intranet.  I've been experimenting with htdig (distributed with Red Hat 
Linux) but have run into some apparent limitations:

1)  Based on the most current information I could find, htdig cannot update an index 
for only modified files.  For example, if 50 of 25000 fil
es are modified in the course of a day, I'd like to be able to update the index for 
only the modified files.  With htdig, I would have to repa
rse and reindex all 25000 files just to get the 50 updates.

2)  htdig (and/or its external parsers) seem to have a very large memory footprint for 
xls, doc, and pdf files over a few MB in size.  Setting
 the max_doc_size to a small number (i.e. 500K) would cause most of our documents to 
be omitted from indexing.

Any recommendations?  I'm especially interested in anything that allows indices to be 
updated on modified files without reindexing unchanged f
iles.  I've looked at Google's product, but is quite costly.

Thanks,
Geoff


_______________________________________________
TriLUG mailing list
    http://www.trilug.org/mailman/listinfo/trilug
TriLUG Organizational FAQ:
    http://www.trilug.org/~lovelace/faq/TriLUG-faq.html

Reply via email to