I use HTDig and it's what runs on TriLUG, but we do run the full re-index every night. (I'm sure there is a way around that...). I've also run Namazu.
Namazu seemed to do a much better job with excel and word docs, but it was also much more complex to setup. Plus the docs are translated english, and that makes for some head-scatching while you are doing the setup. Mandrake defaults to using Medusa. I don't know how far along that is in developement, but if Mandrake uses it, it must be promising. Jon --- Original Message: Friday 10 May 2002 03:28 pm --- > First, congratulations to the new board and thanks to those who served for > the past year. > > Second, I'd like to ask for the group's recommendations on an intranet > search (engine|tool) which runs on Linux and is suitable for a small to > midsize intranet. I've been experimenting with htdig (distributed with Red > Hat Linux) but have run into some apparent limitations: > > 1) Based on the most current information I could find, htdig cannot update > an index for only modified files. For example, if 50 of 25000 fil es are > modified in the course of a day, I'd like to be able to update the index > for only the modified files. With htdig, I would have to repa rse and > reindex all 25000 files just to get the 50 updates. > > 2) htdig (and/or its external parsers) seem to have a very large memory > footprint for xls, doc, and pdf files over a few MB in size. Setting the > max_doc_size to a small number (i.e. 500K) would cause most of our > documents to be omitted from indexing. > > Any recommendations? I'm especially interested in anything that allows > indices to be updated on modified files without reindexing unchanged f > iles. I've looked at Google's product, but is quite costly. > > Thanks, > Geoff > > > _______________________________________________ > TriLUG mailing list > http://www.trilug.org/mailman/listinfo/trilug > TriLUG Organizational FAQ: > http://www.trilug.org/~lovelace/faq/TriLUG-faq.html _______________________________________________ TriLUG mailing list http://www.trilug.org/mailman/listinfo/trilug TriLUG Organizational FAQ: http://www.trilug.org/~lovelace/faq/TriLUG-faq.html
