RE: Indexing Doc, PDF, ... from filesystem (Newbie Question)

Teruhiko Kurosaka Tue, 21 Aug 2007 14:33:34 -0700

Christian,
This is interesting.  I have been always thinking that Solr shouldn't
be in the business of parsing; it's responsibility of the Solr client.
But 
what Peter suggested, adding a parsing capability to the Solr
as a request handler does make sense.


One thing that I noticed this approach can't do (or won't fit nicely),
however, is that it can't crawl docs. If this is your requirement, then
using Nutch as a crawler & parser for Nutch may be an answer.
This is the linke provided by Otis in a discussion while ago:
http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with
.html

I was going to try this but I haven't done so yet. I am also aware
that Solr's plugin architecture is different than, and superior to Nutch

in certain aspects.  I recall Nutch has had an issue handling
non-European languages in its parsing code, but that might not
be an issue here as Solr provides the search capability.

-kuro

RE: Indexing Doc, PDF, ... from filesystem (Newbie Question)

Reply via email to