This maybe more of a straight Lucene task, but I thought I'd ask anyway.
Rather than using Nutch as a crawler, I'd rather just send the Nutch parser and
indexer over to a directory on my server and have it detect content-type by the
file extension.
I'd prefer to skip the whole crawling part since all of my data is local, and
increase the reliability of getting all of my proper data indexed. Is this
possible?
---------------------------------
All-new Yahoo! Mail - Fire up a more powerful email and get things done faster.