Distributed Indexing with nutch

Marco Didonna Tue, 08 Feb 2011 02:07:39 -0800

Hi everyone,
I've build a little hadoop program to build an inverted index from a
text collection. It performs basic analysis: tokenization,
lowercasing, stopword removal. I was wondering if I could use some
nutch components since I assume they've undergone a more intense
tuning and so they're more efficient. I looked up in the javadoc
(org.apache.nutch.indexer package) to find some hint but I didn't find
any helping material...I hope someone of you can point me to the right
place with - hopefully - some example code.
I would like to underline that I do need anything but the indexing
capabilities of nutch - no crawling or other stuff -  and I need the
whole thing to work on hadoop :)


Thanks for your time

MD

Distributed Indexing with nutch

Reply via email to