Hi all,
I performed a little test where I index the same set of documents with
Nutch (0.9) and Lucene.
This is a set of documents from TREC, 134 000+ short text documents.

With Lucene, it took 1H. With Nutch using the file:/ protocol, it took
4H10.

Could anyone explain why there is such a difference and is there some
way to eliminate part of this overhead ?

Regards,
--
Marc


Reply via email to