Hello,
This is what I want to do. Given a document, find all its terms and
frequencies.
I understand that Nutch is built on top of Lucene. In Lucene, I can access
the terms and their frequencies of a document via the indexreader. However,
in nutch, I am not sure if there's an equivalent.
hzhong wrote:
Hello,
This is what I want to do. Given a document, find all its terms and
frequencies.
I understand that Nutch is built on top of Lucene. In Lucene, I can access
the terms and their frequencies of a document via the indexreader. However,
in nutch, I am not sure if there's
[
http://issues.apache.org/jira/browse/NUTCH-395?page=comments#action_12448795 ]
Sami Siren commented on NUTCH-395:
--
have you measured what made the biggest impact on performance - changes to
Metadata, or
changes to IO in FetcherOutput?
did
Oh, Thai words are not space delimited?
OK, in that case, you'd need to study how ThaiAnalyzer works and
then modify the rules in NutchAnalysis.jj (if you are going to use
the web search GUI from Nutch). This is because the search
expressions are parsed by the parser generated from