Nutch and Lucene

2006-11-10 Thread hzhong
Hello, This is what I want to do. Given a document, find all its terms and frequencies. I understand that Nutch is built on top of Lucene. In Lucene, I can access the terms and their frequencies of a document via the indexreader. However, in nutch, I am not sure if there's an equivalent.

Re: Nutch and Lucene

2006-11-10 Thread Andrzej Bialecki
hzhong wrote: Hello, This is what I want to do. Given a document, find all its terms and frequencies. I understand that Nutch is built on top of Lucene. In Lucene, I can access the terms and their frequencies of a document via the indexreader. However, in nutch, I am not sure if there's

[jira] Commented: (NUTCH-395) Increase fetching speed

2006-11-10 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-395?page=comments#action_12448795 ] Sami Siren commented on NUTCH-395: -- have you measured what made the biggest impact on performance - changes to Metadata, or changes to IO in FetcherOutput? did

RE: implement thai lanaguage analyzer in nutch

2006-11-10 Thread Teruhiko Kurosaka
Oh, Thai words are not space delimited? OK, in that case, you'd need to study how ThaiAnalyzer works and then modify the rules in NutchAnalysis.jj (if you are going to use the web search GUI from Nutch). This is because the search expressions are parsed by the parser generated from