Hi all I'm interested in playing with term frequency values in a nutch index on a per document and index wide scope.
for example, something similar to this lucene faq entry. http://tinyurl.com/ra3ys so what is the 'correct' way to inspect the nutch index for these values. Particularly against the lucene IndexReader behind the nutch IndexSearcher. Since I don't see anything on the Searcher interface, is there some other hadoop-ified way to do this? assuming there isn't, if I was to add the ability to get document and index wide term frequencies, would this be exposed on the nutch.searcher.Searcher interface? e.g. Searcher.getTermVector( Hit hit ) // returns a nutch friendly TermVec obj Searcher.getTermVector( Hit hit, String field ) Searcher.getTermVector( String field ) or is there a more relevant interface this should hang off of? Searcher doesn't seem like a fit, neither does HitDetailer. Maybe HitTermVector and IndexTermVector?? or is this just insane, it won't work like I think and I should just forget trying to get corpus relevant info from the indexes during runtime? cheers, ckw ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
