how to get sets of urls and terms for tf/idf

Awei Sun, 02 Dec 2007 06:12:29 -0800

Dear All, 

I am a beginner for nutch. I have three questions after using intranet
crawling:


1) How could I get all the urls after crawling? 

2) How could I get all the terms after crawling and indexing? 

3) How could I get the top N frequent terms given A URL (depends on
different fields)? 

I need these three results to comput values of tf/idf. 

For the first question, I managed to solve it after reading this forum. But
for the rest two, I am even in mess!!!!!!!!!!!! 

Can anybody give me some help? Thanks a lot in advance. 

-- 
View this message in context: 
http://www.nabble.com/how-to-get-sets-of-urls-and-terms-for-tf-idf-tf4931802.html#a14115859
Sent from the Nutch - User mailing list archive at Nabble.com.

how to get sets of urls and terms for tf/idf

Reply via email to