Dear All, I am a beginner for nutch. I have three questions after using intranet crawling:
1) How could I get all the urls after crawling? 2) How could I get all the terms after crawling and indexing? 3) How could I get the top N frequent terms given A URL (depends on different fields)? I need these three results to comput values of tf/idf. For the first question, I managed to solve it after reading this forum. But for the rest two, I am even in mess!!!!!!!!!!!! Can anybody give me some help? Thanks a lot in advance. -- View this message in context: http://www.nabble.com/how-to-get-sets-of-urls-and-terms-for-tf-idf-tf4931802.html#a14115859 Sent from the Nutch - User mailing list archive at Nabble.com.
