Hello Nutch Users, I've googled for a while and still can not find answers to the following: 1. After I crawl a web site, how can I extract tf-idf for it? 2. How can I access original web pages crawled? 3. Is it possible to get for each word id it corresponds to?
Thanks in advance! -Zhanibek

