Sitemap function in 2.x version?

Michael Chen Tue, 01 Aug 2017 14:44:19 -0700

Dear fellow Nutch users/developers,

I've been trying to use Nutch 2 sitemap function to crawl and index allpages on the sitemap indices. It seems that integration withCommonCrawler sitemap tools only exist in 2.x branch. But after I got itto work with Hbase 1.2.3, it didn't fetch, parse and index the sitemapindices and sitemaps at all.

I also looked into the code a bit and everything seems to make sense,except I couldn't further trace the data flow beyond Toolrunner.run() inthe FetchReducer. I'm testing it on Linux with the "crawl" script in/bin, so I'm not sure if how I can debug this. Please let me know ifthere's any further information that I can provide you with to helptroubleshoot this issue. Thanks in advance!


Best regards,

Michael

Sitemap function in 2.x version?

Reply via email to