Hi Ankit,
According to this: https://issues.apache.org/jira/browse/NUTCH-1465, sitemap
is a 1.14 feature.
I just checked, and the command indeed exists in 1.14. I did not test that
it works.
In general, Nutch supports crawling anything, but you might need to write
your own parser for custom protocols.
Yossi.
> -----Original Message-----
> From: Ankit Goel [mailto:[email protected]]
> Sent: 01 November 2017 18:55
> To: [email protected]
> Subject: sitemap and xml crawl
>
> Hi,
> I need to crawl a xml feed, which includes url, title and content of the
articles on
> site.
>
> The documentation on the site says that bin/nutch sitemap exists, but on
my
> nutch 1.13 sitemap is not a command in bin/nutch. So does nutch support
> crawling sitemaps? Or xml links.
>
> Regards,
> Ankit Goel