Hi Ankit,

According to this: https://issues.apache.org/jira/browse/NUTCH-1465, sitemap
is a 1.14 feature.
I just checked, and the command indeed exists in 1.14. I did not test that
it works.

In general, Nutch supports crawling anything, but you might need to write
your own parser for custom protocols.

        Yossi.

> -----Original Message-----
> From: Ankit Goel [mailto:[email protected]]
> Sent: 01 November 2017 18:55
> To: [email protected]
> Subject: sitemap and xml crawl
> 
> Hi,
> I need to crawl a xml feed, which includes url, title and content of the
articles on
> site.
> 
> The documentation on the site says that bin/nutch sitemap exists, but on
my
> nutch 1.13 sitemap is not a command in bin/nutch. So does nutch support
> crawling sitemaps? Or xml links.
> 
> Regards,
> Ankit Goel


Reply via email to