Hi Yossi, So I need to make a custom parser. Where do I start? I found this link https://wiki.apache.org/nutch/HowToMakeCustomSearch <https://wiki.apache.org/nutch/HowToMakeCustomSearch>. Is this the right place, or should I be looking at creating a plugin page. Any advice would be helpful.
Thank you, Ankit Goel > On 02-Nov-2017, at 1:14 PM, Yossi Tamari <[email protected]> wrote: > > Hi Ankit, > > According to this: https://issues.apache.org/jira/browse/NUTCH-1465, sitemap > is a 1.14 feature. > I just checked, and the command indeed exists in 1.14. I did not test that > it works. > > In general, Nutch supports crawling anything, but you might need to write > your own parser for custom protocols. > > Yossi. > >> -----Original Message----- >> From: Ankit Goel [mailto:[email protected]] >> Sent: 01 November 2017 18:55 >> To: [email protected] >> Subject: sitemap and xml crawl >> >> Hi, >> I need to crawl a xml feed, which includes url, title and content of the > articles on >> site. >> >> The documentation on the site says that bin/nutch sitemap exists, but on > my >> nutch 1.13 sitemap is not a command in bin/nutch. So does nutch support >> crawling sitemaps? Or xml links. >> >> Regards, >> Ankit Goel > >

