Hi Yossi,
So I need to make a custom parser. Where do I start? I found this link 
https://wiki.apache.org/nutch/HowToMakeCustomSearch 
<https://wiki.apache.org/nutch/HowToMakeCustomSearch>. Is this the right place, 
or should I be looking at creating a plugin page. Any advice would be helpful. 

Thank you,
Ankit Goel

> On 02-Nov-2017, at 1:14 PM, Yossi Tamari <[email protected]> wrote:
> 
> Hi Ankit,
> 
> According to this: https://issues.apache.org/jira/browse/NUTCH-1465, sitemap
> is a 1.14 feature.
> I just checked, and the command indeed exists in 1.14. I did not test that
> it works.
> 
> In general, Nutch supports crawling anything, but you might need to write
> your own parser for custom protocols.
> 
>       Yossi.
> 
>> -----Original Message-----
>> From: Ankit Goel [mailto:[email protected]]
>> Sent: 01 November 2017 18:55
>> To: [email protected]
>> Subject: sitemap and xml crawl
>> 
>> Hi,
>> I need to crawl a xml feed, which includes url, title and content of the
> articles on
>> site.
>> 
>> The documentation on the site says that bin/nutch sitemap exists, but on
> my
>> nutch 1.13 sitemap is not a command in bin/nutch. So does nutch support
>> crawling sitemaps? Or xml links.
>> 
>> Regards,
>> Ankit Goel
> 
> 

Reply via email to