Hi Lewis,

I am looking at your presentation on ApacheCon titled 'Building your big
data search stack with Apache Nutch 2.x' at the link below.


http://prezi.com/gkomeulfuqhh/building-your-big-data-search-stack-with-apache-nutch-2x/?utm_campaign=share&utm_medium=copy

In there I saw a slide mentioning about the future work in parsing the
sitemaps using crawler-commons, what is the purpose of such a functionality
, are we working on that future feature because fetching URLs based off a
sitemap would be better than fetching and crawling with a seed URL that
could be the home page of the web site ?

Thanks.

Reply via email to