Hello,

I´m new to Nutch and I´m doing some tests to see how it works. I want to do
some crawling in a digital newspaper webpage. To do so, I put in the urls
directory where I have my seed list the URL I want to crawl that is: *
http://elcorreo.com*
The thing is that I don´t want to crawl all the news in the site but only
the ones of the current day, so I put a filter in the
*crawl-urlfilter.txt*(for the moment I´m using the
*crawl* command). The filter I put is:

+^http://www.elcorreo.com/.*?/20110613/.*?.html

A correct URL would be for example,
http://www.elcorreo.com/vizcaya/20110613/mas-actualidad/politica/lopez-consta-pactado-bildu-201106131023.html

so, I think the regular expression is correct but Nutch doesn´t crawl
anything. It says that there are *No Urls to Fetch  - check your seed list
and URL filters.*


Am I missing something ??

Thanks,

Reply via email to