Hi,

Background: I have several article list urls in seed.txt. Currently, the nutch 
crawl command crawls both the list urls and the article urls every time.
I want to prevent re-crawling for the urls (article urls) which are already 
crawled. But I want to crawl the urls in the seed.txt (article list urls).
Do you have idea about this?

Regards,
Rui

Reply via email to