Hi, I am using Nutch 1.10 and we are planing to crawl just some url which match some pattern. The problem is we can not do it using regex-urlfilter.txt as this way the seeds itself would be rejected.
For e.g seed is apple.com <http://apple.com/> and we want to crawl just urls which has /mac/ in url string. May be we have to filter the urls at Generate or fetch time . Any thoughts ? Can we customize Generate or Fetch phases ? Thanks Manish Verma