Jaydeep - I have the same problem as well. When I run a fresh crawl, only the urls in the webpage table are being crawled over and over, it was ignoring the new urls in seed.txt.
On Thu, Aug 1, 2013 at 9:03 AM, Jayadeep Reddy <[email protected]>wrote: > I am using Nutch 2.1 every time I run crawl from dmoz directory my existing > crawled pages in the database are fetched again(Taking long time/). Is > there a way to crawl only new sites. > > Thank you > > -- > Jayadeep Reddy.S, > M.D & C.E.O > e Health Access Pvt.Ltd > www.ehealthaccess.com > Hyderabad-Chennai-Banglore > http://www.youtube.com/watch?v=0k5LX8mw6Sk >

