I had already tried this. But when we restrict depth to 1, the crawler will not even crawl http://www.abc.com/category/apple because , the url link depth is 3 for this.
Any other suggestion? On Mon, Jul 2, 2012 at 3:12 PM, shekhar sharma <[email protected]>wrote: > I think you need to specify the depth parameter as 1. > > bin/nutch crawl seedDir -dir crawl -depth 1. > > It will crawl only the seed links given. And if you want to see the out > links from each seed you can read the segments. > Is this what you are looking for? > > Regards, > Som > > On Mon, Jul 2, 2012 at 1:38 PM, Shameema Umer <[email protected]> wrote: > > > Hi there, > > > > How to restrict nutch to crawl only seed urls and links contained in the > > seed pages. > > > > For example. > > If seed.txt contains: > > > > http://www.abc.com/category/apple > > http://www.abc.com/category/orange > > > > I need to parse http://www.abc.com/category/apple and > > http://www.abc.com/category/orange and the toUrls collected from these > > pages. Please help. > > > > Thanks > > Shameema > > >

