Can you tell me what is the command are u running? On Mon, Jul 2, 2012 at 4:51 PM, Shameema Umer <[email protected]> wrote:
> I had already tried this. But when we restrict depth to 1, the crawler will > not even crawl http://www.abc.com/category/apple because , the url link > depth is 3 for this. > > Any other suggestion? > > > > On Mon, Jul 2, 2012 at 3:12 PM, shekhar sharma <[email protected] > >wrote: > > > I think you need to specify the depth parameter as 1. > > > > bin/nutch crawl seedDir -dir crawl -depth 1. > > > > It will crawl only the seed links given. And if you want to see the out > > links from each seed you can read the segments. > > Is this what you are looking for? > > > > Regards, > > Som > > > > On Mon, Jul 2, 2012 at 1:38 PM, Shameema Umer <[email protected]> wrote: > > > > > Hi there, > > > > > > How to restrict nutch to crawl only seed urls and links contained in > the > > > seed pages. > > > > > > For example. > > > If seed.txt contains: > > > > > > http://www.abc.com/category/apple > > > http://www.abc.com/category/orange > > > > > > I need to parse http://www.abc.com/category/apple and > > > http://www.abc.com/category/orange and the toUrls collected from these > > > pages. Please help. > > > > > > Thanks > > > Shameema > > > > > >

