Re: How to restrict nutch to crawl only seed urls and links contained in the seed pages

shekhar sharma Mon, 02 Jul 2012 02:42:49 -0700

I think you need to specify the depth parameter as 1.

bin/nutch crawl seedDir -dir crawl -depth 1.

It will crawl only the seed links given. And if you want to see the out
links from each seed you can read the segments.
Is this what you are looking for?

Regards,
Som

On Mon, Jul 2, 2012 at 1:38 PM, Shameema Umer <[email protected]> wrote:

> Hi there,
>
> How to restrict nutch to crawl only seed urls and links contained in the
> seed pages.
>
> For example.
> If seed.txt contains:
>
> http://www.abc.com/category/apple
> http://www.abc.com/category/orange
>
> I need to parse http://www.abc.com/category/apple and
> http://www.abc.com/category/orange and the toUrls collected from these
> pages. Please help.
>
> Thanks
> Shameema
>

Re: How to restrict nutch to crawl only seed urls and links contained in the seed pages

Reply via email to