Yes, you can do that, see fetcher.follow.outlinks.depth and 
fetcher.follow.outlinks.num.links and fetcher.follow.outlinks.depth.divisor. 
The links are followed within the fetcher, so you don't need two crawl cycles.

https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L925

 
 
-----Original message-----
> From:jjmendes <[email protected]>
> Sent: Friday 21st October 2016 21:51
> To: [email protected]
> Subject: Adding a set number of inner pages to the fetch list
> 
> In order to get data for a study, I am currently using Nutch to go
> through a list of web pages and download their HTML, said list is solely
> comprised of main pages. However, it would be beneficial to also
> download at least one other page from the same domain that was linked to
> by its home page. Is there any easy way of achieving this?
> 
> Thanks,
> 
> JJAM
> 
>   

Reply via email to