Re: Tracing crawled sites

Sebastian Nagel Thu, 18 Apr 2019 08:17:50 -0700

Hi Ryan,

you may have a look at the plugin scoring-depth.
It tracks the depth (links away from one of the seeds)
of a crawled page and could be modified to write also
the parents (maybe only the first) into the CrawlDatum
metadata.


Best,
Sebastian

On 4/9/19 9:08 PM, Ryan Suarez wrote:
> Greetings,
> 
> We are running nutch v1.5 with SOLR v7.3.1
> 
> I would like to determine how a specific site was crawled.  What were
> the parent links that the nutch crawler followed all the way back to
> the root?  
> 
> Could someone let me know what is the best way to accomplish this?
> 
> regards,
> Ryan
>

Re: Tracing crawled sites

Reply via email to