Re: Tracing crawled sites

2019-04-18 Thread Sebastian Nagel
Hi Ryan,

you may have a look at the plugin scoring-depth.
It tracks the depth (links away from one of the seeds)
of a crawled page and could be modified to write also
the parents (maybe only the first) into the CrawlDatum
metadata.

Best,
Sebastian

On 4/9/19 9:08 PM, Ryan Suarez wrote:
> Greetings,
> 
> We are running nutch v1.5 with SOLR v7.3.1
> 
> I would like to determine how a specific site was crawled.  What were
> the parent links that the nutch crawler followed all the way back to
> the root?  
> 
> Could someone let me know what is the best way to accomplish this?
> 
> regards,
> Ryan
> 



Tracing crawled sites

2019-04-09 Thread Ryan Suarez
Greetings,

We are running nutch v1.5 with SOLR v7.3.1

I would like to determine how a specific site was crawled.  What were
the parent links that the nutch crawler followed all the way back to
the root?  

Could someone let me know what is the best way to accomplish this?

regards,
Ryan