Hi Ryan,
you may have a look at the plugin scoring-depth.
It tracks the depth (links away from one of the seeds)
of a crawled page and could be modified to write also
the parents (maybe only the first) into the CrawlDatum
metadata.
Best,
Sebastian
On 4/9/19 9:08 PM, Ryan Suarez wrote:
> Greetings,
>
> We are running nutch v1.5 with SOLR v7.3.1
>
> I would like to determine how a specific site was crawled. What were
> the parent links that the nutch crawler followed all the way back to
> the root?
>
> Could someone let me know what is the best way to accomplish this?
>
> regards,
> Ryan
>