Hi, could you explain in detail what is meant by "parent URL"? - the page the PDF document is linked from - a redirect pointing to the PDF doc - the "directory" of the PDF URL (clip URL after last "/") - ...
Nutch indexes all successfully fetched pages but not redirects, 404s, etc. Of course, pages not crawled cannot be indexed. Best, Sebastian On 09/27/2018 11:58 AM, UMA MAHESWAR wrote: > I am using nutch1.x for website cawing and indexing in solr(5.5.0). > I am trying to include the parent URL along with pdf data . > Can someone please suggest me some way to do it ? > > Thanks in advance for your comments and suggestions > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html >