Please remove me from this list

-----Original Message-----
From: Sebastian Nagel [mailto:wastl.na...@googlemail.com.INVALID] 
Sent: Friday, September 28, 2018 2:25 AM
To: user@nutch.apache.org
Subject: [Non-DoD Source] Re: Include parent URL in pdf data - nutch

All active links contained in this email were disabled.  Please verify the 
identity of the sender, and confirm the authenticity of all links contained 
within the message prior to copying and pasting the address to a Web browser.  




----

Hi,

could you explain in detail what is meant by "parent URL"?
- the page the PDF document is linked from
- a redirect pointing to the PDF doc
- the "directory" of the PDF URL (clip URL after last "/")
- ...

Nutch indexes all successfully fetched pages but not redirects, 404s, etc. Of 
course, pages not crawled cannot be indexed.

Best,
Sebastian

On 09/27/2018 11:58 AM, UMA MAHESWAR wrote:
> I am using nutch1.x for website cawing and indexing in solr(5.5.0). 
> I am trying to include the parent URL along with pdf data . 
> Can someone please suggest me some way to do it ?
> 
> Thanks in advance for your comments and suggestions
> 
> 
> 
> --
> Sent from: 
> Caution-http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html
> 

Reply via email to