Hi Musshorn, You can take a look at http://nutch.apache.org/mailing_lists.html on how to unsubscribe from the mailing list. Send an email to [email protected].
Best Regards, Jorge On Fri, Sep 28, 2018 at 1:24 PM Musshorn, Kris T CTR USARMY CECOM (US) < [email protected]> wrote: > Please remove me from this list > > -----Original Message----- > From: Sebastian Nagel [mailto:[email protected]] > Sent: Friday, September 28, 2018 2:25 AM > To: [email protected] > Subject: [Non-DoD Source] Re: Include parent URL in pdf data - nutch > > All active links contained in this email were disabled. Please verify the > identity of the sender, and confirm the authenticity of all links contained > within the message prior to copying and pasting the address to a Web > browser. > > > > > ---- > > Hi, > > could you explain in detail what is meant by "parent URL"? > - the page the PDF document is linked from > - a redirect pointing to the PDF doc > - the "directory" of the PDF URL (clip URL after last "/") > - ... > > Nutch indexes all successfully fetched pages but not redirects, 404s, etc. > Of course, pages not crawled cannot be indexed. > > Best, > Sebastian > > On 09/27/2018 11:58 AM, UMA MAHESWAR wrote: > > I am using nutch1.x for website cawing and indexing in solr(5.5.0). > > I am trying to include the parent URL along with pdf data . > > Can someone please suggest me some way to do it ? > > > > Thanks in advance for your comments and suggestions > > > > > > > > -- > > Sent from: > > Caution-http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html > > > >

