Hi Musshorn,

You can take a look at http://nutch.apache.org/mailing_lists.html on how to
unsubscribe from the mailing list. Send an email to
[email protected].

Best Regards,
Jorge

On Fri, Sep 28, 2018 at 1:24 PM Musshorn, Kris T CTR USARMY CECOM (US) <
[email protected]> wrote:

> Please remove me from this list
>
> -----Original Message-----
> From: Sebastian Nagel [mailto:[email protected]]
> Sent: Friday, September 28, 2018 2:25 AM
> To: [email protected]
> Subject: [Non-DoD Source] Re: Include parent URL in pdf data - nutch
>
> All active links contained in this email were disabled.  Please verify the
> identity of the sender, and confirm the authenticity of all links contained
> within the message prior to copying and pasting the address to a Web
> browser.
>
>
>
>
> ----
>
> Hi,
>
> could you explain in detail what is meant by "parent URL"?
> - the page the PDF document is linked from
> - a redirect pointing to the PDF doc
> - the "directory" of the PDF URL (clip URL after last "/")
> - ...
>
> Nutch indexes all successfully fetched pages but not redirects, 404s, etc.
> Of course, pages not crawled cannot be indexed.
>
> Best,
> Sebastian
>
> On 09/27/2018 11:58 AM, UMA MAHESWAR wrote:
> > I am using nutch1.x for website cawing and indexing in solr(5.5.0).
> > I am trying to include the parent URL along with pdf data .
> > Can someone please suggest me some way to do it ?
> >
> > Thanks in advance for your comments and suggestions
> >
> >
> >
> > --
> > Sent from:
> > Caution-http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html
> >
>
>

Reply via email to