RE: Extracting XMP metadata from PDF for indexing Nutch 1.15

2020-01-27 Thread Gilvary, Joseph
Thank you, Sebastian, for updated patch and pointer to the discussion. Joe -Original Message- From: Sebastian Nagel Sent: Wednesday, January 15, 2020 5:25 AM To: user@nutch.apache.org Subject: Re: Extracting XMP metadata from PDF for indexing Nutch 1.15 Hi Joseph, sorry

Re: Extracting XMP metadata from PDF for indexing Nutch 1.15

2020-01-15 Thread Sebastian Nagel
promising. Hope you enjoy the holiday! > > Joe > > -Original Message- > From: Sebastian Nagel > Sent: Thursday, January 2, 2020 7:42 AM > To: user@nutch.apache.org > Subject: Re: Extracting XMP metadata from PDF for indexing Nutch 1.15 > > Hi Joseph, >

RE: Extracting XMP metadata from PDF for indexing Nutch 1.15

2020-01-02 Thread Gilvary, Joseph
Happy New Year, Sebastian, Thank you. That looks promising. Hope you enjoy the holiday! Joe -Original Message- From: Sebastian Nagel Sent: Thursday, January 2, 2020 7:42 AM To: user@nutch.apache.org Subject: Re: Extracting XMP metadata from PDF for indexing Nutch 1.15 Hi Joseph

Re: Extracting XMP metadata from PDF for indexing Nutch 1.15

2020-01-02 Thread Sebastian Nagel
bout the pdf:docinfo that isn't > generalized or is somehow configurable for generalization to other > namespaces. > > Thanks, > > Joe > > -----Original Message----- > From: Markus Jelsma > Sent: Tuesday, December 31, 2019 8:30 AM > To: user@nutch.apache.org >

RE: Extracting XMP metadata from PDF for indexing Nutch 1.15

2019-12-31 Thread Gilvary, Joseph
Subject: RE: Extracting XMP metadata from PDF for indexing Nutch 1.15 Hello Joseph, > Is there more documentation on having Nutch get what Tika sees into what Solr > will see? No, but i believe you would want to checkout the parsechecker and indexchecker tools. These tools display wha

RE: Extracting XMP metadata from PDF for indexing Nutch 1.15

2019-12-31 Thread Markus Jelsma
iginal message- > From:Gilvary, Joseph > Sent: Tuesday 31st December 2019 14:19 > To: user@nutch.apache.org > Subject: Extracting XMP metadata from PDF for indexing Nutch 1.15 > > Happy New Year, > > I've searched the archives and the web as best I can, tinkered with &