Yep, that's a problem.  Thank you!

https://issues.apache.org/jira/browse/TIKA-3101

On Mon, May 11, 2020 at 2:24 PM Tim Allison <talli...@apache.org> wrote:

> Thank you for letting us know about this and sharing a file.  My belief is
> that we should be trusting the XMP metadata over the PDFInfo for DC
> metadata keys like TikaCoreProperties.CREATED.  I'll take a look.
>
> On Mon, May 11, 2020 at 11:40 AM Tucker B <barb...@gmail.com> wrote:
>
>> I have a PDF with XMP metadata with two rdf:Description tags with
>> different namespaces. The first namespace is DublinCore the other is
>> XMPSchemaBasic. I can confirm jempbox is able to read the XMP metadata
>> properly and properly identify the namespaces. However, it appears the
>> PDFParser in Tika is not adding XMPSchemaBasic metadata to the extracted
>> metadata, specifically the CreateDate. I'm curious if this is expected
>> behaviour. Ideally, the PDFParser would set the TikaCoreProperties.CREATED
>> to the value in the XMP metadata absent the presence of a created date in
>> the PDDocumentInformation. Or at least a Property such as "xmp:CreateDate".
>> I've attached the XMP packet and a PDF with the XMP metadata. I'm using
>> Tika 1.24.1 Any help or guidance would be greatly appreciated.
>>
>> Also, I noticed the XMP packet id is "W5M0MpCehiHzreSzNTczkc9d" which is
>> base64 encoded string "[42!573]". Curious if anyone knows the
>> significance of this.
>>
>

Reply via email to