Am 09.07.2015 um 15:35 schrieb Allison, Timothy B.:
 From my perspective, it would be great to have a general xmp parser that also 
allows for some variance from spec (PDFBOX-2855).  We've been using jempbox for 
pdfs as well as images over on Tika, and it has worked well for us.

I'd prefer to continue using your xmp parser, but I understand if you need to 
limit what you're willing to take on.

I'll take a look at xmlgraphics, and I'll discuss the fallback option with Tika 
devs about moving jempbox into Tika.

I had a quick look at xmlgraphics xmp, it would also required extra implementation.

I don't mind having it in xmpbox (we have some non-PDF stuff at other places too), we just need a schema definition. Or the most complex possible file with that namespace. "All" there is to do then is to add a file in org.apache.xmpbox.schema.

Tilman


Thank you.

Cheers,

                Tim

-----Original Message-----
From: Maruan Sahyoun [mailto:[email protected]]
Sent: Thursday, July 09, 2015 4:56 AM
To: [email protected]
Subject: Re: DomXmpParser: namespace not found

Hi,

Am 08.07.2015 um 22:42 schrieb Tilman Hausherr <[email protected]>:

Am 08.07.2015 um 17:22 schrieb Allison, Timothy B.:
All,
Apologies for the idiocy I'm about to reveal (well, that won't be a revelation 
to anyone, really), but is there an obvious solution for this kind of error:

Caused by: org.apache.xmpbox.xml.XmpParsingException: Cannot find a definition 
for the namespace http://ns.adobe.com/lightroom/1.0/
                 at 
org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:848)
                 at 
org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:290)
                 at 
org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:234)
                 at 
org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:198)
                 at 
org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:105)
                 at 
org.apache.tika.parser.image.xmp.JempboxExtractor.parse(JempboxExtractor.java:59)

On a handful of image files in our test docs on Tika, I'm getting this with:

http://ns.adobe.com/lightroom/1.0/
http://ns.adobe.com/exif/1.0/aux/

These namespaces are not supported by xmpbox. We've had this problem with 
another namespace (I can't remember which one), and it wasn't possible to 
support it because we couldn't find a schema definition.

But you say these are image files. So this isn't about pdf xmp.
xmpbox is targeted around PDF/A-1. So I'd think we should discuss to extend it 
to support other PDF standard meta data requirements as well as generic XMP use 
cases to again have a generic XMP library. OTOH there is 
org.apache.xmlgraphics.xmp

WDYT?

BR
Maruan


Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to