Hi all, i have a question regarding parsing PDFA extension data with xmpbox and the DomXmpParser.
I did not find a concrete answer to it in the xmp specs. Are the value types case sensitive? We received a ZugFERD pdf and i tried to read the xmp metadata with xmpbox but it fails because of an unknown value type. The problem is the casing ("closed choice of Text"), not the actual value. (xmpbox expects an upper case C in choice) I attached a small sample test case to indicate the problem and this is the error i got: "org.apache.xmpbox.xml.XmpParsingException: Unknown property value type : closed choice of Text at org.apache.xmpbox.xml.PdfaExtensionHelper.populatePDFAPropertyType(PdfaExtensionHelper.java:193) " The PDF (where i took the meta data from) is generated by some SDK which seems to be used in the industry quite often (pdf-tools.com) Is xmp too strict here or should it be able to handle these kind of meta data too? (VeraPDF said the file is compliant) Thanks and with best regards, Peter ------------ test case --------------- @Test void testDomXmpParser() throws XmpParsingException { String data = """ <?xpacket begin='\uFEFF' id='W5M0MpCehiHzreSzNTczkc9d'?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3-Heights(TM) XMP Library 5.6.1.3 (http://www.pdf-tools.com)"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about=""> <xmp:CreatorTool>CURSOR-CRM</xmp:CreatorTool> <xmp:CreateDate>2025-03-31T10:11:14+02:00</xmp:CreateDate> <xmp:ModifyDate>2025-03-31T10:11:14+02:00</xmp:ModifyDate> <xmp:MetadataDate>2025-03-31T10:11:14+02:00</xmp:MetadataDate> </rdf:Description> <rdf:Description xmlns:zf="urn:zugferd:pdfa:CrossIndustryDocument:invoice:2p0#" rdf:about=""> <zf:DocumentType>INVOICE</zf:DocumentType> <zf:DocumentFileName>zugferd-invoice.xml</zf:DocumentFileName> <zf:Version>2p0</zf:Version> <zf:ConformanceLevel>EN 16931</zf:ConformanceLevel> </rdf:Description> <rdf:Description xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/" xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#" xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#" rdf:about=""> <pdfaExtension:schemas> <rdf:Bag> <rdf:li rdf:parseType="Resource"> <pdfaSchema:schema>ZUGFeRD PDFA Extension Schema</pdfaSchema:schema> <pdfaSchema:namespaceURI>urn:zugferd:pdfa:CrossIndustryDocument:invoice:2p0#</pdfaSchema:namespaceURI> <pdfaSchema:prefix>zf</pdfaSchema:prefix> <pdfaSchema:property> <rdf:Seq> <rdf:li rdf:parseType="Resource"> <pdfaProperty:name>DocumentType</pdfaProperty:name> <pdfaProperty:valueType>Text</pdfaProperty:valueType> <pdfaProperty:category>external</pdfaProperty:category> <pdfaProperty:description>Der Dokumententyp, enthält bei ZUGFeRD-Rechnungen immer INVOICE</pdfaProperty:description> </rdf:li> <rdf:li rdf:parseType="Resource"> <pdfaProperty:name>DocumentFileName</pdfaProperty:name> <pdfaProperty:valueType>Text</pdfaProperty:valueType> <pdfaProperty:category>external</pdfaProperty:category> <pdfaProperty:description>Der Dateiname des eingebetteten Rechnungsdatendokuments; muss identisch sein mit dem Wert des /F Eintrags im File Specification Dictionary. Bei ZUGFeRD ist dieser Wert fix zugferd-invoice.xml</pdfaProperty:description> </rdf:li> <rdf:li rdf:parseType="Resource"> <pdfaProperty:name>Version</pdfaProperty:name> <pdfaProperty:valueType>Text</pdfaProperty:valueType> <pdfaProperty:category>external</pdfaProperty:category> <pdfaProperty:description>Die Version des XML-Schemas der Rechnungsdaten</pdfaProperty:description> </rdf:li> <rdf:li rdf:parseType="Resource"> <pdfaProperty:name>ConformanceLevel</pdfaProperty:name> <pdfaProperty:valueType>closed choice of Text</pdfaProperty:valueType> <pdfaProperty:category>external</pdfaProperty:category> <pdfaProperty:description>Das Profil der XML-Rechnungsdaten entsprechend den Vorgaben von ZUGFeRD (erlaubte Werte: MINIMUM, BASIC WL, BASIC, EN 16931, EXTENDED)</pdfaProperty:description> </rdf:li> </rdf:Seq> </pdfaSchema:property> <pdfaSchema:valueType> <rdf:Seq/> </pdfaSchema:valueType> </rdf:li> </rdf:Bag> </pdfaExtension:schemas> </rdf:Description> <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about=""> <dc:format>application/pdf</dc:format> </rdf:Description> <rdf:Description xmlns:pdf="http://ns.adobe.com/pdf/1.3/" rdf:about=""> <pdf:PDFVersion>1.7</pdf:PDFVersion> <pdf:Producer>3-Heights(TM) PDF to PDF-A Converter API 5.6.1.3 (http://www.pdf-tools.com)</pdf:Producer> </rdf:Description> <rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about=""> <pdfaid:part>3</pdfaid:part> <pdfaid:conformance>U</pdfaid:conformance> </rdf:Description> </rdf:RDF> </x:xmpmeta> <?xpacket end='w'?> """; DomXmpParser xmpParser = new DomXmpParser(); xmpParser.setStrictParsing(false); XMPMetadata xmp = xmpParser.parse(data.getBytes()); } -------------------- -- Peter Nowak peter.no...@ecosio.com ecosio GmbH Lange Gasse 30 | 1080 Vienna | Austria Have you checked our status page already? VAT number: ATU68241501, FN 405017p, Commercial Court Vienna Managing Directors: Christoph Ebm, Philipp Liegl, Marco Zapletal --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org