Hi all,

i have a question regarding parsing PDFA extension data with xmpbox
and the DomXmpParser.

I did not find a concrete answer to it in the xmp specs.
Are the value types case sensitive? We received a ZugFERD pdf and i
tried to read the xmp metadata with xmpbox but it fails because of an
unknown value type. The problem is the casing ("closed choice of
Text"), not the actual value. (xmpbox expects an upper case C in
choice)

I attached a small sample test case to indicate the problem and this
is the error i got:
"org.apache.xmpbox.xml.XmpParsingException: Unknown property value
type : closed choice of Text
at 
org.apache.xmpbox.xml.PdfaExtensionHelper.populatePDFAPropertyType(PdfaExtensionHelper.java:193)
"

The PDF (where i took the meta data from) is generated by some SDK
which seems to be used in the industry quite often (pdf-tools.com)

Is xmp too strict here or should it be able to handle these kind of
meta data too? (VeraPDF said the file is compliant)


Thanks and with best regards,
Peter

------------ test case ---------------

@Test
void testDomXmpParser() throws XmpParsingException
{
  String data = """
      <?xpacket begin='\uFEFF' id='W5M0MpCehiHzreSzNTczkc9d'?>
      <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3-Heights(TM) XMP
Library 5.6.1.3 (http://www.pdf-tools.com)">
        <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";>
          <rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/";
rdf:about="">
            <xmp:CreatorTool>CURSOR-CRM</xmp:CreatorTool>
            <xmp:CreateDate>2025-03-31T10:11:14+02:00</xmp:CreateDate>
            <xmp:ModifyDate>2025-03-31T10:11:14+02:00</xmp:ModifyDate>
            <xmp:MetadataDate>2025-03-31T10:11:14+02:00</xmp:MetadataDate>
          </rdf:Description>
          <rdf:Description
xmlns:zf="urn:zugferd:pdfa:CrossIndustryDocument:invoice:2p0#"
rdf:about="">
            <zf:DocumentType>INVOICE</zf:DocumentType>
            <zf:DocumentFileName>zugferd-invoice.xml</zf:DocumentFileName>
            <zf:Version>2p0</zf:Version>
            <zf:ConformanceLevel>EN 16931</zf:ConformanceLevel>
          </rdf:Description>
          <rdf:Description
xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/";
xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#";
xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#";
rdf:about="">
            <pdfaExtension:schemas>
              <rdf:Bag>
                <rdf:li rdf:parseType="Resource">
                  <pdfaSchema:schema>ZUGFeRD PDFA Extension
Schema</pdfaSchema:schema>

<pdfaSchema:namespaceURI>urn:zugferd:pdfa:CrossIndustryDocument:invoice:2p0#</pdfaSchema:namespaceURI>
                  <pdfaSchema:prefix>zf</pdfaSchema:prefix>
                  <pdfaSchema:property>
                    <rdf:Seq>
                      <rdf:li rdf:parseType="Resource">
                        <pdfaProperty:name>DocumentType</pdfaProperty:name>
                        <pdfaProperty:valueType>Text</pdfaProperty:valueType>
                        <pdfaProperty:category>external</pdfaProperty:category>
                        <pdfaProperty:description>Der Dokumententyp,
enthält bei ZUGFeRD-Rechnungen immer
INVOICE</pdfaProperty:description>
                      </rdf:li>
                      <rdf:li rdf:parseType="Resource">
                        <pdfaProperty:name>DocumentFileName</pdfaProperty:name>
                        <pdfaProperty:valueType>Text</pdfaProperty:valueType>
                        <pdfaProperty:category>external</pdfaProperty:category>
                        <pdfaProperty:description>Der Dateiname des
eingebetteten Rechnungsdatendokuments;
      muss identisch sein mit dem Wert des /F Eintrags im File
Specification Dictionary.
      Bei ZUGFeRD ist dieser Wert fix
zugferd-invoice.xml</pdfaProperty:description>
                      </rdf:li>
                      <rdf:li rdf:parseType="Resource">
                        <pdfaProperty:name>Version</pdfaProperty:name>
                        <pdfaProperty:valueType>Text</pdfaProperty:valueType>
                        <pdfaProperty:category>external</pdfaProperty:category>
                        <pdfaProperty:description>Die Version des
XML-Schemas der Rechnungsdaten</pdfaProperty:description>
                      </rdf:li>
                      <rdf:li rdf:parseType="Resource">
                        <pdfaProperty:name>ConformanceLevel</pdfaProperty:name>
                        <pdfaProperty:valueType>closed choice of
Text</pdfaProperty:valueType>
                        <pdfaProperty:category>external</pdfaProperty:category>
                        <pdfaProperty:description>Das Profil der
XML-Rechnungsdaten entsprechend den Vorgaben von ZUGFeRD
      (erlaubte Werte: MINIMUM, BASIC WL, BASIC, EN 16931,
EXTENDED)</pdfaProperty:description>
                      </rdf:li>
                    </rdf:Seq>
                  </pdfaSchema:property>
                  <pdfaSchema:valueType>
                    <rdf:Seq/>
                  </pdfaSchema:valueType>
                </rdf:li>
              </rdf:Bag>
            </pdfaExtension:schemas>
          </rdf:Description>
          <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/";
rdf:about="">
            <dc:format>application/pdf</dc:format>
          </rdf:Description>
          <rdf:Description xmlns:pdf="http://ns.adobe.com/pdf/1.3/";
rdf:about="">
            <pdf:PDFVersion>1.7</pdf:PDFVersion>
            <pdf:Producer>3-Heights(TM) PDF to PDF-A Converter API
5.6.1.3 (http://www.pdf-tools.com)</pdf:Producer>
          </rdf:Description>
          <rdf:Description
xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; rdf:about="">
            <pdfaid:part>3</pdfaid:part>
            <pdfaid:conformance>U</pdfaid:conformance>
          </rdf:Description>
        </rdf:RDF>
      </x:xmpmeta>
      <?xpacket end='w'?>
      """;

  DomXmpParser xmpParser = new DomXmpParser();
  xmpParser.setStrictParsing(false);
  XMPMetadata xmp = xmpParser.parse(data.getBytes());
}
--------------------


-- 
Peter Nowak
peter.no...@ecosio.com
ecosio GmbH
Lange Gasse 30 | 1080 Vienna | Austria
Have you checked our status page already?
VAT number: ATU68241501, FN 405017p, Commercial Court Vienna
Managing Directors: Christoph Ebm, Philipp Liegl, Marco Zapletal

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to