Fixed in
https://issues.apache.org/jira/browse/PDFBOX-5668
(hopefully)

Please try a snapshot at
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.30-SNAPSHOT/

in an hour

Tilman


On 11.07.2023 11:25, Sylvere Babin wrote:

Hello,

We use PDFBox to read the XMP metadata of PDF documents in the Factur-X standard, a Franco-German e-invoicing standard.

The XML schema corresponding to this metadata is quite simple, and retrieving the values are perfectly working with the org.apache.xmpbox.XMPMetadata.getSchema(String) method.

By default, the prefix is fx :

<rdf:Description xmlns:fx="urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#" rdf:about="">

      <fx:DocumentType>INVOICE</fx:DocumentType>

      <fx:DocumentFileName>factur-x.xml</fx:DocumentFileName>

      <fx:Version>1.0</fx:Version>

      <fx:ConformanceLevel>BASIC</fx:ConformanceLevel>

</rdf:Description>

In one case, there were a document with two schemas with the same namespace URI, but different prefixes (fx and zf)

I tried the org.apache.xmpbox.XMPMetadata.getSchema(String, String) method, which according to the documentation seems to handle this case by filtering by prefix.

I got a NullPointerException from this method (line 268), because the prefix of the Factur-x schema in the org.apache.xmpbox.XMPMetadata.schemas collection was null.

So, I've run tests with a hundred example files provided by the Factur-X consortium, and it seems that for any file, the schema with the Factur-X URI always gets a null prefix, regardless of whether one or more schemas exist with this namespace.

This raise two points :

 1. If the prefix can be null, the getSchema(String, String) method
    should handle it.
 2. Is the Factur-X metadata specification a correct XMP standard, or
    is there a bug in the prefix parsing ?

Here’s the PDF document : Icône pdf pdfExemple.pdf <https://cegidgroup-my.sharepoint.com/:b:/g/personal/sbabin_cegid_com/EVN8vpGbR1pEvaOuoIjyvfQBuhV1ZWFlYfAIKMfuAhd6Aw?e=cahEv2>

Here’s the code I use to retrieve the Factur-X metadata values :

import java.io.File;

import java.io.IOException;

import java.io.InputStream;

import org.apache.pdfbox.pdmodel.PDDocument;

import org.apache.pdfbox.pdmodel.PDDocumentCatalog;

import org.apache.pdfbox.pdmodel.common.PDMetadata;

import org.apache.xmpbox.XMPMetadata;

import org.apache.xmpbox.schema.XMPSchema;

import org.apache.xmpbox.xml.DomXmpParser;

import org.apache.xmpbox.xml.XmpParsingException;

public class FacturX {

       public static void main(String[] args) throws XmpParsingException, IOException {

try {

File finputFile = new File(args[0]);

PDDocument doc = PDDocument.load(finputFile);

PDDocumentCatalog catalog = doc.getDocumentCatalog();

PDMetadata m = catalog.getMetadata();

InputStream xmlInputStream = m.createInputStream();

DomXmpParser p = new DomXmpParser();

p.setStrictParsing(false);

XMPMetadata metadata = p.parse(xmlInputStream);

// Getting the factur-x schema with the default "fx" prefix (case of two factur-x schemas with different prefixes)

XMPSchema fx = metadata.getSchema("fx", "urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#");

// If there is no schema with fx prefix, searching for the schema only with the namespace URI

if (fx == null) {

fx = metadata.getSchema("urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#");

}

if (fx == null) {

System.out.println("This PDF document is not a valid Factur-X file");

} else {

String conformanceLevel = fx.getUnqualifiedTextPropertyValue("ConformanceLevel");

                String documentType = fx.getUnqualifiedTextPropertyValue("DocumentType");

                String version = fx.getUnqualifiedTextPropertyValue("Version");

                String documentFileName = fx.getUnqualifiedTextPropertyValue("DocumentFileName");

}

             } catch (XmpParsingException | IOException e) {

e.printStackTrace();

             }

       }

}

Thanks for your help,

*Sylvère Babin*
Developer



Cegid est susceptible d’effectuer un traitement sur vos données personnelles à des fins de gestion de notre relation commerciale. Pour plus d’information, consultez https://www.cegid.com/fr/privacy-policy Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de ses destinataires. Toute utilisation ou diffusion, même partielle, non autorisée est interdite. Tout message électronique est susceptible d'altération; Cegid décline donc toute responsabilité au titre de ce message. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

Cegid may process your personal data for the purpose of our business relationship management. For more information, please visit our website https://www.cegid.com/en/privacy-policy This message and any attachments are confidential and intended solely for the addressees. Any unauthorized use or disclosure, either whole or partial is prohibited. E-mails are susceptible to alteration; Cegid shall therefore not be liable for the content of this message. If you are not the intended recipient of this message, please delete it and notify the sender.

Reply via email to