[jira] [Comment Edited] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level
[ https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937349#comment-17937349 ] Tilman Hausherr edited comment on PDFBOX-5976 at 3/22/25 10:31 AM: --- I just tried the patch from PDFBOX-5292 but it doesn't work on your file. was (Author: tilman): 2219 is something else... maybe you meant PDFBOX-5292 or PDFBOX-2913? I just tried the patch from PDFBOX-5292 but it doesn't work on your file. > DomXmpParser incorrectly expects namespaces on attribute level > -- > > Key: PDFBOX-5976 > URL: https://issues.apache.org/jira/browse/PDFBOX-5976 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 3.0.4 PDFBox >Reporter: Jochen Stärk >Priority: Major > Labels: xml > Attachments: AN-10005_v28_2025-03-19-2.pdf, > AN-10005_v28_2025-03-19x-1.pdf > > > When trying to determine the PDF-A-Version like > {{PDDocument document = null;}} > {{try {}} > {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}} > {{PDDocumentCatalog catalog = document.getDocumentCatalog();}} > {{PDMetadata metadata = catalog.getMetadata();}} > {{DomXmpParser xmpParser = new DomXmpParser();}} > {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}} > {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}} > {{if (pdfaSchema != null) {}} > {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}} > {{}}} > {{document.close();}} > {{} catch (XmpParsingException e) {}} > {{e.printStackTrace();}} > {{} catch (IOException e) {}} > {{e.printStackTrace();}} > {{}}} > on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox > incorrectly fails with a > > {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this > document : http://www.aiim.org/pdfa/ns/id/}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}} > {{ at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}} > {{ at de.usegroup.Main.main(Main.java:25)}} > > After manipulating the metadata stream with itext RuPS from > {{http://ns.adobe.com/pdf/1.3/"; > xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /> rdf:about="" pdf:Producer="WeasyPrint 64.1" />}} > to > {{ http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}} > {{ {{ xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}} > {{ xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}} > {{ xmlns:xmp="http://ns.adobe.com/xap/1.0/"}} > {{ pdfaid:conformance="B"}} > {{ pdfaid:part="3"}} > {{ pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 > (AGPL version) ©2000-2023 iText Group NV"}} > {{ xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}} > {{ }} > putting the namespace definition in the rdf:Description > (AN-10005_v28_2025-03-19x-1.pdf) it works. > The issue is: it should be sufficient to put the namespace definitions in the > root element, "RDF", i.e. the first example should also work. > > When searching for similar issues I had the impression this may be similar to > PDFBOX-2913. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level
[ https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937685#comment-17937685 ] Tilman Hausherr edited comment on PDFBOX-5976 at 3/23/25 9:42 AM: -- The preflight artifact is just the preflight without dependencies. preflight-app is for the preflight command line application that contains all the dependencies. You can also use 3.0.5-SNAPSHOT if you have the apache repository in your pom.xml. In the long run you should switch to VeraPDF for PDF/A checking. If you want to access the XMP stuff then you don't need preflight at all, it's xmpbox that you need. was (Author: tilman): The preflight artifact is just the preflight without dependencies. preflight-app is for the preflight command line application that contains all the dependencies. You can also use 3.0.5-SNAPSHOT if you have the apache repository in your pom.xml. In the long run you should switch to VeraPDF for PDF/A checking. > DomXmpParser incorrectly expects namespaces on attribute level > -- > > Key: PDFBOX-5976 > URL: https://issues.apache.org/jira/browse/PDFBOX-5976 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 2.0.33, 3.0.4 PDFBox >Reporter: Jochen Stärk >Assignee: Tilman Hausherr >Priority: Major > Labels: xml > Fix For: 2.0.34, 3.0.5 PDFBox, 4.0.0 > > Attachments: AN-10005_v28_2025-03-19-2.pdf, > AN-10005_v28_2025-03-19x-1.pdf > > > When trying to determine the PDF-A-Version like > {{PDDocument document = null;}} > {{try {}} > {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}} > {{PDDocumentCatalog catalog = document.getDocumentCatalog();}} > {{PDMetadata metadata = catalog.getMetadata();}} > {{DomXmpParser xmpParser = new DomXmpParser();}} > {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}} > {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}} > {{if (pdfaSchema != null) {}} > {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}} > {{}}} > {{document.close();}} > {{} catch (XmpParsingException e) {}} > {{e.printStackTrace();}} > {{} catch (IOException e) {}} > {{e.printStackTrace();}} > {{}}} > on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox > incorrectly fails with a > > {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this > document : http://www.aiim.org/pdfa/ns/id/}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}} > {{ at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}} > {{ at de.usegroup.Main.main(Main.java:25)}} > > After manipulating the metadata stream with itext RuPS from > {{http://ns.adobe.com/pdf/1.3/"; > xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /> rdf:about="" pdf:Producer="WeasyPrint 64.1" />}} > to > {{ http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}} > {{ {{ xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}} > {{ xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}} > {{ xmlns:xmp="http://ns.adobe.com/xap/1.0/"}} > {{ pdfaid:conformance="B"}} > {{ pdfaid:part="3"}} > {{ pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 > (AGPL version) ©2000-2023 iText Group NV"}} > {{ xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}} > {{ }} > putting the namespace definition in the rdf:Description > (AN-10005_v28_2025-03-19x-1.pdf) it works. > The issue is: it should be sufficient to put the namespace definitions in the > root element, "RDF", i.e. the first example should also work. > > When searching for similar issues I had the impression this may be similar to > PDFBOX-2913. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level
[ https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937685#comment-17937685 ] Tilman Hausherr edited comment on PDFBOX-5976 at 3/23/25 9:41 AM: -- The preflight artifact is just the preflight and its dependencies. preflight-app is for the preflight command line application that contains all the dependencies. You can also use 3.0.5-SNAPSHOT if you have the apache repository in your pom.xml. In the long run you should switch to VeraPDF for PDF/A checking. was (Author: tilman): The preflight artifact is just the preflight and its dependencies. preflight-app is for the preflight command line. You can also use 3.0.5-SNAPSHOT if you have the apache repository in your pom.xml. > DomXmpParser incorrectly expects namespaces on attribute level > -- > > Key: PDFBOX-5976 > URL: https://issues.apache.org/jira/browse/PDFBOX-5976 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 2.0.33, 3.0.4 PDFBox >Reporter: Jochen Stärk >Assignee: Tilman Hausherr >Priority: Major > Labels: xml > Fix For: 2.0.34, 3.0.5 PDFBox, 4.0.0 > > Attachments: AN-10005_v28_2025-03-19-2.pdf, > AN-10005_v28_2025-03-19x-1.pdf > > > When trying to determine the PDF-A-Version like > {{PDDocument document = null;}} > {{try {}} > {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}} > {{PDDocumentCatalog catalog = document.getDocumentCatalog();}} > {{PDMetadata metadata = catalog.getMetadata();}} > {{DomXmpParser xmpParser = new DomXmpParser();}} > {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}} > {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}} > {{if (pdfaSchema != null) {}} > {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}} > {{}}} > {{document.close();}} > {{} catch (XmpParsingException e) {}} > {{e.printStackTrace();}} > {{} catch (IOException e) {}} > {{e.printStackTrace();}} > {{}}} > on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox > incorrectly fails with a > > {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this > document : http://www.aiim.org/pdfa/ns/id/}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}} > {{ at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}} > {{ at de.usegroup.Main.main(Main.java:25)}} > > After manipulating the metadata stream with itext RuPS from > {{http://ns.adobe.com/pdf/1.3/"; > xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /> rdf:about="" pdf:Producer="WeasyPrint 64.1" />}} > to > {{ http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}} > {{ {{ xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}} > {{ xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}} > {{ xmlns:xmp="http://ns.adobe.com/xap/1.0/"}} > {{ pdfaid:conformance="B"}} > {{ pdfaid:part="3"}} > {{ pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 > (AGPL version) ©2000-2023 iText Group NV"}} > {{ xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}} > {{ }} > putting the namespace definition in the rdf:Description > (AN-10005_v28_2025-03-19x-1.pdf) it works. > The issue is: it should be sufficient to put the namespace definitions in the > root element, "RDF", i.e. the first example should also work. > > When searching for similar issues I had the impression this may be similar to > PDFBOX-2913. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level
[ https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937685#comment-17937685 ] Tilman Hausherr edited comment on PDFBOX-5976 at 3/23/25 9:41 AM: -- The preflight artifact is just the preflight without dependencies. preflight-app is for the preflight command line application that contains all the dependencies. You can also use 3.0.5-SNAPSHOT if you have the apache repository in your pom.xml. In the long run you should switch to VeraPDF for PDF/A checking. was (Author: tilman): The preflight artifact is just the preflight and its dependencies. preflight-app is for the preflight command line application that contains all the dependencies. You can also use 3.0.5-SNAPSHOT if you have the apache repository in your pom.xml. In the long run you should switch to VeraPDF for PDF/A checking. > DomXmpParser incorrectly expects namespaces on attribute level > -- > > Key: PDFBOX-5976 > URL: https://issues.apache.org/jira/browse/PDFBOX-5976 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 2.0.33, 3.0.4 PDFBox >Reporter: Jochen Stärk >Assignee: Tilman Hausherr >Priority: Major > Labels: xml > Fix For: 2.0.34, 3.0.5 PDFBox, 4.0.0 > > Attachments: AN-10005_v28_2025-03-19-2.pdf, > AN-10005_v28_2025-03-19x-1.pdf > > > When trying to determine the PDF-A-Version like > {{PDDocument document = null;}} > {{try {}} > {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}} > {{PDDocumentCatalog catalog = document.getDocumentCatalog();}} > {{PDMetadata metadata = catalog.getMetadata();}} > {{DomXmpParser xmpParser = new DomXmpParser();}} > {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}} > {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}} > {{if (pdfaSchema != null) {}} > {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}} > {{}}} > {{document.close();}} > {{} catch (XmpParsingException e) {}} > {{e.printStackTrace();}} > {{} catch (IOException e) {}} > {{e.printStackTrace();}} > {{}}} > on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox > incorrectly fails with a > > {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this > document : http://www.aiim.org/pdfa/ns/id/}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}} > {{ at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}} > {{ at de.usegroup.Main.main(Main.java:25)}} > > After manipulating the metadata stream with itext RuPS from > {{http://ns.adobe.com/pdf/1.3/"; > xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /> rdf:about="" pdf:Producer="WeasyPrint 64.1" />}} > to > {{ http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}} > {{ {{ xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}} > {{ xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}} > {{ xmlns:xmp="http://ns.adobe.com/xap/1.0/"}} > {{ pdfaid:conformance="B"}} > {{ pdfaid:part="3"}} > {{ pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 > (AGPL version) ©2000-2023 iText Group NV"}} > {{ xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}} > {{ }} > putting the namespace definition in the rdf:Description > (AN-10005_v28_2025-03-19x-1.pdf) it works. > The issue is: it should be sufficient to put the namespace definitions in the > root element, "RDF", i.e. the first example should also work. > > When searching for similar issues I had the impression this may be similar to > PDFBOX-2913. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level
[ https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937381#comment-17937381 ] Tilman Hausherr edited comment on PDFBOX-5976 at 3/22/25 10:31 AM: --- -I'm afraid this won't be fixed anytime soon. I have made several attempts on similar problems but wasn't successful on all.- A workaround is to use Jempbox, but that one also failed with your file (PDFBOX-5977, which I was able to fix but didn't commit) {code:java} XMPMetadata xmp = XMPMetadata.load(new ByteArrayInputStream(s.getBytes())); xmp.addXMLNSMapping(XMPSchemaPDFAId.NAMESPACE, XMPSchemaPDFAId.class); XMPSchemaPDFAId schema = (XMPSchemaPDFAId) xmp.getSchemaByClass(XMPSchemaPDFAId.class); System.out.println(schema.getConformance() + " " + schema.getPart()); {code} was (Author: tilman): I'm afraid this won't be fixed anytime soon. I have made several attempts on similar problems but wasn't successful on all. A workaround is to use Jempbox, but that one also failed with your file (PDFBOX-5977, which I was able to fix but didn't commit) {code:java} XMPMetadata xmp = XMPMetadata.load(new ByteArrayInputStream(s.getBytes())); xmp.addXMLNSMapping(XMPSchemaPDFAId.NAMESPACE, XMPSchemaPDFAId.class); XMPSchemaPDFAId schema = (XMPSchemaPDFAId) xmp.getSchemaByClass(XMPSchemaPDFAId.class); System.out.println(schema.getConformance() + " " + schema.getPart()); {code} > DomXmpParser incorrectly expects namespaces on attribute level > -- > > Key: PDFBOX-5976 > URL: https://issues.apache.org/jira/browse/PDFBOX-5976 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 3.0.4 PDFBox >Reporter: Jochen Stärk >Priority: Major > Labels: xml > Attachments: AN-10005_v28_2025-03-19-2.pdf, > AN-10005_v28_2025-03-19x-1.pdf > > > When trying to determine the PDF-A-Version like > {{PDDocument document = null;}} > {{try {}} > {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}} > {{PDDocumentCatalog catalog = document.getDocumentCatalog();}} > {{PDMetadata metadata = catalog.getMetadata();}} > {{DomXmpParser xmpParser = new DomXmpParser();}} > {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}} > {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}} > {{if (pdfaSchema != null) {}} > {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}} > {{}}} > {{document.close();}} > {{} catch (XmpParsingException e) {}} > {{e.printStackTrace();}} > {{} catch (IOException e) {}} > {{e.printStackTrace();}} > {{}}} > on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox > incorrectly fails with a > > {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this > document : http://www.aiim.org/pdfa/ns/id/}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}} > {{ at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}} > {{ at de.usegroup.Main.main(Main.java:25)}} > > After manipulating the metadata stream with itext RuPS from > {{http://ns.adobe.com/pdf/1.3/"; > xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /> rdf:about="" pdf:Producer="WeasyPrint 64.1" />}} > to > {{ http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}} > {{ {{ xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}} > {{ xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}} > {{ xmlns:xmp="http://ns.adobe.com/xap/1.0/"}} > {{ pdfaid:conformance="B"}} > {{ pdfaid:part="3"}} > {{ pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 > (AGPL version) ©2000-2023 iText Group NV"}} > {{ xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}} > {{ }} > putting the namespace definition in the rdf:Description > (AN-10005_v28_2025-03-19x-1.pdf) it works. > The issue is: it should be sufficient to put the namespace definitions in the > root element, "RDF", i.e. the first example should also work. > > When searching for similar issues I had the impression this may be similar to > PDFBOX-2913. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level
[ https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937381#comment-17937381 ] Tilman Hausherr edited comment on PDFBOX-5976 at 3/22/25 4:29 AM: -- I'm afraid this won't be fixed anytime soon. I have made several attempts on similar problems but wasn't successful on all. A workaround is to use Jempbox, but that one also failed with your file (PDFBOX-5977, which I was able to fix but didn't commit) {code:java} XMPMetadata xmp = XMPMetadata.load(new ByteArrayInputStream(s.getBytes())); xmp.addXMLNSMapping(XMPSchemaPDFAId.NAMESPACE, XMPSchemaPDFAId.class); XMPSchemaPDFAId schema = (XMPSchemaPDFAId) xmp.getSchemaByClass(XMPSchemaPDFAId.class); System.out.println(schema.getConformance() + " " + schema.getPart()); {code} was (Author: tilman): I'm afraid this won't be fixed anytime soon. I have made several attempts on similar problems but wasn't successful on all. A workaround is to use Jempbox, but that one also failed with your file :-( {code:java} XMPMetadata xmp = XMPMetadata.load(new ByteArrayInputStream(s.getBytes())); xmp.addXMLNSMapping(XMPSchemaPDFAId.NAMESPACE, XMPSchemaPDFAId.class); XMPSchemaPDFAId schema = (XMPSchemaPDFAId) xmp.getSchemaByClass(XMPSchemaPDFAId.class); System.out.println(schema.getConformance() + " " + schema.getPart()); {code} > DomXmpParser incorrectly expects namespaces on attribute level > -- > > Key: PDFBOX-5976 > URL: https://issues.apache.org/jira/browse/PDFBOX-5976 > Project: PDFBox > Issue Type: Bug > Components: XmpBox >Affects Versions: 3.0.4 PDFBox >Reporter: Jochen Stärk >Priority: Major > Labels: xml > Attachments: AN-10005_v28_2025-03-19-2.pdf, > AN-10005_v28_2025-03-19x-1.pdf > > > When trying to determine the PDF-A-Version like > {{PDDocument document = null;}} > {{try {}} > {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}} > {{PDDocumentCatalog catalog = document.getDocumentCatalog();}} > {{PDMetadata metadata = catalog.getMetadata();}} > {{DomXmpParser xmpParser = new DomXmpParser();}} > {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}} > {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}} > {{if (pdfaSchema != null) {}} > {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}} > {{}}} > {{document.close();}} > {{} catch (XmpParsingException e) {}} > {{e.printStackTrace();}} > {{} catch (IOException e) {}} > {{e.printStackTrace();}} > {{}}} > on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox > incorrectly fails with a > > {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this > document : http://www.aiim.org/pdfa/ns/id/}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}} > {{ at > org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}} > {{ at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}} > {{ at de.usegroup.Main.main(Main.java:25)}} > > After manipulating the metadata stream with itext RuPS from > {{http://ns.adobe.com/pdf/1.3/"; > xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /> rdf:about="" pdf:Producer="WeasyPrint 64.1" />}} > to > {{ http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}} > {{ {{ xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}} > {{ xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}} > {{ xmlns:xmp="http://ns.adobe.com/xap/1.0/"}} > {{ pdfaid:conformance="B"}} > {{ pdfaid:part="3"}} > {{ pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 > (AGPL version) ©2000-2023 iText Group NV"}} > {{ xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}} > {{ }} > putting the namespace definition in the rdf:Description > (AN-10005_v28_2025-03-19x-1.pdf) it works. > The issue is: it should be sufficient to put the namespace definitions in the > root element, "RDF", i.e. the first example should also work. > > When searching for similar issues I had the impression this may be similar to > your issue #2219 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org