[jira] [Comment Edited] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level

2025-03-24 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937349#comment-17937349
 ] 

Tilman Hausherr edited comment on PDFBOX-5976 at 3/22/25 10:31 AM:
---

I just tried the patch from PDFBOX-5292 but it doesn't work on your file.


was (Author: tilman):
2219 is something else... maybe you meant PDFBOX-5292 or PDFBOX-2913? I just 
tried the patch from PDFBOX-5292 but it doesn't work on your file.

> DomXmpParser incorrectly expects namespaces on attribute level
> --
>
> Key: PDFBOX-5976
> URL: https://issues.apache.org/jira/browse/PDFBOX-5976
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 3.0.4 PDFBox
>Reporter: Jochen Stärk
>Priority: Major
>  Labels: xml
> Attachments: AN-10005_v28_2025-03-19-2.pdf, 
> AN-10005_v28_2025-03-19x-1.pdf
>
>
> When trying to determine the PDF-A-Version like 
> {{PDDocument document = null;}}
> {{try {}}
> {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}}
> {{PDDocumentCatalog catalog = document.getDocumentCatalog();}}
> {{PDMetadata metadata = catalog.getMetadata();}}
> {{DomXmpParser xmpParser = new DomXmpParser();}}
> {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}}
> {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}}
> {{if (pdfaSchema != null) {}}
> {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}}
> {{}}}
> {{document.close();}}
> {{} catch (XmpParsingException e) {}}
> {{e.printStackTrace();}}
> {{} catch (IOException e) {}}
> {{e.printStackTrace();}}
> {{}}}
> on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox
> incorrectly fails with a 
>  
> {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this 
> document : http://www.aiim.org/pdfa/ns/id/}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}}
> {{    at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}}
> {{    at de.usegroup.Main.main(Main.java:25)}}
>  
> After manipulating the metadata stream with itext RuPS from 
> {{http://ns.adobe.com/pdf/1.3/"; 
> xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; 
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /> rdf:about="" pdf:Producer="WeasyPrint 64.1" />}}
> to
> {{  http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}}
> {{     {{        xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}}
> {{        xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}}
> {{        xmlns:xmp="http://ns.adobe.com/xap/1.0/"}}
> {{      pdfaid:conformance="B"}}
> {{      pdfaid:part="3"}}
> {{      pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 
> (AGPL version) ©2000-2023 iText Group NV"}}
> {{      xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}}
> {{  }}
> putting the namespace definition in the rdf:Description 
> (AN-10005_v28_2025-03-19x-1.pdf) it works. 
> The issue is: it should be sufficient to put the namespace definitions in the 
> root element, "RDF", i.e. the first example should also work.
>  
> When searching for similar issues I had the impression this may be similar to 
> PDFBOX-2913.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level

2025-03-23 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937685#comment-17937685
 ] 

Tilman Hausherr edited comment on PDFBOX-5976 at 3/23/25 9:42 AM:
--

The preflight artifact is just the preflight without dependencies. 
preflight-app is for the preflight command line application that contains all 
the dependencies. You can also use 3.0.5-SNAPSHOT if you have the apache 
repository in your pom.xml.

In the long run you should switch to VeraPDF for PDF/A checking.

If you want to access the XMP stuff then you don't need preflight at all, it's 
xmpbox that you need.


was (Author: tilman):
The preflight artifact is just the preflight without dependencies. 
preflight-app is for the preflight command line application that contains all 
the dependencies. You can also use 3.0.5-SNAPSHOT if you have the apache 
repository in your pom.xml.

In the long run you should switch to VeraPDF for PDF/A checking.

> DomXmpParser incorrectly expects namespaces on attribute level
> --
>
> Key: PDFBOX-5976
> URL: https://issues.apache.org/jira/browse/PDFBOX-5976
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.33, 3.0.4 PDFBox
>Reporter: Jochen Stärk
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: xml
> Fix For: 2.0.34, 3.0.5 PDFBox, 4.0.0
>
> Attachments: AN-10005_v28_2025-03-19-2.pdf, 
> AN-10005_v28_2025-03-19x-1.pdf
>
>
> When trying to determine the PDF-A-Version like 
> {{PDDocument document = null;}}
> {{try {}}
> {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}}
> {{PDDocumentCatalog catalog = document.getDocumentCatalog();}}
> {{PDMetadata metadata = catalog.getMetadata();}}
> {{DomXmpParser xmpParser = new DomXmpParser();}}
> {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}}
> {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}}
> {{if (pdfaSchema != null) {}}
> {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}}
> {{}}}
> {{document.close();}}
> {{} catch (XmpParsingException e) {}}
> {{e.printStackTrace();}}
> {{} catch (IOException e) {}}
> {{e.printStackTrace();}}
> {{}}}
> on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox
> incorrectly fails with a 
>  
> {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this 
> document : http://www.aiim.org/pdfa/ns/id/}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}}
> {{    at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}}
> {{    at de.usegroup.Main.main(Main.java:25)}}
>  
> After manipulating the metadata stream with itext RuPS from 
> {{http://ns.adobe.com/pdf/1.3/"; 
> xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; 
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /> rdf:about="" pdf:Producer="WeasyPrint 64.1" />}}
> to
> {{  http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}}
> {{     {{        xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}}
> {{        xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}}
> {{        xmlns:xmp="http://ns.adobe.com/xap/1.0/"}}
> {{      pdfaid:conformance="B"}}
> {{      pdfaid:part="3"}}
> {{      pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 
> (AGPL version) ©2000-2023 iText Group NV"}}
> {{      xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}}
> {{  }}
> putting the namespace definition in the rdf:Description 
> (AN-10005_v28_2025-03-19x-1.pdf) it works. 
> The issue is: it should be sufficient to put the namespace definitions in the 
> root element, "RDF", i.e. the first example should also work.
>  
> When searching for similar issues I had the impression this may be similar to 
> PDFBOX-2913.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level

2025-03-23 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937685#comment-17937685
 ] 

Tilman Hausherr edited comment on PDFBOX-5976 at 3/23/25 9:41 AM:
--

The preflight artifact is just the preflight and its dependencies. 
preflight-app is for the preflight command line application that contains all 
the dependencies. You can also use 3.0.5-SNAPSHOT if you have the apache 
repository in your pom.xml.

In the long run you should switch to VeraPDF for PDF/A checking.


was (Author: tilman):
The preflight artifact is just the preflight and its dependencies. 
preflight-app is for the preflight command line. You can also use 
3.0.5-SNAPSHOT if you have the apache repository in your pom.xml.

> DomXmpParser incorrectly expects namespaces on attribute level
> --
>
> Key: PDFBOX-5976
> URL: https://issues.apache.org/jira/browse/PDFBOX-5976
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.33, 3.0.4 PDFBox
>Reporter: Jochen Stärk
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: xml
> Fix For: 2.0.34, 3.0.5 PDFBox, 4.0.0
>
> Attachments: AN-10005_v28_2025-03-19-2.pdf, 
> AN-10005_v28_2025-03-19x-1.pdf
>
>
> When trying to determine the PDF-A-Version like 
> {{PDDocument document = null;}}
> {{try {}}
> {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}}
> {{PDDocumentCatalog catalog = document.getDocumentCatalog();}}
> {{PDMetadata metadata = catalog.getMetadata();}}
> {{DomXmpParser xmpParser = new DomXmpParser();}}
> {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}}
> {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}}
> {{if (pdfaSchema != null) {}}
> {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}}
> {{}}}
> {{document.close();}}
> {{} catch (XmpParsingException e) {}}
> {{e.printStackTrace();}}
> {{} catch (IOException e) {}}
> {{e.printStackTrace();}}
> {{}}}
> on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox
> incorrectly fails with a 
>  
> {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this 
> document : http://www.aiim.org/pdfa/ns/id/}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}}
> {{    at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}}
> {{    at de.usegroup.Main.main(Main.java:25)}}
>  
> After manipulating the metadata stream with itext RuPS from 
> {{http://ns.adobe.com/pdf/1.3/"; 
> xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; 
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /> rdf:about="" pdf:Producer="WeasyPrint 64.1" />}}
> to
> {{  http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}}
> {{     {{        xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}}
> {{        xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}}
> {{        xmlns:xmp="http://ns.adobe.com/xap/1.0/"}}
> {{      pdfaid:conformance="B"}}
> {{      pdfaid:part="3"}}
> {{      pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 
> (AGPL version) ©2000-2023 iText Group NV"}}
> {{      xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}}
> {{  }}
> putting the namespace definition in the rdf:Description 
> (AN-10005_v28_2025-03-19x-1.pdf) it works. 
> The issue is: it should be sufficient to put the namespace definitions in the 
> root element, "RDF", i.e. the first example should also work.
>  
> When searching for similar issues I had the impression this may be similar to 
> PDFBOX-2913.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level

2025-03-23 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937685#comment-17937685
 ] 

Tilman Hausherr edited comment on PDFBOX-5976 at 3/23/25 9:41 AM:
--

The preflight artifact is just the preflight without dependencies. 
preflight-app is for the preflight command line application that contains all 
the dependencies. You can also use 3.0.5-SNAPSHOT if you have the apache 
repository in your pom.xml.

In the long run you should switch to VeraPDF for PDF/A checking.


was (Author: tilman):
The preflight artifact is just the preflight and its dependencies. 
preflight-app is for the preflight command line application that contains all 
the dependencies. You can also use 3.0.5-SNAPSHOT if you have the apache 
repository in your pom.xml.

In the long run you should switch to VeraPDF for PDF/A checking.

> DomXmpParser incorrectly expects namespaces on attribute level
> --
>
> Key: PDFBOX-5976
> URL: https://issues.apache.org/jira/browse/PDFBOX-5976
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 2.0.33, 3.0.4 PDFBox
>Reporter: Jochen Stärk
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: xml
> Fix For: 2.0.34, 3.0.5 PDFBox, 4.0.0
>
> Attachments: AN-10005_v28_2025-03-19-2.pdf, 
> AN-10005_v28_2025-03-19x-1.pdf
>
>
> When trying to determine the PDF-A-Version like 
> {{PDDocument document = null;}}
> {{try {}}
> {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}}
> {{PDDocumentCatalog catalog = document.getDocumentCatalog();}}
> {{PDMetadata metadata = catalog.getMetadata();}}
> {{DomXmpParser xmpParser = new DomXmpParser();}}
> {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}}
> {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}}
> {{if (pdfaSchema != null) {}}
> {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}}
> {{}}}
> {{document.close();}}
> {{} catch (XmpParsingException e) {}}
> {{e.printStackTrace();}}
> {{} catch (IOException e) {}}
> {{e.printStackTrace();}}
> {{}}}
> on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox
> incorrectly fails with a 
>  
> {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this 
> document : http://www.aiim.org/pdfa/ns/id/}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}}
> {{    at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}}
> {{    at de.usegroup.Main.main(Main.java:25)}}
>  
> After manipulating the metadata stream with itext RuPS from 
> {{http://ns.adobe.com/pdf/1.3/"; 
> xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; 
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /> rdf:about="" pdf:Producer="WeasyPrint 64.1" />}}
> to
> {{  http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}}
> {{     {{        xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}}
> {{        xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}}
> {{        xmlns:xmp="http://ns.adobe.com/xap/1.0/"}}
> {{      pdfaid:conformance="B"}}
> {{      pdfaid:part="3"}}
> {{      pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 
> (AGPL version) ©2000-2023 iText Group NV"}}
> {{      xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}}
> {{  }}
> putting the namespace definition in the rdf:Description 
> (AN-10005_v28_2025-03-19x-1.pdf) it works. 
> The issue is: it should be sufficient to put the namespace definitions in the 
> root element, "RDF", i.e. the first example should also work.
>  
> When searching for similar issues I had the impression this may be similar to 
> PDFBOX-2913.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level

2025-03-22 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937381#comment-17937381
 ] 

Tilman Hausherr edited comment on PDFBOX-5976 at 3/22/25 10:31 AM:
---

-I'm afraid this won't be fixed anytime soon. I have made several attempts on 
similar problems but wasn't successful on all.-
A workaround is to use Jempbox, but that one also failed with your file 
(PDFBOX-5977, which I was able to fix but didn't commit)
{code:java}
XMPMetadata xmp = XMPMetadata.load(new 
ByteArrayInputStream(s.getBytes()));
xmp.addXMLNSMapping(XMPSchemaPDFAId.NAMESPACE, XMPSchemaPDFAId.class);
XMPSchemaPDFAId schema = (XMPSchemaPDFAId) 
xmp.getSchemaByClass(XMPSchemaPDFAId.class);
System.out.println(schema.getConformance() + " " + schema.getPart());
{code}


was (Author: tilman):
I'm afraid this won't be fixed anytime soon. I have made several attempts on 
similar problems but wasn't successful on all.
A workaround is to use Jempbox, but that one also failed with your file 
(PDFBOX-5977, which I was able to fix but didn't commit)
{code:java}
XMPMetadata xmp = XMPMetadata.load(new 
ByteArrayInputStream(s.getBytes()));
xmp.addXMLNSMapping(XMPSchemaPDFAId.NAMESPACE, XMPSchemaPDFAId.class);
XMPSchemaPDFAId schema = (XMPSchemaPDFAId) 
xmp.getSchemaByClass(XMPSchemaPDFAId.class);
System.out.println(schema.getConformance() + " " + schema.getPart());
{code}

> DomXmpParser incorrectly expects namespaces on attribute level
> --
>
> Key: PDFBOX-5976
> URL: https://issues.apache.org/jira/browse/PDFBOX-5976
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 3.0.4 PDFBox
>Reporter: Jochen Stärk
>Priority: Major
>  Labels: xml
> Attachments: AN-10005_v28_2025-03-19-2.pdf, 
> AN-10005_v28_2025-03-19x-1.pdf
>
>
> When trying to determine the PDF-A-Version like 
> {{PDDocument document = null;}}
> {{try {}}
> {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}}
> {{PDDocumentCatalog catalog = document.getDocumentCatalog();}}
> {{PDMetadata metadata = catalog.getMetadata();}}
> {{DomXmpParser xmpParser = new DomXmpParser();}}
> {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}}
> {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}}
> {{if (pdfaSchema != null) {}}
> {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}}
> {{}}}
> {{document.close();}}
> {{} catch (XmpParsingException e) {}}
> {{e.printStackTrace();}}
> {{} catch (IOException e) {}}
> {{e.printStackTrace();}}
> {{}}}
> on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox
> incorrectly fails with a 
>  
> {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this 
> document : http://www.aiim.org/pdfa/ns/id/}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}}
> {{    at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}}
> {{    at de.usegroup.Main.main(Main.java:25)}}
>  
> After manipulating the metadata stream with itext RuPS from 
> {{http://ns.adobe.com/pdf/1.3/"; 
> xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; 
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /> rdf:about="" pdf:Producer="WeasyPrint 64.1" />}}
> to
> {{  http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}}
> {{     {{        xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}}
> {{        xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}}
> {{        xmlns:xmp="http://ns.adobe.com/xap/1.0/"}}
> {{      pdfaid:conformance="B"}}
> {{      pdfaid:part="3"}}
> {{      pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 
> (AGPL version) ©2000-2023 iText Group NV"}}
> {{      xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}}
> {{  }}
> putting the namespace definition in the rdf:Description 
> (AN-10005_v28_2025-03-19x-1.pdf) it works. 
> The issue is: it should be sufficient to put the namespace definitions in the 
> root element, "RDF", i.e. the first example should also work.
>  
> When searching for similar issues I had the impression this may be similar to 
> PDFBOX-2913.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5976) DomXmpParser incorrectly expects namespaces on attribute level

2025-03-21 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937381#comment-17937381
 ] 

Tilman Hausherr edited comment on PDFBOX-5976 at 3/22/25 4:29 AM:
--

I'm afraid this won't be fixed anytime soon. I have made several attempts on 
similar problems but wasn't successful on all.
A workaround is to use Jempbox, but that one also failed with your file 
(PDFBOX-5977, which I was able to fix but didn't commit)
{code:java}
XMPMetadata xmp = XMPMetadata.load(new 
ByteArrayInputStream(s.getBytes()));
xmp.addXMLNSMapping(XMPSchemaPDFAId.NAMESPACE, XMPSchemaPDFAId.class);
XMPSchemaPDFAId schema = (XMPSchemaPDFAId) 
xmp.getSchemaByClass(XMPSchemaPDFAId.class);
System.out.println(schema.getConformance() + " " + schema.getPart());
{code}


was (Author: tilman):
I'm afraid this won't be fixed anytime soon. I have made several attempts on 
similar problems but wasn't successful on all.
A workaround is to use Jempbox, but that one also failed with your file :-(
{code:java}
XMPMetadata xmp = XMPMetadata.load(new 
ByteArrayInputStream(s.getBytes()));
xmp.addXMLNSMapping(XMPSchemaPDFAId.NAMESPACE, XMPSchemaPDFAId.class);
XMPSchemaPDFAId schema = (XMPSchemaPDFAId) 
xmp.getSchemaByClass(XMPSchemaPDFAId.class);
System.out.println(schema.getConformance() + " " + schema.getPart());
{code}

> DomXmpParser incorrectly expects namespaces on attribute level
> --
>
> Key: PDFBOX-5976
> URL: https://issues.apache.org/jira/browse/PDFBOX-5976
> Project: PDFBox
>  Issue Type: Bug
>  Components: XmpBox
>Affects Versions: 3.0.4 PDFBox
>Reporter: Jochen Stärk
>Priority: Major
>  Labels: xml
> Attachments: AN-10005_v28_2025-03-19-2.pdf, 
> AN-10005_v28_2025-03-19x-1.pdf
>
>
> When trying to determine the PDF-A-Version like 
> {{PDDocument document = null;}}
> {{try {}}
> {{document = Loader.loadPDF(new File("AN-10005_v28_2025-03-19.pdf"));}}
> {{PDDocumentCatalog catalog = document.getDocumentCatalog();}}
> {{PDMetadata metadata = catalog.getMetadata();}}
> {{DomXmpParser xmpParser = new DomXmpParser();}}
> {{XMPMetadata xmp = xmpParser.parse(metadata.createInputStream());}}
> {{PDFAIdentificationSchema pdfaSchema = xmp.getPDFAIdentificationSchema();}}
> {{if (pdfaSchema != null) {}}
> {{System.out.println("It's a PDF A-" + pdfaSchema.getPart());}}
> {{}}}
> {{document.close();}}
> {{} catch (XmpParsingException e) {}}
> {{e.printStackTrace();}}
> {{} catch (IOException e) {}}
> {{e.printStackTrace();}}
> {{}}}
> on the attached (and valid) PDF A-3b AN-10005_v28_2025-03-19-2.pdf, PDFBox
> incorrectly fails with a 
>  
> {{org.apache.xmpbox.xml.XmpParsingException: Schema is not set in this 
> document : http://www.aiim.org/pdfa/ns/id/}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:920)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRootAttr(DomXmpParser.java:276)}}
> {{    at 
> org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:247)}}
> {{    at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:201)}}
> {{    at de.usegroup.Main.main(Main.java:25)}}
>  
> After manipulating the metadata stream with itext RuPS from 
> {{http://ns.adobe.com/pdf/1.3/"; 
> xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"; 
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";> rdf:about="" pdfaid:part="3" pdfaid:conformance="B" /> rdf:about="" pdf:Producer="WeasyPrint 64.1" />}}
> to
> {{  http://www.w3.org/1999/02/22-rdf-syntax-ns#";>}}
> {{     {{        xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"}}
> {{        xmlns:pdf="http://ns.adobe.com/pdf/1.3/"}}
> {{        xmlns:xmp="http://ns.adobe.com/xap/1.0/"}}
> {{      pdfaid:conformance="B"}}
> {{      pdfaid:part="3"}}
> {{      pdf:Producer="WeasyPrint 64.1; modified using iText® Core 7.2.5 
> (AGPL version) ©2000-2023 iText Group NV"}}
> {{      xmp:ModifyDate="2025-03-21T08:16:58+01:00"/>}}
> {{  }}
> putting the namespace definition in the rdf:Description 
> (AN-10005_v28_2025-03-19x-1.pdf) it works. 
> The issue is: it should be sufficient to put the namespace definitions in the 
> root element, "RDF", i.e. the first example should also work.
>  
> When searching for similar issues I had the impression this may be similar to 
> your issue #2219



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org