Am 14.08.2025 um 16:24 schrieb Poisson, David (DGRI):
Here are the PDF's in question (didn't want to add 3 PDF's to the email, so 
here's a link to my google drive's folder that has all 3 PDF's):
https://drive.google.com/drive/folders/1Tb136kzA5mMy5R2ti0Cy7UXWT2PQVS5z?usp=sharing
v3.PDF: conversion result using version 3 of our conversion library, works well 
in PDFBox 1.8.12
v4.PDF: conversion result using version 4 of our conversion library, gives 
errors in PDFBox
v4-fixedByAcrobat.pdf: v4.PDF opened and exported by Acrobat: works well in 
PDFBox 1.8.12

I had no trouble doing a text extraction with 1.8.12, 1.8.16 and 1.8.17 on v3 and v4 using pdfbox-app. Makes me wonder if there's either a problem with PDFBox when using an input stream, or if something goes wrong when you read the file (maybe wrong mime type so it's passed as text)

Re the PDF/A problems:

Your file is a (correct) PDF/A-2a, and you checked it to be PDF/A-1b, which it isn't.

   Checking against conformance level PDF/A-2a
   True

   Checking against conformance level PDF/A-2b
   True

   Checking against conformance level PDF/A-2u
   True

That's all you need!

Tilman

Reply via email to