Hello Erik,
I've identified the problem and created issue
https://issues.apache.org/jira/browse/PDFBOX-3559 where it has been
fixed. The cause was a "fast path" for jpeg files that ignored the mask.
Please try again with a snapshot build when it is there.
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.4-SNAPSHOT/
I tested myself, the output does not look very good but this could be
because IrfanView misidentifies the ARGB file as a CMYK jpeg (which it
isn't) or because java doesn't save it properly. Maybe it is different
with another software.
Tilman
Am 08.11.2016 um 09:22 schrieb Zeiske, Erik (DualStudy):
I've used the Extract Images Command Line Tool to get the images.
Erik
-----Original Message-----
From: Tilman Hausherr [mailto:[email protected]]
Sent: Dienstag, 8. November 2016 09:16
To: [email protected]
Subject: Re: Issues with MRC Compressed using JBIG2-image
What methods did you use to get the images?
What I did is to look at the rendering and it looks like in Adobe Reader.
I also looked at the images with PDFDebugger, that one shows the images with
the mask applied. The second image is at
Root/Pages/Kids/[0]/Resources/XObject/Im002
and it shows colored text. The image is DCT encoded. The mask is black and
white text that is jbig2 encoded.
http://imgur.com/a/2ofjD
What do you get?
Is the jbig2 decoder in your class path? For PDFDebugger, you need to do
this:
java -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider -cp
"pdfbox-app-XXXX.jar;lib/*" org.apache.pdfbox.tools.PDFBox PDFReader filename
the subdir "lib" has the additional jars.
Tilman
Am 08.11.2016 um 08:31 schrieb Zeiske, Erik (DualStudy):
Hello Tilman,
You solved the NPE but there is something else wrong with the outputted images.
In the PDF there are 3 images an 2 masks for two of those images. (The PDF is
compressed like it is shown here:
https://www.abbyy.com/en-us/ocr-sdk-embedded/pdf-mrc/. The Foreground is the
second image of the PDF and uses the JBIG2 image as a mask to get the coloured
text. The third image and its mask is for the watermark of the PDF and is
extracted perfectly fine.) The library doesn't apply the mask correctly to the
second image. The resulting image should be only the Text with its colour. But
the result is only the colour without the mask applied.
I hope this makes sense.
Erik.
-----Original Message-----
From: Tilman Hausherr [mailto:[email protected]]
Sent: Montag, 7. November 2016 18:27
To: [email protected]
Subject: Re: Issues with MRC Compressed using JBIG2-immage
Hello Erik,
I've opened
https://issues.apache.org/jira/browse/PDFBOX-3558
and fixed the cause for the NPE in the sources. I have not fully understood
your text or maybe misunderstood something, and maybe something is now moot;
can you please test with a snapshot that the rendering is like you want it? The
build will be there within a few hours.
https://repository.apache.org/content/groups/snapshots/org/apache/pdfb
ox/pdfbox-app/2.0.4-SNAPSHOT/
Tilman
Am 07.11.2016 um 08:06 schrieb Zeiske, Erik (DualStudy):
Here is a Dropbox link to download the PDF:
https://www.dropbox.com/s/q1t58ov6vybu3k7/scan300_1-6.pdf?dl=0
I am using version 2.0.3 of PDF-Box
-----Original Message-----
From: Tilman Hausherr [mailto:[email protected]]
Sent: Donnerstag, 3. November 2016 18:07
To: [email protected]
Subject: Re: Issues with MRC Compressed using JBIG2-immage
Am 03.11.2016 um 09:58 schrieb Zeiske, Erik (DualStudy):
Hello everybody,
I have an issue with PDFBox and the handling of a MRC Compressed PDF.
The issue is related to the JBIG2 Compression used in the PDF. If I
try to extract the different Images used in the PDF attached, the
library throws an NullPointerException cause the Bits are not
defined in the JBIG2-Filter. I think this is because in the PDF
there is no "Bits per Component" defined in the JBIG2-Immage. If I
try to define the Bits in the JAVA-Code the program runs without an
error, but it doesn't apply the JBIG2 mask properly to the
foreground-colour-image of the PDF. To fix this issue I tried to
extract the mask into a file, but it seems like the mask-image is the same as
the foreground-image.
I couldn't find the reason for this and I don't think it is related
to the PDF itself.
The PDF I was using with is in the attached to this e-mail.
Please upload the file to a sharehoster, PDF attachments are not
allowed. Please tell also what version you are using and what
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]