I've used the Extract Images Command Line Tool to get the images. Erik
-----Original Message----- From: Tilman Hausherr [mailto:[email protected]] Sent: Dienstag, 8. November 2016 09:16 To: [email protected] Subject: Re: Issues with MRC Compressed using JBIG2-image What methods did you use to get the images? What I did is to look at the rendering and it looks like in Adobe Reader. I also looked at the images with PDFDebugger, that one shows the images with the mask applied. The second image is at Root/Pages/Kids/[0]/Resources/XObject/Im002 and it shows colored text. The image is DCT encoded. The mask is black and white text that is jbig2 encoded. http://imgur.com/a/2ofjD What do you get? Is the jbig2 decoder in your class path? For PDFDebugger, you need to do this: java -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider -cp "pdfbox-app-XXXX.jar;lib/*" org.apache.pdfbox.tools.PDFBox PDFReader filename the subdir "lib" has the additional jars. Tilman Am 08.11.2016 um 08:31 schrieb Zeiske, Erik (DualStudy): > Hello Tilman, > > You solved the NPE but there is something else wrong with the outputted > images. In the PDF there are 3 images an 2 masks for two of those images. > (The PDF is compressed like it is shown here: > https://www.abbyy.com/en-us/ocr-sdk-embedded/pdf-mrc/. The Foreground is the > second image of the PDF and uses the JBIG2 image as a mask to get the > coloured text. The third image and its mask is for the watermark of the PDF > and is extracted perfectly fine.) The library doesn't apply the mask > correctly to the second image. The resulting image should be only the Text > with its colour. But the result is only the colour without the mask applied. > I hope this makes sense. > > Erik. > > -----Original Message----- > From: Tilman Hausherr [mailto:[email protected]] > Sent: Montag, 7. November 2016 18:27 > To: [email protected] > Subject: Re: Issues with MRC Compressed using JBIG2-immage > > Hello Erik, > > I've opened > https://issues.apache.org/jira/browse/PDFBOX-3558 > and fixed the cause for the NPE in the sources. I have not fully understood > your text or maybe misunderstood something, and maybe something is now moot; > can you please test with a snapshot that the rendering is like you want it? > The build will be there within a few hours. > https://repository.apache.org/content/groups/snapshots/org/apache/pdfb > ox/pdfbox-app/2.0.4-SNAPSHOT/ > > Tilman > > Am 07.11.2016 um 08:06 schrieb Zeiske, Erik (DualStudy): >> Here is a Dropbox link to download the PDF: >> https://www.dropbox.com/s/q1t58ov6vybu3k7/scan300_1-6.pdf?dl=0 >> I am using version 2.0.3 of PDF-Box >> >> -----Original Message----- >> From: Tilman Hausherr [mailto:[email protected]] >> Sent: Donnerstag, 3. November 2016 18:07 >> To: [email protected] >> Subject: Re: Issues with MRC Compressed using JBIG2-immage >> >> Am 03.11.2016 um 09:58 schrieb Zeiske, Erik (DualStudy): >>> Hello everybody, >>> >>> I have an issue with PDFBox and the handling of a MRC Compressed PDF. >>> >>> The issue is related to the JBIG2 Compression used in the PDF. If I >>> try to extract the different Images used in the PDF attached, the >>> library throws an NullPointerException cause the Bits are not >>> defined in the JBIG2-Filter. I think this is because in the PDF >>> there is no "Bits per Component" defined in the JBIG2-Immage. If I >>> try to define the Bits in the JAVA-Code the program runs without an >>> error, but it doesn't apply the JBIG2 mask properly to the >>> foreground-colour-image of the PDF. To fix this issue I tried to >>> extract the mask into a file, but it seems like the mask-image is the same >>> as the foreground-image. >>> I couldn't find the reason for this and I don't think it is related >>> to the PDF itself. >>> >>> The PDF I was using with is in the attached to this e-mail. >>> >> Please upload the file to a sharehoster, PDF attachments are not >> allowed. Please tell also what version you are using and what >> >> Tilman >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

