Thanks Tilman! That worked! Any information when you'll have a new release with this fix in?
Regards, Joe On Mon, Dec 7, 2015 at 4:33 PM, Tilman Hausherr <[email protected]> wrote: > Hi, > > The good news is that this bug was fixed last weekend in PDFBOX-3153. Get > the latest trunk and see in ExtractImages.java or do this to get a jpeg > stream: > > InputStream dctStream = > img.createInputStream(Arrays.asList(COSName.DCT_DECODE.getName())); > > > > Tilman > > > Am 07.12.2015 um 13:54 schrieb Joe Ye: > >> Hi, >> >> >> We've been using PDFBox to extract images from PDF files and recently >> upgraded to PDFBox version 2.0.0-RC2. I noticed that class PDXObjectImage >> is renamed/rewritten and method PDXObjectImage.write2OutputStream we used >> to write images to disk no longer exists? >> >> >> >> Therefore, I've been trying to use the new class PDImageXObject and follow >> your example org.apache.pdfbox.tools.ExtractImages#write2file in order to >> extract images from PDF and write them to disk. It appears that there's a >> code path (IOUtils.copy etc) for RGB or Gray colorspace where it just >> copies the unmodified JPEG stream. However, I have a couple of JPEG images >> with RBG colorspace in a PDF and used this code to extract and write them >> to disk, and they can't be opened by any image viewer, suggesting that the >> images may be damaged… >> >> >> >> If I change the code to call ImageIOUtil.writeImage instead, then the >> extracted images can be viewed ok. But I don't know the implication here >> as >> the code suggests that the JPEG will be converted. >> >> >> >> Please could you suggest why IOUtils.copy for RGB or Gray did not work >> properly and what's the recommended/ correct way to process them? >> >> >> Kind regards, >> >> Joe >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >

