Difficulty compressing images in a fop generated pdf

Carl Buxbaum Mon, 30 Oct 2017 12:14:17 -0700

HI all,

I seem to be stumped with this.


I am taking as source a FOP generated PDF, and trying to compress the images.  
I have this bit of code that compresses the images for a page:


private static void getImagesFromResources(PDResources resources, PDDocument 
document, float quality) throws IOException {

Iterator<COSName> objectNames = resources.getXObjectNames().iterator();

ArrayList <COSName> objectNamesArray = new ArrayList<COSName>();

while (objectNames.hasNext())

{

objectNamesArray.add(objectNames.next());

}

for (int i=0; i < objectNamesArray.size(); i++)

{

COSName xObjectName=objectNamesArray.get(i);

PDXObject xObject = resources.getXObject(xObjectName);


if (xObject instanceof PDFormXObject)

{

// skip this, not a use case we will encounter

}

else if (xObject instanceof PDImageXObject)

{

System.out.println("replacing Image");

PDImageXObject  imageObject = (PDImageXObject) xObject;

BufferedImage image = imageObject.getImage();

// writes the file with given compression level

// from your JPEGImageWriteParam instance

PDImageXObject newImageObject = JPEGFactory.createFromImage(document, image, 
quality);

resources.put(xObjectName, newImageObject);

}

}

}


Here is the snippet that calls that code:


File sourceFile = tempFile;

String fileName = FilenameUtils.getBaseName(tempFile.getName());

File destFile;

try

{

destFile = File.createTempFile(fileName, ".pdf", tempdir);

}

catch (IOException e2)

{

throw new ApplicationException("Could not create temporary file" , e2);

}

PDDocument document = null;

try

{

document = PDDocument.load(tempFile);

}

catch (InvalidPasswordException e)

{

throw new ApplicationException("Could not load input PDF file" , e);

}

catch (IOException e)

{

throw new ApplicationException("Could not load input PDF file" , e);

}

PDStream stream= new PDStream(document);

try

{

is = stream.createInputStream();

}

catch (IOException e1)

{

throw new ApplicationException("Could not load create input stream" , e1);

}

try

{

for (int i = 0; i < document.getNumberOfPages(); i++)

{

PDPage page = document.getPage(i);

try

{

PDFParser parser =

getImagesFromResources(page.getResources(),document, quality);

}

catch (IOException e)

{

throw new ApplicationException("Could not retrieve images from PDF file" , e);

}

}

…


I am passing in the resources associated with each page.  The problem seems to 
be that all of the image resources appear on all pages, so I end up processing 
all of the images multiple times.  Also, in the original document, it seems 
that, although all of the resources are also present on all pages, the page 
somehow “knows” which ones to use and which to ignore.  So in my processed 
document, when I add the new images to the resources, I end up bloating the pdf 
with unnecessary images.

Is there a way to see if the page is actually using the image, and only 
processing it if it is?  I tried finding matches on the page dictionary, and 
parsing the page cream and matching on a dictionary there, to know avail.  I 
have used the debugger to see that the resources are in each page although each 
page only displays one of the the images.

Thanks in advance for any advice/help.

Carl Buxbaum
Senior Software Architect
17 Rogers St
Gloucester, MA 01930
1-978-515-5128

[cid:[email protected]]<https://www.bamboorose.com/>   
[cid:[email protected]] 
<http://www.facebook.com/BambooRoseCommunity>    
[cid:[email protected]] <https://www.linkedin.com/company/2814733> 
   [cid:[email protected]] <https://twitter.com/GoBambooRose>    
[cid:[email protected]] 
<https://www.youtube.com/channel/UCmVhcuiXr9JbN9H8DBZcNNg>    
[cid:[email protected]] <https://www.bamboorose.com/blog/>

________________________________
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Difficulty compressing images in a fop generated pdf

Reply via email to