HI all,
I seem to be stumped with this.
I am taking as source a FOP generated PDF, and trying to compress the images.
I have this bit of code that compresses the images for a page:
private static void getImagesFromResources(PDResources resources, PDDocument
document, float quality) throws IOException {
Iterator<COSName> objectNames = resources.getXObjectNames().iterator();
ArrayList <COSName> objectNamesArray = new ArrayList<COSName>();
while (objectNames.hasNext())
{
objectNamesArray.add(objectNames.next());
}
for (int i=0; i < objectNamesArray.size(); i++)
{
COSName xObjectName=objectNamesArray.get(i);
PDXObject xObject = resources.getXObject(xObjectName);
if (xObject instanceof PDFormXObject)
{
// skip this, not a use case we will encounter
}
else if (xObject instanceof PDImageXObject)
{
System.out.println("replacing Image");
PDImageXObject imageObject = (PDImageXObject) xObject;
BufferedImage image = imageObject.getImage();
// writes the file with given compression level
// from your JPEGImageWriteParam instance
PDImageXObject newImageObject = JPEGFactory.createFromImage(document, image,
quality);
resources.put(xObjectName, newImageObject);
}
}
}
Here is the snippet that calls that code:
File sourceFile = tempFile;
String fileName = FilenameUtils.getBaseName(tempFile.getName());
File destFile;
try
{
destFile = File.createTempFile(fileName, ".pdf", tempdir);
}
catch (IOException e2)
{
throw new ApplicationException("Could not create temporary file" , e2);
}
PDDocument document = null;
try
{
document = PDDocument.load(tempFile);
}
catch (InvalidPasswordException e)
{
throw new ApplicationException("Could not load input PDF file" , e);
}
catch (IOException e)
{
throw new ApplicationException("Could not load input PDF file" , e);
}
PDStream stream= new PDStream(document);
try
{
is = stream.createInputStream();
}
catch (IOException e1)
{
throw new ApplicationException("Could not load create input stream" , e1);
}
try
{
for (int i = 0; i < document.getNumberOfPages(); i++)
{
PDPage page = document.getPage(i);
try
{
PDFParser parser =
getImagesFromResources(page.getResources(),document, quality);
}
catch (IOException e)
{
throw new ApplicationException("Could not retrieve images from PDF file" , e);
}
}
…
I am passing in the resources associated with each page. The problem seems to
be that all of the image resources appear on all pages, so I end up processing
all of the images multiple times. Also, in the original document, it seems
that, although all of the resources are also present on all pages, the page
somehow “knows” which ones to use and which to ignore. So in my processed
document, when I add the new images to the resources, I end up bloating the pdf
with unnecessary images.
Is there a way to see if the page is actually using the image, and only
processing it if it is? I tried finding matches on the page dictionary, and
parsing the page cream and matching on a dictionary there, to know avail. I
have used the debugger to see that the resources are in each page although each
page only displays one of the the images.
Thanks in advance for any advice/help.
Carl Buxbaum
Senior Software Architect
17 Rogers St
Gloucester, MA 01930
1-978-515-5128
[cid:[email protected]]<https://www.bamboorose.com/>
[cid:[email protected]]
<http://www.facebook.com/BambooRoseCommunity>
[cid:[email protected]] <https://www.linkedin.com/company/2814733>
[cid:[email protected]] <https://twitter.com/GoBambooRose>
[cid:[email protected]]
<https://www.youtube.com/channel/UCmVhcuiXr9JbN9H8DBZcNNg>
[cid:[email protected]] <https://www.bamboorose.com/blog/>
________________________________
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it in
any way.
No representation is made that this email or any attachments are free of
viruses. Virus scanning is recommended and is the responsibility of the
recipient.