Original post: I've been using pdfbox to do processing on documents scanned with our office copy machines (Cannon and Ricoh machines). Normally, the resulting pdf files contain one TIF file per page and the page.converttoimage() function works fine to extract the image. One of the machines has a setting to produce 'compact pdf' files which produces a smaller file. When the compact feature is turned on, instead of a single TIF image being stored in the page, multiple images are stored which need to be reassembled into a single image through some type of merging process. I am able to extract the separate images but I'm missing the roadmap on how to size the images and recombine them. Is there some sort of property which provides relative x,y coordinates so I can recombine using a graphics drawImage method?
I've been trying all sorts of variations using some of the techniques in the PrintImageLocations.html sample. Sadly, i'm not seeing how three separate images are contained inside the crop box - apparently with different scaling and x,y coordinates. I can extract the three images, get their sizes and also get the size of the crop box. I've tried assembling the images back together again by creating a new buffered image using the size of the crop box, and then doing an AlphaComposite to layer each of the three images on top of each other. Here is the pdf file I've been testing with and my java class, in case anyone has any ideas: The idea would be to call the pdfImageProcess class, getImage method, and pass in a single PDF page as follows: PDPage page = (PDPage)pages.get( i ); pdfImageProcess pdfIP = new pdfImageProcess(); image = pdfIP.getImage(page); The debug output on the three images looks like: Found image[Obj4] at 0.0,0.0 size=7803.0,13068.0, xScale=6.12, yScale=7.92 Found image[Obj5] at 36.48,465.6 size=11616.0,2457.9458, xScale=5.28, yScale=2.4288 Found image[Obj6] at 34.56,48.96 size=8995.431,8.639999, xScale=4.6464, yScale=0.144 Subject: RES: merging images from a compact-pdf file From: José Rodolfo Carrijo de Freitas ([email protected]) Date: Nov 23, 2010 10:21:20 am List: org.apache.pdfbox.users Hello Glenn, There is an example that show how to do that. Is a class called PrintImageLocations, the problem is to process the entire stream to find this information. Maybe you can adapt it to process the stream once and store those locations in a data structure. http://pdfbox.apache.org/apidocs/org/apache/pdfbox/examples/util/PrintImageL ocations.html -----Mensagem original----- De: GlennHirshon [mailto:[email protected]] Enviada em: terça-feira, 23 de novembro de 2010 16:16 Para: users Assunto: merging images from a compact-pdf file I've been using pdfbox to do processing on documents scanned with our office copy machines (Cannon and Ricoh machines). Normally, the resulting pdf files contain one TIF file per page and the page.converttoimage() function works fine to extract the image. One of the machines has a setting to produce 'compact pdf' files which produces a smaller file. When the compact feature is turned on, instead of a single TIF image being stored in the page, multiple images are stored which need to be reassembled into a single image through some type of merging process. I am able to extract the separate images but I'm missing the roadmap on how to size the images and recombine them. Is there some sort of property which provides relative x,y coordinates so I can recombine using a graphics drawImage method? The information contained in this message and any attachment(s) may be privileged, confidential, proprietary or otherwise protected from disclosure and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or use of this message and any attachment is strictly prohibited and may be unlawful. If you have received this message in error, please notify us immediately by replying to this email and permanently delete the message from your computer. Nothing contained in this message and/or any attachment(s) constitutes a solicitation or an offer to buy or sell any securities.
<<image/gif>>

