Hi all, I frequently come across PDFs where the convertToImage() method is generating blank or partly blank images. One of those PDFs is attached to this mail.
My code for processing: PDFParser parser; parser = new PDFParser(new FileInputStream(f)); parser.parse(); cosDoc = parser.getDocument(); pdDoc = new PDDocument(cosDoc); .. Iterator<PDPage> it = pdDoc.getDocumentCatalog().getAllPages().iterator(); PDPage page = it.next(); ... PDRectangle cropBox = page.findCropBox(); Dimension dimension = cropBox.createDimension(); ... BufferedImage img = page.convertToImage(BufferedImage.TYPE_INT_RGB, ImageParser.PARAM_DPI); I am using pdfbox-app-1.8.0.jar. So I have two questions: 1. Is there a different way to to extract the page as an image that I am not aware of to get the correct image? 2. Or is it possible to detect, that this page was not extracted correctly before or after the extraction? At the moment I just don't know when dealing with a corrupted image. Thanks a lot for any hints, Alex -- Dr. Alexander G. Klenner Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI) Schloss Birlinghoven, D-53754 Sankt Augustin Tel.: +49 - 2241 - 14 - 2736 E-mail: [email protected] Internet: http://www.scai.fraunhofer.de

