Hi all,

I frequently come across PDFs where the convertToImage() method is generating 
blank or partly blank images. One of those PDFs is attached to this mail. 

My code for processing: 

PDFParser parser;
parser = new PDFParser(new FileInputStream(f));
parser.parse();
cosDoc = parser.getDocument();

pdDoc = new PDDocument(cosDoc);
..
Iterator<PDPage> it = pdDoc.getDocumentCatalog().getAllPages().iterator();
PDPage page = it.next();
...
PDRectangle cropBox = page.findCropBox();
Dimension dimension = cropBox.createDimension();
...
BufferedImage img = page.convertToImage(BufferedImage.TYPE_INT_RGB, 
ImageParser.PARAM_DPI);


I am using pdfbox-app-1.8.0.jar.

So I have two questions: 

1. Is there a different way to to extract the page as an image that I am not 
aware of to get the correct image? 
2. Or is it possible to detect, that this page was not extracted correctly 
before or after the extraction?

At the moment I just don't know when dealing with a corrupted image.

Thanks a lot for any hints,

Alex

--
Dr. Alexander G. Klenner
Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
Schloss Birlinghoven, D-53754 Sankt Augustin
Tel.: +49 - 2241 - 14 - 2736
E-mail: [email protected]
Internet: http://www.scai.fraunhofer.de

Reply via email to