Hi, We have a PDF document with an image on each page and we have code to extract and process those images in a Java program.
We used to have the following lines of code to extract an image from a PDF page using PDFBox 1.8.13 and the time it takes is 119 milliseconds. PDDocument document = PDDocument.load(file); PDDocumentCatalog pdc = document.getDocumentCatalog(); List pages = pdc.getAllPages(); Iterator iter = pages.iterator(); while (iter.hasNext()) { PDPage pg = (PDPage) iter.next(); BufferedImage bi = pg.convertToImage(BufferedImage.TYPE_BYTE_GRAY, 288); .... Code that processes the above buffered image } But when we moved to PDFBox 2.0.9 we had to convert the above code to the following as convertToImage is deprecared ,but this code is taking 4 to 10 times more time (823 milliseconds compared to 119 in previous version as an example for a page). Is there any better way of extracting the image with 288 DPI and have the performance of the old version of API? PDDocument document = PDDocument.load(file); PDFRenderer pdfRenderer = new PDFRenderer(document); int totalPages = document.getNumberOfPages(); int currentPageCount = -1; while (++currentPageCount < totalPages) { BufferedImage bi = pdfRenderer.renderImageWithDPI(currentPageCount, 288, ImageType.GRAY); .... Code that processes the above buffered image } Regards