Performance issue with page to image conversion between PDFBox versions

Kalyan Donda Tue, 16 Oct 2018 11:20:59 -0700

Hi,

  We have a PDF document with an image on each page and we have code to
extract and process those images in a Java program.


  We used to have the following lines of code to extract an image from a
PDF page using PDFBox 1.8.13 and the time it takes is 119 milliseconds.

    PDDocument document = PDDocument.load(file);
    PDDocumentCatalog pdc = document.getDocumentCatalog();
    List pages = pdc.getAllPages();
    Iterator iter = pages.iterator();
    while (iter.hasNext()) {
      PDPage pg = (PDPage) iter.next();
      BufferedImage bi = pg.convertToImage(BufferedImage.TYPE_BYTE_GRAY,
288);
      .... Code that processes the above buffered image
    }

  But when we moved to PDFBox 2.0.9 we had to convert the above code to the
following as convertToImage is deprecared ,but this code is taking 4 to 10
times more time (823 milliseconds compared to 119 in previous version as an
example for a page). Is there any better way of extracting the image with
288 DPI and have the performance of the old version of API?

    PDDocument document = PDDocument.load(file);
    PDFRenderer pdfRenderer = new PDFRenderer(document);
    int totalPages = document.getNumberOfPages();
    int currentPageCount = -1;
    while (++currentPageCount < totalPages) {
      BufferedImage bi = pdfRenderer.renderImageWithDPI(currentPageCount,
288, ImageType.GRAY);
      .... Code that processes the above buffered image
    }

Regards

Performance issue with page to image conversion between PDFBox versions

Reply via email to