
  We have a PDF document with an image on each page and we have code to
extract and process those images in a Java program.

  We used to have the following lines of code to extract an image from a
PDF page using PDFBox 1.8.13 and the time it takes is 119 milliseconds.

    PDDocument document = PDDocument.load(file);
    PDDocumentCatalog pdc = document.getDocumentCatalog();
    List pages = pdc.getAllPages();
    Iterator iter = pages.iterator();
    while (iter.hasNext()) {
      PDPage pg = (PDPage) iter.next();
      BufferedImage bi = pg.convertToImage(BufferedImage.TYPE_BYTE_GRAY,
      .... Code that processes the above buffered image

  But when we moved to PDFBox 2.0.9 we had to convert the above code to the
following as convertToImage is deprecared ,but this code is taking 4 to 10
times more time (823 milliseconds compared to 119 in previous version as an
example for a page). Is there any better way of extracting the image with
288 DPI and have the performance of the old version of API?

    PDDocument document = PDDocument.load(file);
    PDFRenderer pdfRenderer = new PDFRenderer(document);
    int totalPages = document.getNumberOfPages();
    int currentPageCount = -1;
    while (++currentPageCount < totalPages) {
      BufferedImage bi = pdfRenderer.renderImageWithDPI(currentPageCount,
288, ImageType.GRAY);
      .... Code that processes the above buffered image


Reply via email to