Hello. I'm getting a timeout in one of my tests after upgrading to v 2.0.17: 
PDImageXObject.getImage() takes more than 1:10 minutes instead of less than 2 
seconds with previous release 2.0.16.

I cannot provide the sample PDF because it contains sensitive information. I 
have tried to simplify it but the issue (almost) dissappears even if I save the 
file without changing anything.

Some more facts:

  *   The issue happens if the load is done with 
MemoryUsageSetting.setupTempFileOnly (without that flag the issue doesn't 
happen)
  *   The performance is OK for v 2.0.16
  *   Please find attached the stack trace of the timeout of my original test
  *   PDF structure is quite simple and contains a single image (find attached 
the relevant data from pdfdebugger)
  *   I have created a sample program to demonstrate the issue: it simply loads 
the PDF file with the setupTempFileOnly flag and does getImage for all the 
images (only one in the PDF). It then does the same thing without the flag and 
after that it does the same thing with another PDF: it is simply the same file 
saved by PDFBOX with another name.

Output with pdfbox 2.0.16:
java -cp "pdfbox-2.0.16.jar;commons-logging-1.2.jar;." TestExtractImage 
sample.pdf
With    temp file
    Before getImage: Thu Sep 26 16:32:46 ART 2019
    After  getImage: Thu Sep 26 16:32:46 ART 2019
Without temp file
    Before getImage: Thu Sep 26 16:32:46 ART 2019
    After  getImage: Thu Sep 26 16:32:46 ART 2019
With    temp file after saving with another name
    Before getImage: Thu Sep 26 16:32:46 ART 2019
    After  getImage: Thu Sep 26 16:32:47 ART 2019
(i.e.: less than 2 seconds in all cases)

Output with pdfbox 2.0.17:
java -cp "pdfbox-2.0.17.jar;commons-logging-1.2.jar;." TestExtractImage 
sample.pdf
With    temp file
    Before getImage: Thu Sep 26 16:31:09 ART 2019
    After  getImage: Thu Sep 26 16:32:30 ART 2019   => more than 1'20"
Without temp file
    Before getImage: Thu Sep 26 16:32:30 ART 2019
    After  getImage: Thu Sep 26 16:32:30 ART 2019
With    temp file after saving with another name
    Before getImage: Thu Sep 26 16:32:30 ART 2019
    After  getImage: Thu Sep 26 16:32:34 ART 2019 => more than 2"

And here the source code for TestExtractImage.java :

import java.io.File;
import java.io.IOException;
import java.util.Date;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.io.MemoryUsageSetting;

public class TestExtractImage {
    public static void main(String[] args) throws IOException {
        System.out.println("With    temp file");
        PDDocument d = PDDocument.load(new File(args[0]), 
MemoryUsageSetting.setupTempFileOnly() );
        getImage(d);
        d.close();

        System.out.println("Without temp file");
        d = PDDocument.load(new File(args[0]));
        getImage(d);
        d.save("other.pdf");
        d.close();
        System.out.println("With    temp file after saving with another name");
        d = PDDocument.load(new File("other.pdf"), 
MemoryUsageSetting.setupTempFileOnly() );
        getImage(d);
    }

    static void getImage(PDDocument d) throws IOException {
        PDResources res = d.getPage(0).getResources();
        for (COSName n: res.getXObjectNames()) {
            PDXObject o = res.getXObject(n);
            if (o instanceof PDImageXObject){
                PDImageXObject i = (PDImageXObject) o;
                if ("png".equals(i.getSuffix())){
                    System.out.println("    Before getImage: "+new Date());
                    i.getImage();
                    System.out.println("    After  getImage: "+new Date());
                }
            }
        }
    }
}



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to