Am 08.09.2016 um 08:33 schrieb Markus Barbey:
>I haven't tested your observation yet; the only explanation for now >
>would be that your PDF was just created, i.e. not read from a file or a >
>stream. (Thus no COSObject references)>
>
>Tilman>
Ok...

found it... while examinating the cashing algorithm I found this comment/code 
in PDResources#getXObject(COSName name):

         // we can't cache PDImageXObject, because it holds page resources, see 
PDFBOX-2370
         if (cache != null && !(xobject instanceof PDImageXObject))
         {
             cache.put(indirect, xobject);
         }


So the PDImageXObject is definitively not cached right now:-(


Yeah, you're right. Ouch! And I was involved in this a year ago, see at the bottom of this issue:
https://issues.apache.org/jira/browse/PDFBOX-2370

The problem is that PDImageXObject usually has a colorspace, and gets passed a resources object when created.

Last year this made trouble for patterns, which can have their colorspace passed when rendering. I noticed that a resource object was passed at creation time for images too, therefore "So I disabled caching for objects that hold a pointer to resources" and didn't think about performance loss.

An image could be in several pages with the same colorspace name, but that name could be something different in different pages. (very unlikely, but possible!)

Please replace the code segment at the end of the method with this (remove the println if you want):

        if (cache != null) //  && !(xobject instanceof PDImageXObject))
        {
            if (xobject instanceof PDImageXObject)
            {
COSBase colorSpace = xobject.getCOSObject().getDictionaryObject(COSName.COLORSPACE);
                if (colorSpace instanceof COSName)
                {
// don't cache if it might use page resources, see PDFBOX-2370 and XXXX
                    COSName colorSpaceName = (COSName) colorSpace;
if (colorSpaceName.equals(COSName.DEVICECMYK) && hasColorSpace(COSName.DEFAULT_CMYK))
                    {
System.out.println("Don't cache " + colorSpaceName);
                        return xobject;
                    }
if (colorSpaceName.equals(COSName.DEVICERGB) && hasColorSpace(COSName.DEFAULT_RGB))
                    {
System.out.println("Don't cache " + colorSpaceName);
                        return xobject;
                    }
if (colorSpaceName.equals(COSName.DEVICEGRAY) && hasColorSpace(COSName.DEFAULT_GRAY))
                    {
System.out.println("Don't cache " + colorSpaceName);
                        return xobject;
                    }
                    if (hasColorSpace(colorSpaceName))
                    {
System.out.println("Don't cache " + colorSpaceName);
                        return xobject;
                    }
                }
            }
            cache.put(indirect, xobject);
        }
        return xobject;
    }


You can see the effect (2nd page is rendered much faster) by viewing this document (all pages have the same image) in PDFDebugger and look for the time in the status bar at the lower left. I have a huge speed increase.
https://people.apache.org/~lehmi/apachecon/ApacheConPDFBox.pdf

Only very few files have images where no caching takes place.

Tilman


Reply via email to