Am 08.09.2016 um 08:33 schrieb Markus Barbey:
>I haven't tested your observation yet; the only explanation for now >
>would be that your PDF was just created, i.e. not read from a file or a >
>stream. (Thus no COSObject references)>
>
>Tilman>
Ok...
found it... while examinating the cashing algorithm I found this comment/code
in PDResources#getXObject(COSName name):
// we can't cache PDImageXObject, because it holds page resources, see
PDFBOX-2370
if (cache != null && !(xobject instanceof PDImageXObject))
{
cache.put(indirect, xobject);
}
So the PDImageXObject is definitively not cached right now:-(
Yeah, you're right. Ouch! And I was involved in this a year ago, see at
the bottom of this issue:
https://issues.apache.org/jira/browse/PDFBOX-2370
The problem is that PDImageXObject usually has a colorspace, and gets
passed a resources object when created.
Last year this made trouble for patterns, which can have their
colorspace passed when rendering. I noticed that a resource object was
passed at creation time for images too, therefore "So I disabled caching
for objects that hold a pointer to resources" and didn't think about
performance loss.
An image could be in several pages with the same colorspace name, but
that name could be something different in different pages. (very
unlikely, but possible!)
Please replace the code segment at the end of the method with this
(remove the println if you want):
if (cache != null) // && !(xobject instanceof PDImageXObject))
{
if (xobject instanceof PDImageXObject)
{
COSBase colorSpace =
xobject.getCOSObject().getDictionaryObject(COSName.COLORSPACE);
if (colorSpace instanceof COSName)
{
// don't cache if it might use page resources, see
PDFBOX-2370 and XXXX
COSName colorSpaceName = (COSName) colorSpace;
if (colorSpaceName.equals(COSName.DEVICECMYK) &&
hasColorSpace(COSName.DEFAULT_CMYK))
{
System.out.println("Don't cache " +
colorSpaceName);
return xobject;
}
if (colorSpaceName.equals(COSName.DEVICERGB) &&
hasColorSpace(COSName.DEFAULT_RGB))
{
System.out.println("Don't cache " +
colorSpaceName);
return xobject;
}
if (colorSpaceName.equals(COSName.DEVICEGRAY) &&
hasColorSpace(COSName.DEFAULT_GRAY))
{
System.out.println("Don't cache " +
colorSpaceName);
return xobject;
}
if (hasColorSpace(colorSpaceName))
{
System.out.println("Don't cache " +
colorSpaceName);
return xobject;
}
}
}
cache.put(indirect, xobject);
}
return xobject;
}
You can see the effect (2nd page is rendered much faster) by viewing
this document (all pages have the same image) in PDFDebugger and look
for the time in the status bar at the lower left. I have a huge speed
increase.
https://people.apache.org/~lehmi/apachecon/ApacheConPDFBox.pdf
Only very few files have images where no caching takes place.
Tilman