Valdis Andersons created PDFBOX-2212:
----------------------------------------

             Summary: OutOfMemoryError in GlyfCompositeDescrip
                 Key: PDFBOX-2212
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2212
             Project: PDFBox
          Issue Type: Bug
          Components: FontBox, Preflight
    Affects Versions: 1.8.6
         Environment: Windows 7, JDK6
            Reporter: Valdis Andersons


Hi All,
 
The application I’m working on is a web service that accepts PDF documents and 
combines them in a single larger PDF. Client submits a bunch of PDFs and we 
create a single PDF out of them. In some rare cases one of the PDF documents 
submitted has a glitch in it that causes Adobe Reader to throw errors when 
viewing the final document (attached).
When I tried to check the buggy PDF with the approach outlined here:
 
https://pdfbox.apache.org/cookbook/pdfavalidation.html
 
I was getting an OutOfMemoryError in the GlyfCompositeDescrip class, here is 
the full stack trace:
 
java.lang.OutOfMemoryError: Java heap space
                at 
org.apache.fontbox.ttf.GlyfCompositeDescript.<init>(GlyfCompositeDescript.java:58)
                at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:62)
                at 
org.apache.fontbox.ttf.GlyphTable.initData(GlyphTable.java:69)
                at 
org.apache.fontbox.ttf.TrueTypeFont.initializeTable(TrueTypeFont.java:280)
                at 
org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:128)
                at 
org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:80)
                at 
org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:109)
                at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25)
                at 
org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:84)
                at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25)
                at 
org.apache.pdfbox.preflight.font.descriptor.TrueTypeDescriptorHelper.processFontFile(TrueTypeDescriptorHelper.java:84)
                at 
org.apache.pdfbox.preflight.font.descriptor.FontDescriptorHelper.validate(FontDescriptorHelper.java:97)
                at 
org.apache.pdfbox.preflight.font.SimpleFontValidator.processFontDescriptorValidation(SimpleFontValidator.java:82)
                at 
org.apache.pdfbox.preflight.font.SimpleFontValidator.validate(SimpleFontValidator.java:55)
                at 
org.apache.pdfbox.preflight.process.reflect.FontValidationProcess.validate(FontValidationProcess.java:69)
                at 
org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
                at 
org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
                at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateFonts(ResourcesValidationProcess.java:96)
                at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:74)
                at 
org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
                at 
org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
                at 
org.apache.pdfbox.preflight.xobject.XObjFormValidator.validateXObjectResources(XObjFormValidator.java:178)
                at 
org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:75)
                at 
org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:77)
                at 
org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
                at 
org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
                at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:191)
                at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:78)
                at 
org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
                at 
org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
                at 
org.apache.pdfbox.preflight.xobject.XObjFormValidator.validateXObjectResources(XObjFormValidator.java:178)
                at 
org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:75)
 
While I can’t send on the PDF in question due to the sensitivity of the 
contents in it, I did a bit of digging and debugging to find out why this is 
happening.
In the GlyfCompositeDescrip classes constructor there is a do … while loop that 
is constructing GlyfCompositeComp objects and adding them to the components 
list of GlyfCompositeDescrip. In the constructor of the GlyfCompositeComp a 
signed short is read from the TTFDataStream in the flags field, that field in 
turn is used in the GlyfCompositeDescrip constructor to check if any more 
components are there to be read. Here is the code in question:
 
public GlyfCompositeDescript(TTFDataStream bais, GlyphTable glyphTable) throws 
IOException
    {
…
        do
        {
            comp = new GlyfCompositeComp(bais); //This is where the 
OutOfMemoryError happens
            components.add(comp);
        } while ((comp.getFlags() & GlyfCompositeComp.MORE_COMPONENTS) != 0); 
//here the flags are used to check if more components are there
…
    }
 
protected GlyfCompositeComp(TTFDataStream bais) throws IOException
    {
        flags = bais.readSignedShort();
…
}
 
In the case of the corrupted PDF, that we get from time to time, the 
bais.readSignedShort() call in GlyfCompositeComp results in a value of -1 and 
once it hits that value the condition in the GlyfCompositeDescript 
constructor’s loop will always result in 32 (!=0). Basically, it ends up in an 
infinite loop and keeps constructing GlyfCompositeComp objects until the memory 
runs out.
 
The main question here is, has anyone ever encountered a PDF corruption that 
causes this behaviour and how would one have to go about checking the PDF 
document for this sort of corruptions without causing the application to run 
out of memory?
 
We’re not required to fix the document, just check if it’s valid. If it’s not 
valid then we just reject the document. Ideally I’d also like to know what the 
corruption could be so that I can at least give a hint to the client software 
as to what is causing this document to be rejected (I do understand that 
without the actual PDF that’s causing this it might be impossible to tell that).




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to