[ 
https://issues.apache.org/jira/browse/PDFBOX-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-385:
--------------------------------------

    Attachment: BaseParser_385-Patch.diff

As already mentioned in a former comment the pdf-document isn't welformed. The 
object reference is broken. 
I made a patch to prevent a NPE in those cases.

> ClassCastException when call parseCOSArray in BaseParser.java 
> --------------------------------------------------------------
>
>                 Key: PDFBOX-385
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-385
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 0.7.0, 0.7.1, 0.7.2, 0.7.3
>         Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + 
> Jboss 402
>            Reporter: Yubin Zheng
>         Attachments: BaseParser_385-Patch.diff, Test9.pdf
>
>
> When parse spefical PDF document, PDF will throw ClassCaseException, then 
> Lucene integrated in Liferay will can not get text by parse PDF to add the 
> index.
> Debug the issue, the method "parseCOSArray" at BaseParser.java  see the 
> caused by it:
>   COSArray po = new COSArray();
>         COSBase pbo = null;
>         skipSpaces();
>         int i = 0;
>         while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
>         {
>             pbo = parseDirObject();
>             if( pbo instanceof COSObject )
>             {
>                 COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
>                 COSInteger number = (COSInteger)po.remove( po.size() -1 );
>                 COSObjectKey key = new COSObjectKey(number.intValue(), 
> genNumber.intValue());
>                 pbo = document.getObjectFromPool(key);
>             }
>             if( pbo != null )
>             {
>                 po.add( pbo );
>             }
>             else
>             {
>                 //it could be a bad object in the array which is just skipped
>             }
>             skipSpaces();
>         }
>         pdfSource.read(); //read ']'
>         skipSpaces();
>         return po;
>     }
> If meet the specific PDF document, the statment     COSInteger number = 
> (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object 
> is COSObject, not COSInteger. so Cast Class fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to