Michael McCandless created PDFBOX-1297: ------------------------------------------
Summary: ExtractText fails to extract text from packaged PDFs Key: PDFBOX-1297 URL: https://issues.apache.org/jira/browse/PDFBOX-1297 Project: PDFBox Issue Type: Improvement Components: Text extraction Affects Versions: 1.6.0 Environment: Fedora 13 Linux Reporter: Michael McCandless Apparently a PDF is able to contain multiple files (like a Zip file); it's called a PDF Package, described at http://help.adobe.com/en_US/Reader/8.0/help.html?content=WSE034CA46-D08F-4fff-AA3C-FF04510DAEF0.html I have a simple example PDF Package, containing two sub-PDFs, but ExtractText fails to extract their text. It does run successfully (no exceptions), but the text it extracts is just the boilerplate text saying you should upgrade to Adobe Acrobat version 8 or later to view this PDF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira