Michael McCandless created PDFBOX-1297:
------------------------------------------

             Summary: ExtractText fails to extract text from packaged PDFs
                 Key: PDFBOX-1297
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1297
             Project: PDFBox
          Issue Type: Improvement
          Components: Text extraction
    Affects Versions: 1.6.0
         Environment: Fedora 13 Linux
            Reporter: Michael McCandless


Apparently a PDF is able to contain multiple files (like a Zip file); it's 
called
a PDF Package, described at
http://help.adobe.com/en_US/Reader/8.0/help.html?content=WSE034CA46-D08F-4fff-AA3C-FF04510DAEF0.html

I have a simple example PDF Package, containing two sub-PDFs, but ExtractText
fails to extract their text.

It does run successfully (no exceptions), but the text it extracts is just the 
boilerplate text
saying you should upgrade to Adobe Acrobat version 8 or later to view this PDF.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to