If you show us the stacktrace we might be able to point you in the right 
direction or refer you to the PDF specification.  Another option would be 
to open test.pdf and search to "IA" (case sensitive) and see if you can 
determine which object it's failing to read.  If there are too many 
instances of "IA" you can try debugging.

Do other operations work with this file?  For example, can you read the 
bookmarks, copy pages, etc?  Try using some of the programs from the 
utilities package.  They might help determine if the issue is with the 
file or not.  If nothing will work, then it's probably a core document 
structure like the document outline.  If you know what program was used to 
create this PDF, that may help us duplicate the problem.  For example if 
text can not be extracted from all PDFs created with XYZ, then we can see 
if it's a conforming PDF; if it is, we can update the library.  If it's 
non-conforming (i.e. it doesn't follow the PDF specification), we'll take 
a look and see what the best way to handle it would be.

---- 
Thanks,
Adam





From:
"Robson Bortoleto" <[email protected]>
To:
[email protected]
Date:
07/27/2010 06:29
Subject:
Re: Problem with Text Extraction in pdfbox 1.2.1



Hi

Have you checked if the file is protected (read only)?
I have never used the PDFtextStripper, but many times I had different 
response between files due to write protection.


----- original message --------

Subject: Problem with Text Extraction in pdfbox 1.2.1
Sent: Tue, 27 Jul 2010
From: Jorge Imar Canché Álvarez<[email protected]>

> Hi, I am having problems with pdfbox 1.2.1. I want to extract text from
> a pdf file but my program throws an exception, the exception message is:
> 
> java.io.IOException: Error: Expected operator 'ID' actual='IA'
> 
> 
> My test class is:
> 
> PDDocument doc = PDDocument.load("test.pdf");
> PDFtextStripper strip = new PDFTextStripper();
> String text = strip.getText(doc);
> 
> 
> If I change the test.pdf for another file it works, but I must extract
> the text of "test.pdf"
> 
> 
> Thanks for your help.
> 
> 

--- original message end ----




?  Click here to submit conditions  

This email and any content within or attached hereto from  Sun West Mortgage 
Company, Inc.  is confidential and/or legally privileged. The information is 
intended only for the use of the individual or entity named on this email. If 
you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or the taking of any action in reliance on 
the contents of this email information is strictly prohibited, and that the 
documents should be returned to this office immediately by email. Receipt by 
anyone other than the intended recipient is not a waiver of any privilege. 
Please do not include your social security number, account number, or any other 
personal or financial information in the content of the email. Should you have 
any questions, please call  (800) 453 7884.   

Reply via email to