Hi,
I am using pdf box 1.8.9 for extracting pdf contents(actually using apache tika
which in turn is using pdf box). I am encountering the below exceptions while
trying to parse Portuguese or Spanish pdf files. They are different exceptions
but seem to be related to handling Spanish or Portuguese characters. Has
anybody encountered these exceptions before?? Any suggestions to fix it??
I can attached the pdf files if that would be helpful.
Exception list:--
1.) java.lang.RuntimeException: java.io.IOException: Expected='null'
actual='n' at offset 4306
at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
2.) java.lang.RuntimeException: java.io.IOException: Unknown dir object
c=')' cInt=41 peek=')' peekInt=41 8544
at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
3.) java.lang.RuntimeException: java.io.IOException: Error expected floating
point number actual='--22.'
at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
4.) java.lang.RuntimeException: java.io.IOException: Error expected floating
point number actual='173.0.2'
at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
5.) java.lang.RuntimeException: java.io.IOException: Value is not an
integer: -1-15
at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:198)
at
org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:205)
Thanks,
Mouthgalya Ganapathy
Product Development Team
______________________________________________________________________
Confidentiality Notice: The information contained in this e-mail and any
attachment(s) is confidential and for the use of the addressee(s) only. If you
are not the intended recipient of this e-mail, do not duplicate or redistribute
it by any means. Please delete this e-mail and any attachment(s) and notify us
immediately. Unauthorized use, reliance, disclosure or copying of the contents
of this e-mail and any attachment(s), or any similar action, is strictly
prohibited. Fitch Ratings reserves the right, to the extent permitted by
applicable law, to retain, monitor and intercept e-mail messages both to and
from its systems.
This e-mail has been scanned by the MessageLabs Email Security System. For
more information, please visit http://www.messagelabs.com/email.
______________________________________________________________________