yes - you are right. I had both the 7.3 version and the 8.0 version of the
jars.
now I have removed the old version and I am able to parse the data properly.

The other question I had was, the pdf I am trying to parse has data in a
table with many columns and rows.... all the data is being extracted as a
string....
how do I parse them out seperatley?

thanks
~pramod

2009/10/27 Andreas Lehmkühler <andr...@lehmi.de>

> Hi,
>
> Pramod Pradhan schrieb:
> > thank you for the sample code Andreas... but i am hitting another
> exception
> > now.
> >
> > I get the below exception when I try using the piece of code provided by
> > you. can u please help?
> >
> > Exception in thread "main"
> org.apache.pdfbox.exceptions.WrappedIOException
> > at
> org.apache.pdfbox.util.PDFStreamEngine.<init>(PDFStreamEngine.java:137)
> > at
> org.apache.pdfbox.util.PDFTextStripper.<init>(PDFTextStripper.java:162)
> > at ExtractText.main(ExtractText.java:230)
> > Caused by: java.lang.ClassCastException:
> > org.pdfbox.util.operator.ShowTextGlyph cannot be cast to
> > org.apache.pdfbox.util.operator.OperatorProcessor
> > at
> org.apache.pdfbox.util.PDFStreamEngine.<init>(PDFStreamEngine.java:131)
> > ... 2 more
> You somehow mixed up your environment. You have both pdfbox versions in
> the classpath. All pdfbox classes from the current version have the
> prefix "org.apache.pdfbox" and your stacktrace shows at least one class
> with the prefix "org.pdfbox" used in former versions.
>
> BR
> Andreas Lehmkühler
>
> > thanks,
> > ~pramod
> >
> > 2009/10/27 Andreas Lehmkühler <andr...@lehmi.de>
> >
> >> Hi,
> >>
> >> Betreff: java.io.IOException: expected='startxref' Gesendet: Di, 27. Okt
> >> 2009
> >>
> >> Von: Pramod Pradhan
> >>
> >>> Hi All,
> >>> I am trying to write a simple to code to just parse the text data from
> a
> >> pdf file onto the console.I am hitting the below exception
> >>> java.io.IOException: expected='startxref' actual=''
> >> org.pdfbox.io.pushbackinputstr...@100ab23  at
> >>> org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:355)      at
> >>> org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:176)    at
> >> PDFTextParser.pdftoText(PDFTextParser.java:49)       at
> >>> PDFTextParser.main(PDFTextParser.java:93)PDF to Text Conversion failed.
> >> Looking at the stacktrace your're obviously using an older version of
> >> pdfbox. I suggest to update to pdfbox 0.8.0. It is available at [1]
> >>
> >>> Can someone please help? I have attached the Java class file.
> >> Your attachment didn't make it because of the mailing list policy.
> >> If you are looking for an example how to extract text from a pdf, have a
> >> look at ExtractText [2]
> >>
> >> BR
> >> Andreas Lehmkühler
> >>
> >> [1] http://incubator.apache.org/pdfbox/download.html
> >> [2]
> >>
> http://svn.apache.org/repos/asf/incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/ExtractText.java
> >>
> >
> >
> >
>
>


-- 
thanks,
Pramod Pradhan
(361)228-3989

Reply via email to