yes - you are right. I had both the 7.3 version and the 8.0 version of the jars. now I have removed the old version and I am able to parse the data properly.
The other question I had was, the pdf I am trying to parse has data in a table with many columns and rows.... all the data is being extracted as a string.... how do I parse them out seperatley? thanks ~pramod 2009/10/27 Andreas Lehmkühler <andr...@lehmi.de> > Hi, > > Pramod Pradhan schrieb: > > thank you for the sample code Andreas... but i am hitting another > exception > > now. > > > > I get the below exception when I try using the piece of code provided by > > you. can u please help? > > > > Exception in thread "main" > org.apache.pdfbox.exceptions.WrappedIOException > > at > org.apache.pdfbox.util.PDFStreamEngine.<init>(PDFStreamEngine.java:137) > > at > org.apache.pdfbox.util.PDFTextStripper.<init>(PDFTextStripper.java:162) > > at ExtractText.main(ExtractText.java:230) > > Caused by: java.lang.ClassCastException: > > org.pdfbox.util.operator.ShowTextGlyph cannot be cast to > > org.apache.pdfbox.util.operator.OperatorProcessor > > at > org.apache.pdfbox.util.PDFStreamEngine.<init>(PDFStreamEngine.java:131) > > ... 2 more > You somehow mixed up your environment. You have both pdfbox versions in > the classpath. All pdfbox classes from the current version have the > prefix "org.apache.pdfbox" and your stacktrace shows at least one class > with the prefix "org.pdfbox" used in former versions. > > BR > Andreas Lehmkühler > > > thanks, > > ~pramod > > > > 2009/10/27 Andreas Lehmkühler <andr...@lehmi.de> > > > >> Hi, > >> > >> Betreff: java.io.IOException: expected='startxref' Gesendet: Di, 27. Okt > >> 2009 > >> > >> Von: Pramod Pradhan > >> > >>> Hi All, > >>> I am trying to write a simple to code to just parse the text data from > a > >> pdf file onto the console.I am hitting the below exception > >>> java.io.IOException: expected='startxref' actual='' > >> org.pdfbox.io.pushbackinputstr...@100ab23 at > >>> org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:355) at > >>> org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:176) at > >> PDFTextParser.pdftoText(PDFTextParser.java:49) at > >>> PDFTextParser.main(PDFTextParser.java:93)PDF to Text Conversion failed. > >> Looking at the stacktrace your're obviously using an older version of > >> pdfbox. I suggest to update to pdfbox 0.8.0. It is available at [1] > >> > >>> Can someone please help? I have attached the Java class file. > >> Your attachment didn't make it because of the mailing list policy. > >> If you are looking for an example how to extract text from a pdf, have a > >> look at ExtractText [2] > >> > >> BR > >> Andreas Lehmkühler > >> > >> [1] http://incubator.apache.org/pdfbox/download.html > >> [2] > >> > http://svn.apache.org/repos/asf/incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/ExtractText.java > >> > > > > > > > > -- thanks, Pramod Pradhan (361)228-3989