thank you for the sample code Andreas... but i am hitting another exception now.
I get the below exception when I try using the piece of code provided by you. can u please help? Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException at org.apache.pdfbox.util.PDFStreamEngine.<init>(PDFStreamEngine.java:137) at org.apache.pdfbox.util.PDFTextStripper.<init>(PDFTextStripper.java:162) at ExtractText.main(ExtractText.java:230) Caused by: java.lang.ClassCastException: org.pdfbox.util.operator.ShowTextGlyph cannot be cast to org.apache.pdfbox.util.operator.OperatorProcessor at org.apache.pdfbox.util.PDFStreamEngine.<init>(PDFStreamEngine.java:131) ... 2 more thanks, ~pramod 2009/10/27 Andreas Lehmkühler <andr...@lehmi.de> > Hi, > > Betreff: java.io.IOException: expected='startxref' Gesendet: Di, 27. Okt > 2009 > > Von: Pramod Pradhan > > >Hi All, > >I am trying to write a simple to code to just parse the text data from a > pdf file onto the console.I am hitting the below exception > >java.io.IOException: expected='startxref' actual='' > org.pdfbox.io.pushbackinputstr...@100ab23 at > >org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:355) at > >org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:176) at > PDFTextParser.pdftoText(PDFTextParser.java:49) at > >PDFTextParser.main(PDFTextParser.java:93)PDF to Text Conversion failed. > Looking at the stacktrace your're obviously using an older version of > pdfbox. I suggest to update to pdfbox 0.8.0. It is available at [1] > > >Can someone please help? I have attached the Java class file. > Your attachment didn't make it because of the mailing list policy. > If you are looking for an example how to extract text from a pdf, have a > look at ExtractText [2] > > BR > Andreas Lehmkühler > > [1] http://incubator.apache.org/pdfbox/download.html > [2] > http://svn.apache.org/repos/asf/incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/ExtractText.java > -- thanks, Pramod Pradhan (361)228-3989