Thanks Kovi for quick response. Well why does it fail only for a particular file, a replica of same file generated using another pdf library works perfectly fine with PDFTextStripper ... isn't it strange and look like a bug ?
I hope you checked shared Sample.zip, it has both working & non-working files. Regards. On Fri, Jul 29, 2016 at 4:30 PM, Gregor Kovač <[email protected]> wrote: > Hi! > > API docs for PDFTextStripper ( > > http://pdfbox.apache.org/docs/2.0.2/javadocs/org/apache/pdfbox/text/PDFTextStripper.html > ) > states that "This class will take a pdf document and strip out all of the > text and ignore the formatting and such". Please note that you can > call setAddMoreFormatting ( > > http://pdfbox.apache.org/docs/2.0.2/javadocs/org/apache/pdfbox/text/PDFTextStripper.html#setAddMoreFormatting(boolean) > ) > with true and it will add a bit more formatting, but in my experience this > does not compare to using "pdftotext -layout" from Xpdf project. pdftotext > does a much better job preserving layout. > > Best regards, > Kovi > > 2016-07-29 12:44 GMT+02:00 Shyam Sundar <[email protected]>: > > > Hi, > > > > While converting a particular pdf to txt, spacing between lines and > > paragraphs is not retained, output is just a flat text. > > > > Sample file : ftp://PfXxyEhxh:[email protected]/Sample.zip > > > > Looks like a file specific issue. Can you pls check ? > > > > Thanks. > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > > -- > -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~ > | In A World Without Fences Who Needs Gates? | > | Experience Linux. | > -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~ >

