Hi!

API docs for PDFTextStripper (
http://pdfbox.apache.org/docs/2.0.2/javadocs/org/apache/pdfbox/text/PDFTextStripper.html)
states that "This class will take a pdf document and strip out all of the
text and ignore the formatting and such". Please note that you can
call setAddMoreFormatting (
http://pdfbox.apache.org/docs/2.0.2/javadocs/org/apache/pdfbox/text/PDFTextStripper.html#setAddMoreFormatting(boolean))
with true and it will add a bit more formatting, but in my experience this
does not compare to using "pdftotext -layout" from Xpdf project. pdftotext
does a much better job preserving layout.

Best regards,
    Kovi

2016-07-29 12:44 GMT+02:00 Shyam Sundar <[email protected]>:

> Hi,
>
> While converting a particular pdf to txt, spacing between lines and
> paragraphs is not retained, output is just a flat text.
>
> Sample file : ftp://PfXxyEhxh:[email protected]/Sample.zip
>
> Looks like a file specific issue. Can you pls check ?
>
> Thanks.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>



-- 
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
|  In A World Without Fences Who Needs Gates?  |
|              Experience Linux.               |
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~

Reply via email to