Am 31.10.2018 um 22:07 schrieb Luca Loiodice:
I am using 2.0.X and have to support arbitrary input PDF (including
90,180,270 orientations, multi-column text, etc..).

Everything works fine except for the text on angle.
I came up with 2 pass call to the PDFStripper. Getting standard oriented
text using SortByPosition=false and getting  90,180,270 oriented text using
SortByPosition=true,
which I am not sure is correct, but seems to work.

Are there any override I could try on the 2.0.X PDFStripper to make it
work?

No, nothing out of the box... the reason is that PDFBox sees each glyph by itself. You, a human, are smarter than PDFBox and do notice that these glyphs seemingly on different "lines" are part of a skewed line.

Tilman







On Wed, Oct 31, 2018 at 3:34 PM Tilman Hausherr <thaush...@t-online.de>
wrote:

It might work with 1.8. However that version has other weaknesses.

Tilman

Am 31.10.2018 um 21:19 schrieb Luca Loiodice:
Is it possible to extract the 2 lines of text from this page?
https://www.dropbox.com/s/2uh3p464i7iwjwv/textonanangle.pdf?dl=0

This is the text lines I get using standard PdfStripper

Tex
t on
e o
n a
n a
ngl
e
Text two on an angle

Thanks a lot,
Luca


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to