No good idea except call setRotate() on the page and then do text
extraction.
A possible strategy might be to do all rotations and see which one
brings most known words.
Tilman
Am 25.09.2017 um 19:31 schrieb Allison, Timothy B.:
Colleagues,
Any recommendations for extracting rotated text such as:
https://www.fsis.usda.gov/wps/wcm/connect/896bf55c-0d78-44a0-adfb-94f893eb0f72/GallagherEbelKause_74.pdf?MOD=AJPERES
?
Adobe DC gets reasonable text with "save as text". PDFBox's ExtractText (and
Tika) get something like this:
FS
IS
L
is
te
ria
Li
st
er
ia
R
is
k
R
is
k
As
se
ss
m
en
Thank you!
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org