No good idea except call setRotate() on the page and then do text extraction.

A possible strategy might be to do all rotations and see which one brings most known words.

Tilman


Am 25.09.2017 um 19:31 schrieb Allison, Timothy B.:
Colleagues,
Any recommendations for extracting rotated text such as: 
https://www.fsis.usda.gov/wps/wcm/connect/896bf55c-0d78-44a0-adfb-94f893eb0f72/GallagherEbelKause_74.pdf?MOD=AJPERES
 ?

Adobe DC gets reasonable text with "save as text".  PDFBox's ExtractText (and 
Tika) get something like this:

FS
IS
L
is
te
ria
Li
st
er
ia
R
is
k
R
is
k
As
se
ss
m
en

Thank you!



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to