Thank you, Tilman. I haven't looked yet, but to confirm, there's no page parameter that specifies that the text has been rotated?
Back to language modeling... 😊 Thank you, again! -----Original Message----- From: Tilman Hausherr [mailto:[email protected]] Sent: Monday, September 25, 2017 1:39 PM To: [email protected] Subject: Re: Extracting rotated text No good idea except call setRotate() on the page and then do text extraction. A possible strategy might be to do all rotations and see which one brings most known words. Tilman Am 25.09.2017 um 19:31 schrieb Allison, Timothy B.: > Colleagues, > Any recommendations for extracting rotated text such as: > https://www.fsis.usda.gov/wps/wcm/connect/896bf55c-0d78-44a0-adfb-94f893eb0f72/GallagherEbelKause_74.pdf?MOD=AJPERES > ? > > Adobe DC gets reasonable text with "save as text". PDFBox's ExtractText (and > Tika) get something like this: > > FS > IS > L > is > te > ria > Li > st > er > ia > R > is > k > R > is > k > As > se > ss > m > en > > Thank you! > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

