Re: Extracting rotated text

2017-09-25 Thread Peter Murray-Rust
no page > parameter that specifies that the text has been rotated? > > Back to language modeling...  Thank you, again! > > -Original Message- > From: Tilman Hausherr [mailto:thaush...@t-online.de] > Sent: Monday, September 25, 2017 1:39 PM > To: users@pdfbox.apache

Re: Extracting rotated text

2017-09-25 Thread Tilman Hausherr
gain! -Original Message- From: Tilman Hausherr [mailto:thaush...@t-online.de] Sent: Monday, September 25, 2017 1:39 PM To: users@pdfbox.apache.org Subject: Re: Extracting rotated text No good idea except call setRotate() on the page and then do text extraction. A possible strategy might be

RE: Extracting rotated text

2017-09-25 Thread Allison, Timothy B.
1:39 PM To: users@pdfbox.apache.org Subject: Re: Extracting rotated text No good idea except call setRotate() on the page and then do text extraction. A possible strategy might be to do all rotations and see which one brings most known words. Tilman Am 25.09.2017 um 19:31 schrieb Allison

Re: Extracting rotated text

2017-09-25 Thread Tilman Hausherr
No good idea except call setRotate() on the page and then do text extraction. A possible strategy might be to do all rotations and see which one brings most known words. Tilman Am 25.09.2017 um 19:31 schrieb Allison, Timothy B.: Colleagues, Any recommendations for extracting rotated text

Extracting rotated text

2017-09-25 Thread Allison, Timothy B.
Colleagues, Any recommendations for extracting rotated text such as: https://www.fsis.usda.gov/wps/wcm/connect/896bf55c-0d78-44a0-adfb-94f893eb0f72/GallagherEbelKause_74.pdf?MOD=AJPERES ? Adobe DC gets reasonable text with "save as text". PDFBox's ExtractText (and Tika) get some