RE: Extracting rotated text

Allison, Timothy B. Mon, 25 Sep 2017 10:43:49 -0700

Thank you, Tilman.  I haven't looked yet, but to confirm, there's no page 
parameter that specifies that the text has been rotated?


Back to language modeling... 😊  Thank you, again!

-----Original Message-----
From: Tilman Hausherr [mailto:[email protected]] 
Sent: Monday, September 25, 2017 1:39 PM
To: [email protected]
Subject: Re: Extracting rotated text

No good idea except call setRotate() on the page and then do text extraction.

A possible strategy might be to do all rotations and see which one brings most 
known words.

Tilman


Am 25.09.2017 um 19:31 schrieb Allison, Timothy B.:
> Colleagues,
> Any recommendations for extracting rotated text such as: 
> https://www.fsis.usda.gov/wps/wcm/connect/896bf55c-0d78-44a0-adfb-94f893eb0f72/GallagherEbelKause_74.pdf?MOD=AJPERES
>  ?
>
> Adobe DC gets reasonable text with "save as text".  PDFBox's ExtractText (and 
> Tika) get something like this:
>
> FS
> IS
> L
> is
> te
> ria
> Li
> st
> er
> ia
> R
> is
> k
> R
> is
> k
> As
> se
> ss
> m
> en
>
> Thank you!
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Extracting rotated text

Reply via email to