Re: Issues with Rotated text in PDF files

Tilman Hausherr Tue, 08 Oct 2019 10:40:35 -0700

Am 08.10.2019 um 19:19 schrieb Merrick, Scott:

We are seeing issues with parsing text out of a PDF that has the textrotated 90 degrees counter clockwise.
The resulting text is broken into 2-3 characters per line. The textseems to be read in the correct order as you can read the text (sort of)
This appears to be the same as TIKA-723
https://issues.apache.org/jira/browse/TIKA-723


That one can probably be closed.

And is in the current TIka as well, using the tika-app-1.22.jar

I did see the following  TIKA-2779
https://issues.apache.org/jira/browse/TIKA-2779
Where it mentions better handling of rotated text but I am still notable to properly parse the sample PDF I have.
Are there some parameters that have to be set that I am not aware of?


Did you try the "detectAngles" setting?

Tilman

Thanks,

Scott Merrick

Re: Issues with Rotated text in PDF files

Reply via email to