Am 08.10.2019 um 19:19 schrieb Merrick, Scott:

We are seeing issues with parsing text out of a PDF that has the text rotated 90 degrees counter clockwise.

The resulting text is broken into 2-3 characters per line.  The text seems to be read in the correct order as you can read the text (sort of)

This appears to be the same as TIKA-723
https://issues.apache.org/jira/browse/TIKA-723


That one can probably be closed.


And is in the current TIka as well, using the tika-app-1.22.jar

I did see the following  TIKA-2779
https://issues.apache.org/jira/browse/TIKA-2779

Where it mentions better handling of rotated text but I am still not able to properly parse the sample PDF I have.

Are there some parameters that have to be set that I am not aware of?


Did you try the "detectAngles" setting?

Tilman


Thanks,

Scott Merrick


Reply via email to