Issues with Rotated text in PDF files

Merrick, Scott Tue, 08 Oct 2019 10:27:25 -0700

We are seeing issues with parsing text out of a PDF that has the text rotated 
90 degrees counter clockwise.


The resulting text is broken into 2-3 characters per line.  The text seems to 
be read in the correct order as you can read the text (sort of)

This appears to be the same as TIKA-723
https://issues.apache.org/jira/browse/TIKA-723

And is in the current TIka as well, using the tika-app-1.22.jar


I did see the following  TIKA-2779
https://issues.apache.org/jira/browse/TIKA-2779
Where it mentions better handling of rotated text but I am still not able to 
properly parse the sample PDF I have.

Are there some parameters that have to be set that I am not aware of?

Thanks,

Scott Merrick

Issues with Rotated text in PDF files

Reply via email to