We are seeing issues with parsing text out of a PDF that has the text rotated 90 degrees counter clockwise.
The resulting text is broken into 2-3 characters per line. The text seems to be read in the correct order as you can read the text (sort of) This appears to be the same as TIKA-723 https://issues.apache.org/jira/browse/TIKA-723 And is in the current TIka as well, using the tika-app-1.22.jar I did see the following TIKA-2779 https://issues.apache.org/jira/browse/TIKA-2779 Where it mentions better handling of rotated text but I am still not able to properly parse the sample PDF I have. Are there some parameters that have to be set that I am not aware of? Thanks, Scott Merrick
