Hi, From: Ganesh [mailto:[email protected]] > I newbie with Tika. I am using latest version 0.8 version. I extracted > text from PDF document but found spaces and new line missing. Indexing > the data gives wrong result. Could any one in this group could help me?
That's an unfortunate regression that got included in the 0.8 release. See TIKA-548 [1] for the details. The problem is fixed in the latest 0.9-SNAPSHOT version, and we probably should cut a new release soon with this fix. [1] https://issues.apache.org/jira/browse/TIKA-548 BR, Jukka Zitting
