Excatly the same issue. The spaces and newline is not extracted properly. When could we expect the new release?
Regards Ganesh ----- Original Message ----- From: "Jukka Zitting" <[email protected]> To: <[email protected]> Sent: Sunday, December 05, 2010 5:24 PM Subject: RE: PDF text extracted without spaces > Hi, > > From: Ganesh [mailto:[email protected]] >> I newbie with Tika. I am using latest version 0.8 version. I extracted >> text from PDF document but found spaces and new line missing. Indexing >> the data gives wrong result. Could any one in this group could help me? > > That's an unfortunate regression that got included in the 0.8 release. See > TIKA-548 [1] for the details. > > The problem is fixed in the latest 0.9-SNAPSHOT version, and we probably > should cut a new release soon with this fix. > > [1] https://issues.apache.org/jira/browse/TIKA-548 > > BR, > > Jukka Zitting > Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download Now! http://messenger.yahoo.com/download.php
