[ https://issues.apache.org/jira/browse/TIKA-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tyler Palsulich closed TIKA-1552. --------------------------------- Resolution: Not A Problem Marking this as not a problem, since Adobe Reader also adds white space. > Pdf document parser > ------------------- > > Key: TIKA-1552 > URL: https://issues.apache.org/jira/browse/TIKA-1552 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.7 > Reporter: Konstantin > Attachments: 2014_US_Federal_Budget.pdf, issue.jpg > > > Hello, > We found that when a pdf document has marked text inside frame (table) then > after parsing Tika insert tabs between words. > Original text from attached file: > Provides $17.7 billion in discretionary funding for the National Aeronautics > and Space > Parsed text (jira removed tabs, so i will add -> symbols instead): > • Provides -> $17.7 -> > billion->in->discretionary->funding->for->the->National->Aeronautics->and->Space > Please take a look in attached screenshot. > On the left side is the parsed text in text editor > Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)