[ https://issues.apache.org/jira/browse/TIKA-268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-268. -------------------------------- Resolution: Fixed Fix Version/s: 0.5 Assignee: Jukka Zitting Fixed in revision 806887 based on Uwe's suggestion. > HTMLParser ommits necessary space-characters when parsing table-data > --------------------------------------------------------------------- > > Key: TIKA-268 > URL: https://issues.apache.org/jira/browse/TIKA-268 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.3, 0.4 > Environment: Win, Mac, Lin; Java 5+ > Reporter: Joachim Zittmayr > Assignee: Jukka Zitting > Priority: Critical > Fix For: 0.5 > > Original Estimate: 4h > Remaining Estimate: 4h > > When an HTML file with a table structure is given to the TIKA-ecosystem, then > HTML parser doesn't output space characters between table cells. > Example: > Input > ------------------------------ > <table> > <tr> > <td>Apache LUCENE<td><td>is f****** amazing!</td> > </tr> > <tr> > <td>Apache TIKA</td><td>freaks you out!</td> > </tr> > <table> > ------------------------------ > Output > ------------------------------ > Apache LUCENEis f****** amazing! > Apache TIKAfreaks you out! > ------------------------------ > unfortuantely i didnt have the time to do some investigation within > HTMLParser. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.