[ https://issues.apache.org/jira/browse/TIKA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Rusin updated TIKA-195: ------------------------------- Description: If a Word document contains text which is not in paragraphs, but rather in some frames, the text is ignored. The following code extracts ALL text, however I am not sure how it fits the Paragraps model used ty Tika: List textPieces = doc.getTextTable().getTextPieces(); for (Object o : textPieces) { TextPiece piece = (TextPiece) o; xhtml.element("p", piece.getStringBuffer().toString()); was: If a Word document contains text which is not in paragraphs, but rather in some frames, the text is ignored. The following code extracts ALL text, however I am not sure how it fits the Paragraps model used ty Tika: > MSWORD: Tika ignores text from Pieces > ------------------------------------- > > Key: TIKA-195 > URL: https://issues.apache.org/jira/browse/TIKA-195 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 0.2 > Reporter: Andrzej Rusin > Priority: Minor > > If a Word document contains text which is not in paragraphs, but rather in > some frames, the text is ignored. > The following code extracts ALL text, however I am not sure how it fits the > Paragraps model used ty Tika: > List textPieces = doc.getTextTable().getTextPieces(); > for (Object o : textPieces) { > TextPiece piece = (TextPiece) o; > xhtml.element("p", piece.getStringBuffer().toString()); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.