Yahav Amsalem created TIKA-3257: ----------------------------------- Summary: RAR files extracted content is not separated from the inner file names Key: TIKA-3257 URL: https://issues.apache.org/jira/browse/TIKA-3257 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.23 Reporter: Yahav Amsalem Attachments: test.rar
Attached is a RAR file containing a PPT file ("test.ppt") with one line in it - "Here the PPT content starts". However, the extracted text from tika is *not separating the file name and its content* as follows: "test.pptHere the PPT content starts" -- This message was sent by Atlassian Jira (v8.3.4#803005)