[jira] [Created] (TIKA-1795) RTFParser can double Metadata.CONTENT_TYPE entry in Metadata

2015-11-16 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1795: - Summary: RTFParser can double Metadata.CONTENT_TYPE entry in Metadata Key: TIKA-1795 URL: https://issues.apache.org/jira/browse/TIKA-1795 Project: Tika Issue

[jira] [Commented] (TIKA-1794) TXTParser removes form feed characters

2015-11-16 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006797#comment-15006797 ] Ken Krugler commented on TIKA-1794: --- Tika uses XHTML 1.0, which doesn't allow the form-feed character.

[jira] [Comment Edited] (TIKA-1794) TXTParser removes form feed characters

2015-11-16 Thread Olivier M (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006768#comment-15006768 ] Olivier M edited comment on TIKA-1794 at 11/16/15 3:21 PM: --- After some reading it

[jira] [Resolved] (TIKA-1795) RTFParser can double Metadata.CONTENT_TYPE entry in Metadata

2015-11-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1795. --- Resolution: Fixed Fix Version/s: 1.12 r1714617 > RTFParser can double Metadata.CONTENT_TYPE

[jira] [Commented] (TIKA-1794) TXTParser removes form feed characters

2015-11-16 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006743#comment-15006743 ] Ken Krugler commented on TIKA-1794: --- The output of the Tika parse process is XHTML, and I don't believe a

[jira] [Updated] (TIKA-1794) TXTParser removes form feed characters

2015-11-16 Thread Olivier M (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier M updated TIKA-1794: Description: Just noticed that Apache Tika removes form feed characters (0C in UTF-8) when parsing a text

[jira] [Updated] (TIKA-1794) TXTParser removes form feed characters

2015-11-16 Thread Olivier M (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier M updated TIKA-1794: Description: Just noticed that Apache Tika removes form feed characters (0C in UTF-8) when parsing a text

[jira] [Updated] (TIKA-1794) TXTParser removes form feed characters

2015-11-16 Thread Olivier M (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier M updated TIKA-1794: Description: Just noticed that Apache Tika removes form feed characters (0C in UTF-8) when parsing a text

[jira] [Updated] (TIKA-1794) TXTParser removes form feed characters

2015-11-16 Thread Olivier M (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier M updated TIKA-1794: Attachment: form_feed.txt Txt file with form feed character. > TXTParser removes form feed characters >

[jira] [Created] (TIKA-1794) TXTParser removes form feed characters

2015-11-16 Thread Olivier M (JIRA)
Olivier M created TIKA-1794: --- Summary: TXTParser removes form feed characters Key: TIKA-1794 URL: https://issues.apache.org/jira/browse/TIKA-1794 Project: Tika Issue Type: Bug