Tim Allison created TIKA-1795:
-
Summary: RTFParser can double Metadata.CONTENT_TYPE entry in
Metadata
Key: TIKA-1795
URL: https://issues.apache.org/jira/browse/TIKA-1795
Project: Tika
Issue
[
https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006797#comment-15006797
]
Ken Krugler commented on TIKA-1794:
---
Tika uses XHTML 1.0, which doesn't allow the form-feed character.
[
https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006768#comment-15006768
]
Olivier M edited comment on TIKA-1794 at 11/16/15 3:21 PM:
---
After some reading it
[
https://issues.apache.org/jira/browse/TIKA-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1795.
---
Resolution: Fixed
Fix Version/s: 1.12
r1714617
> RTFParser can double Metadata.CONTENT_TYPE
[
https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006743#comment-15006743
]
Ken Krugler commented on TIKA-1794:
---
The output of the Tika parse process is XHTML, and I don't believe a
[
https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olivier M updated TIKA-1794:
Description:
Just noticed that Apache Tika removes form feed characters (0C in UTF-8) when
parsing a text
[
https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olivier M updated TIKA-1794:
Description:
Just noticed that Apache Tika removes form feed characters (0C in UTF-8) when
parsing a text
[
https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olivier M updated TIKA-1794:
Description:
Just noticed that Apache Tika removes form feed characters (0C in UTF-8) when
parsing a text
[
https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olivier M updated TIKA-1794:
Attachment: form_feed.txt
Txt file with form feed character.
> TXTParser removes form feed characters
>
Olivier M created TIKA-1794:
---
Summary: TXTParser removes form feed characters
Key: TIKA-1794
URL: https://issues.apache.org/jira/browse/TIKA-1794
Project: Tika
Issue Type: Bug
10 matches
Mail list logo