[ https://issues.apache.org/jira/browse/TIKA-335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ken Krugler updated TIKA-335: ----------------------------- Attachment: TIKA-335.patch This patch also cleans up some generics warnings (sorry about mixing the two, I was going to open a second issue but the two were co-mingled). In order to make this work, I had to modify the charset detection code to actually use the hint - weird that ICU never actually implemented this. Includes a test case for an ambiguous run of text that could be UTF-8 or 8859-1. > TXTParser should use incoming charset > ------------------------------------- > > Key: TIKA-335 > URL: https://issues.apache.org/jira/browse/TIKA-335 > Project: Tika > Issue Type: Improvement > Affects Versions: 0.5 > Reporter: Ken Krugler > Priority: Minor > Attachments: TIKA-335.patch > > > The incoming charset (if any) from metadata should be passed to > CharsetDetector.setDeclaredEncoding(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.