[ https://issues.apache.org/jira/browse/TIKA-339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-339. -------------------------------- Resolution: Fixed Assignee: Jukka Zitting Committed in revision 890130. > HtmlParser & TXTParser should not use language returned by CharsetDetector if > language hint has been provided > ------------------------------------------------------------------------------------------------------------- > > Key: TIKA-339 > URL: https://issues.apache.org/jira/browse/TIKA-339 > Project: Tika > Issue Type: Bug > Affects Versions: 0.6 > Reporter: Ken Krugler > Assignee: Jukka Zitting > Priority: Minor > Fix For: 0.6 > > Attachments: TIKA-339.patch > > > Currently the code used to call CharsetDetector in both TXTParser and > HtmlParser is that any incoming language in the metadata map gets replaced if > the detector returns a language. > Given the low reliability of this language result, it should only be used in > cases where there is no provided language, as typically this is coming in > from either the Http response header or (for the HtmlParser) a meta tag or > some other tag attribute. In all those cases, the incoming language is more > accurate than the guess by the CharsetDetector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.