[ 
https://issues.apache.org/jira/browse/TIKA-339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved TIKA-339.
--------------------------------

    Resolution: Fixed
      Assignee: Jukka Zitting

Committed in revision 890130.

> HtmlParser & TXTParser should not use language returned by CharsetDetector if 
> language hint has been provided
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-339
>                 URL: https://issues.apache.org/jira/browse/TIKA-339
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 0.6
>            Reporter: Ken Krugler
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: TIKA-339.patch
>
>
> Currently the code used to call CharsetDetector in both TXTParser and 
> HtmlParser is that any incoming language in the metadata map gets replaced if 
> the detector returns a language.
> Given the low reliability of this language result, it should only be used in 
> cases where there is no provided language, as typically this is coming in 
> from either the Http response header or (for the HtmlParser) a meta tag or 
> some other tag attribute. In all those cases, the incoming language is more 
> accurate than the guess by the CharsetDetector.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to