[ 
https://issues.apache.org/jira/browse/TIKA-274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved TIKA-274.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.5
         Assignee: Jukka Zitting

Hmm, good point. It looks like the feature was never implemented in the ICU4J 
code that we're using.

I modified the TXTParser code in revision 813624 so that we now always use the 
given encoding as the default in case the automatic encoding detection fails.

This behavior could be further improved by making the encoding hint affect the 
detection code for example when choosing between the highly similar ISO-8859-X 
character sets. Please file a new improvement issue if you have a concrete use 
case where this would be beneficial.

> CharsetDetector.setDeclaredEncoding has no effect
> -------------------------------------------------
>
>                 Key: TIKA-274
>                 URL: https://issues.apache.org/jira/browse/TIKA-274
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.4, 0.5
>            Reporter: Piotr B.
>            Assignee: Jukka Zitting
>             Fix For: 0.5
>
>
> In TXTParser.java we may read:
>         // Use the declared character encoding, if available
>         String encoding = metadata.get(Metadata.CONTENT_ENCODING);
>         if (encoding != null) {
>             detector.setDeclaredEncoding(encoding);
>         }
> But it seems to be not implemented feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to