[ 
https://issues.apache.org/jira/browse/RAT-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085049#comment-13085049
 ] 

Jukka Zitting commented on RAT-96:
----------------------------------

Tika has automatic encoding detection support based on code and byte frequency 
tables from ICU4J. It's not perfect, but could be used as a starting point here.

> Check source files for unexpected encodings
> -------------------------------------------
>
>                 Key: RAT-96
>                 URL: https://issues.apache.org/jira/browse/RAT-96
>             Project: RAT
>          Issue Type: New Feature
>            Reporter: Sebb
>
> Idea for possible enhancement:
> Source files with characters in encodings other than ASCII can easily get 
> mangled, so it might be worth offering a tool to report these.
> For example, I have come across Javadoc which uses dashes instead of hyphens, 
> and at some point the encoded dash got corrupted.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to