[
https://issues.apache.org/jira/browse/RAT-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085049#comment-13085049
]
Jukka Zitting commented on RAT-96:
----------------------------------
Tika has automatic encoding detection support based on code and byte frequency
tables from ICU4J. It's not perfect, but could be used as a starting point here.
> Check source files for unexpected encodings
> -------------------------------------------
>
> Key: RAT-96
> URL: https://issues.apache.org/jira/browse/RAT-96
> Project: RAT
> Issue Type: New Feature
> Reporter: Sebb
>
> Idea for possible enhancement:
> Source files with characters in encodings other than ASCII can easily get
> mangled, so it might be worth offering a tool to report these.
> For example, I have come across Javadoc which uses dashes instead of hyphens,
> and at some point the encoded dash got corrupted.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira