I think that Python will provide instead a factory that will return the appropriate concrete codec when given an encoding code and the standard body to which it must be conforming to : ISO, IETF (for MIME and the IANA database, as specified in RFC's), W3C (for HTML5), and possibly other private standards (e.g. Microsoft, IBM, Apple and Adobe for their own code pages), ITU (for some GSM encodings and encodings used in teletext) Instructed with the standard type (or registry), the encoding "name" can be mapped correctly without needing reimplementations and new conformance tests and validations. Note that the default Encoding class in Java does not have such indication of the registry, it assumes its own registry which does not recognize the same set of encoding names and aliases. Now if you go to the list of encodings supported in each OS, each one has its own flavor, so the OS type would also be indicated as one of the possible registries. Some of them will make diferences between capitalization forms, or in their use of separators. There are also reistries implemented in various RDBMS engines (some of them storing the mapping in a system table where they are extensible, sometimes implemented as a simple table, sometimes as a Java class or procedure/function written in the query language, and stored in the database). In other words, before the layer implementing the actual codecs, there's a layer to map the various possible registries. A factory could also be implemented by looking for a few entries for its own definitions, and then searching for aliases within another default registry. Registries can be chained, but the IANA database should be at end of all chains starting from a given registry. Registries may also be "pluggable" beside what is in the library or system level, using a EncodingProvider that will implement a registry.
A good codec implementation should also support these 3 modes of operation : * mapping unknown/invalid codes as exceptions that will be thrown without returning the converted sequence * mapping a default valid replacement character (which should be configurable) * ignoring the invalid codes (possibly returning a status saying that the conversion was lossy). In addition a codec could also work in a "tolerant" mode : when several source codes are mapped to the same target code, and one of them is considered "canonical" and the other ones are just "aliases", the conversion is not reversible exactly if the source text contains one of these aliased codes. But if working in strict mode, these source modes could be either signaled by an exception, or returned by still indicating a lossy result status. But for some "standards" the encoding is also ambiguous (e.g. in legacy GSM encodings, which are still widely used, you cannot make a difference between a Latin letter A, a Cyrillic letter A and a Greek letter Alpha, without first looking at the language code, which may still be ambiguous).

