Short comment about the "detection" tools from a previous discussion.
The tools supposed to detect the coding scheme are all working with a simple logical mathematical rule: p ==> q <==> non q ==> non p . Shortly -- and consequence -- they do not detect a coding scheme they only detect "a" possible coding schme. The Flexible String Representation has conceptually to face the same problem. It splits "unicode" in chunks and it has to solve two problems at the same time, the coding and the handling of multiple "char sets". The problem? It fails. "This poor Flexible String Representation does not succeed to solve the problem it create itsself." Workaround: add more flags (see PEP 3xx.) Still thinking "mathematics" (limit). For a given repertoire of characters one can assume that every char has its own flag (because of the usage of multiple coding schemes). Conceptually, one will quickly realize, at the end, that they will be an equal amount of flags and an amount of characters and the only valid solution it to work with a unique set of encoded code points, where every element of this set *is* its own flag. Curiously, that's what the utf-* (and btw other coding schemes in the byte string world) are doing (with plenty of other advantages). Already said. An healthy coding scheme can only work with a unique set of encoded code points. That's why we have to live today with all these coding schemes. jmf -- https://mail.python.org/mailman/listinfo/python-list