Personally, I find it interesting to see which and how many characters are affected by the difference in binary ordering between UTF-8 and UTF-16. Affected are all code points in two ranges: U+e000..U+ffff U+10000..U+10ffff The second range contains assignments for characters that are "rare" in the "average text". The first range is interesting: It consists mostly of the PUA range of the BMP, some "specials", and of compatibility character assignments. There are - aside from private use characters and the specials U+fff0..U+fffd - only 20 code points that "survive" an NFKD transformation: 12 CJK Unified Ideographs (U+fa__) 1 U+fb1e HEBREW POINT JUDEO-SPANISH VARIKA 2 ornate parentheses (U+fd3e/f) 2 combining ligatures halves (U+fe20/1) 2 combining tilde halves (U+fe22/3) 1 U+feff ZWNBSP So, given normalized text (NFKD), there are only 20 assigned, non-compatibility, non-special characters that sort either before or after those "very rare" supplementary characters when one binary sorts UTF-8/16 strings. I leave it up to the list to consider this... ;-) markus
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Peter_Constable
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Mark Davis
- RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) Marco Cimarosti
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Mark Davis
- RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) Marco Cimarosti
- RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) Carl W. Brown
- RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) Misha . Wolf
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Mark Davis
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Michael \(michka\) Kaplan
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Peter_Constable
- RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) Markus Scherer
- RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) Carl W. Brown
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Peter_Constable
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Peter_Constable
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Mark Davis
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Peter_Constable
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) DougEwell2
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) DougEwell2
- RE: UTF-8S (was: Re: ISO vs Unicode UTF-8) Carl W. Brown
- Re: UTF-8S (was: Re: ISO vs Unicode UTF-8) Mark Davis
- Fw: UTF-8S (was: Re: ISO vs Unicode UTF-8) Mark Davis