Quoting Philippe Verdy <[EMAIL PROTECTED]>: > From: "Jon Hanna" <[EMAIL PROTECTED]> > > Quoting Marco Cimarosti <[EMAIL PROTECTED]>: > > > > > Jon Hanna wrote: > > > > I refuse to rename my UTF-81920! > > > > > > Doug, Shlomi, there's a new one out there! > > > Jon, would you mind describing it? > > > > There are two different UTF-81920s (the resultant ambiguity is very much > in the > > spirit of UTF-81920). > > I can't find any reference document about "UTF-81920" in Google.
That's because there are no documents about UTF-81920. It barely qualifies as the starting point of a gedankenexperiment, never mind as a spec. That's why this thread is marked as OT. The closest thing to a spec is the email I just sent to this list. > All I can find is documents describing "UTF-8" which encodes 128 characters > on 1 byte, and 1920 characters on 2 bytes. Excellent, the inclusion of "1920" in the name is then wonderfully serendipitous. > Does it mean that UTF-81920 is a restriction of UTF-8 to the range > [U+0000..U+007FF] which can be encoded with at most 2 bytes with UTF-8? No, it is as explained in the email. > UTF-81920 would then effectively not be a Unicode-compatible encoding scheme > as it would be restricted to only Latin, Greek, Coptic, Cyrillic, Armenian, > Hebrew and Arabic with their diacritics, excluding all Asian scripts, > surrogates, and compatibility characters, Arabic/Hebrew extension, common > ligatures like "fi" and presentation forms, as well as currency signs (such > as the Euro symbol coded at U+20AC), technical symbols, and even the BOM > U+FEFF? This encoding does not seem suitable to even represent successfully > the legacy DOS/OEM codepages, or the legacy PostScript and Mac charsets. Yes, day-dream concepts mentioned in jest do often have technical short-comings. -- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*

