I don't get point from this argument as UTF-8S is exactly mapped to UTF-16 in UTF-16 code unit which means one UTF-16 code unit will be mapped to either one, two, or three bytes in UTF-8S. So if you are saying there is ambiguous in UTF-8S, it should also apply to UTF-16, which does not make sense to me. Regards, Jianping. [EMAIL PROTECTED] wrote: > On 06/07/2001 10:38:15 AM DougEwell2 wrote: > > >The ambiguity comes from the fact that, if I am using UTF-8s and I want to > >represent the sequence of (invalid) scalar values <D800 DC00>, I must use > the > >UTF-8s sequence <ED A0 80 ED B0 80>, and if I want to represent the > (valid) > >scalar value <10000>, I must *also* use the UTF-8s sequence <ED A0 80 ED > B0 > >80>. Unless you have a crystal ball or are extremely good with tarot > cards, > >you have no way, upon reverse-mapping the UTF-8s sequence <ED A0 80 ED B0 > >80>, to know whether it is supposed to be mapped back to <D800 DC00> or to > ><10000>. > > This brings out a good point. We can't yet say that UTF-8s is ambiguous > since it is not formally defined. What this does highlight, though, is a > gap in the proposal that must be addressed before it could be considered: a > well-formed definition for UTF-8 must (by D29) provide a *unique* > representation for *all* USVs, and unless the proposal is amended to remove > D800 - DFFF from the codespace, it must be amended to provide some unique > means of representing things like U+D800. What it is *not allowed* to be is > ambiguous. If UTF-8s considers <ED A0 80 ED B0 80> to mean U+10000, then it > must provide some sequence other than <ED A0 80> to mean U+D800. > > >Premise: Unicode should not, and does not, define ambiguous UTFs. > > I think we agree on this. > > Yes. > > >Premise: UTF-8s is ambiguous in its handling of surrogate code points. > > I tried to prove this above. > > > >Conclusion: Unicode should not define UTF-8s. > > I definitely agree with the idea your getting at, but am just looking from > a very slightly different angle. The conclusion does not necessarily follow > because UTF-8s is only a proposal that potentially can be modified. If you > say, "UTF-8s as has been currently proposed would be inconsistent with > D29", then I agree. The proposed definition for UTF-8s *could* potentiall > be revised, though, and so the argument that a UTF-8s cannot be added to > Unicode doesn't hold. > > UTF-8s definitely is not tenable as currently proposed, given the current > definitions. I think we agree on that. > > - Peter > > --------------------------------------------------------------------------- > Peter Constable > > Non-Roman Script Initiative, SIL International > 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA > Tel: +1 972 708 7485 > E-mail: <[EMAIL PROTECTED]>
begin:vcard n:Yang;Jianping tel;fax:650-506-7225 tel;work:650-506-4865 x-mozilla-html:FALSE org:Server Gobalization Technology;Server Technology version:2.1 email;internet:[EMAIL PROTECTED] title:Senior Development Manager adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065; fn:Jianping Yang end:vcard

