Ken, Thanks, your comment could close this argument against UTF-8S syntax as the attack here is groundless now, because there is no need to encoding <ED A0 80> and <ED B0 80> as separate *paired* surrogates in UTF-8S and they will always be converted into 0x10000 in UTF-32 or <F0 90 80 80> in UTF-8. So there is no ambiguity anymore in UTF-8S. Regards, Jianping. Kenneth Whistler wrote: > Jianping said: > > > The issue comes from unpaired surrogates as <ED A0 80> and <ED B0 80> > > These are not *unpaired* surrogates -- they are *paired* surrogates. > Else your equating them to <F0 90 80 80> or U-00010000 would make no sense. > > > can be > > in UTF-8 > > They cannot be in well-formed UTF-8. They can only be in ill-formed > UTF-8 of the irregular subtype. > > > and your search for <F0 90 80 80> (which is Unicode scalar value > > U-00010000) cannot find it. But however, when the UTF-8 string converted into > > UTF-16, <ED A0 80> and <ED B0 80> will become > > <D800 DC00>, and you can find the same character by searching <D800 DC00> in > > UTF-16. > > > > Unless this unpaired surrogate will be totally eliminated from UTF forms, this > > issue could be hit. > > *PAIRED* surrogates. > > --Ken
begin:vcard n:Yang;Jianping tel;fax:650-506-7225 tel;work:650-506-4865 x-mozilla-html:FALSE org:Server Gobalization Technology;Server Technology version:2.1 email;internet:[EMAIL PROTECTED] title:Senior Development Manager adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065; fn:Jianping Yang end:vcard

