Ken,

Thanks, your comment could close this argument  against UTF-8S syntax as the attack
here is groundless now, because there is no need to encoding <ED A0 80> and <ED B0
80> as separate *paired* surrogates in UTF-8S and they will always be converted
into  0x10000 in UTF-32 or <F0 90 80 80> in UTF-8. So there is no ambiguity anymore
in UTF-8S.

Regards,
Jianping.

Kenneth Whistler wrote:

> Jianping said:
>
> > The issue comes from unpaired surrogates as <ED A0 80> and <ED B0 80>
>
> These are not *unpaired* surrogates -- they are *paired* surrogates.
> Else your equating them to <F0 90 80 80> or U-00010000 would make no sense.
>
> > can be
> > in UTF-8
>
> They cannot be in well-formed UTF-8. They can only be in ill-formed
> UTF-8 of the irregular subtype.
>
> > and your search for <F0 90 80 80> (which is Unicode scalar value
> > U-00010000)  cannot find it. But however, when the UTF-8 string converted into
> > UTF-16, <ED A0 80> and <ED B0 80> will become
> > <D800 DC00>, and you can find the same character by searching <D800 DC00> in
> > UTF-16.
> >
> > Unless this unpaired surrogate will be totally eliminated from UTF forms, this
> > issue could be hit.
>
> *PAIRED* surrogates.
>
> --Ken
begin:vcard 
n:Yang;Jianping
tel;fax:650-506-7225
tel;work:650-506-4865
x-mozilla-html:FALSE
org:Server Gobalization Technology;Server Technology
version:2.1
email;internet:[EMAIL PROTECTED]
title:Senior Development Manager
adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065;
fn:Jianping Yang
end:vcard

Reply via email to