The issue comes from unpaired surrogates as <ED A0 80> and <ED B0 80> can be
in UTF-8 and your search for <F0 90 80 80> (which is Unicode scalar value
U-00010000)  cannot find it. But however, when the UTF-8 string converted into
UTF-16, <ED A0 80> and <ED B0 80> will become
<D800 DC00>, and you can find the same character by searching <D800 DC00> in
UTF-16.

Unless this unpaired surrogate will be totally eliminated from UTF forms, this
issue could be hit.

Regards,
Jianping.

"Ayers, Mike" wrote:

> > From: Jianping Yang [mailto:[EMAIL PROTECTED]]
>
> > This will fix the following problem for example:
> > For a searching engine to search the character  U-00010000 in
> > UTF-8 string, and it
> > could not find. But when UTF-8 is converted into UTF-16, it
> > can found it there
> > because <ED A0 80> and  <ED B0 80> are converted into
> > U-0001000 in UTF-16.
>
>         (scratches head)
>
>         HUH?
>
>         To find U-00010000 in UTF-8, just search for <F0 90 80 80>[1] and
> find it.  If you convert to UTF-16, you will need to search for something
> else[2], which will not be <00010000>[4], which is the UTF-32
> representation.  So I fail to see how anything gets "fixed" here.
>
>         I am getting more convinced as this goes along that there is not a
> single technical reason for UTF-8s.
>
> /|/|ike
>
> [1] - Byte conversion courtesy of Cima's UTF-8 Magic Pocket Encoder[3].
>
> [2] - I can't convert UTF-16 ... Marco?  Please?  How about a UTF-16 Magic
> Pocket Encoder?
>
> [3] - Which is NOT used to encode magic pockets.
>
> [4] - Magic Pocket Encoder not necessary for this one.
begin:vcard 
n:Yang;Jianping
tel;fax:650-506-7225
tel;work:650-506-4865
x-mozilla-html:FALSE
org:Server Gobalization Technology;Server Technology
version:2.1
email;internet:[EMAIL PROTECTED]
title:Senior Development Manager
adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065;
fn:Jianping Yang
end:vcard

Reply via email to