RE: UTF-8 syntax

Ayers, Mike Fri, 08 Jun 2001 13:24:02 -0700

> From: Jianping Yang [mailto:[EMAIL PROTECTED]]

> This will fix the following problem for example:
> For a searching engine to search the character  U-00010000 in 
> UTF-8 string, and it
> could not find. But when UTF-8 is converted into UTF-16, it 
> can found it there
> because <ED A0 80> and  <ED B0 80> are converted into 
> U-0001000 in UTF-16.

        (scratches head)

        HUH?

        To find U-00010000 in UTF-8, just search for <F0 90 80 80>[1] and
find it.  If you convert to UTF-16, you will need to search for something
else[2], which will not be <00010000>[4], which is the UTF-32
representation.  So I fail to see how anything gets "fixed" here.

        I am getting more convinced as this goes along that there is not a
single technical reason for UTF-8s.



/|/|ike


[1] - Byte conversion courtesy of Cima's UTF-8 Magic Pocket Encoder[3].

[2] - I can't convert UTF-16 ... Marco?  Please?  How about a UTF-16 Magic
Pocket Encoder?

[3] - Which is NOT used to encode magic pockets.

[4] - Magic Pocket Encoder not necessary for this one.
RE: UTF-8 syntax

Reply via email to