The issue comes from unpaired surrogates as <ED A0 80> and <ED B0 80> can be in UTF-8 and your search for <F0 90 80 80> (which is Unicode scalar value U-00010000) cannot find it. But however, when the UTF-8 string converted into UTF-16, <ED A0 80> and <ED B0 80> will become <D800 DC00>, and you can find the same character by searching <D800 DC00> in UTF-16. Unless this unpaired surrogate will be totally eliminated from UTF forms, this issue could be hit. Regards, Jianping. "Ayers, Mike" wrote: > > From: Jianping Yang [mailto:[EMAIL PROTECTED]] > > > This will fix the following problem for example: > > For a searching engine to search the character U-00010000 in > > UTF-8 string, and it > > could not find. But when UTF-8 is converted into UTF-16, it > > can found it there > > because <ED A0 80> and <ED B0 80> are converted into > > U-0001000 in UTF-16. > > (scratches head) > > HUH? > > To find U-00010000 in UTF-8, just search for <F0 90 80 80>[1] and > find it. If you convert to UTF-16, you will need to search for something > else[2], which will not be <00010000>[4], which is the UTF-32 > representation. So I fail to see how anything gets "fixed" here. > > I am getting more convinced as this goes along that there is not a > single technical reason for UTF-8s. > > /|/|ike > > [1] - Byte conversion courtesy of Cima's UTF-8 Magic Pocket Encoder[3]. > > [2] - I can't convert UTF-16 ... Marco? Please? How about a UTF-16 Magic > Pocket Encoder? > > [3] - Which is NOT used to encode magic pockets. > > [4] - Magic Pocket Encoder not necessary for this one.
begin:vcard n:Yang;Jianping tel;fax:650-506-7225 tel;work:650-506-4865 x-mozilla-html:FALSE org:Server Gobalization Technology;Server Technology version:2.1 email;internet:[EMAIL PROTECTED] title:Senior Development Manager adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065; fn:Jianping Yang end:vcard

