--- In [email protected], "entropyreduction" 
<alancampbelllists+ya...@...> wrote:
>
> --- In [email protected], "Sheri" <sherip99@> wrote:
> > Yes, that works. Also using upper plane characters that I can actually see 
> > as proper looking characters on the table in Firefox, I can put them into 
> > my own html doc as utf8 and see them equally well. They look like box 
> > characters in unicode.messagebox or IE. I suppose this is a font or font 
> > script issue.
> 
> Dunno.  When I have a chance might ionvestigate for messagebox.
> 
> There are quite a few services in plugin that aren't right if surrogate pairs 
> and (oh dear) combining character sequences are taken into account.  length 
> is wrong, all the get/set char stuff, index, slice, etc.
> 

Now it may be time to differentiate between Unicode character and (2-byte) Wide 
character, as differentiated between character and byte with MBCS. I'm not 
aware of any Win32 APIs which take care of surrogate pairs etc fully as 
expected, but .NET covers them pretty completely. It supports all of 
UTF-8/UTF-16/UTF-32, i.e. codepages 65001/1200/1201/12000/12001. It's .NET, 
however, can be missing in some systems.

http://msdn.microsoft.com/en-us/library/system.string.aspx
http://msdn.microsoft.com/en-us/library/system.globalization.stringinfo.aspx
http://msdn.microsoft.com/en-us/library/system.text.aspx

Reply via email to