--- In [email protected], "entropyreduction" <alancampbelllists+ya...@...> wrote: > > --- In [email protected], "Sheri" <sherip99@> wrote: > > Yes, that works. Also using upper plane characters that I can actually see > > as proper looking characters on the table in Firefox, I can put them into > > my own html doc as utf8 and see them equally well. They look like box > > characters in unicode.messagebox or IE. I suppose this is a font or font > > script issue. > > Dunno. When I have a chance might ionvestigate for messagebox. > > There are quite a few services in plugin that aren't right if surrogate pairs > and (oh dear) combining character sequences are taken into account. length > is wrong, all the get/set char stuff, index, slice, etc. >
Now it may be time to differentiate between Unicode character and (2-byte) Wide character, as differentiated between character and byte with MBCS. I'm not aware of any Win32 APIs which take care of surrogate pairs etc fully as expected, but .NET covers them pretty completely. It supports all of UTF-8/UTF-16/UTF-32, i.e. codepages 65001/1200/1201/12000/12001. It's .NET, however, can be missing in some systems. http://msdn.microsoft.com/en-us/library/system.string.aspx http://msdn.microsoft.com/en-us/library/system.globalization.stringinfo.aspx http://msdn.microsoft.com/en-us/library/system.text.aspx
