[power-pro] Re: Unicode: multibyte

entropyreduction Thu, 20 Aug 2009 21:09:36 -0700

Unfortunately I can;t access .NET layers with the compiler I have.

I'm gonna test the IBM ICU libraries and see if they work.

In the meantime, I'm gonna restrict current version unicode plugin: no 
surrogate pairs or combining character sequences (therefore no characters 
greater than 0xFFFF).  I'll update docs accordingly.  If ICU works out, I'll 
have to rewrite plugin extensively (e.g. current plugin is pure C, largely to 
make it easy to clone Bruce's code for many of his functions).  ICU will 
require transform to C++.

--- In [email protected], "swzoh" <sean...@...> wrote:

> > There are quite a few services in plugin that aren't right if surrogate 
> > pairs and (oh dear) combining character sequences are taken into account.  
> > length is wrong, all the get/set char stuff, index, slice, etc.

> 
> Now it may be time to differentiate between Unicode character and (2-byte) 
> Wide character, as differentiated between character and byte with MBCS. I'm 
> not aware of any Win32 APIs which take care of surrogate pairs etc fully as 
> expected, but .NET covers them pretty completely. It supports all of 
> UTF-8/UTF-16/UTF-32, i.e. codepages 65001/1200/1201/12000/12001. It's .NET, 
> however, can be missing in some systems.
> 
> http://msdn.microsoft.com/en-us/library/system.string.aspx
> http://msdn.microsoft.com/en-us/library/system.globalization.stringinfo.aspx
> http://msdn.microsoft.com/en-us/library/system.text.aspx

[power-pro] Re: Unicode: multibyte

Reply via email to