[power-pro] Re: Unicode: multibyte

silvermoonwoman2001 Mon, 24 Aug 2009 21:22:25 -0700

--- In [email protected], "entropyreduction" 
<alancampbelllists+ya...@...> wrote:
>
> Sorry, don't understand. h_ustring converted to utf-8, so its
> made up of a string of 8 bit bytes, all of which by definition
> have to be in range \x{0001}-\x{FFFF}, surely? I'd need to
> recognise the UTF-8 equivalents of 
>
> Combining Diacritical Marks (0300-036F)
> Combining Diacritical Marks Supplement (1DC0-1DFF)
> Combining Diacritical Marks for Symbols (20D0-20FF)
> Combining Half Marks (FE20-FE2F)
> http://en.wikipedia.org/wiki/Combining_character#Unicode_ranges


If you wanted also wanted to be sure the string didn't have any of those marks 
in it, I think it could be done:

if (regex.pcrematch(?"[^\x{0001}-\x{FFFF}]|\p{M}", h_ustring, "utf8")==0) do
;unicode.services are ok
else
;avoid character based unicode.services
endif

I guess we need a test string that has some marks in it.

There is also a metacharacter for matching in utf8 an "extended character", 
meaning a character inclusive of any combination marks: "\X"

Regards,
Sheri

[power-pro] Re: Unicode: multibyte

Reply via email to