--- In [email protected], "entropyreduction"
<alancampbelllists+ya...@...> wrote:
>
> Sorry, don't understand. h_ustring converted to utf-8, so its
> made up of a string of 8 bit bytes, all of which by definition
> have to be in range \x{0001}-\x{FFFF}, surely? I'd need to
> recognise the UTF-8 equivalents of
>
> Combining Diacritical Marks (0300-036F)
> Combining Diacritical Marks Supplement (1DC0-1DFF)
> Combining Diacritical Marks for Symbols (20D0-20FF)
> Combining Half Marks (FE20-FE2F)
> http://en.wikipedia.org/wiki/Combining_character#Unicode_ranges
If you wanted also wanted to be sure the string didn't have any of those marks
in it, I think it could be done:
if (regex.pcrematch(?"[^\x{0001}-\x{FFFF}]|\p{M}", h_ustring, "utf8")==0) do
;unicode.services are ok
else
;avoid character based unicode.services
endif
I guess we need a test string that has some marks in it.
There is also a metacharacter for matching in utf8 an "extended character",
meaning a character inclusive of any combination marks: "\X"
Regards,
Sheri