--- In [email protected], "entropyreduction"
<alancampbelllists+ya...@...> wrote:
>
> --- In [email protected], "silvermoonwoman2001" <sherip99@> wrote:
> >
> > --- In [email protected], "entropyreduction"
> > <alancampbelllists+yahoo@> wrote:
> >
> > You could include (in the docs) a script that can identify (using
> > regex) whether there are any high code points in a unicode string.
> >
> > if (regex.pcrematch(?"[^\x{0001}-\x{FFFF}]", h_ustring, "utf8")==0) do
> > ;unicode.services are ok
> > else
> > ;avoid character based unicode.services
> > endif
>
> Sorry, don't understand. h_ustring converted to utf-8, so its
> made up of a string of 8 bit bytes, all of which by definition
> have to be in range \x{0001}-\x{FFFF}, surely?
It works. Try it.
On my previous test string, it can find e.g., \x{10401} and the other upper
range character in the utf-string. Every character in a utf-8 string (except
low code ascii) is encoded into multiple bytes for utf-8. PCRE can still find
them as characters by \x{code point}
Regards,
Sheri