--- In [email protected], "silvermoonwoman2001" <sheri...@...> wrote: > > --- In [email protected], "entropyreduction" > <alancampbelllists+yahoo@> wrote: > > > > --- In [email protected], "silvermoonwoman2001" <sherip99@> wrote: > > > > > > --- In [email protected], "entropyreduction" > > > <alancampbelllists+yahoo@> wrote: > > > > > > You could include (in the docs) a script that can identify (using > > > regex) whether there are any high code points in a unicode string. > > > > > > if (regex.pcrematch(?"[^\x{0001}-\x{FFFF}]", h_ustring, "utf8")==0) do > > > ;unicode.services are ok > > > else > > > ;avoid character based unicode.services > > > endif > > > > Sorry, don't understand. h_ustring converted to utf-8, so its > > made up of a string of 8 bit bytes, all of which by definition > > have to be in range \x{0001}-\x{FFFF}, surely?
When a pattern is compiled with PCRE_UTF8, it doesn't search by bytes. You can only search for a byte in utf-8 by using \C in the pattern, but that isn't recommended.
