[power-pro] Re: Unicode: multibyte

silvermoonwoman2001 Mon, 24 Aug 2009 20:23:33 -0700

--- In [email protected], "entropyreduction" 
<alancampbelllists+ya...@...> wrote:
>
> --- In [email protected], "silvermoonwoman2001" <sherip99@> wrote:
> >
> > --- In [email protected], "entropyreduction" 
> > <alancampbelllists+yahoo@> wrote:
> > 
> > You could include (in the docs) a script that can identify (using 
> > regex) whether there are any high code points in a unicode string.
> > 
> > if (regex.pcrematch(?"[^\x{0001}-\x{FFFF}]", h_ustring, "utf8")==0) do
> >   ;unicode.services are ok
> > else
> >   ;avoid character based unicode.services
> > endif
> 
> Sorry, don't understand. h_ustring converted to utf-8, so its
> made up of a string of 8 bit bytes, all of which by definition
> have to be in range \x{0001}-\x{FFFF}, surely?


It works. Try it.

On my previous test string, it can find e.g., \x{10401} and the other upper 
range character in the utf-string. Every character in a utf-8 string (except 
low code ascii) is encoded into multiple bytes for utf-8. PCRE can still find 
them as characters by \x{code point}

Regards,
Sheri

[power-pro] Re: Unicode: multibyte

Reply via email to