[power-pro] Re: Unicode: multibyte

silvermoonwoman2001 Mon, 24 Aug 2009 20:41:01 -0700

--- In [email protected], "silvermoonwoman2001" <sheri...@...> wrote:
>
> --- In [email protected], "entropyreduction" 
> <alancampbelllists+yahoo@> wrote:
> >
> > --- In [email protected], "silvermoonwoman2001" <sherip99@> wrote:
> > >
> > > --- In [email protected], "entropyreduction" 
> > > <alancampbelllists+yahoo@> wrote:
> > > 
> > > You could include (in the docs) a script that can identify (using 
> > > regex) whether there are any high code points in a unicode string.
> > > 
> > > if (regex.pcrematch(?"[^\x{0001}-\x{FFFF}]", h_ustring, "utf8")==0) do
> > >   ;unicode.services are ok
> > > else
> > >   ;avoid character based unicode.services
> > > endif
> > 
> > Sorry, don't understand. h_ustring converted to utf-8, so its
> > made up of a string of 8 bit bytes, all of which by definition
> > have to be in range \x{0001}-\x{FFFF}, surely?


When a pattern is compiled with PCRE_UTF8, it doesn't search by bytes. You can 
only search for a byte in utf-8 by using \C in the pattern, but that isn't 
recommended.

[power-pro] Re: Unicode: multibyte

Reply via email to