On Tue, Apr 27, 2010 at 7:53 PM, bill lam <[email protected]> wrote:
> regex default to use utf-8 encoding but those htmls use latin-1.
> Either convert text to utf-8 or set regex to non-utf8 mode.
>
>   rxutf8 0

After reading
   open'regex'
and
   http://www.pcre.org/pcre.txt

What I thought I would want
   rxutf8 do_jregex_ 'PCRE_UTF8 23 b. PCRE_NO_UTF8_CHECK'

Unfortunately, PCRE_NO_UTF8_CHECK is not defined, and when
I look for its value, I find
http://read.pudn.com/downloads126/sourcecode/delphi_control/536510/PCRE/pcre.h__.htm

which suggests
PCRE_UTF8=: 16b800
PCRE_NO_UTF8_CHECK=: 16b2000

So now I know that I am confused.

Can anyone suggest how I might be able to use pcre's ability to
recognize word forming utf8 characters without also losing access
to latin1 content?

Thanks,

-- 
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to