On Tue, Apr 27, 2010 at 7:53 PM, bill lam <[email protected]> wrote: > regex default to use utf-8 encoding but those htmls use latin-1. > Either convert text to utf-8 or set regex to non-utf8 mode. > > rxutf8 0
After reading open'regex' and http://www.pcre.org/pcre.txt What I thought I would want rxutf8 do_jregex_ 'PCRE_UTF8 23 b. PCRE_NO_UTF8_CHECK' Unfortunately, PCRE_NO_UTF8_CHECK is not defined, and when I look for its value, I find http://read.pudn.com/downloads126/sourcecode/delphi_control/536510/PCRE/pcre.h__.htm which suggests PCRE_UTF8=: 16b800 PCRE_NO_UTF8_CHECK=: 16b2000 So now I know that I am confused. Can anyone suggest how I might be able to use pcre's ability to recognize word forming utf8 characters without also losing access to latin1 content? Thanks, -- Raul ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
