Чтв, 29 Апр 2010, Raul Miller писал(а): > On Tue, Apr 27, 2010 at 7:53 PM, bill lam <[email protected]> wrote: > > regex default to use utf-8 encoding but those htmls use latin-1. > > Either convert text to utf-8 or set regex to non-utf8 mode. > > > > rxutf8 0 > > After reading > open'regex' > and > http://www.pcre.org/pcre.txt > > What I thought I would want > rxutf8 do_jregex_ 'PCRE_UTF8 23 b. PCRE_NO_UTF8_CHECK' > > Unfortunately, PCRE_NO_UTF8_CHECK is not defined, and when > I look for its value, I find > http://read.pudn.com/downloads126/sourcecode/delphi_control/536510/PCRE/pcre.h__.htm > > which suggests > PCRE_UTF8=: 16b800 > PCRE_NO_UTF8_CHECK=: 16b2000 > > So now I know that I am confused. > > Can anyone suggest how I might be able to use pcre's ability to > recognize word forming utf8 characters without also losing access > to latin1 content? > > Thanks,
rxutf8 is intended to called as either 'rxutf8 0' or 'rxutf8 1', do you mean that the constant for enable/disable utf8 option is incorrect inside jregex? -- regards, ==================================================== GPG key 1024D/4434BAB3 2008-08-24 gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
