It would be helpful, you explained what you are trying to do.
> What do you mean by using UTF-8 verbatim?
load 'regex'
T=: 'о сколько нам открытий чудных' NB. test
V=: 'аиоуы' NB. some vowels
runs=: ;:^:_1@,@(rxmatches rxfrom]) NB. contigous runs
('[^ ',V,']+') runs T
ск льк н м ткр т й ч дн х
----- Original Message ----
> From: Alexander Mikhailov <[email protected]>
>
> Hi,
>
> I don't know much about regex. I just need to match
> characters from various Unicode classes. Here -
>
> http://www.jsoftware.com/help/pcre/pcrepattern.html#SEC2
>
> under "Unicode character properties" (about 3 screens
> below) is said:
>
> "When PCRE is built with Unicode character property support,
> three additional escape sequences to match character
> properties are available when UTF-8 mode is selected. They
> are:
>
> \p{xx} a character with the xx property
> \P{xx} a character without the xx property
> \X an extended Unicode sequence"
>
> Seems like \p{xx} escape sequence would do what I need,
> but it doesn't seem to work.
>
> What do you mean by using UTF-8 verbatim?
>
> I'm not using at the moment any software other than from
> a standard J distribution. That includes jpcre.dll, and
> here -
>
> http://www.jsoftware.com/help/user/regex_expressions.htm
>
> is said "J uses the PCRE (Perl Compatible Regular Expression)
> engine through the POSIX regex interface.
>
> So, the question is: can rxmatch match a Unicode class of
> characters, and if yes, how?
>
> Alexander
>
> > Date: Fri, 13 Mar 2009 17:06:47 -0700 (PDT)
> > From: Oleg Kobchenko
> > Subject: Re: [Jprogramming] regex matching Unicode classes?
> > To: Programming forum
> > Message-ID:
> > <[email protected]>
> > Content-Type: text/plain; charset=us-ascii
> >
> >
> > I know regex very well but that escape is unfamiliar.
> >
> > Can you use UTF-8 verbatim?
> >
> > OSS has compile flags, so it could have different
> > features.
> >
> >
> > Oleg
> >
> >
> > On Mar 12, 2009, at 0:26, Alexander Mikhailov
> > wrote:
> >
> >
> >
> > Hi,
> >
> > I'm trying to construct a regular expression which
> > recognizes a Unicode
> > class of characters.
> >
> > The following command
> >
> > '\p{Lu}' rxmatch 'bAb'
> >
> > produces the error
> >
> > |pattern error at offset 1 : rxcomp
> > | (rxerror'') 13!:8[12
> >
> > I expect it should return 1 1 . The command
> > '\d[ab]' rxmatch 'qw1awer' produces 2 2
> > , as expected. Am I doing
> > something wrong?
> >
> > I've checked
> > http://www.jsoftware.com/help/pcre/pcrepattern.html ,
> > it says, "When PCRE is built with Unicode character
> > property support,
> > three additional escape sequences to match character
> > properties are
> > available..." Does it mean there are different
> > versions of ~tools/
> > regex/jpcre.dll?..
> >
> > Thank you,
> >
> > Alexander
>
>
>
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm