It would be helpful, you explained what you are trying to do.

> What do you mean by using UTF-8 verbatim?

   load 'regex'
   T=: 'о сколько нам открытий чудных'  NB. test
   V=: 'аиоуы'                          NB. some vowels
   runs=: ;:^:_1@,@(rxmatches rxfrom])  NB. contigous runs
   ('[^ ',V,']+') runs T
ск льк н м ткр т й ч дн х



----- Original Message ----
> From: Alexander Mikhailov <[email protected]>
> 
> Hi,
> 
> I don't know much about regex. I just need to match
> characters from various Unicode classes. Here -
> 
> http://www.jsoftware.com/help/pcre/pcrepattern.html#SEC2
> 
> under "Unicode character properties" (about 3 screens
> below) is said:
> 
> "When PCRE is built with Unicode character property support,
> three additional escape sequences to match character
> properties are available when UTF-8 mode is selected. They
> are:
> 
>   \p{xx}   a character with the xx property
>   \P{xx}   a character without the xx property
>   \X       an extended Unicode sequence"
> 
> Seems like \p{xx} escape sequence would do what I need,
> but it doesn't seem to work.
> 
> What do you mean by using UTF-8 verbatim?
> 
> I'm not using at the moment any software other than from
> a standard J distribution. That includes jpcre.dll, and
> here -
> 
> http://www.jsoftware.com/help/user/regex_expressions.htm
> 
> is said "J uses the PCRE (Perl Compatible Regular Expression)
> engine through the POSIX regex interface.
> 
> So, the question is: can rxmatch match a Unicode class of
> characters, and if yes, how?
> 
> Alexander
> 
> > Date: Fri, 13 Mar 2009 17:06:47 -0700 (PDT)
> > From: Oleg Kobchenko 
> > Subject: Re: [Jprogramming] regex matching Unicode classes?
> > To: Programming forum 
> > Message-ID:
> > <[email protected]>
> > Content-Type: text/plain; charset=us-ascii
> > 
> > 
> > I know regex very well but that escape is unfamiliar.
> > 
> > Can you use UTF-8 verbatim?
> > 
> > OSS has compile flags, so  it could have different
> > features. 
> > 
> > 
> > Oleg
> > 
> > 
> > On Mar 12, 2009, at 0:26, Alexander Mikhailov
> > wrote:
> > 
> > 
> > 
> > Hi,
> > 
> > I'm trying to construct a regular expression which
> > recognizes a Unicode
> > class of characters.
> > 
> > The following command
> > 
> > '\p{Lu}' rxmatch 'bAb'
> > 
> > produces the error
> > 
> > |pattern error at offset 1     : rxcomp
> > |   (rxerror'')    13!:8[12
> > 
> > I expect it should return 1 1 . The command
> > '\d[ab]' rxmatch 'qw1awer' produces 2 2
> > , as expected. Am I doing
> > something wrong?
> > 
> > I've checked
> > http://www.jsoftware.com/help/pcre/pcrepattern.html ,
> > it says, "When PCRE is built with Unicode character
> > property support,
> > three additional escape sequences to match character
> > properties are
> > available..." Does it mean there are different
> > versions of ~tools/
> > regex/jpcre.dll?..
> > 
> > Thank you,
> > 
> > Alexander
> 
> 
> 
>       
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm



      
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to