Yeah, you need to fix your email client. So you are parsing C#. I have yet to see a C# program with Unicode identifiers. Obviously Unicode character classes are not supported by PCRE shipped with J. Also it may need to operate on two0byte Unicode rather than UTF-8.
----- Original Message ---- > From: Alexander Mikhailov <[email protected]> > > >It would be helpful, you explained what you are trying to do. > > There is a lexical part of a grammar I'm trying to get > parsed. The part particularly says: > > identifier-or-keyword: > identifier-start-character identifier-part-charactersopt > > identifier-start-character: > letter-character > _ (the underscore character U+005F) > > letter-character: > A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl > A unicode-escape-sequence representing a character of classes > Lu, Ll, Lt, Lm, Lo, or Nl > > combining-character: > A Unicode character of classes Mn or Mc > A unicode-escape-sequence representing a character of classes > Mn or Mc > > etc. I'm trying to build a lexer for the grammar. > > Sorry, I didn't get what UTF-8 verbatim means. I just got a > bunch of question marks. > > Alexander > > ----- > > Date: Sat, 14 Mar 2009 21:10:11 -0700 (PDT) > From: Oleg Kobchenko > Subject: Re: [Jprogramming] regex matching Unicode classes? > To: Programming forum > Message-ID: <[email protected]> > Content-Type: text/plain; charset=utf-8 > > > It would be helpful, you explained what you are trying to do. > > > > What do you mean by using UTF-8 verbatim? > > load 'regex' > T=: '? ??????? ??? ???????? ??????' NB. test > V=: '?????' NB. some vowels > runs=: ;:^:_1@,@(rxmatches rxfrom]) NB. contigous runs > ('[^ ',V,']+') runs T > ?? ??? ? ? ??? ? ? ? ?? ? > > > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
