Look at the verb "trace" defined in J7 '~addons/general/misc/trace.ijs'. This implements the J parsing, I think. I went to a lot of trouble using the J directly to extend the J parsing a while back. Now I think I should have gone to the trouble to understand trace as it would have done a much better job.
It should give you a chance to do your own name handling to support UTF. The only thing that might be a little sticky is handling copulas. On Sat, Feb 12, 2011 at 3:04 PM, Dan Bron <[email protected]> wrote: > I'm trying to take a first pass at a J framework/utility that will allow us > to implement JUICE. Specifically, I want to provide a framework that: > > - Provides a REPL > - Fully supports Unicode > - Provides lexing (essentially identical to J lexing except Unicode > identifiers are allowed) > - Provides parsing (identical to J by default but over-rideable) > > I have some ideas on how to approach this, and I would like some feedback > and advice. Specifically, I'm considering extending J's rhematics to admit > Unicode identifiers. I intend to take a J sentence as input, lex it into > words as usual, except permitting sequences of certain Unicode characters as > identifiers (user-assignable or -assigned names), encoding/decoding those > names as Punycode (⍳ → xn_wkh), and then executing the resulting J as > usual. > > For example: > > ⍳ =: <@#: i. > ⍳ 2 2 > +---+---+ > |0 0|0 1| > +---+---+ > |1 0|1 1| > +---+---+ > > This works by providing proxying all J I/O through a transformative REPL > (replacing all non-ASCII with Punycode equivalents), so that what the J > engine would see is: > > xn_wkh =: <@#: i. > xn_wkh 2 2 > +---+---+ > |0 0|0 1| > +---+---+ > |1 0|1 1| > +---+---+ > > I'd like > > - Feedback on this approach > - Advice on how to implement it > > Also, I think I once saw a Unicode spec laying out all the characters the > consortium considered appropriate to compose programming language > identifiers. Am I misremembering? If not, can someone point me at it, and > if so, I'd like suggestions on useful word-formatting rules for Unicode > characters. > > > -Dan > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
