I'm trying to take a first pass at a J framework/utility that will allow us to 
implement JUICE.  Specifically, I want to provide a framework that:

        -  Provides a REPL
        -  Fully supports Unicode
        -  Provides lexing (essentially identical to J lexing except Unicode 
identifiers are allowed)
        -  Provides parsing (identical to J by default but over-rideable)

I have some ideas on how to approach this, and I would like some feedback and 
advice.  Specifically, I'm considering extending J's rhematics to admit Unicode 
identifiers.  I intend to take a J sentence as input, lex it into words as 
usual, except permitting sequences of certain Unicode characters as identifiers 
(user-assignable or -assigned names), encoding/decoding those names as Punycode 
(⍳  → xn_wkh), and then executing the resulting J as usual.

For example:

           ⍳  =: <@#: i.
           ⍳  2 2
        +---+---+
        |0 0|0 1|
        +---+---+
        |1 0|1 1|
        +---+---+

This works by providing proxying all J I/O through a transformative REPL 
(replacing all non-ASCII with Punycode equivalents), so that what the J engine 
would see is:

           xn_wkh  =: <@#: i.
           xn_wkh 2 2
        +---+---+
        |0 0|0 1|
        +---+---+
        |1 0|1 1|
        +---+---+

I'd like 

        -  Feedback on this approach
        -  Advice on how to implement it

Also, I think I once saw a Unicode spec laying out all the characters the 
consortium considered appropriate to compose programming language identifiers.  
Am I misremembering?  If not, can someone point me at it, and if so, I'd like 
suggestions on useful word-formatting rules for Unicode characters.


-Dan


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to