I'm trying to take a first pass at a J framework/utility that will allow us to
implement JUICE. Specifically, I want to provide a framework that:
- Provides a REPL
- Fully supports Unicode
- Provides lexing (essentially identical to J lexing except Unicode
identifiers are allowed)
- Provides parsing (identical to J by default but over-rideable)
I have some ideas on how to approach this, and I would like some feedback and
advice. Specifically, I'm considering extending J's rhematics to admit Unicode
identifiers. I intend to take a J sentence as input, lex it into words as
usual, except permitting sequences of certain Unicode characters as identifiers
(user-assignable or -assigned names), encoding/decoding those names as Punycode
(⍳ → xn_wkh), and then executing the resulting J as usual.
For example:
⍳ =: <@#: i.
⍳ 2 2
+---+---+
|0 0|0 1|
+---+---+
|1 0|1 1|
+---+---+
This works by providing proxying all J I/O through a transformative REPL
(replacing all non-ASCII with Punycode equivalents), so that what the J engine
would see is:
xn_wkh =: <@#: i.
xn_wkh 2 2
+---+---+
|0 0|0 1|
+---+---+
|1 0|1 1|
+---+---+
I'd like
- Feedback on this approach
- Advice on how to implement it
Also, I think I once saw a Unicode spec laying out all the characters the
consortium considered appropriate to compose programming language identifiers.
Am I misremembering? If not, can someone point me at it, and if so, I'd like
suggestions on useful word-formatting rules for Unicode characters.
-Dan
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm