Look at the verb "trace" defined in J7 '~addons/general/misc/trace.ijs'.
This implements the J parsing, I think. I went to a lot of trouble using the
J directly to extend the J parsing a while back. Now I think I should have
gone to the trouble to understand trace as it would have done a much better
job.

It should give you a chance to do your own name handling to support UTF. The
only thing that might be a little sticky is handling copulas.

On Sat, Feb 12, 2011 at 3:04 PM, Dan Bron <[email protected]> wrote:

> I'm trying to take a first pass at a J framework/utility that will allow us
> to implement JUICE.  Specifically, I want to provide a framework that:
>
>        -  Provides a REPL
>        -  Fully supports Unicode
>        -  Provides lexing (essentially identical to J lexing except Unicode
> identifiers are allowed)
>        -  Provides parsing (identical to J by default but over-rideable)
>
> I have some ideas on how to approach this, and I would like some feedback
> and advice.  Specifically, I'm considering extending J's rhematics to admit
> Unicode identifiers.  I intend to take a J sentence as input, lex it into
> words as usual, except permitting sequences of certain Unicode characters as
> identifiers (user-assignable or -assigned names), encoding/decoding those
> names as Punycode (⍳  → xn_wkh), and then executing the resulting J as
> usual.
>
> For example:
>
>           ⍳  =: <@#: i.
>           ⍳  2 2
>        +---+---+
>        |0 0|0 1|
>        +---+---+
>        |1 0|1 1|
>        +---+---+
>
> This works by providing proxying all J I/O through a transformative REPL
> (replacing all non-ASCII with Punycode equivalents), so that what the J
> engine would see is:
>
>           xn_wkh  =: <@#: i.
>           xn_wkh 2 2
>        +---+---+
>        |0 0|0 1|
>        +---+---+
>        |1 0|1 1|
>        +---+---+
>
> I'd like
>
>        -  Feedback on this approach
>        -  Advice on how to implement it
>
> Also, I think I once saw a Unicode spec laying out all the characters the
> consortium considered appropriate to compose programming language
> identifiers.  Am I misremembering?  If not, can someone point me at it, and
> if so, I'd like suggestions on useful word-formatting rules for Unicode
> characters.
>
>
> -Dan
>
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to