Re: [Jprogramming] parsing

Dan Bron Wed, 28 Nov 2007 07:13:20 -0800

When I say "parsing", I usually mean breaking a string into its logical
tokens (which technically lexing).  Is that what you want?  Or do you mean
parsing in the technical sense, e.g. building an abstract syntax tree?


If you just mean the former, for simple rhematics, you can use  ;.  .  For
example, take a look at the standard csv parser:

           load'csv'
           chopcsv
        3 : 0
           dat=. y,','
           b=. dat e. ','
           c=. ~:/\dat='"'
           msk=. b>c
           if. 0=+/msk do. msk=. (#msk){.1 end.
           dat=. msk <;._2 dat
           b=. '"'={.@(1&{.) &> dat
           dat=. b }.each dat
           b=. '"'={.@(_1&{.) &> dat
           dat=. (-b) }.each dat
        )

For more complex structures, you can use FSM, dyadic  ;:  .  You can find an
interactive introduction to this in the lab  "Sequential Machines".   And a
good starter example is in the vocabulary page for  ;:  itself, which
provides the specification for monad  ;:  in terms of dyad  ;:  .  That is,
it shows you how J's rhematics are defined (how to lex J sentences):

        mj=: 256$0                     NB. X other
        mj=: 1 (9,a.i.' ')}mj          NB. S space and tab
        mj=: 2 ((a.i.'Aa')+/i.26)}mj   NB. A A-Z a-z excluding N B
        mj=: 3 (a.i.'N')}mj            NB. N the letter N
        mj=: 4 (a.i.'B')}mj            NB. B the letter B
        mj=: 5 (a.i.'0123456789_')}mj  NB. 9 digits and _
        mj=: 6 (a.i.'.')}mj            NB. D .
        mj=: 7 (a.i.':')}mj            NB. C :
        mj=: 8 (a.i.'''')}mj           NB. Q quote
        
        sj=: _2]\"1 }.".;._2 (0 : 0) 
        ' X    S    A    N    B    9    D    C    Q ']0
         1 1  0 0  2 1  3 1  2 1  6 1  1 1  1 1  7 1  NB. 0 space
         1 2  0 3  2 2  3 2  2 2  6 2  1 0  1 0  7 2  NB. 1 other
         1 2  0 3  2 0  2 0  2 0  2 0  1 0  1 0  7 2  NB. 2 alp/num
         1 2  0 3  2 0  2 0  4 0  2 0  1 0  1 0  7 2  NB. 3 N
         1 2  0 3  2 0  2 0  2 0  2 0  5 0  1 0  7 2  NB. 4 NB
         9 0  9 0  9 0  9 0  9 0  9 0  1 0  1 0  9 0  NB. 5 NB.
         1 4  0 5  6 0  6 0  6 0  6 0  6 0  1 0  7 4  NB. 6 num
         7 0  7 0  7 0  7 0  7 0  7 0  7 0  7 0  8 0  NB. 7 '
         1 2  0 3  2 2  3 2  2 2  6 2  1 2  1 2  7 0  NB. 8 ''
         9 0  9 0  9 0  9 0  9 0  9 0  9 0  9 0  9 0  NB. 9 comment
        )
        
           x=: 0;sj;mj
           y=: 'sum=. (i.3 4)+/ .*0j4+pru 4'
        
           x ;: y
        +---+--+-+--+---+-+-+-+-+-+---+-+---+-+
        |sum|=.|(|i.|3 4|)|+|/|.|*|0j4|+|pru|4|
        +---+--+-+--+---+-+-+-+-+-+---+-+---+-+
           (x ;: y) -: ;: y
        1

Devon posted another good example of this yesterday:

    http://www.jsoftware.com/jwiki/Scripts/JavascriptCruncher

And I think Raul is responsible for the example in the HTTP parser:

    http://www.jsoftware.com/jwiki/JWebServer/HttpParser

Unfortunately, FSM is almost a language unto itself, and hard to get right. 
It's fast, but it's very low level.  Oleg has enriched the community by
providing a useful frontend to help build and debug FSMs:

    http://olegykj.sourceforge.net/scrshots/graphviz.html

And, if you're familiar with regexen (which have a well designed interface)
he also has a lexer parameterized by them (which actually employs  ;.  not 
;:  ):

    http://www.jsoftware.com/jwiki/Essays/Regex_Lexer
  
But if you meant parsing in the technical sense, and want to build a
grammar, you'll have a more difficult time finding examples and direction.

I once posted a toy "interpreter" for an ad-hoc language:

    http://www.jsoftware.com/pipermail/programming/2007-January/004756.html

But beyond such trivial models, I'm not aware of any examples.  But, since a
lot of parsing is based on ASTs, maybe an introduction to efficient tree
handling in J would help.  In general, J doesn't make it fast or easy to
process trees, but you might look at the lab "Huffman Coding" or Roger's
essay at:

    http://www.jsoftware.com/jwiki/Essays/Huffman_Coding

I hope this helps.  If not, maybe you could post a simple example of the
problem you're trying to solve?

-Dan
-- 
View this message in context: 
http://www.nabble.com/parsing-tf4888496s24193.html#a13994158
Sent from the J Programming mailing list archive at Nabble.com.

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] parsing

Reply via email to