Re: [Jprogramming] Parsing Pythonish

Raul Miller Wed, 04 Apr 2007 16:38:38 -0700

On 4/4/07, Pascal Jasmin <[EMAIL PROTECTED]> wrote:

There's 2 "mistakes" in your solution. 1. body is treated as a child of head
instead of a sibbling. 2. A less important difference is that the tr group has
an un-needed extra level of depth.


These are related, I think.  The underlying issue is that boxing gets used for
two different purposes.  On one level, boxes represent things being sequenced.
On another level, boxes are used to represent a tree structure.  Since
sequences at different levels must remain distinct, you need a different
set of boxes to represent the parent/child relationship between certain
lines of text and any contained sequence of text from the boxes you
use to actually represent that sequence.

In other words, I think simple nesting should look like this:

+---------------+
|+----+--------+|
||html|+------+||
||    ||+----+|||
||    |||head||||
||    ||+----+|||
||    |+------+||
|+----+--------+|
+---------------+

The outer box represents that the result is a single logical entity.

Within this box, <'html' is the name of the line that "owns" that
single logical entity.  Following <'html' is another box which holds
the sequence of the contents of that entity.  (Conceptually, you should
have a second box even when that sequence is empty, but you can
toss these empty boxes with no loss of information.)

In this particular case, that sequence of contents is only one element
long -- again, you have an outer box which represents the logcal
element and an inner box which contains the line that "owns" that
logical element.  This inner box could have been followed by another
box, but I've left that off because it would be empty.

(Hypothetically, I could also discard the innermost box for the
case where the second box is elided, but that creates confusion
between "name boxes" and "entity boxes", so I'm not going to do
that.)

Also, the python parsing mechanism you have described is rather complicated
to implement.  The key issue is that sometimes the specific indentation
level matters, and sometimes only the relative level matters, and sometimes
both matter.  The logic seems to be built around the concept of a "state
machine with a stack" -- in other words, the natural way of solving this
problem uses a shift/reduce parsing engine.  Here's a first cut:

words=: <@((<@<@}.~;])(0 i.~' '=]));._2
blank=: <,:'';0
parse=: [: value [: crate^:(<:@#)@> [: shift&.>/ blank |.@, words
value=: 0 {:: ,
level=: >:@[EMAIL PROTECTED]&(1&{::)
shift=: [EMAIL PROTECTED](level {:)

NB. x - new word, y - parse stack
enter=: ,~
group=: }:@] , {:@] (,&.>&{. , {:@]) [
leave=: [ group (] bunch 1 >. [: +/ <&(1&{::"1))

bunch=: [EMAIL PROTECTED]:[~ NB. x - levels to package
crate=: (<@{. , {:)@{: stuff }:
stuff=: }:@] , {:@] ((}:@[ , {:@[ ,&.> <@])&.>&{. , {:@[) [

test =: 0 : 0
html
head
body
 p stuff
 p div='adiv'
  more stuff
 table
  tr
   td 1
   td 2
 p last stuff
)

  parse test

In this result, odd levels of nesting represent "logical entities" and
even levels of nesting distinguish owning text from content text

Perhaps this would be clearer if I arranged things slightly differently.
For example, placing owning text above content text (this change is
trivial because those lists are never longer than two elements):

stuff=: }:@] , {:@] ((}:@[ , {:@[ ,.@,&.> <@])&.>&{. , {:@[) [

Also... feel free to refactor this code for readability.  I just kind of
threw this together, and haven't really thought about how to best
express the "real concepts".  It's quite possible I've overlooked
some drastic simplifications.  (That said, there is a lot going on
here, between the arithmetic behind the indentation and the
multi-level structure manipulation and the basic shift/reduce mechanism).

I believe this all follows from the following two constraints:

[1] Use "python indent"
[2] # sequence must return the number of elements in that sequence

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Parsing Pythonish

Reply via email to