I made a more generic version that is an adverb which takes a conversion
function to be applied to each line. (was ". in our example)
i2t=: 1 : '; (<@:(<@:u@:{. [`( , $:)@.(*@#@]) }.);.1~ [: (=<./) ('' ''
i.&1@:~: ])"1) y'
a -: ". i2t treetest
1
dltb i2t treetest NB. will look the same but keep data as text
test =: > cutLF 0 : 0
html
head
body
p stuff
p div='adiv'
more stuff
table
tr
td 1
td 2
p last stuff
)
dltb i2t test
┌────┬──────┬──────────────────────────────────────────────────────────────────────────────────────┐
│html│┌────┐│┌────┬─────────┬───────────────────────────┬──────────────────────────┬──────────────┐│
│
││head│││body│┌───────┐│┌────────────┬────────────┐│┌─────┬──────────────────┐│┌────────────┐││
│ │└────┘││ ││p stuff│││p
div='adiv'│┌──────────┐│││table│┌──┬──────┬──────┐│││p last stuff│││
│ │ ││ │└───────┘││ ││more stuff││││
││tr│┌────┐│┌────┐│││└────────────┘││
│ │ ││ │ ││ │└──────────┘│││ ││ ││td 1│││td
2││││ ││
│ │ ││ │ │└────────────┴────────────┘││ ││
│└────┘│└────┘│││ ││
│ │ ││ │ │ ││
│└──┴──────┴──────┘││ ││
│ │ ││ │ │
│└─────┴──────────────────┘│ ││
│ │
│└────┴─────────┴───────────────────────────┴──────────────────────────┴──────────────┘│
└────┴──────┴──────────────────────────────────────────────────────────────────────────────────────┘
>@:;: i2t test
┌────┬──────┬──────────────────────────────────────────────────────────────┐
│html│┌────┐│┌────┬───────┬────────────────┬──────────────────────┬───────┐│
│ ││head│││body│┌─────┐│┌──────┬───────┐│┌─────┬──────────────┐│┌─────┐││
│ │└────┘││ ││p │││p │┌─────┐│││table│┌──┬────┬────┐│││p │││
│ │ ││ ││stuff│││div ││more ││││ ││tr│┌──┐│┌──┐││││last │││
│ │ ││ │└─────┘││= ││stuff││││ ││ ││td│││td│││││stuff│││
│ │ ││ │ ││'adiv'│└─────┘│││ ││ ││1 │││2 ││││└─────┘││
│ │ ││ │ │└──────┴───────┘││ ││ │└──┘│└──┘│││ ││
│ │ ││ │ │ ││ │└──┴────┴────┘││ ││
│ │ ││ │ │ │└─────┴──────────────┘│ ││
│ │ │└────┴───────┴────────────────┴──────────────────────┴───────┘│
└────┴──────┴──────────────────────────────────────────────────────────────┘
----- Original Message -----
From: Marshall Lochbaum <[email protected]>
To: 'Pascal Jasmin' via Programming <[email protected]>
Cc:
Sent: Wednesday, June 18, 2014 7:14:24 PM
Subject: Re: [Jprogramming] tree parsing
You can use recursion ($:) to generate the required tree.
a -: {.L:1 $:^:(0<L.)&.>@:tree2 ". leaf <^:(2 %~ 0 i.~' '=]) "1 treetest
Note that {.L:1 is required as otherwise the leaves would be lists of
length 1. I'm not sure how to prevent that from happening within the
verb that uses $: .
Here's another solution that generates the tree more directly, without
first using the iterative boxing of leaves:
]t1 =. (". ,~ ' ' i.&1@:~: ])"1 treetest
2 176715
4 385
6 5
6 77
8 11
8 7
4 459
6 51
8 3
8 17
6 9
8 3
8 3
a -: ; (<@(<@{:@{. [`(, $:)@.(*@#@]) }.);.1~ (=<./)@:({."1)) t1
1
Assigning to verbs, we have
vhtotree =: <@(<@{:@{. [`(, $:)@.(*@#@]) }.);.1~ (=<./)@:({."1)
indenttotree =: [: ;@:vhtotree (". ,~ ' ' i.&1@:~: ])"1
a -: indenttotree treetest
1
vhtotree can be further split into pieces
f =. <@(<@{:@{. [`(, vhtotree)@.(*@#@]) }.)
vhtotree =. f;.1~ (=<./)@:({."1)
The latter part breaks the tree into branches by splitting on those
leaves which are at the same level. f processes a branch, returning the
first leaf if the branch has only one leaf and otherwise calling
vhtotree on the rest of the branch and appending it to the first leaf.
Marshall
On Wed, Jun 18, 2014 at 02:42:52PM -0700, 'Pascal Jasmin' via Programming wrote:
> all one line,
>
> treetest =: >(((<' ') ('' ,~ ] #~ [ * <:@:#@:])"0 1~ each [: #S:1 {::) ,
> each <@": S:0) a=: (fac =: 3 : 'q =. q: y if. 1<#q do. (y ; [: fac "0 each y
> (] , %) ([: */ ] {~ # ?~ <.@:-:@:#)) q else. < y end.') 176715
>
> treetest
> 1767150
> 189
> 9
> 3
> 3
> 21
> 7
> 3
> 9350
> 55
> 5
> 11
> 170
> 17
> 10
> 2
> 5
> ": ,. a
> ┌────────────────────────────────────────────┐
> │1767150 │
> ├────────────────────────────────────────────┤
> │┌───┬───────────┬────────────┐ │
> ││189│┌─┬───┬───┐│┌──┬───┬───┐│ │
> ││ ││9│┌─┐│┌─┐│││21│┌─┐│┌─┐││ │
> ││ ││ ││3│││3││││ ││7│││3│││ │
> ││ ││ │└─┘│└─┘│││ │└─┘│└─┘││ │
> ││ │└─┴───┴───┘│└──┴───┴───┘│ │
> │└───┴───────────┴────────────┘ │
> ├────────────────────────────────────────────┤
> │┌────┬─────────────┬───────────────────────┐│
> ││9350│┌──┬───┬────┐│┌───┬────┬────────────┐││
> ││ ││55│┌─┐│┌──┐│││170│┌──┐│┌──┬───┬───┐│││
> ││ ││ ││5│││11││││ ││17│││10│┌─┐│┌─┐││││
> ││ ││ │└─┘│└──┘│││ │└──┘││ ││2│││5│││││
> ││ │└──┴───┴────┘││ │ ││ │└─┘│└─┘││││
> ││ │ ││ │ │└──┴───┴───┘│││
> ││ │ │└───┴────┴────────────┘││
> │└────┴─────────────┴───────────────────────┘│
> └────────────────────────────────────────────┘
>
> The challenge is how to recreate a, from treetest. I think a is the "right
> format" for a tree representation, and this has applications in parsing code
> blocks and dsls like html and python.
>
> Here is a link to Raul's excellent code from years ago:
> http://www.jsoftware.com/pipermail/programming/2007-April/005908.html
>
> but here is an approach that seems simple, except that it is hard to
> generalize:
>
> tree =: ((([: >: [: <./ #S:1@:{::) = #S:1@:{::) <;.1 ])
> tree2 =: > L:1 @:({. , tree)
> tree2 each each tree2 each tree2 ". leaf <^:(2 %~ 0 i.~' '=]) "1
> treetest
>
> This is almost identical to a (first term extra boxed), but doing an extra
> "tree" level would require calling tree2 each each each.
>
> Is there a way to generalize a solution here?
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm