Re: [Jprogramming] Ragged Array Shapes are Trees

Raul Miller Tue, 09 Sep 2014 00:25:26 -0700

I've worked with a representation sort of like that, but [in my
opinion] considerably more efficient and flexible.


I'll model the two main routines here:

ssdir=:3 :0
   assert. 1=#$y
   (}:,.2 -~/\])I.(={.)(,{.)y
)

ssndx=:4 :0
   assert. 1=#$y
   assert. 2=#$x
   assert. 2={:$x
   (;<@(+i.)/"1 x){y
)

Example use:

   ssdir ' this is a test'
 0 5
 5 3
 8 2
10 5

   ex=: ' this is a test'
   ((0 3{ssdir) ssndx ]) ex
 this test

In other words, ssndx was analogous to { in design.

Other notes:

(0) the 'ss' part stood for Segmented String.

(1) This was something like 20 years ago, using APL.

(2) we had compiled APL back then, for a rank 0 subset of APL (and
with type inference within that mini language - anything which
constrained a type was equivalent to a type declaration), I think our
first implementations of this code were using that compiled APL
mechanism (but then I think it was redone in assembly language - I"m a
bit foggy on the details at this point, but I do remember that
performance was decent).

(3) There was an additional constraint on the code, in the final
version, such that only type character was supported. We were mostly
interested in working with text (we were building search engines and
related stuff ... Google's emphasis on map-reduce algorithms seems
like a rather straightforward and obvious development from my
perspective). But it's easy enough to build another one (not quite as
fast as the hand assembled version) to work on other types.

[Actually, I'll have to admit that I haven't been really impressed
with a lot of the advancement of state of the art our industry. It's
been looking like a lot of activity but most of the work (including my
own) just has not impressed me as being all that significant. There
have been significant improvements, of course, but not in proportion
to the number of people employed by the industry.]

Anyways... that's just two primitives, but they are quite flexible.
You can replace ssdir with other mechanisms in some contexts, for
example. And you can perform operations on the underlying characters
and then use ssndx to or straight indexing to extract it in a useful
form. (Obviously you'll need a lot more code, but this alleviated a
significant performance bottleneck.)

Thanks,

-- 
Raul

On Tue, Sep 9, 2014 at 1:13 AM, Michal D. <michal.dobrog...@gmail.com> wrote:
> One can imagine that it's possible for
>
>    ] x =. (<@i."0) 1+i.3
> +-+---+-----+
> |0|0 1|0 1 2|
> +-+---+-----+
>
> to be laid out in memory as [0 0 1 0 1 2] with an additional vector
> representing the starting point of each rank-1 array [0 1 3 6].  I would be
> willing to wager a large sum that this is much more computationally
> efficient than having a representation where each rank-1 array is floating
> off somewhere in memory [0] [0 1] [0 1 2] with an array of pointers to them.
>
> Of course this representation is possible, but somewhat verbose, to do
> manually in J (http://dl.acm.org/citation.cfm?id=804488).  That's why I'm
> fantasizing about a language where such a representation would be closer to
> the core of the language.
>
> Something like +/"1 x instead of +/&.> x (obviously there's not much
> difference here).
>
> Like we've both suggested, it may be that boxing with a sufficiently
> optimized interpreter is the solution.
>
> Cheers,
>
> Mike
>
> On Mon, Sep 8, 2014 at 8:10 AM, Raul Miller <rauldmil...@gmail.com> wrote:
>
>> What does this mean "shapes extended to support ragged arrays without
>> boxing"?
>>
>> K has ragged arrays without boxing, but K's concept of shape only
>> applies to the non-ragged prefix dimensions.
>>
>> Meanwhile, it seems to me that the guarantee you suggest would be
>> computationally expensive.
>>
>> (Not saying the ideas shouldn't be explored, but I suspect what you
>> are really after is a more efficient boxing implementation and that
>> will lead in an entirely different direction.)
>>
>> Thanks,
>>
>> --
>> Raul
>>
>>
>> On Mon, Sep 8, 2014 at 11:01 AM, Michal D. <michal.dobrog...@gmail.com>
>> wrote:
>> > Thanks Pascal, that's a nice formulation for the 2-level deep case.  It
>> > breaks down of course for deeper nesting but could serve as a base.
>> >
>> >     ([:|. $ ;~ $ each) ((i.2 3);(,0)) ; (i. 2 3)
>> > +-+-------+
>> > |2|+-+---+|
>> > | ||2|2 3||
>> > | |+-+---+|
>> > +-+-------+
>> >
>> > There's no specific problem I'm trying to solve directly in J at the
>> > moment.  It's more of a language design problem.  My interest in this is
>> a
>> > J like language, but with shapes extended to support ragged arrays
>> without
>> > boxing.  Failing that, we could have some optimization guarantees like
>> 'an
>> > array of boxes containing equal rank values are allocated contiguously'.
>> >
>> > Cheers,
>> >
>> > Mike
>> >
>> >
>> >
>> > On Sat, Sep 6, 2014 at 9:28 AM, 'Pascal Jasmin' via Programming <
>> > programm...@jsoftware.com> wrote:
>> >
>> >> Hi Michal,
>> >>
>> >> maybe this is an interesting representation for what you want?
>> >>
>> >>    ([:|. $ ;~ $ each) (i. 2 3) ; (,0) ; 0 1
>> >> ┌─┬─────────┐
>> >> │3│┌───┬─┬─┐│
>> >> │ ││2 3│1│2││
>> >> │ │└───┴─┴─┘│
>> >> └─┴─────────┘
>> >>
>> >>
>> >>
>> >> ----- Original Message -----
>> >> From: Michal D. <michal.dobrog...@gmail.com>
>> >> To: programm...@jsoftware.com
>> >> Cc:
>> >> Sent: Saturday, September 6, 2014 12:06 PM
>> >> Subject: [Jprogramming] Ragged Array Shapes are Trees
>> >>
>> >> I just came to the realization that ragged array shapes, regardless of
>> how
>> >> you want to represent them, are trees.  I would be interested in any
>> >> references to prior exploration of this idea.
>> >>
>> >> Some example data (boxes are used to display ragged arrays, but
>> otherwise
>> >> have no semantic meaning):
>> >>
>> >>    ] A =. i. 2 3
>> >> 0 1 2
>> >> 3 4 5
>> >>    ] B =. (< @ i."0) 1+i.2
>> >> +-+---+
>> >> |0|0 1|
>> >> +-+---+
>> >>    ] C =. A ; B
>> >> +-----+-+---+
>> >> |0 1 2|0|0 1|
>> >> |3 4 5| |   |
>> >> +-----+-+---+
>> >>    ] D =. A ; < B
>> >> +-----+-------+
>> >> |0 1 2|+-+---+|
>> >> |3 4 5||0|0 1||
>> >> |     |+-+---+|
>> >> +-----+-------+
>> >>
>> >> The corresponding shape trees (monospace font required, root node is on
>> the
>> >> left):
>> >>
>> >> A: 2 - 3
>> >>
>> >>        1
>> >>      /
>> >> B: 2
>> >>      \
>> >>        2
>> >>
>> >>        2 - 3
>> >>      /
>> >> C: 3 - 1
>> >>      \
>> >>        2
>> >>
>> >>        2 - 3
>> >>      /
>> >> D: 2       1
>> >>      \   /
>> >>        2
>> >>          \
>> >>            2
>> >>
>> >> In some sense the shape tree for A is compressed, the verbose
>> alternative
>> >> being:
>> >>
>> >>        3
>> >>      /
>> >> A: 2
>> >>      \
>> >>        3
>> >>
>> >> Compressing D in a similar manner leaves the full shape of the tree
>> >> ambiguous - there is no way to distinguish the duplicate structure of
>> leaf
>> >> 3 from the structure of leaves 1, 2.
>> >>
>> >>            3
>> >>          /
>> >> D: 2 - 2 - 1
>> >>          \
>> >>            2
>> >>
>> >> We could resolving this ambiguity by listing the shapes of all the items
>> >> explicitly.  The problem here is that 'compression' could very easily
>> lead
>> >> to an increase in representation size, although it is nice and uniform
>> for
>> >> ragged arrays of uniform rank.
>> >>
>> >>             3
>> >>           /
>> >>          |  3
>> >>          | /
>> >> D: 2 - 2 +
>> >>          | \
>> >>          |  1
>> >>           \
>> >>             2
>> >>
>> >> Regardless of compression, I would be interested in prior work in
>> >> representing shapes of ragged arrays as trees.
>> >>
>> >> Cheers,
>> >>
>> >> Mike
>> >> ----------------------------------------------------------------------
>> >> For information about J forums see http://www.jsoftware.com/forums.htm
>> >>
>> >> ----------------------------------------------------------------------
>> >> For information about J forums see http://www.jsoftware.com/forums.htm
>> >>
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Ragged Array Shapes are Trees

Reply via email to