Here are my late-night thoughts on this, derived using only Wikipedia
and J. Feel free to ask for clarification; in fact I'd be astounded if
you understood most of this at a level deeper than "if you say so...".

To begin with, the fast way to represent trees in J is with a list of
parent nodes. This means we keep the node values in one array and keep a
list of the index of each node's parent in another array. For example,
using _1 as the root node, 1 _1 1 would be a single root (1) with two
children.

Now I'm going to write some code to turn a list of points into a
kd-tree. Using a different axis each time, I'll sort the points, split
by the median, generate two kd-trees, combine, and unsort.

First, some needed algorithms for working with trees. To combine two
trees (i.e. put one on the left and one on the right of a parent node),
we want to:
Concatenate them with a _1 in the middle
Change the _1's in the originals to the index of that _1
Increase the indices on the right so that they correspond to the new
indices.

I'll do these in reverse order, calling the arrays l and r. Note that
the _1 in the middle will go at index (#l), so r starts at index (1+#l).
Therefore we can accomplish the last using (>:@#@[ + ]) instead of ].
I'm now writing in tacit code, where [ and ] are the left and right
identities and basically stand in for l and r. @ is functional
composition, with a caveat involving rank. @: is true composition, and
unless I say otherwise, you can assume I'm just using @ to save typing
and that @: would work just as well. The + is a fork, which means that
it applies to the results of >:@#@[ and ] , in the same sense that we
write addition of functions f+g in calculus.

The second step is trivial, so I'll discuss the first, which gets kind
of nasty. First we want to find the locations of the _1's. Easy enough:
([,0,])&(=&_1)
This concatenates the two arguments with a zero in the middle, after
applying (=&_1) to both arguments.
Now we want to use this to choose between either the concatenated thing
that I'll come up with in a minute or #@[ which gives the index of the
root. There are a variety of ways of doing this, and none of them are
completely satisfactory. I'll opt for simplest (which can be replaced
with quicker code). In this, we use { as the choice or conditional
function. Its arguments are the mask we computed, and an array which is
a list of pairs:
#@[ ,.~ ([ , _1 , >:@#@[ + ])
,. concatenates the items of its left argument with those of its right.
The left argument (concatenated thingy, since we're using ~ to swap
arguments) is a full list, but the right is a single number, whose only
item is itself. J replicates the right argument, and we end up with a
bunch of pairs (normal value, index of root). Then we perform a
selection on each of these pairs with each boolean in the mask. For this
we use J's version of map, ("_1). This is a special case of the rank
conjunction; I highly recommend chapter 5 of Henry Rich's book "J for C
Programmers" for a good treatment of rank.

The resulting verb is then
jointree =: ([,0,])&(=&_1)  {"_1  #@[ ,.~ ([,_1,>:@#@[+])

Which is used dyadically like so:
   1 _1 1 jointree 1 _1
1 3 1 _1 5 3

Hooray! One verb done. There's another utility we'll need, which is the
ability to permute a tree while properly updating the indices. With some
work, which I'll ignore entirely, we can show that this requires
applying the permutation to the array representing the tree, and then
applying the inverse permutation to the parent indices. A permutation in
J is (almost always) a simple list of indices like 0 2 3 1. The inverse
can be obtained with the verb "grade", or /:, which gives the
permutation required to sort its argument. If the argument is another
permutation, this means that p and (/:p) are inverses. Now to create the
verb, whose left argument will be the permutation and whose right
argument will be the tree (we want it to be called in the same way as { ):

selecttree =: /:@[ {~ {

But wait! This doesn't preserve the root node _1. When _1 selects from
/: of our permutation, it will give the last element, which is no longer
_1. To remedy this, we force the last element to be _1 by appending it:

selecttree =: (_1 ,~ /:@[) {~ {

Note that I used _1 ,~ /:@[ instead of /:@[,_1 . The latter is actually
invalid due to a quirk of J syntax: if the rightmost element of an
expression is a noun, then verbs are all executed rather than forming
forks. This is necessary so that expressions like 6 * -3 get executed,
but it prevents us from making forks with nouns on the right. In fact,
forks with nouns on the left are a recent (J4, perhaps? Before I started
coding, in any case) addition. Before they existed, people used ("_) to
turn a noun into a verb. Here, that would mean (/:@[ , _1"_) . It's
still useful in a number of places, and you may see it in some older
code. For _1 (any integer between _9 and 9, in fact) the verb _1: is a
shortcut.

We could also straighten things in selecttree out by swapping the
arguments to {~ to get

selecttree =: {  {  _1 ,~ /:@[

I prefer the flow of the former version, but this way gets rid of the
parentheses.

It's getting late, so I'll stop for now (only one line left now,
though!). I'll pick up tomorrow, and bank on the fact that it will take
much longer to read this than it did to write it.

Marshall

On Wed, Sep 12, 2012 at 12:25:38AM -0700, Scott Locklin wrote:
> Mini intro: new to J; only been looking at it for two days now. Very 
> impressed so far. I do finance/machine learning/numerics nerdery and need to 
> stash my larger TS in a proper column DB. As such, JDB is what landed me here 
> (I'm hoping it solves my TSDB problems), but I found a lot more than I 
> expected to keep me here. You guys have done an amazing job with the IDE, 
> core libs, and website/demos/etc, and the language is heady stuff which might 
> make me forget about Lisp.
> 
> My question: does anybody have a bit of sample code for kd-trees (or R* 
> trees) and tree-searcher I could peep at? This is something I use everywhere 
> for instance learning (nearest neighbor type things), among other things. I 
> don't know enough about J to write one just yet, but I figure a sample of 
> something I am familiar with will help me learn. Also provide a good 
> benchmark for how fast the language is (I can do very fast searches using 
> libANN in C++).
> 
> thanks for any help,
> 
> -Scott
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to