Raul:
Can you expand on why trees are bad in J? Coming from a C and Lisp
mentality, tree search wins a lot of problems. kd-trees break down in higher
dimensions (aka, for things like image databases, where I guess locality
hashing will beat a brute force search), but for modest dimensionality
(~10), they seem to do well, with cache misses and everything.
Your technique works, if I understand it correctly, and I'm guessing it
doesn't keep extra copies of the data around to fill up the memory, making
it effectively a sort of tree.
It is going to take me a while to digest the code you and Marshall wrote,
thank you both very kindly for taking the time!
-Scott
Raul Miller-4 wrote:
>
> Tree structures -- especially binary tree structures -- are frequently
> a bad choice in J.
>
> Personally, if I were working with spatial data in J, I'd probably
> start with something like this:
>
> NB. example arbitrary set of points:
> points=: ?.20 3$0
> prepad=: $&__@,~@{:@$, ], $&_@,~@{:@$
> searchable=: /:~@:((|."0 1) ,:^:(2 - #@$))"0 2~ i.@{:@$
> SP=: searchable prepad points
>
> In other words: make n copies of the list of points (where n is the
> number of dimensions) and sort each list so that it can be binary
> searched along one of those dimensions.
>
> Of course, it's not that simple. To find the nearest point, you need
> to search a volume which includes your candidate points. So, in the
> worst case (most points equidistant from the point you are checking)
> you wind up searching the entire data structure.
>
> I've assumed, here, that once we've done a binary search to find a
> close match to the point we are looking for that it's worth searching
> the n nearest points in each direction from that match (where n is the
> number of dimensions). So this means an initial search of n*(1+n)
> points (n lists of candidates each containing 1+n points). We need to
> extend our search if any of the 2n points bounding each of the n lists
> of candidates has a coordinate distance from the point we are looking
> for which is closer than our closest candidate.
>
> This might sound significantly less efficient than a kd-tree, but we
> need to keep in mind that the kd-tree (or r* tree) has significant
> costs from cache misses and from pipeline busting.
>
> That said, there are a few other tricks that might be needed. For
> example, if you are incrementally building the structure, adding to
> sorted lists is inefficient. You can work around this issue by
> maintaining multiple instances of the logical structure and searching
> both (or all) of them and then picking the best result (for example).
> Then occasionally you find some time to merge them together.
>
> That said, I have not had time to draft up a real sample. (The
> project I am currently working on releases this weekend, and in my
> "spare" time I'm taking a class, which leaves me without much time for
> fun stuff like this.)
>
> I hope this helps,
>
> --
> Raul
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
>
--
View this message in context:
http://old.nabble.com/Spatial-trees-tp34422178s24193p34429335.html
Sent from the J Programming mailing list archive at Nabble.com.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm