Raul: Can you expand on why trees are bad in J? Coming from a C and Lisp mentality, tree search wins a lot of problems. kd-trees break down in higher dimensions (aka, for things like image databases, where I guess locality hashing will beat a brute force search), but for modest dimensionality (~10), they seem to do well, with cache misses and everything. Your technique works, if I understand it correctly, and I'm guessing it doesn't keep extra copies of the data around to fill up the memory, making it effectively a sort of tree.
It is going to take me a while to digest the code you and Marshall wrote, thank you both very kindly for taking the time! -Scott Raul Miller-4 wrote: > > Tree structures -- especially binary tree structures -- are frequently > a bad choice in J. > > Personally, if I were working with spatial data in J, I'd probably > start with something like this: > > NB. example arbitrary set of points: > points=: ?.20 3$0 > prepad=: $&__@,~@{:@$, ], $&_@,~@{:@$ > searchable=: /:~@:((|."0 1) ,:^:(2 - #@$))"0 2~ i.@{:@$ > SP=: searchable prepad points > > In other words: make n copies of the list of points (where n is the > number of dimensions) and sort each list so that it can be binary > searched along one of those dimensions. > > Of course, it's not that simple. To find the nearest point, you need > to search a volume which includes your candidate points. So, in the > worst case (most points equidistant from the point you are checking) > you wind up searching the entire data structure. > > I've assumed, here, that once we've done a binary search to find a > close match to the point we are looking for that it's worth searching > the n nearest points in each direction from that match (where n is the > number of dimensions). So this means an initial search of n*(1+n) > points (n lists of candidates each containing 1+n points). We need to > extend our search if any of the 2n points bounding each of the n lists > of candidates has a coordinate distance from the point we are looking > for which is closer than our closest candidate. > > This might sound significantly less efficient than a kd-tree, but we > need to keep in mind that the kd-tree (or r* tree) has significant > costs from cache misses and from pipeline busting. > > That said, there are a few other tricks that might be needed. For > example, if you are incrementally building the structure, adding to > sorted lists is inefficient. You can work around this issue by > maintaining multiple instances of the logical structure and searching > both (or all) of them and then picking the best result (for example). > Then occasionally you find some time to merge them together. > > That said, I have not had time to draft up a real sample. (The > project I am currently working on releases this weekend, and in my > "spare" time I'm taking a class, which leaves me without much time for > fun stuff like this.) > > I hope this helps, > > -- > Raul > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > > -- View this message in context: http://old.nabble.com/Spatial-trees-tp34422178s24193p34429335.html Sent from the J Programming mailing list archive at Nabble.com. ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm