Carlos Ferreira wrote: > Regarding the R.Tree performance problem, > > What is the original problem that is causing slow performance in the > SQlite R-Tree implementation?
I was populating my DB with bad data. In particular, I was first choosing a random prefix length, then filling up that number of bits to create a random prefix. I was then just blindly sticking that into the database. For very short prefix lengths, the chance of an exact collision was therefore very high. E.g. if prefix length 0 is chosen, then the chances of collision are 100%. And prefix length 0 is chosen 1% of the time. Collisions on the large bounding boxes are exactly what you don't want, because of the way an R*Tree works. (Collisions in small bounding boxes only matter for the rare searches that happen to fall within them.) Fixed by checking for the existence of an identical bounding box, and removing it if it does exist. (This population prototype code might eventually feed production code, which will find it more useful to have INSERT OR REPLACE semantics than INSERT OR IGNORE semantics.) The above, however, raises an interesting point. Even when exact collisions are detected and avoided, the above randomized scheme does create a significant "matrioshka" structure that may not be present in real-world datasets. That is, again, there is a 1/128 chance that length 0 is chosen. There is an 8/128 or 6% chance that a prefix of length <= 8 is chosen. I.e. we are creating a lot of large bounding boxes that likely cover smaller ones, and that may or may not reflect reality. -- Eric A. Rubin-Smith Computer programs don't like being anthropomorphized. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users