Re: [sqlite] slowish R*Tree performance

Eric Rubin-Smith Wed, 18 Jun 2014 11:46:50 -0700

Carlos Ferreira wrote: 

> Regarding the R.Tree performance problem, 
> 
> What is the original problem that is causing slow performance in the 
> SQlite R-Tree implementation?


I was populating my DB with bad data.  In particular, I was first 
choosing a random prefix length, then filling up that number of bits to 
create a random prefix.  I was then just blindly sticking that into the 
database.  

For very short prefix lengths, the chance of an exact collision was 
therefore very high.  E.g.  if prefix length 0 is chosen, then the 
chances of collision are 100%.  And prefix length 0 is chosen 1% of the 
time.  

Collisions on the large bounding boxes are exactly what you don't want, 
because of the way an R*Tree works.  (Collisions in small bounding boxes 
only matter for the rare searches that happen to fall within them.)  

Fixed by checking for the existence of an identical bounding box, and 
removing it if it does exist.  (This population prototype code might 
eventually feed production code, which will find it more useful to have 
INSERT OR REPLACE semantics than INSERT OR IGNORE semantics.)  

The above, however, raises an interesting point.  Even when exact 
collisions are detected and avoided, the above randomized scheme does 
create a significant "matrioshka" structure that may not be present 
in real-world datasets.  That is, again, there is a 1/128 chance that 
length 0 is chosen.  There is an 8/128 or 6% chance that a prefix of 
length <= 8 is chosen.  I.e.  we are creating a lot of large bounding 
boxes that likely cover smaller ones, and that may or may not reflect
reality.

-- 
Eric A. Rubin-Smith

Computer programs don't like being anthropomorphized.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] slowish R*Tree performance

Reply via email to