Sure looks like cache to me.  In your slow version you're walking thriugh the 
entire table twice as opposed to interleaving.



You don't say how big your database is.



Increasing cache size may solve the problem.



pragma cache_size=2000 (default cache size in pages...1024 size pages means 2MB 
by default)

So bump it up to the size of your database.



Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems

________________________________
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Hauptmann Peter [hauptma...@yahoo.com]
Sent: Thursday, January 05, 2012 10:06 AM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] Curious SQLite performance observation

This isn't a question as much as an observation that might be interesting for 
others, too, though I'd share.

Having a table building a tree by each item referencing its parent:

      // Table Items
      CREATE TABLE [Items] (
        [iRowID] INTEGER PRIMARY KEY,
        [iParentRowID] integer NULL,

         ... // some more columns, nothing special
       );
      // Indices on Table Items

      CREATE INDEX [dbidx_Items_parent] ON [Items] ([iParentRowID]);



I have two pieces of code that recurse through all nodes (starting at some root 
node), and I was startled by the immense performance difference. The "better" 
version - using prepared statements, querying only a subset of information -
was more than 10 times slower than the dumb brute-force one that built a giant 
string diagnostic on the way.


I tacked the difference down to the order of queries, in pseudocode:


      Recurse_Fast($row) :=
         SELECT * WHERE rowid=$row
         for each (childid in (SELECT rowID WHERE parentRowID=$row))
            Recurse(childid)

      Recurse_Slow($row)

         SELECT * WHERE rowid=$row
         for each (childid in (SELECT rowID WHERE parentRowID=$row))
           children.push_back(childid)
         for each (childid in children)
           Recurse(childid)

i.e. the slow version first gets all the child node id's before recursing into 
them. (To note: it is not the additional vector allocation, the significant 
time is spent in sqlite3_step.)


In both cases, we have the same number of queries, the number of child nodes 
per node is typically small, and the performance difference is very resistant 
to changes such as changing the selected subset of columns, using prepared 
statements etc.


I presume it is a caching issue, simply because of the significant gap.

-----

As said, no real question, but if someone has some insights on the why, I'd be 
interested.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to