[sqlite] slowish R*Tree performance

2014-06-14 Thread Eric Rubin-Smith
I am exploring a mechanism for finding the longest covering IPv6 prefix for
a given IPv6 address by leveraging SQLite 3.8.5's R*Tree feature.  I'm
getting pretty bad performance and would like to know if I'm doing
something obviously wrong.

A 1-dimensional R*Tree of integers of width 128 bits would be ideal.  But
SQLite R*Trees are only 32 bits wide per dimension.

So I am modeling IPv6 prefixes as a pair of coordinates in 5-dimensional
integer space:

CREATE VIRTUAL TABLE ipIndex USING rtree_i32(
id, -- Integer primary key
minD1, maxD1,
minD2, maxD2,
minD3, maxD3,
minD4, maxD4,
minD5, maxD5
);

CREATE TABLE routeTarget(
id INTEGER PRIMARY KEY,
prefix TEXT NOT NULL,
target TEXT NOT NULL
);

To do this, I have come up with a mapping from an IPv6 prefix to a pair of
coordinates that has the geometric property that we would like: bounding
box 1 contains bounding box 2 if and only if prefix 1 contains prefix 2.
So if a search for bounding boxes containing a particular address's
coordinate returns rows, then each of those rows corresponds to a covering
prefix -- from there, we must simply find the smallest bounding box to find
the longest-matching prefix.

Functionally, this appears to work like a charm.

Performance, unfortunately, leaves much to be desired.  I have seeded this
database with 100k randomly-generated prefixes, and am only able to push
through 250 searches per second.  I am hoping to increase the database size
to ~50m records and would like to hit 50k searches per second.

This is not an unreasonable request based on my hardware, and SQLite has
easily hit this throughput (at least, this order of magnitude in
throughput) in other applications.  For example, the analogue in IPv4
merely requires a 1-dimensional R*Tree, and with that I was able to hit
10kTPS throughput without breaking much of a sweat.

Here is my search query plan:

sqlite> explain query plan SELECT prefix, target FROM routeTarget WHERE id
= (
   ...>SELECT id FROM ipIndex
   ...> WHERE minD1 <= 1220818432 and 1220818432 <= maxD1
   ...>   AND minD2 <= 2120561472 and 2120561472 <= maxD2
   ...>   AND minD3 <= 1685398080 and 1685398080 <= maxD3
   ...>   AND minD4 <= 1685755328 and 1685755328 <= maxD4
   ...>   AND minD5 <= 538331072 and 538331072 <= maxD5
   ...> ORDER BY ((maxD5-minD5)*(maxD4-minD4)*(maxD3-minD3)*
   ...>   (maxD2-minD2)*(maxD1-minD1)) ASC
   ...>LIMIT 1);
0|0|0|SEARCH TABLE routeTarget USING INTEGER PRIMARY KEY (rowid=?)
0|0|0|EXECUTE SCALAR SUBQUERY 1
1|0|0|SCAN TABLE ipIndex VIRTUAL TABLE INDEX 2:B0D1B2D3B4D5B6D7B8D9
1|0|0|USE TEMP B-TREE FOR ORDER BY

I created a profiled SQLite build, and here are the top offenders for a run
on 1000 searches:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total
 time   seconds   secondscalls  ms/call  ms/call  name
 40.00  1.58 1.58   300179 0.01 0.01  sqlite3VdbeExec
  6.33  1.83 0.25 36628944 0.00 0.00  sqlite3VdbeMemMove
  6.08  2.07 0.24 18314472 0.00 0.00  rtreeColumn
  4.05  2.23 0.16  1665952 0.00 0.00  rtreeStepToLeaf
  2.41  2.33 0.10 55549722 0.00 0.00  sqlite3VdbeMemRelease
  2.28  2.42 0.09  1965104 0.00 0.00
sqlite3BtreeMovetoUnpacked
  2.03  2.50 0.08 20187085 0.00 0.00
rtreeNodeOfFirstSearchPoint
  1.90  2.57 0.08 19981424 0.00 0.00
sqlite3VtabImportErrmsg
  1.77  2.64 0.07  1663952 0.00 0.00  sqlite3BtreeDelete
  1.52  2.70 0.06  5297026 0.00 0.00  sqlite3VdbeSerialGet
  1.52  2.76 0.06  1665952 0.00 0.00  btreeMoveto
  1.27  2.81 0.05 29969136 0.00 0.00  numericType
  1.27  2.86 0.05 22092688 0.00 0.00  sqlite3DbFree
  1.27  2.91 0.05 16649521 0.00 0.00  sqlite3_result_int
  1.27  2.96 0.05  5295009 0.00 0.00  moveToRoot
  1.27  3.01 0.05  1844945 0.00 0.00  nodeAcquire
  1.27  3.06 0.05  1665952 0.00 0.00  insertCell
  1.27  3.11 0.05  1663952 0.00 0.00  dropCell
  1.01  3.15 0.04 21214814 0.00 0.00  sqlite3_free
  0.89  3.19 0.04  3335904 0.00 0.00  pager_write
  0.76  3.22 0.03  9991712 0.00 0.00  sqlite3VdbeSerialType
  0.76  3.25 0.03  3335904 0.00 0.00  sqlite3PagerWrite
  0.76  3.28 0.03   903468 0.00 0.00  pcache1Fetch


I believe I have avoided the common pitfalls: everything is within a single
transaction, my cache_size pragma is large, etc.

I note from SQLite trace callbacks that a particular explicit search
results in a great many implicit SQL calls into the underlying R*Tree
tables.  Three hundred or so per search.  I assume this is expected?  It
seems pretty large.  They all look like this:

-- SELECT data FROM 'main'.'ipIndex_node' WHERE nodeno = :1
-- 

Re: [sqlite] Any way to debug which query/connection is holding a share lock?

2014-06-14 Thread RSmith


On 2014/06/14 02:00, Sohail Somani wrote:


I think I'm pretty certain that my program *is* the culprit. I'd like to add the tracing to SQLite itself. Which functions do you 
suggest? I know you mentioned opening/closing so that would be sqlite3_open* and sqlite3_finalize?


Just to clarify, this is all happening within one process:

Write thread: write
Read thread: read something
Write thread: write
Write thread: write
Read thread: read something <-- this is a "special" query but I can't see 
anything special about it except that it uses FTS
Write thread: hang until app exits

Also, I tried using sqlite3_config to add logging but that didn't seem to do 
much. Time for some printfs...



Hi again Sohail,

The things for which you should log with any returned object/memory pointers 
are:
(Jotting this down from memory as I am not at my desk, so spellings might be 
off, but you'd know the things I refer to)

sqlite3_openXX()  // because I have in the past accidentally opened a connection twice, which is perfectly ok in sqlite terms, but 
may not be the intention of your code.

sqlite3_close()// To match to the opens.
sqlite3_prepareXX()  // whichever you are using
sqlite3_reset()
sqlite3_unprepare()  // Yes I forgot the name for this, _finalize maybe? 
Whatever kills/frees the prepared statement.
sqlite3_execute()  // In case you bypass the whole prepare-step-free business 
by using some direct SQL

Also please log in terms of the TCL or sending any SQL statements like BEGIN or SAVEPOINT or any of their aliases and other 
transaction-flow/ending functions.


That should supply you with a large enough view to trap any out-of-place or 
non-closed/non-finalized stuff.
Anyone else reading this with an experience like the OP's that was solved, 
kindly add to this list, or any other thoughts that may help.

Cheers!

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users