[sqlite] Virtual table API performance

Eleytherios Stamatogiannakis Fri, 28 Feb 2014 08:25:25 -0800

Hello,

First of all, i would like to say that, IMHO, SQLite's overall designand code quality is top notch. Having said that, SQLite's virtual tableAPI, while beautifully designed, it is not very efficient.

We have been hitting our heads for the last few years, on the virtualtable API, to make it work efficiently. During that time, we werethinking that the problem was on our side, in the code that feeds thevirtual table API. It turns out that the virtual table API isn't veryefficient.

The main cause of inefficiency is that it is extremely "chatty". For anexternal stream that contains many columns, "xColumn" can be called-backhundreds of millions of times for the stream to be consumed by SQLite.These callbacks have a very big cost. Let me describe a test that we did.

For our work, we use compressed streams that are being fed in SQLitethrough the virtual table API.

If we load into SQLite, the external compressed stream (containing 3Mrows) through the virtual table API:


create table newtable as select * from READCOMPRESSEDFILE('ctable.rc');

it takes: 55 sec

If we create an external program that inserts into SQLite, the rows inthe compressed stream one by one, using "insert into newtable values...." and the SQLite bind API:


it takes: 19 sec (~3x faster than using the virtual table API)

Another problem with the virtual table API, is that it wrecks havok withVM JIT engines. Especially for tracing JIT engines, the many "small" percolumn callbacks do not permit them to specialize at all, compared tothe specialization that a tracing JIT could achieve with one "big"get_an_entire_row callback.

A suggestion for improving the efficiency of the virtual table APInaturally arises when we look at all the virtual table functions that wehave already created. We have ~15 VT functions dealing with importing"all" from external sources (files, http streams, xml, clipboard, etc),and only one filtering VT function (a multidimensional index) "picking"columns to return.


So most of our queries that use VTs look like this:

create table cliptab as select * from clipboard();

, these queries most of the time select all columns from an external stream.

Based on above, an addition that improves the efficiency of the VT APIwould be an "xRow" function that the SQLite could call to get an entirerow back (like the bind API).

Even better, and to reduce even more the callback count, would be a"xNextRow" function that returns the contents of the next row or EOF.


Regards,

estama.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Virtual table API performance

Reply via email to