Hello,

First of all, i would like to say that, IMHO, SQLite's overall design and code quality is top notch. Having said that, SQLite's virtual table API, while beautifully designed, it is not very efficient.

We have been hitting our heads for the last few years, on the virtual table API, to make it work efficiently. During that time, we were thinking that the problem was on our side, in the code that feeds the virtual table API. It turns out that the virtual table API isn't very efficient.

The main cause of inefficiency is that it is extremely "chatty". For an external stream that contains many columns, "xColumn" can be called-back hundreds of millions of times for the stream to be consumed by SQLite. These callbacks have a very big cost. Let me describe a test that we did.

For our work, we use compressed streams that are being fed in SQLite through the virtual table API.

If we load into SQLite, the external compressed stream (containing 3M rows) through the virtual table API:

create table newtable as select * from READCOMPRESSEDFILE('ctable.rc');

it takes: 55 sec


If we create an external program that inserts into SQLite, the rows in the compressed stream one by one, using "insert into newtable values ...." and the SQLite bind API:

it takes: 19 sec (~3x faster than using the virtual table API)


Another problem with the virtual table API, is that it wrecks havok with VM JIT engines. Especially for tracing JIT engines, the many "small" per column callbacks do not permit them to specialize at all, compared to the specialization that a tracing JIT could achieve with one "big" get_an_entire_row callback.

A suggestion for improving the efficiency of the virtual table API naturally arises when we look at all the virtual table functions that we have already created. We have ~15 VT functions dealing with importing "all" from external sources (files, http streams, xml, clipboard, etc), and only one filtering VT function (a multidimensional index) "picking" columns to return.

So most of our queries that use VTs look like this:

create table cliptab as select * from clipboard();

, these queries most of the time select all columns from an external stream.

Based on above, an addition that improves the efficiency of the VT API would be an "xRow" function that the SQLite could call to get an entire row back (like the bind API).

Even better, and to reduce even more the callback count, would be a "xNextRow" function that returns the contents of the next row or EOF.

Regards,

estama.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to