Hello,
First of all, i would like to say that, IMHO, SQLite's overall design
and code quality is top notch. Having said that, SQLite's virtual table
API, while beautifully designed, it is not very efficient.
We have been hitting our heads for the last few years, on the virtual
table API, to make it work efficiently. During that time, we were
thinking that the problem was on our side, in the code that feeds the
virtual table API. It turns out that the virtual table API isn't very
efficient.
The main cause of inefficiency is that it is extremely "chatty". For an
external stream that contains many columns, "xColumn" can be called-back
hundreds of millions of times for the stream to be consumed by SQLite.
These callbacks have a very big cost. Let me describe a test that we did.
For our work, we use compressed streams that are being fed in SQLite
through the virtual table API.
If we load into SQLite, the external compressed stream (containing 3M
rows) through the virtual table API:
create table newtable as select * from READCOMPRESSEDFILE('ctable.rc');
it takes: 55 sec
If we create an external program that inserts into SQLite, the rows in
the compressed stream one by one, using "insert into newtable values
...." and the SQLite bind API:
it takes: 19 sec (~3x faster than using the virtual table API)
Another problem with the virtual table API, is that it wrecks havok with
VM JIT engines. Especially for tracing JIT engines, the many "small" per
column callbacks do not permit them to specialize at all, compared to
the specialization that a tracing JIT could achieve with one "big"
get_an_entire_row callback.
A suggestion for improving the efficiency of the virtual table API
naturally arises when we look at all the virtual table functions that we
have already created. We have ~15 VT functions dealing with importing
"all" from external sources (files, http streams, xml, clipboard, etc),
and only one filtering VT function (a multidimensional index) "picking"
columns to return.
So most of our queries that use VTs look like this:
create table cliptab as select * from clipboard();
, these queries most of the time select all columns from an external stream.
Based on above, an addition that improves the efficiency of the VT API
would be an "xRow" function that the SQLite could call to get an entire
row back (like the bind API).
Even better, and to reduce even more the callback count, would be a
"xNextRow" function that returns the contents of the next row or EOF.
Regards,
estama.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users