Re: [sqlite] Virtual table API performance

Eleytherios Stamatogiannakis Tue, 04 Mar 2014 09:30:12 -0800

Thank you for the explanation.

Answers inline.


On 04/03/14 16:16, Hick Gunter wrote:

Your VT1 table already has an xColumn implementation, possibly doing something 
like

         switch( p_column )
         {
                 case 1: sqlite3_result_xxx( p_ctx, v_rec->f1, ...); break;
                 ...
                 case n: sqlite3_result_xxx( p_ctx, v_rec->fn, ...); break;
         }

This needs to have two cases added:

                 case n+1: sqlite3_result_int64( p_ctx, (uintptr_t)v_rec );
                 case n+2: sqlite3_result_int64( p_ctx, (uintptr_t)func );

where

         static int func( p_rec, p_ctx, p_column );

calls

         xColumn( v_cursor, p_ctx, p_column );

with a dummy cursor structure as defined for your table.

The VT2 table can then prepare "select __rec,__func from VT1", and in its 
xColumn implementation it calls

       v_rec  = (void *)sqlite3_column_int64( v_stmt, 0 ); // this can be 
stored and cleared in the xNext function
         v_func = (func *)sqlite3_column_int64( v_stmt, 1 ); // this can be 
stored
         v_func( v_rec, p_ctx, p_column );

I see, so you do a similar trick as what we do with passing Python'sgenerators as values in SQLite.


As for your second example, as written it does not suffer from the effect 
because you are already selecting c1, c2 and c3 at the bottom level.

Rewritten as

Select processrow(c1,c2,c3) from VT2(select * from VT1);

Without knowing what VT2 will do, I don't think that this rewritting canhappen. For example, "processrow" might return generators (nestedtables), that get expanded by VT2. If you moved it outside VT2, then thegenerators would not be expanded.


Regards,

l.


results in the VT1 xColumn function getting called (via the VT2 xColumn 
function) just 3 times per row.

Additionally, you may like to "select __func from VT1 limit 1" and store that in your 
xFilter implementation; and then "select __rec from VT1" in your xNext implementation to 
have sqlite3_result_int64() called half as often.

HTH

-----Ursprüngliche Nachricht-----
Von: Eleytherios Stamatogiannakis [mailto:est...@gmail.com]
Gesendet: Dienstag, 04. März 2014 14:15
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Virtual table API performance

Could you explain some more your solution?

Does it work in this following case:

select * from VT2(select * from VT1);

by directly passing the rows from VT1 to VT2 (short-circuiting SQLite)?

What would happen in the following case?:

select * from VT2(select processrow(c1,c2,c3) from VT1);

Regards,

l.

On 03/03/14 14:17, Hick Gunter wrote:

We have gotten around this problem by defining "virtual" fields that contain a reference
to the "current record" and the entrypoint of a wrapper around the xColumn function. That
way only two fields get passed upwards through the virtual table stack and the top level virtual
table's xColumn implementation calls straight through to the bottom layer's wrapper.

It does take some care to avoid sorting in between the layers and
re-preparation of statements on schema changes.

-----Ursprüngliche Nachricht-----
Von: Elefterios Stamatogiannakis [mailto:est...@gmail.com]
Gesendet: Sonntag, 02. März 2014 20:39
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Virtual table API performance

We have both input and output virtual tables that avoid hitting the hard disk
and are also able to compress the incoming and outgoing data.

We have a virtual table that takes as input a query and sends the data to a port on another
machine. This virtual table is called "OUTPUT". And another virtual table that takes as
input data from another port and forwards it into SQLite. Lets call it "INPUT". A query
that uses these two virtual tables would look like this in madIS:

OUTPUT ip:192.168.0.1 port:8080 select * from INPUT('port:8081');

We actually use queries like above (actually we don't do it directly to ports
but to buffered named pipes that are then forwarded via netcat) to run
distributed queries on clusters, connecting all the local SQLite/madIS
instances on the different machines together.

The main point that i want to make with above explanation is that we don't view
SQLite only as a traditional database. We also view it as a data stream
processing machine, that doesn't have the requirement for the data to be stored
on a hard disk.

Under this view, the efficiency of the virtual table api is very important.
Above query only uses 2 VTs in it, but we have other queries that use a lot
more VTs than that.

estama

On 2/3/2014 9:34 ìì, Max Vlasov wrote:

On Sun, Mar 2, 2014 at 5:21 PM, Elefterios Stamatogiannakis
<est...@gmail.com> wrote:


Our main test case is TPCH, a standard DB benchmark. The "lineitem"
table of TPCH contains 16 columns, which for 10M rows would require
160M xColumn callbacks, to pass it through the virtual table API.
These callbacks are very expensive, especially when at the other end
sits a VM (CPython or PyPy) handling them.


Ok, not stating that the performance improvment is impossible, I will
explain why I'm a little sceptical about it.

For every bulk insert we have a theoretical maxiumum we'd all glad to
see sqlite would perform with - the speed of simple file copying.
Sqlite can't be faster than that, but to be on par is a good goal.
This is not possible when an insert means also modification of other
parts of the file, for example when there's an index involved. But
let's forget about it. Finally when new data is added, sqlite should
write a number of database pages, the cost of this part is absolutely
in the hands of the media (driver) and OS (driver).  But for every
database page write there's also price to pay in CPU units, for many
actions sqlite should do before actual value is translated from what
the developer provided to what actually appears on disk.

The illustration of the CPU price is the following example
    CREATE TABLE t(Value)

on my ssd drive mulitply inserts (thousands)
     insert into t (Value) values ('123456689....  // this string
contains many symbols, for example 1024) performed with the speed
     30 MB/Sec

but the query
     insert into t (Value) values (100000)  // this is a small integer
value only
     3 Mb/Sec

Both shows almost full cpu load. Why such difference? Because with
latter query the system can do more than 30 MB of writes in 1 second,
but it should wait for sqlite spending 10 seconds in preparations.
The former is better because CPU cost of passing a large text value
to sqlite is comparatively low comparing to the  time spent in I/O in
writing this on disk.

So CPU price to pay isn't avoidable and notice that in example this
is not virtual table API, this is bind API. I suppose that the price
we pay for CPU spent in virtual table API is on par with an average
price payed in sqlite as a whole. This means that if I transfom the
avove queries into inserts from virtual tables, the final speed
difference will be similar. And this also means that for your
comparision tests (when you get x3 difference), the CPU price sqlite
pays inside bind api and in its code wrapping xColumn call is
probably similar. The rest is the share your code pays.

Well, I know that there are differences in CPU architectures and
probably there are platform where compiled code for bind api and
virtual tables api behaves a little differently making the costs more
diffrent. But imagine that hard task of fine tuning and refactoring
just to get a noticeable difference for a particular platform.


Max
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


----------------------------------------------------------------------
-
Gunter Hick
Software Engineer

Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna,
Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This e-mail is confidential and may well also be legally privileged.
If you have received it in error, you are on notice as to its status and 
accordingly please notify us immediately by reply e-mail and then delete this 
message from your system. Please do not copy it or use it for any purposes, or 
disclose its contents to any person as to do so could be a breach of 
confidence. Thank you for your cooperation.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


-----------------------------------------------------------------------
Gunter Hick
Software Engineer

Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna,
Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This e-mail is confidential and may well also be legally privileged. If you 
have received it in error, you are on notice as to its status and accordingly 
please notify us immediately by reply e-mail and then
delete this message from your system. Please do not copy it or use it for any 
purposes, or disclose its contents to any person as to do so could be a breach 
of confidence. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Virtual table API performance

Reply via email to