Re: [sqlite] Virtual table API performance

Elefterios Stamatogiannakis Tue, 04 Mar 2014 14:18:35 -0800

On 4/3/2014 11:33 μμ, Alek Paunov wrote:

On 04.03.2014 20:25, Eleytherios Stamatogiannakis wrote:

On 04/03/14 20:11, Alek Paunov wrote:

On 04.03.2014 15:05, Eleytherios Stamatogiannakis wrote:

On 03/03/14 03:01, Alek Paunov wrote:

It seems that the easiest optimization for this (very often) VT use
case
(bulk streaming) is SQLite add-on in _C_ to be written, implementing
vtable interface specialization containing xNextPage "buffering" let's
say 4K rows or even better 16KB data (in addition to your initial
proposal of xNextRow).


The technical question is: how the rows to be encoded? You said
initially that you use some compressed format. But for such extension,
to gain more traction in the future, it would be better probably a
more
standard format to be chosen.

a) Rows represented in native SQLite3 format [3]
b) ... native SQLite4 format
c) Some wide used encoding near to SQLite types [4]
d) ...

[3] http://www.sqlite.org/fileformat.html#record_format
[4] https://github.com/msgpack/msgpack/blob/master/spec.md#formats

IMHO, It would be very nice if some common conventions for binary
record
streaming could be discussed and adopted across the SQLite binding and
add-on developers. The possible applications are not limited only to
vtables ;-).


SQLite doesn't need any special format for the records to be passed
over
to it. It already has the "bind" API which would be very suitable for
the xNextRow function too.


It seems that I do not know the sqlite3_ API very well.

http://www.sqlite.org/c3ref/funclist.html

Would you like to point me to the "bind" API page?


In the link that you posted above, look for all the sqlite3_bind_xxxxx
functions. In SQLite the bind API is used to pass parameters to prepared
statements. The way the the bind API works is that you have a statement
parameter "row", and you fill it by saying:

Set column 1 of statement parameter "row" to an int with value 10
Set column 2 of statement parameter "row" to an float with value 3.5
...

So instead of SQLite calling back for each column, in the bind API
"way", the program calls SQLite to fill a row's values.


Ah, OK. Let see if I finally understood your idea:

- exec "insert into t select * from vt"
- VDBE calls xNextRow
- in xNextRow implementation, the Python code calls something like
bind_xxx for each column with the scalar addresses (allocated by you)
- xNextRow returns, VDBE inserts the row, you clean on next step

Questions:

What stops you to make this wrapper right now (e.g. as apsw patch or
standalone sqlite add-on loaded by PyPy FFI)?

Yes these things can be done. We have written our own APSW compatibleFFI for PyPy, in which we have tried techniques like the one youdescribe. The end result wasn't that much faster. Because no matter howmany wrappers there are above VT's API, the efficiency characteristicsof it are still there.


How you expect this model (managing one per cell count of scalar
allocations during the query) to perform in comparison with passing
encoded row pages (memory chinks) between sqlite and the script engine
especially when it is not PyPy or LuaJIT?

Essentially in my first post about VT's API efficiency the comparisonwas between the bind API (similar to your "encoded pages" idea but for asingle row), and the current xColumn based VT API.

In our tests with Python, the difference between the two APIs was 3x.I've seen another post here by somebody else who was using a nativecompiled VT implementation saying that the difference was 2x. Both ofthese observations say that there is some inefficiency in the currentsituation with the VT API.

If this API is also used within SQLite's VM, maybe adding an xNextRowopcode, would speedup SQLite's VM too?


Best,

l.

Regards,
Alek


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Virtual table API performance

Reply via email to