On 4/3/2014 11:33 μμ, Alek Paunov wrote:
On 04.03.2014 20:25, Eleytherios Stamatogiannakis wrote:
On 04/03/14 20:11, Alek Paunov wrote:
On 04.03.2014 15:05, Eleytherios Stamatogiannakis wrote:
On 03/03/14 03:01, Alek Paunov wrote:
It seems that the easiest optimization for this (very often) VT use
case
(bulk streaming) is SQLite add-on in _C_ to be written, implementing
vtable interface specialization containing xNextPage "buffering" let's
say 4K rows or even better 16KB data (in addition to your initial
proposal of xNextRow).

The technical question is: how the rows to be encoded? You said
initially that you use some compressed format. But for such extension,
to gain more traction in the future, it would be better probably a
more
standard format to be chosen.

a) Rows represented in native SQLite3 format [3]
b) ... native SQLite4 format
c) Some wide used encoding near to SQLite types [4]
d) ...

[3] http://www.sqlite.org/fileformat.html#record_format
[4] https://github.com/msgpack/msgpack/blob/master/spec.md#formats

IMHO, It would be very nice if some common conventions for binary
record
streaming could be discussed and adopted across the SQLite binding and
add-on developers. The possible applications are not limited only to
vtables ;-).

SQLite doesn't need any special format for the records to be passed
over
to it. It already has the "bind" API which would be very suitable for
the xNextRow function too.


It seems that I do not know the sqlite3_ API very well.

http://www.sqlite.org/c3ref/funclist.html

Would you like to point me to the "bind" API page?

In the link that you posted above, look for all the sqlite3_bind_xxxxx
functions. In SQLite the bind API is used to pass parameters to prepared
statements. The way the the bind API works is that you have a statement
parameter "row", and you fill it by saying:

Set column 1 of statement parameter "row" to an int with value 10
Set column 2 of statement parameter "row" to an float with value 3.5
...

So instead of SQLite calling back for each column, in the bind API
"way", the program calls SQLite to fill a row's values.


Ah, OK. Let see if I finally understood your idea:

- exec "insert into t select * from vt"
- VDBE calls xNextRow
- in xNextRow implementation, the Python code calls something like
bind_xxx for each column with the scalar addresses (allocated by you)
- xNextRow returns, VDBE inserts the row, you clean on next step

Questions:

What stops you to make this wrapper right now (e.g. as apsw patch or
standalone sqlite add-on loaded by PyPy FFI)?

Yes these things can be done. We have written our own APSW compatible FFI for PyPy, in which we have tried techniques like the one you describe. The end result wasn't that much faster. Because no matter how many wrappers there are above VT's API, the efficiency characteristics of it are still there.

How you expect this model (managing one per cell count of scalar
allocations during the query) to perform in comparison with passing
encoded row pages (memory chinks) between sqlite and the script engine
especially when it is not PyPy or LuaJIT?


Essentially in my first post about VT's API efficiency the comparison was between the bind API (similar to your "encoded pages" idea but for a single row), and the current xColumn based VT API.

In our tests with Python, the difference between the two APIs was 3x. I've seen another post here by somebody else who was using a native compiled VT implementation saying that the difference was 2x. Both of these observations say that there is some inefficiency in the current situation with the VT API.

If this API is also used within SQLite's VM, maybe adding an xNextRow opcode, would speedup SQLite's VM too?

Best,

l.

Regards,
Alek


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to