Re: [sqlite] Article about using sqlite3 in Python

Keith Medcalf Mon, 23 Oct 2017 13:07:19 -0700

According to the documentation, it is the cache_statements keyword to the 
sqlite3 connect function:


The sqlite3 module internally uses a statement cache to avoid SQL parsing 
overhead. If you want to explicitly set the number of statements that are 
cached for the connection, you can set the cached_statements parameter. The 
currently implemented default is to cache 100 statements.

According to the documentation for APSW, it is the statementcachesize keyword 
to the Connection function.

Both default to 100.

And yes, in this particularly simple case where there is "practically nothing" 
done inside the database engine, the overhead of jumping out of the "C" 
interface code to back to python is noticeable.

Of course, if you build a carray containing the 10,000,000 integers and pass 
that to a "create table ... as select ..." and let the database do all the 
work, the time for executing the statement is under 1 second ... (but the time 
to build the carray object in python is about 5 seconds).

So it is all a trade-off.  For example parsing CSV files into an SQLITE3 
database is far more efficient using the csv extension which can import and 
process about 100,000 rows/second.  executemany is not an option because a 
"great deal" of processing must be done on each row as it is inserted, versus 
individual row insertion which does about 10,000 rows per second.  This is 
because most of the processing can be pushed down to the database engine when 
using the csv extension and it is very efficient compared to python.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of David Raymond
>Sent: Monday, 23 October, 2017 11:54
>To: SQLite mailing list
>Subject: Re: [sqlite] Article about using sqlite3 in Python
>
>How does one enable or disable statement caching then? For something
>as basic as just inserting integers executemany is about 3 times
>faster.
>
>Python version: 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30)
>[MSC v.1500 32 bit (Intel)]
>Python sqlite3 module version: 2.6.0
>sqlite3.dll version: 3.20.1
>executemany takes: 8.674 seconds for 10,000,000 inserts
>execute takes 30.173 seconds for 10,000,000 inserts
>
>
>import sqlite3
>import sys
>import time
>
>numInts = 10000000
>
>def genFunc():
>    for x in xrange(numInts):
>        yield (x,)
>
>def execute():
>    startTime = time.clock()
>    with sqlite3.connect(":memory:", isolation_level = None) as conn:
>        conn.execute("begin transaction;")
>        conn.execute("create table foo (bar int);")
>        for x in xrange(numInts):
>            conn.execute("insert into foo values (?);", (x,))
>        conn.commit()
>    conn.close()
>    endTime = time.clock()
>    return endTime - startTime
>
>def executemany():
>    startTime = time.clock()
>    with sqlite3.connect(":memory:", isolation_level = None) as conn:
>        conn.execute("begin transaction;")
>        conn.execute("create table foo (bar int);")
>        conn.executemany("insert into foo values (?);", genFunc())
>        conn.commit()
>    conn.close()
>    endTime = time.clock()
>    return endTime - startTime
>
>if __name__ == "__main__":
>    print "Python version: {0}".format(sys.version)
>    print "Python sqlite3 module version:
>{0}".format(sqlite3.version)
>    print "sqlite3.dll version: {0}".format(sqlite3.sqlite_version)
>    print "executemany takes: {0:,.3f} seconds for {1:,d}
>inserts".format(executemany(), numInts)
>    print "execute takes {0:,.3f} seconds for {1:,d}
>inserts".format(execute(), numInts)
>
>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Keith Medcalf
>Sent: Monday, October 23, 2017 11:55 AM
>To: SQLite mailing list
>Subject: Re: [sqlite] Article about using sqlite3 in Python
>
>execute vs executemany is merely syntactic sugar.  If and only if you
>have disabled statement caching will the statement be recompiled each
>time.  In all cases the sequence of sequences parameter is used as a
>binding to multiple calls to execute.  executemany is the equivalent
>of:
>
>executemany(sql, param)
>->
>for args in param:
>    execute(sql, args)
>
>and has no (or very little) benefit over doing the same thing in
>Python itself (yes, the iteration is run in C, but it is still
>entirely calling into the Python VM to do all its work, so there is
>very little if any benefit there -- the only real benefit over the
>above is that locating the cached statement is only done once when
>you use executemany and the same C stmt pointer is used over and over
>again without looking it up in the cache, which does save a few
>cycles -- however since since the bulk of the time is spent inside
>the I/O bound SQLite3 library code where the GIL is released, this
>effectively makes minimal difference in the long run).
>_______________________________________________
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Article about using sqlite3 in Python

Reply via email to