Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row

Hick Gunter Wed, 31 May 2017 01:07:43 -0700

Looking at the generated program for mulit-tuple INSERT INTO ... VALUE yields 
the following structure:


Only 1 tuple (standard case)

<prolog>
<build record>
<insert into table>
<epilog>

2 tuples

<prolog>
<build record 1>
<yield to INSERT coroutine>
<build record 2>
<yield to INSERT coroutine>
<epilog>
<insert into table>
<yield to RECORD coroutine>

Each further tuple adds a <build record> segment (1 opcode per field + data) 
and a yield opcode to the generated VDBE program, and an equivalent amount to 
the parse tree, all of which will tend to decrease locality and increase the 
required working set of pages in memory.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Keith Medcalf
Gesendet: Mittwoch, 31. Mai 2017 04:45
An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Betreff: Re: [sqlite] Importing from single-insert-statement SQL dump is 61 
times slower than importing from SQL dump with one statement per row

On Tuesday, 30 May, 2017 10:33, R Smith <rsm...@rsweb.co.za> said:

> Keith, I think the OP meant he inserted the values using one single
> statement, not one single transaction, as in he did one ginormous
> INSERT INTO t(v1, v2...) VALUES (1, 2),(2,3),(..),........,(100000000,
> 297829872); - 180MB or so worth... Probably lots of data in few rows,
> because he is not hitting any SQLite statement limits - unless those
> were disabled.
>
> Meaning that it is probably just one VDBE program. That's unless I am
> reading wrong or assuming wrong from the original post - which is
> always possible.

You are correct.  I modified the dump so that it would execute one insert per 
table with all the data in the single insert statement.  It is far slower than 
even inserting each row in its own transaction (autocommit).  Of course, that 
means that some of the insert statements are approaching several hundred 
megabytes long.  It works, but it is very slow.  Most of the time is spent in 
the parser as one would expect.

> On 2017/05/30 6:19 PM, Keith Medcalf wrote:
> > I find quite the opposite.  Using a DUMP file to create a database
> > where
> the first test uses the standard dump (which does the load in a single
> transaction) is enormously faster than the second one, where the BEGIN
> TRANSACTION and COMMIT have been commented out, and thus each insert
> is performed in its own transaction.  The structure does have the
> indexes created while loading ...
> >
> >> wc -l xmltv.sql
> >   5425040    xmltv.sql
> >
> > 2017-05-30  09:43       446,700,424 xmltv.sql
> >
> > With the commands all being loaded in a single transaction:
> >
> >> timethis sqlite test1.db < xmltv.sql
> > TimeThis :  Command Line :  sqlite test1.db
> > TimeThis :    Start Time :  Tue May 30 09:41:12 2017
> > TimeThis :      End Time :  Tue May 30 09:42:14 2017
> > TimeThis :  Elapsed Time :  00:01:02.005
> >
> >
> > With the commands being in their own individual autocommit transactions:
> >   --- still running after 10 minutes
> >   --- still running after 20 minutes
> >   --- gave up after 30 minutes (and it was only 1/100th through
> inserting all the data)
> >
> >
> > The fact of the matter is, that you are much better running larger
> transactions than smaller ones.  The hardware limits the transaction
> rate (max 60 transactions/second on spinning rust -- unless you have
> lying hardware that does not flush properly).  Apparently somewhat
> limited on a machine with a 5GB/s SSD as well...since the write rate
> was hugely less than the channel limit, but the CPU was fully consumed.
> >
> > And just to be clear, putting multiple statements in a transaction
> > does
> not mean they are executed as a single VDBE program.  They are still
> executed one at a time.  The only difference is the number of
> trnsactions (and hence the commit/flush to disk count).
> >
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Importing from single-insert-statement SQL dump is 61 times slower than importing from SQL dump with one statement per row

Reply via email to