Re: [sqlite] Dramatic performance difference between different "PRAGMA synchronous" settings for bulk inserts

2014-01-20 Thread Mario M. Westphal
> Unrelated to your question, but, take a look at "external content" FTS4
table
> they dramatically cut down the amount of duplicated data [1])

Thanks for the tip. I'll definitely check that.
Currently I build the contents for FTS from several other tables, combining,
splitting, merging data via SQL as needed when INSERTing into the FTS
tables.
Maybe I can safe some of this by linking to the original data tables.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Dramatic performance difference between different "PRAGMA synchronous" settings for bulk inserts

2014-01-20 Thread Dominique Devienne
On Sun, Jan 19, 2014 at 10:59 PM, Mario M. Westphal  wrote:

> If I set wal_autocheckpoint=1, I will get 1/10 of the synchs and WAL
> file of about 10 MB, correct?


http://www.sqlite.org/pragma.html#pragma_wal_autocheckpoint states it's a
page count, so that depends on
http://www.sqlite.org/pragma.html#pragma_page_size

With a 1KB page size, sure, that's 10MB. --DD
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Dramatic performance difference between different "PRAGMA synchronous" settings for bulk inserts

2014-01-19 Thread Mario M. Westphal
> Unrelated to your question, but, take a look at "external content" 
> FTS4 table they dramatically cut down the amount of duplicated data 
> [1])

Thanks for the tip. I'll definitely check that.
Currently I build the contents for FTS dynamically from several other
tables, combining, splitting, merging data via SQL as needed when INSERTing
into the FTS tables. Maybe I can safe some of these efforts and reduce the
amount of data in the FTS tables.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Dramatic performance difference between different "PRAGMA synchronous" settings for bulk inserts

2014-01-19 Thread Mario M. Westphal

>
If you want to try running with synchronous=NORMAL, you might try setting
PRAGMA wal_autocheckpoint=10; (from the default of 1000) which will
make for dramatically larger WAL files, but also dramatically fewer syncs.
Then the syncs will use just 5 or 6 minutes instead of 4.5 hours. Hopefully.
<

Thanks for the tip! I will add that and combine it with synchronous=NORMAL.

The wal_autocheckpoint documentation is not that clear (IMHO) about how this
setting can impact performance.
Maybe adding a sentence to the documentation, explaining the relation
between wal_autocheckpoint, synch frequency and performance, would help
other (new) users.

If I set wal_autocheckpoint=1, I will get 1/10 of the synchs and WAL
file of about 10 MB, correct?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Dramatic performance difference between different "PRAGMA synchronous" settings for bulk inserts

2014-01-19 Thread Petite Abeille

On Jan 19, 2014, at 3:00 PM, Mario M. Westphal  wrote:

> Also FTS4 is used, which also creates large tables.

(Unrelated to your question, but, take a look at "external content" FTS4 tableā€¦ 
they dramatically cut down the amount of duplicated data [1])

> During an ingest phase, my application pumps in hundreds of thousands of
> records into multiple tables.

For initial, bulk loading, I tend to use the following pragma combo:

pragma journal_mode = off
pragma locking_mode = exclusive
pragma synchronous = off


[1] http://www.sqlite.org/fts3.html#section_6_2_2
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Dramatic performance difference between different "PRAGMA synchronous" settings for bulk inserts

2014-01-19 Thread Richard Hipp
In WAL mode with synchronous=NORMAL, SQLite only syncs (FlushFileBuffers()
on windows) when it does a checkpoint operation.  Checkpoints should be
happening automatically whenever the WAL file exceeds about 1MB in size.

For an 8GB database, probably there are about 8000 sync operations,
therefore.  If each takes about 2 seconds, that would pretty much account
for the extra 4.5 hours.

If you are creating a new database from scratch, it is safe to set
synchronous=OFF.  If you lose power in the middle, your database file will
probably be corrupt, but since you were creating it from scratch you can
easily recover just be starting the database creation process over again
from the beginning.

If you want to try running with synchronous=NORMAL, you might try setting
PRAGMA wal_autocheckpoint=10; (from the default of 1000) which will
make for dramatically larger WAL files, but also dramatically fewer syncs.
Then the syncs will use just 5 or 6 minutes instead of 4.5 hours. Hopefully.



On Sun, Jan 19, 2014 at 9:00 AM, Mario M. Westphal  wrote:

> I have a performance effect which I don't quite understand.
> Maybe I'm using the wrong settings or something. Sorry for the long post,
> but I wanted to include all the info that may be important.
>
> My software is written in C++, runs on Windows 7/8, the SQLite database
> file
> is either on a local SATA RAID disk or a SSD.
> Typical database sizes are between 2 GB and 8 GB.
> The largest tables hold several million entries. Also FTS4 is used, which
> also creates large tables.
> Fast internal RAID disks, SDD. Four Xeon cores. 8 GB RAM.
>
> I'm using SQLite 3.8.0.2
> WAL mode, shared cache enabled.
> locking_mode=NORMAL
> checkpoint_fullfsync=0
> pragma page_size=4096
> pragma cache_size=16384
>
> General (retrieval) performance is excellent!
>
>
> During an ingest phase, my application pumps in hundreds of thousands of
> records into multiple tables.
> There are massive amounts of writes during that phase, different record
> sizes, tables with one to four indices etc.
>
> My application is multi-threaded and inserts data into the database
> concurrently from multiple threads.
> The threads process data in batches, and use SQLite transactions to process
> all records of a batch into the database. Transactions gain a lot of speed,
> which outweighs the side effects of potential blocking.
> The threads monitor the execution times of the database operations and
> adjust the batch size to balance speed and transaction lock duration.
> Slower
> operations cause smaller batches, which results in shorter database locks
> and better concurrency. The system adapts fairly well to system performance
> and data structure.
>
> The performance was not that bad, but far from good.
>
> For a given set of input data (100,000 "elements"), the execution estimate
> was about 5 hours.
> Database on a high-speed SSD.
> The largest table holds about 5 million entries afterwards.
>
> ***With one single change*** I improved the execution time from 5 hours
> down
> to about 30 minutes!
>
> I changed
>
> PRAGMA synchronous=NORMAL
>
> to
>
> PRAGMA synchronous=OFF
>
> Also all other database write operations just 'fly' now.
> I'm even more impressed with SQLite than before, but I wonder why is the
> change so _dramatic_ ?
>
> From the docs my impression was that using WAL mode is ideal for bulk
> inserts. That wrapping large bulks of data into smaller batches, wrapped in
> BEGIN/COMMT is best for performance etc. That using synchronous=NORMAL
> limits the file system flush/wait operations certain really important
> operations.
>
> I logged the execution times of various operations in this phase to a text
> file. Everything was fast, the processing, the INSERTs etc.
> But COMMIT operations sometimes took 20s, then 0.2s, then again 10s. That's
> the time SQLite spends in the execute call with "COMMIT".
>
> Of course the amount of data written in each transaction block varied, but
> in general, 85% of the total execution time of my code was spent in the
> COMMIT call.
> My application was the only application with measurable disk I/O at the
> time. No virus checker etc.
>
> Changing to synchronous=OFF made the commits 10 times faster.
>
> Is this the expected behavior or am I missing something obvious?
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Dramatic performance difference between different "PRAGMA synchronous" settings for bulk inserts

2014-01-19 Thread Simon Slavin

On 19 Jan 2014, at 2:00pm, Mario M. Westphal  wrote:

> I logged the execution times of various operations in this phase to a text
> file. Everything was fast, the processing, the INSERTs etc.
> But COMMIT operations sometimes took 20s, then 0.2s, then again 10s. That's
> the time SQLite spends in the execute call with "COMMIT".

First, I want to check that you've read



I can't answer your question but only a big UPDATE or DELETE would legitimately 
take 20s.  If you're seeing INSERT times of 10s or 20s then you're seeing the 
result of two threads clashing over database access.  One thread has to back 
off and wait for the other to finish, and the retry times eventually reach 10s 
and 20s before they get so long SQLite gives up and returns an error.  So 
you're not seeing a process take 20s to do useful stuff, you're seeing one 
thread keep the database busy -- so busy that it's always busy when the other 
thread tries to write to it.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Dramatic performance difference between different "PRAGMA synchronous" settings for bulk inserts

2014-01-19 Thread Mario M. Westphal
I have a performance effect which I don't quite understand.
Maybe I'm using the wrong settings or something. Sorry for the long post,
but I wanted to include all the info that may be important.
 
My software is written in C++, runs on Windows 7/8, the SQLite database file
is either on a local SATA RAID disk or a SSD.
Typical database sizes are between 2 GB and 8 GB.
The largest tables hold several million entries. Also FTS4 is used, which
also creates large tables.
Fast internal RAID disks, SDD. Four Xeon cores. 8 GB RAM.
 
I'm using SQLite 3.8.0.2
WAL mode, shared cache enabled.
locking_mode=NORMAL
checkpoint_fullfsync=0
pragma page_size=4096
pragma cache_size=16384
 
General (retrieval) performance is excellent!
 
 
During an ingest phase, my application pumps in hundreds of thousands of
records into multiple tables.
There are massive amounts of writes during that phase, different record
sizes, tables with one to four indices etc.
 
My application is multi-threaded and inserts data into the database
concurrently from multiple threads.
The threads process data in batches, and use SQLite transactions to process
all records of a batch into the database. Transactions gain a lot of speed,
which outweighs the side effects of potential blocking.
The threads monitor the execution times of the database operations and
adjust the batch size to balance speed and transaction lock duration. Slower
operations cause smaller batches, which results in shorter database locks
and better concurrency. The system adapts fairly well to system performance
and data structure.
 
The performance was not that bad, but far from good.
 
For a given set of input data (100,000 "elements"), the execution estimate
was about 5 hours.
Database on a high-speed SSD.
The largest table holds about 5 million entries afterwards.
 
***With one single change*** I improved the execution time from 5 hours down
to about 30 minutes!
 
I changed 
 
PRAGMA synchronous=NORMAL
 
to 
 
PRAGMA synchronous=OFF
 
Also all other database write operations just 'fly' now.
I'm even more impressed with SQLite than before, but I wonder why is the
change so _dramatic_ ?
 
>From the docs my impression was that using WAL mode is ideal for bulk
inserts. That wrapping large bulks of data into smaller batches, wrapped in
BEGIN/COMMT is best for performance etc. That using synchronous=NORMAL
limits the file system flush/wait operations certain really important
operations.

I logged the execution times of various operations in this phase to a text
file. Everything was fast, the processing, the INSERTs etc.
But COMMIT operations sometimes took 20s, then 0.2s, then again 10s. That's
the time SQLite spends in the execute call with "COMMIT".

Of course the amount of data written in each transaction block varied, but
in general, 85% of the total execution time of my code was spent in the
COMMIT call.
My application was the only application with measurable disk I/O at the
time. No virus checker etc.
 
Changing to synchronous=OFF made the commits 10 times faster.
 
Is this the expected behavior or am I missing something obvious?
 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users