On 01/17/2015 12:04 AM, Jan Slodicka wrote:
Simon Slavin-3 wrote
Thanks to your post I discovered multiple-row inserts so that I now
understand what you asked.
Just a note that multiple-row inserts were added to SQLite relatively
recently (2012-03-20 (3.7.11)) and, because SQLite does only
database-level locking, its overhead for INSERTs is far less than that of
SQL Server.  It might be faster to use them but I would expect it to be so
much faster than many inserts as part of one transaction.
I made a fast, perhaps oversimplified, TestA:

Create an empty database. (file based)
CREATE TABLE Test (city NVARCHAR(120) NULL COLLATE NOCASE)
BEGIN
Then I inserted N records using commands such as
INSERT INTO Test VALUES('_random_'). (_random was a random 8 character long
string.)
COMMIT
Measure WAL size
Close the DB
Measure DB size

TestB was performed with the same data except the records were grouped
INSERT INTO Test VALUES('_random_'), VALUES('_random1')...

I tested different groupings (1,2,5,10) and different N values (100000 -
2500000).

Results:
- The more records are grouped, the faster.
- Grouping of 10 records was more than 2x faster than no grouping at all.
- WAL size did not depend on the grouping used and was just slightly larger
than the DB size.

Then I modified the test by adding an index on the single column. I run 2
sets of tests - one where the index was created before first insert and the
one with the index created after all inserts finished.

Results:
- Active index: WAL size ~ DB size
- Inactive index: WAL size ~ 50% of the DB size
- Tests with an active index were slower by 15-20%

Conclusion:
Switching off the indexing during a bulk insert brings minor advantages.
Multi-row inserts may bring larger advantages.
The reason of the "huge WAL problem" remains unclear.


Is it correct that you have a single transaction inserting lots of data into a table with multiple indexes on it? Something like 1GB?

When an SQL write transaction is performed, SQLite begins by modifying database pages within its internal page-cache. If the transaction modifies only a few pages, all dirty pages stay in the cache until the user executes "COMMIT", at which point they are appended to the *-wal file. However, the page-cache is of limited size (by default 2000 pages), and once it is completely full of dirty pages SQLite begins appending them to the *-wal file mid-transaction in order to free up space. At this point each time SQLite needs to modify a page that is not already in the cache it must select a dirty page to write out to the *-wal file so as to free up space to load the new page into the cache where it can be modified. This means that a transaction with a large working set may append more than one copy of a single page to the *-wal file. Maybe many, many copies.

SQLite indexes are b-trees. Each b-tree node is stored on a database page. So if you're inserting keys in random order into a large index (one too large to fit entirely within the page-cache), then virtually all inserts result in an existing dirty page being flushed from the cache and appended to the *-wal file.

To avoid this it seems like there are two options - make the page-cache large enough to hold all the indexes or insert keys into the indexes in sorted order. Inserting keys in sorted order avoids the problem because all writes go to the right-most leaf node of the index b-tree, which will almost always be present in the page-cache.

To change the size of the page-cache, see the "PRAGMA cache_size" and "PRAGMA page_size" commands.

One way to get keys to be inserted in order is to create all indexes after populating the table. SQLite sorts the data before creating the index b-tree in this case.

The other is to create a temp (or non temp) table with *exactly the same* columns and indexes as the table to be populated and insert the new rows into it. Then running:

  INSERT INTO target_tbl SELECT * FROM temp_tbl;

In this case, SQLite detects the similar schemas and copies keys in sorted order from the indexes on "temp_tbl" to the corresponding index on "target_tbl".

Dan.








Note:
The tests were programmed in c# code that called native SQLite.dll and were
run on a W7 desktop. The results may not apply to other environments.





--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Huge-WAL-log-tp79991p80070.html
Sent from the SQLite mailing list archive at Nabble.com.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to