Re: [sqlite] Best practices for huge batch inserts with WAL?

2017-04-24 Thread Clemens Ladisch
Scott Hess wrote:
> WRT #3, you could also consider journal_mode off (or memory, if your code
> requires transactions to work right).  In that case, the database state is
> indeterminate if you have an app-level crash, but you should be fine if you
> make it to the end.

It would be a better idea to change to the rollback journal mode (DELETE/
TRUNCATE/PERSIST) - it's guaranteed to work correctly in all situations,
and with a small database to be filled, the journal cannot become large
because it stores only _old_ data.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Best practices for huge batch inserts with WAL?

2017-04-24 Thread Scott Hess
What I'd do:

1) Look at the indices, and make sure the input is sorted to insert in
index order.  Also drop any unnecessary indices and add them back at the
end.  [Read the code for vacuum to see what kinds of things make sense to
defer.]

2) Bump up the cache a lot.  Inserting in sorted order makes this less
essential, but for a one-off like this it can't hurt.

3) Turn synchronous off while loading, and commit to starting over if you
get an OS-level crash.  Even with synchronous off, things should be
correctly ordered for app-level crashes.

4) Maybe use exclusive locking?  That wouldn't have a huge impact if you're
batching inserts, I expect.

---

WRT #3, you could also consider journal_mode off (or memory, if your code
requires transactions to work right).  In that case, the database state is
indeterminate if you have an app-level crash, but you should be fine if you
make it to the end.

WRT #1, I would consider fts as an index for these purposes, but it may
require schema changes to make is possible to selectively disable/enable
the indexing.  See https://sqlite.org/fts5.html#external_content_tables .
I'm not sure there would be much gain from disabling fts when loading,
though, as long as your bulk transactions are large.

-scott


On Mon, Apr 24, 2017 at 11:00 AM, Jens Alfke  wrote:

> I’m importing a large data set with a lot of rows — an entire Wikipedia
> dump, about 60GB, one article per row — into a brand new SQLite database in
> WAL mode. What’s the fastest way to import it?
>
> I started with one big transaction, but noted that (of course) the WAL
> file was growing rapidly while the main database file stayed tiny. I
> figured this would become inefficient, so I stopped the run and adjusted my
> code to commit and re-open a transaction every 10,000 rows.
>
> With that, the import started quickly, but as time went on the commits
> were taking longer and longer, so the process was spending most of its time
> committing. (I wasn’t able to finish the job, as it ran into an unrelated
> fatal error in my code about ⅔ of the way through.)
>
> Would it have been faster to use a single transaction? Even if the commit
> at the end is epic-length, it wouldn’t be rewriting the b-tree nodes over
> and over again. If so, would periodic WAL checkpoints help?
>
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Best practices for huge batch inserts with WAL?

2017-04-24 Thread John Found
Do you have some indices on the table? Drop them before the huge insert and 
then recreate again. It is faster.

On Mon, 24 Apr 2017 11:00:00 -0700
Jens Alfke  wrote:

> I’m importing a large data set with a lot of rows — an entire Wikipedia dump, 
> about 60GB, one article per row — into a brand new SQLite database in WAL 
> mode. What’s the fastest way to import it?
> 
> I started with one big transaction, but noted that (of course) the WAL file 
> was growing rapidly while the main database file stayed tiny. I figured this 
> would become inefficient, so I stopped the run and adjusted my code to commit 
> and re-open a transaction every 10,000 rows.
> 
> With that, the import started quickly, but as time went on the commits were 
> taking longer and longer, so the process was spending most of its time 
> committing. (I wasn’t able to finish the job, as it ran into an unrelated 
> fatal error in my code about ⅔ of the way through.)
> 
> Would it have been faster to use a single transaction? Even if the commit at 
> the end is epic-length, it wouldn’t be rewriting the b-tree nodes over and 
> over again. If so, would periodic WAL checkpoints help?
> 
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


-- 
http://fresh.flatassembler.net
http://asm32.info
John Found 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Best practices for huge batch inserts with WAL?

2017-04-24 Thread Jens Alfke
I’m importing a large data set with a lot of rows — an entire Wikipedia dump, 
about 60GB, one article per row — into a brand new SQLite database in WAL mode. 
What’s the fastest way to import it?

I started with one big transaction, but noted that (of course) the WAL file was 
growing rapidly while the main database file stayed tiny. I figured this would 
become inefficient, so I stopped the run and adjusted my code to commit and 
re-open a transaction every 10,000 rows.

With that, the import started quickly, but as time went on the commits were 
taking longer and longer, so the process was spending most of its time 
committing. (I wasn’t able to finish the job, as it ran into an unrelated fatal 
error in my code about ⅔ of the way through.)

Would it have been faster to use a single transaction? Even if the commit at 
the end is epic-length, it wouldn’t be rewriting the b-tree nodes over and over 
again. If so, would periodic WAL checkpoints help?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users