Re: [sqlite] Best practices for huge batch inserts with WAL?
Scott Hess wrote: > WRT #3, you could also consider journal_mode off (or memory, if your code > requires transactions to work right). In that case, the database state is > indeterminate if you have an app-level crash, but you should be fine if you > make it to the end. It would be a better idea to change to the rollback journal mode (DELETE/ TRUNCATE/PERSIST) - it's guaranteed to work correctly in all situations, and with a small database to be filled, the journal cannot become large because it stores only _old_ data. Regards, Clemens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Best practices for huge batch inserts with WAL?
What I'd do: 1) Look at the indices, and make sure the input is sorted to insert in index order. Also drop any unnecessary indices and add them back at the end. [Read the code for vacuum to see what kinds of things make sense to defer.] 2) Bump up the cache a lot. Inserting in sorted order makes this less essential, but for a one-off like this it can't hurt. 3) Turn synchronous off while loading, and commit to starting over if you get an OS-level crash. Even with synchronous off, things should be correctly ordered for app-level crashes. 4) Maybe use exclusive locking? That wouldn't have a huge impact if you're batching inserts, I expect. --- WRT #3, you could also consider journal_mode off (or memory, if your code requires transactions to work right). In that case, the database state is indeterminate if you have an app-level crash, but you should be fine if you make it to the end. WRT #1, I would consider fts as an index for these purposes, but it may require schema changes to make is possible to selectively disable/enable the indexing. See https://sqlite.org/fts5.html#external_content_tables . I'm not sure there would be much gain from disabling fts when loading, though, as long as your bulk transactions are large. -scott On Mon, Apr 24, 2017 at 11:00 AM, Jens Alfkewrote: > I’m importing a large data set with a lot of rows — an entire Wikipedia > dump, about 60GB, one article per row — into a brand new SQLite database in > WAL mode. What’s the fastest way to import it? > > I started with one big transaction, but noted that (of course) the WAL > file was growing rapidly while the main database file stayed tiny. I > figured this would become inefficient, so I stopped the run and adjusted my > code to commit and re-open a transaction every 10,000 rows. > > With that, the import started quickly, but as time went on the commits > were taking longer and longer, so the process was spending most of its time > committing. (I wasn’t able to finish the job, as it ran into an unrelated > fatal error in my code about ⅔ of the way through.) > > Would it have been faster to use a single transaction? Even if the commit > at the end is epic-length, it wouldn’t be rewriting the b-tree nodes over > and over again. If so, would periodic WAL checkpoints help? > > —Jens > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Best practices for huge batch inserts with WAL?
Do you have some indices on the table? Drop them before the huge insert and then recreate again. It is faster. On Mon, 24 Apr 2017 11:00:00 -0700 Jens Alfkewrote: > I’m importing a large data set with a lot of rows — an entire Wikipedia dump, > about 60GB, one article per row — into a brand new SQLite database in WAL > mode. What’s the fastest way to import it? > > I started with one big transaction, but noted that (of course) the WAL file > was growing rapidly while the main database file stayed tiny. I figured this > would become inefficient, so I stopped the run and adjusted my code to commit > and re-open a transaction every 10,000 rows. > > With that, the import started quickly, but as time went on the commits were > taking longer and longer, so the process was spending most of its time > committing. (I wasn’t able to finish the job, as it ran into an unrelated > fatal error in my code about ⅔ of the way through.) > > Would it have been faster to use a single transaction? Even if the commit at > the end is epic-length, it wouldn’t be rewriting the b-tree nodes over and > over again. If so, would periodic WAL checkpoints help? > > —Jens > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users -- http://fresh.flatassembler.net http://asm32.info John Found ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Best practices for huge batch inserts with WAL?
I’m importing a large data set with a lot of rows — an entire Wikipedia dump, about 60GB, one article per row — into a brand new SQLite database in WAL mode. What’s the fastest way to import it? I started with one big transaction, but noted that (of course) the WAL file was growing rapidly while the main database file stayed tiny. I figured this would become inefficient, so I stopped the run and adjusted my code to commit and re-open a transaction every 10,000 rows. With that, the import started quickly, but as time went on the commits were taking longer and longer, so the process was spending most of its time committing. (I wasn’t able to finish the job, as it ran into an unrelated fatal error in my code about ⅔ of the way through.) Would it have been faster to use a single transaction? Even if the commit at the end is epic-length, it wouldn’t be rewriting the b-tree nodes over and over again. If so, would periodic WAL checkpoints help? —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users