On 08/07/2015 12:35 AM, Roger Binns wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 08/06/2015 09:27 AM, Dan Kennedy wrote:
>>> Is it using more CPU cycles in mmap mode or just taking longer?
>>> If the former, does [time] attribute them to "user" or "system"?
> It is taking longer.  I have 3 XML dumps which I turn into JSON
> (incrementally) and then denormalise and insert into SQLite across
> several tables.  While all this work is going on, the code prints out
> statistics about how long it is running and about how many records per
> second are being processed.  The final database size after commit and
> vacuum is ~8GB.  There are a lot of foreign keys too, although all
> simple (referring to an INTEGER PRIMARY KEY column).
>
> I gave mmap a try - ie the *only* change was to add an extra pragma
> before the beginning of the transaction:
>
>     "pragma mmap_size="+str(2*1024*1024*1024)
>
> In hard numbers, without that line I was doing ~1,118 records per
> second and with it it does ~300.  A normal run takes about 1h20m but
> the mmap one was still running 3 hours later when I aborted it.
>
> (BTW this is all on a tmpfs filesystem on 64 bit Linux with swap
> spread across two ssds, and 32GB of ram.  ie the actual storage
> hardware isn't a factor.  Also single threaded because XML.)


When the b-tree layer requests a page reference in mmap mode, SQLite 
first needs to figure out whether it should use regular in-memory page 
(data cached in heap memory) or a mmap page (data is actually a pointer 
into mmap'd address space). If a write-transaction is open, it cannot 
use a mmap page if:

   1) there is an entry for the requested page in the wal file, or
   2) there is an entry (possibly a dirty one) for the requested page in 
the cache.

If the wal file is really large, as in this case, then test (1) can be 
quite slow.

One interesting thing is that the wal-file lookup is done before the 
cache lookup. Which doesn't seem quite right. Source code archeology and 
testing have failed to reveal why it is that way. The branch here swaps 
the two tests around:

   http://www.sqlite.org/src/info/3a82c8e6cb7227fe

Does that improve performance any in your case?

Thanks,
Dan.









>
>>> How large are you letting the wal file grow between checkpoints?
> Pretty much the entire database size.  For the tests I was starting
> with a deleted database directory (ie no pre-existing files), and then
> doing these pragmas:
>
>            "pragma page_size=4096",
> #         "pragma mmap_size="+str(2*1024*1024*1024),
>            "pragma journal_mode=wal",
>            "pragma wal_autocheckpoint=10000",
>            "pragma foreign_keys=on",
>
> Then I start a transaction, and do the importing within that
> transaction.  The database file is 4kb during that process, the wal
> file gets to be about 10GB.
>
> If I use gdb to periodically break into the running process in the
> mmap case, then it was always in sqlite3WalFindFrame.
>
> I don't need any help fixing my importing process (eg don't need a
> journal on an empty database anyway).  But it is frustrating that mmap
> only goes up to a few kb shy of 2GB even for 64 bit, and I have one
> example (ie anecdote not data) showing that mmap hurts for inserts on
>> 2GB databases.  Perhaps it is worth others testing to see if this
>> is
> a systemic problem, or just bad luck for me :-)
>
> It may also be relevant that tables add columns over time.  I
> dynamically add them after encountering previously unseen fields in
> the JSON.  However I'd expect the schema to be final a few thousand
> records in.  Most tables have 3 to 6 million records.
>
> Roger
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iEYEARECAAYFAlXDmukACgkQmOOfHg372QSVtgCbBihGgIuZqS3Yy2JARXZ1+Q59
> GmwAoMG53XxuLNhcMIw1PV46fD/Z/5tT
> =luXx
> -----END PGP SIGNATURE-----
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to