Re: [sqlite] Strange index creation/insertion performance between Windows and Linux 64-bit

Jay A. Kreibich Sat, 18 Jul 2009 08:05:21 -0700

On Sat, Jul 18, 2009 at 12:22:08AM -0700, Brian Dantes scratched on the wall:
> I have a largish DB around 1GB in size. There is a table with 5 million rows
> in it that has a 3-key index on it. This database file is fragmented -- to
> what degree I'm not sure.
> 
> Using sqlite 3.6.14, dropping and recreating the index under WinXP Pro (on a
> local disk) with no other activity takes about 90 seconds. With the same
> database on Red Hat Linux 64-bit with no other activity and a local disk,
> the index recreation takes almost 30 *minutes*. The activity is completely
> I/O bound. If I vacuum the database, the Windows time drops to 70 seconds,
> and the Linux time drops to 7 minutes.


  Assuming your data import is semi-random (in terms of index order)
  and assuming you haven't raised the default cache size, chances are
  you're going to be thrashing the page cache.  On nearly any system,
  bumping up the page cache size anywhere from 2x to 100x will almost
  always have a dramatic effect on the index build speed.

  If you're using a more modern version of SQLite, the Windows database
  will default the page size to the block size of the file system
  you're using.  I understand on most Windows systems that is 4K.  The
  UNIX system is very likely using the default 1K page size, unless
  you've changed it.  Since the page cache is defined by page numbers
  and not memory size, the page cache on the Windows box is likely to
  be 4x bigger.  An index is also a fairly efficient use of page space
  (assuming one of your index fields is not large strings) so the
  Windows box will be doing about 1/4th the I/O.  Matching the pages to
  blocks may also help with I/O performance.

  There is also the question of how much memory both systems have and
  how much the OS can use for a file cache, I'm guessing that's not as
  critical as the difference in page size.

  The other issue is disk speed.  Because SQLite waits for the drive to
  indicate the data has been physically written to the platters, there
  is a direct link between disk rotation speed and I/O performance.  If
  one system has a 5200 RPM disk and one has a 7200, you're likely to
  notice a difference.

> Conversely, with this same table and index starting from empty, if I start
> inserting rows with the index in place, on Windows the insertion speed drops
> dramatically after about 100-200K rows and takes about 6-7 *hours* to
> complete. On Linux 64-bit, the same experiment takes less than an hour to
> complete and the insertion speed seems fairly constant.

  It sounds like you're not using transactions to batch the inserts,
  requiring I/O transactions for each row.  For bulk inserts, it is
  much more efficient to batch 100 to 10000+ inserts into a transaction
  (BEGIN; INSERT...; INSERT...; ... COMMIT;) to reduce the I/O load.
  That should reduce your import times on both platforms.

  I'd also hope the times became similar if you use transactions.  I'm
  a lot more surprised by this set of results, but I'm guessing the
  root of it is still the difference in page size.  There might be some
  issue on the Windows platform that is getting uncovered by the
  different access patterns that further limits performance.




  Try bumping the page size on the Linux DB up to 4K and see what
  happens.  Also, try using transactions for the inserts and see what
  happens, as well as verify the disk speeds are similar (although
  using transactions should help with this).


    -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Strange index creation/insertion performance between Windows and Linux 64-bit

Reply via email to