Re: [sqlite] Commit frequency and performance

2009-02-04 Thread Jay A. Kreibich
On Wed, Feb 04, 2009 at 10:10:15PM -0700, Gerry Snyder scratched on the wall:
> pri...@gmail.com wrote:
> > Yes, there are 3 indexes being created
> >
> > I'll post again after figuring out which of the changes improved the
> > performance.  Thanks for the clues!
>   
> Since you are able to do some experimenting, try entering the data 
> without the indices, and then create them.

  Generally, this won't make a huge difference.  It is more or less the
  same as doing all the inserts in one commit.  Dealing with sorting
  that much data all at once can actually be slower.

  What will make the biggest difference is how much memory you have
  available.  As you noticed, part of that is page size, but a big part
  is also the size of the page cache.  The default settings are 1K
  pages w/ 2000 page cache.  Each page in the cache has ~0.5K overhead,
  so that's a 3MB cache.  If you're doing this on a beefy desktop with
  the default 1K pages, crank the cache up to 100,000 pages or more.
  Adjust accordingly for larger pages.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Commit frequency and performance

2009-02-04 Thread Gerry Snyder
pri...@gmail.com wrote:
> Yes, there are 3 indexes being created
>
> I'll post again after figuring out which of the changes improved the
> performance.  Thanks for the clues!
>   
Since you are able to do some experimenting, try entering the data 
without the indices, and then create them.


Gerry

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Commit frequency and performance

2009-02-04 Thread prirun
Yes, there are 3 indexes being created.  For 2 indexes, the data is in
order.  For the 3rd index, the data is not in order.  I can understand
how that will cause extra seeking.  It's a good point.

I just ran another test after downloading the latest SQLite version,
changed the page size to 4K on the main database (the 230MB file),
kept pragma synchronous=off, and changed the commit interval back to 5
seconds (even though it shouldn't be doing anything because of
synchronous=off).  This improved performance from 190 minutes to 106
minutes, and disk I/O and CPU overlap was fairly steady throughout,
with the CPU waiting for I/O about 30% of the time.   That's higher
than the sequential file case because of the extra seeking needed to
build the trees/indexes, and it's less than double the best-case
sequential time, so I'm okay with that.  With the slower times I
reported before, there would be long periods of time, like 10-15
seconds, where the CPU was completely idle while data was being
written at relatively slow speeds, like < 1000 I/O's per second.  I'm
assuming these were mostly random I/O, hence the slow rate.  With the
lastest changes, it doesn't do that.

I'll post again after figuring out which of the changes improved the
performance.  Thanks for the clues!

Jim

On 2/4/09, Gerry Snyder  wrote:
> Jim Wilcoxson wrote:
>> I am creating an SQLite database via Python, and trying to understand
>> some performance issues.
>
> Wild guess--are you creating an index (or indices) on data being entered
> out of order?
>
> HTH,
>
>
> Gerry
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Commit frequency and performance

2009-02-04 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

pri...@gmail.com wrote:
> I still can't understand how 1 long transaction (no syncs) can be
> slower than periodic 5-second commits.  Any ideas?

I'd suggest posting a link to your code/representative data so that
others can try to reproduce it.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkmKPIEACgkQmOOfHg372QSCHQCg4t3FcD+7kPjP+2i0rRcBAfbj
jGsAoIEKNTKbrrEyX4v17ioeUARPCC1v
=rbcc
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Commit frequency and performance

2009-02-04 Thread prirun
Thanks Roger.  There is no other activity on this machine.  I have run
vmstat 5 during all of the different cases.  For plain sequential
files, Linux is able to almost completely overlap I/O with CPU, so the
real time is close to user+sys time.  With SQLite, there are often
periods where the CPU is stalled waiting on I/O.

I still can't understand how 1 long transaction (no syncs) can be
slower than periodic 5-second commits.  Any ideas?

Jim

On 2/4/09, Roger Binns  wrote:
>
> Jim Wilcoxson wrote:
>> Can anyone shed light on why building a database inside a single
>> transaction would be slower than periodically commiting?
>
> If you look at the user and sys times then each approach is almost the
> same.  The real times differing so much indicates other activity on the
> machine as well as I/O wait.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Commit frequency and performance

2009-02-04 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jim Wilcoxson wrote:
> Can anyone shed light on why building a database inside a single
> transaction would be slower than periodically commiting?

If you look at the user and sys times then each approach is almost the
same.  The real times differing so much indicates other activity on the
machine as well as I/O wait.

Assuming you are using Linux, one gotcha with ext3 filesystems is that
the fsync/fdatasync turns into a sync - ie they cause all outstanding
data for the entire filesystem to be written not just the file in the
fsync request.  Consequently *any* other filesystem activity will slow
your SQLite activity and give the kind of varying real times you see.  I
like to run "vmstat 1" to get a better picture of what is going on with
disk activity.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkmKLFUACgkQmOOfHg372QSlqACfWr+iKHni+VD2BTx4Do8MU83j
eN8AnRUTysAky05K6sj6f9DuR7S15fqH
=3fnx
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Commit frequency and performance

2009-02-04 Thread Jim Wilcoxson
I am creating an SQLite database via Python, and trying to understand
some performance issues.

This application does 3.8M inserts.   Most inserts are to a main
database that ends up being around 293MB.  Around 740K of the inserts
with larger data records are to 25 related databases of around 600MB
each.  The main database and 1 subdatabase are active together, then
the subdatabase is closed when it gets to around 600MB and a new one
is opened.  This is an archiving application.

As a performance baseline, I changed the application to just write out
plain sequential text files, and it takes around 62 minutes to
complete.  This includes all the Python overhead and the raw hard
drive overhead to write out the same amount of data in a "best case"
scenario:

  Time: 3754.16 seconds
  Files: 708120 Bytes: 31565490710

  real62m34.335s
  user53m55.492s
  sys 2m58.305s

If I use SQLite and commit every 5 seconds, I get this performance:

  Time: 11383.95 seconds
  Files: 708120 Bytes: 31565490710

  real189m45.061s
  user55m58.638s
  sys 4m46.528s

If I commit every 30 seconds, I get this:

  Time: 13021.34 seconds
  Files: 708120 Bytes: 31565490710

  real217m2.078s
  user56m9.647s
  sys 4m59.850s

I believe fsync/fdatasync are significant performance issues, so I
thought that it should improve performance if I start a transaction,
do ALL of the inserts, then commit.  Since I'm starting with an empty
database, the journal should stay mostly empty, the database will be
built without any syncs, and when I commit, the small journal will be
deleted.  I also added pragma synchronous=off.  (All of these test are
run with pragma lockmode exclusive.)  But when I tried this, the
performance was slowest of all:

  Time: 15356.42 seconds
  Files: 708120 Bytes: 31565490710

  real255m57.523s
  user55m51.215s
  sys 4m22.173s

Can anyone shed light on why building a database inside a single
transaction would be slower than periodically commiting?

Thanks,
Jim
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users