Re: [PERFORM] wal_level=archive gives better performance than minimal - why?

2012-02-04 Thread Cédric Villemain
Le 3 février 2012 19:48, Robert Haas robertmh...@gmail.com a écrit :
 2012/1/22 Tomas Vondra t...@fuzzy.cz:
 That's suspiciously similar to the checkpoint timeout (which was set to
 4 minutes), but why should this matter for minimal WAL level and not for
 archive?

 I went through and looked at all the places where we invoke
 XLogIsNeeded().  When XLogIsNeeded(), we:

 1. WAL log creation of the _init fork of an unlogged table or an index
 on an unlogged table (otherwise, an fsync is enough)
 2. WAL log index builds
 3. WAL log changes to max_connections, max_prepared_xacts,
 max_locks_per_xact, and/or wal_level
 4. skip calling posix_fadvise(POSIX_FADV_DONTNEED) when closing a WAL file
 5. skip supplying O_DIRECT when writing WAL, if wal_sync_method is
 open_sync or open_datasync
 6. refuse to create named restore points
 7. WAL log CLUSTER
 8. WAL log COPY FROM into a newly created/truncated relation
 9. WAL log ALTER TABLE .. SET TABLESPACE
 9. WAL log cleanup info before doing an index vacuum (this one should
 probably be changed to happen only in HS mode)
 10. WAL log SELECT INTO

 It's hard to see how generating more WAL could cause a performance
 improvement, unless there's something about full page flushes being
 more efficient than partial page flushes or something like that.  But
 none of the stuff above looks likely to happen very often anyway.  But
 items #4 and #5 on that list like things that could potentially be
 causing a problem - if WAL files are being reused regularly, then
 calling POSIX_FADV_DONTNEED on them could represent a regression.  It
 might be worth compiling with POSIX_FADV_DONTNEED undefined and see
 whether that changes anything.

it should be valuable to have the kernel version and also confirm the
same behavior happens with XFS.


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] wal_level=archive gives better performance than minimal - why?

2012-02-04 Thread Tomas Vondra
On 4.2.2012 17:04, Cédric Villemain wrote:
 Le 3 février 2012 19:48, Robert Haas robertmh...@gmail.com a écrit :
 2012/1/22 Tomas Vondra t...@fuzzy.cz:
 That's suspiciously similar to the checkpoint timeout (which was set to
 4 minutes), but why should this matter for minimal WAL level and not for
 archive?

 I went through and looked at all the places where we invoke
 XLogIsNeeded().  When XLogIsNeeded(), we:

 1. WAL log creation of the _init fork of an unlogged table or an index
 on an unlogged table (otherwise, an fsync is enough)
 2. WAL log index builds
 3. WAL log changes to max_connections, max_prepared_xacts,
 max_locks_per_xact, and/or wal_level
 4. skip calling posix_fadvise(POSIX_FADV_DONTNEED) when closing a WAL file
 5. skip supplying O_DIRECT when writing WAL, if wal_sync_method is
 open_sync or open_datasync
 6. refuse to create named restore points
 7. WAL log CLUSTER
 8. WAL log COPY FROM into a newly created/truncated relation
 9. WAL log ALTER TABLE .. SET TABLESPACE
 9. WAL log cleanup info before doing an index vacuum (this one should
 probably be changed to happen only in HS mode)
 10. WAL log SELECT INTO

 It's hard to see how generating more WAL could cause a performance
 improvement, unless there's something about full page flushes being
 more efficient than partial page flushes or something like that.  But
 none of the stuff above looks likely to happen very often anyway.  But
 items #4 and #5 on that list like things that could potentially be
 causing a problem - if WAL files are being reused regularly, then
 calling POSIX_FADV_DONTNEED on them could represent a regression.  It
 might be worth compiling with POSIX_FADV_DONTNEED undefined and see
 whether that changes anything.
 
 it should be valuable to have the kernel version and also confirm the
 same behavior happens with XFS.

The kernel is 3.1.5, more precisely the uname -a gives this:

Linux rimmer 3.1.5-gentoo #1 SMP PREEMPT Sun Dec 25 14:11:19 CET 2011
x86_64 Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz GenuineIntel GNU/Linux

I plan to rerun the test with various settings, I'll add there XFS
results (so far everything was on EXT4) and I'll post an update to this
thread.

Tmoas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] wal_level=archive gives better performance than minimal - why?

2012-02-03 Thread Robert Haas
2012/1/22 Tomas Vondra t...@fuzzy.cz:
 That's suspiciously similar to the checkpoint timeout (which was set to
 4 minutes), but why should this matter for minimal WAL level and not for
 archive?

I went through and looked at all the places where we invoke
XLogIsNeeded().  When XLogIsNeeded(), we:

1. WAL log creation of the _init fork of an unlogged table or an index
on an unlogged table (otherwise, an fsync is enough)
2. WAL log index builds
3. WAL log changes to max_connections, max_prepared_xacts,
max_locks_per_xact, and/or wal_level
4. skip calling posix_fadvise(POSIX_FADV_DONTNEED) when closing a WAL file
5. skip supplying O_DIRECT when writing WAL, if wal_sync_method is
open_sync or open_datasync
6. refuse to create named restore points
7. WAL log CLUSTER
8. WAL log COPY FROM into a newly created/truncated relation
9. WAL log ALTER TABLE .. SET TABLESPACE
9. WAL log cleanup info before doing an index vacuum (this one should
probably be changed to happen only in HS mode)
10. WAL log SELECT INTO

It's hard to see how generating more WAL could cause a performance
improvement, unless there's something about full page flushes being
more efficient than partial page flushes or something like that.  But
none of the stuff above looks likely to happen very often anyway.  But
items #4 and #5 on that list like things that could potentially be
causing a problem - if WAL files are being reused regularly, then
calling POSIX_FADV_DONTNEED on them could represent a regression.  It
might be worth compiling with POSIX_FADV_DONTNEED undefined and see
whether that changes anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] wal_level=archive gives better performance than minimal - why?

2012-01-22 Thread Tomas Vondra
On 17.1.2012 01:29, Tomas Vondra wrote:
 On 16.1.2012 23:35, Greg Smith wrote:
 On 01/12/2012 06:17 PM, Tomas Vondra wrote:
 I've run a series fo pgbench benchmarks with the aim to see the effect
 of moving the WAL logs to a separate drive, and one thing that really
 surprised me is that the archive log level seems to give much better
 performance than minimal log level.

 How repeatable is this?  If you always run minimal first and then
 archive, that might be the actual cause of the difference.  In this
 situation I would normally run this 12 times, with this sort of pattern:

 minimal
 minimal
 minimal
 archive
 archive
 archive
 minimal
 minimal
 minimal
 archive
 archive
 archive

 To make sure the difference wasn't some variation on gets slower after
 each run.  pgbench suffers a lot from problems in that class.

So, I've rerun the whole benchmark (varying fsync method and wal level),
and the results are exactly the same as before ...

See this:

  http://www.fuzzy.cz/tmp/fsync/tps.html
  http://www.fuzzy.cz/tmp/fsync/latency.html

Each row represents one of the fsync methods, first column is archive
level, second column is minimal level. Notice that the performance with
archive level continuously increases and is noticeably better than the
minimal wal level. In some cases (e.g. fdatasync) the difference is up
to 15%. That's a lot.

This is a 20-minute pgbench read-write run that is executed after a
20-minute read-only pgbench run (to warm up the caches etc.)

The latencies seem generaly the same, except that with minimal WAL level
there's a 4-minute interval of significantly higher latencies at the
beginning.

That's suspiciously similar to the checkpoint timeout (which was set to
4 minutes), but why should this matter for minimal WAL level and not for
archive?

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] wal_level=archive gives better performance than minimal - why?

2012-01-16 Thread Greg Smith

On 01/12/2012 06:17 PM, Tomas Vondra wrote:

I've run a series fo pgbench benchmarks with the aim to see the effect
of moving the WAL logs to a separate drive, and one thing that really
surprised me is that the archive log level seems to give much better
performance than minimal log level.


How repeatable is this?  If you always run minimal first and then 
archive, that might be the actual cause of the difference.  In this 
situation I would normally run this 12 times, with this sort of pattern:


minimal
minimal
minimal
archive
archive
archive
minimal
minimal
minimal
archive
archive
archive

To make sure the difference wasn't some variation on gets slower after 
each run.  pgbench suffers a lot from problems in that class.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] wal_level=archive gives better performance than minimal - why?

2012-01-16 Thread Tomas Vondra
On 16.1.2012 23:35, Greg Smith wrote:
 On 01/12/2012 06:17 PM, Tomas Vondra wrote:
 I've run a series fo pgbench benchmarks with the aim to see the effect
 of moving the WAL logs to a separate drive, and one thing that really
 surprised me is that the archive log level seems to give much better
 performance than minimal log level.
 
 How repeatable is this?  If you always run minimal first and then
 archive, that might be the actual cause of the difference.  In this
 situation I would normally run this 12 times, with this sort of pattern:
 
 minimal
 minimal
 minimal
 archive
 archive
 archive
 minimal
 minimal
 minimal
 archive
 archive
 archive
 
 To make sure the difference wasn't some variation on gets slower after
 each run.  pgbench suffers a lot from problems in that class.

AFAIK it's well repeatable - the primary goal of the benchmark was to
see the benefir of moving the WAL to a separate device (with various WAL
levels and device types - SSD and HDD).

I plan to rerun the whole thing this week with a bit more details logged
to rule out basic configuration mistakes etc.

Each run is completely separate (rebuilt from scratch) and takes about 1
hour to complete. Each pgbench run consists of these steps

  1) rebuild the data from scratch
  2) 10-minute warmup (read-only run)
  3) 20-minute read-only run
  4) checkpoint
  5) 20-minute read-write run

and the results are very stable.

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance