Re: [HACKERS] synchronous commit vs. hint bits

2011-12-01 Thread YAMAMOTO Takashi
hi,

 On Wed, Nov 30, 2011 at 1:37 AM, YAMAMOTO Takashi
 y...@mwd.biglobe.ne.jp wrote:
 Yes, I would expect that.  What kind of increase are you seeing?  Is
 it causing a problem for you, or are you just making an observation?

 i was curious because my application uses async commits mainly to
 avoid frequent fsync.  i have no numbers right now.
 
 Oh, that's interesting.  Why do you want to avoid frequent fsyncs?

simply because it was expensive on my environment.

 I
 thought the point of synchronous_commit=off was to move the fsyncs to
 the background, but not necessarily to decrease the frequency.

it makes sense.
but it's normal for users to abuse features. :)

YAMAMOTO Takashi

 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-12-01 Thread Andres Freund
Hi Robert, 
On Wednesday, November 30, 2011 02:10:00 PM Robert Haas wrote:
 On Wed, Nov 30, 2011 at 1:37 AM, YAMAMOTO Takashi
 
 y...@mwd.biglobe.ne.jp wrote:
  Yes, I would expect that.  What kind of increase are you seeing?  Is
  it causing a problem for you, or are you just making an observation?
  
  i was curious because my application uses async commits mainly to
  avoid frequent fsync.  i have no numbers right now.
 Oh, that's interesting.  Why do you want to avoid frequent fsyncs?  I
 thought the point of synchronous_commit=off was to move the fsyncs to
 the background, but not necessarily to decrease the frequency.
Is that so? If it wouldn't avoid fsyncs how could you reach multiple thousand 
TPS in a writing pgbench run on a pretty ordinary system with fsync=on?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-12-01 Thread Robert Haas
On Thu, Dec 1, 2011 at 4:09 AM, Andres Freund and...@anarazel.de wrote:
 Oh, that's interesting.  Why do you want to avoid frequent fsyncs?  I
 thought the point of synchronous_commit=off was to move the fsyncs to
 the background, but not necessarily to decrease the frequency.
 Is that so? If it wouldn't avoid fsyncs how could you reach multiple thousand
 TPS in a writing pgbench run on a pretty ordinary system with fsync=on?

Eh, well, what would stop you from achieving that?  An fsync operation
that occurs in the background doesn't block further transactions from
completing.  Meanwhile, getting the WAL records on disk faster allows
us to set hint bits sooner, which is a significant win, as shown by
the numbers I posted upthread.

One possible downside of trying to kick off the fsync more quickly is
that if there are a continuous stream of background fsyncs going on, a
process that needs to do an XLogFlush in the foreground (i.e. a
synchronous_commit=on transaction in the middle of many
synchronous_commit=off transactions) might be more likely to find an
fsync already in progress and therefore need to wait until it
completes before starting the next one, slowing things down.  But I'm
a bit reluctant to believe that is a real effect without some data.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-12-01 Thread Andres Freund
On Thursday, December 01, 2011 03:11:43 PM Robert Haas wrote:
 On Thu, Dec 1, 2011 at 4:09 AM, Andres Freund and...@anarazel.de wrote:
  Oh, that's interesting.  Why do you want to avoid frequent fsyncs?  I
  thought the point of synchronous_commit=off was to move the fsyncs to
  the background, but not necessarily to decrease the frequency.
  
  Is that so? If it wouldn't avoid fsyncs how could you reach multiple
  thousand TPS in a writing pgbench run on a pretty ordinary system with
  fsync=on?
 Eh, well, what would stop you from achieving that?  An fsync operation
 that occurs in the background doesn't block further transactions from
 completing. 
But it will slow down overall system io. For one an fsync() on linux will 
cause a queue drain on the io submit queue. For another it counts against the 
total available random io ops a device can do.
Which in turn will cause slowdown for anything else doing syncronous random 
io. I.e. read(2).

 Meanwhile, getting the WAL records on disk faster allows
 us to set hint bits sooner, which is a significant win, as shown by
 the numbers I posted upthread.
Oh, that part I dont doubt. Sorry for that.


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-12-01 Thread Jeff Janes
On Thu, Dec 1, 2011 at 6:11 AM, Robert Haas robertmh...@gmail.com wrote:

 One possible downside of trying to kick off the fsync more quickly is
 that if there are a continuous stream of background fsyncs going on, a
 process that needs to do an XLogFlush in the foreground (i.e. a
 synchronous_commit=on transaction in the middle of many
 synchronous_commit=off transactions) might be more likely to find an
 fsync already in progress and therefore need to wait until it
 completes before starting the next one, slowing things down.

Waiting until the other one completes is how it currently is
implemented, but is it necessary from a correctness view?  It seems
like the WALWriteLock only needs to protect the write, and not the
sync (assuming the sync method allows those to be separate actions),
and that there could be multiple fsync requests from different
processes pending at the same time without a correctness problem.
After dropping the WALWriteLock and doing the fsync, it would then
have to take the lock again or maybe just a spinlock to update the
accounting for how far the log has been flushed.  So rather than one
committing process blocking on fsync and bunch of others blocking on
WALWriteLock, you could have all of them blocking on different fsyncs
and let the kernel deal with waking them up.  I don't know at all
whether this would actually be an improvement, assuming it would even
be safe.  Reading the xlog.c code, it is hard to tell which
designs/features are there for safety and which ones are there for
suspected performance reasons.

Sorry for high-jacking your topic, it is just something I had been
thinking about for a while.


Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-12-01 Thread Robert Haas
On Thu, Dec 1, 2011 at 9:58 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 Waiting until the other one completes is how it currently is
 implemented, but is it necessary from a correctness view?  It seems
 like the WALWriteLock only needs to protect the write, and not the
 sync (assuming the sync method allows those to be separate actions),
 and that there could be multiple fsync requests from different
 processes pending at the same time without a correctness problem.

I've wondered about that, too.  At least on Linux, the overhead of a
system call seems to be pretty low - e.g. the ridiculous number of
lseek calls we do on a pgbench -S doesn't seem create much overhead
until the inode mutex starts to become contended; and that problem
should be fixed in Linux 3.2.  But I'm not sure if system calls are
similarly cheap on all platforms, or even if it's true on Linux for
fsync() in particular.

There's another possible approach here, too: instead of waiting to set
hint bits until the commit record hits the disk, we could allow the
hint bits to set immediately on the condition that we don't write it
out until the commit record hits the disk.  Bumping the page LSN would
do that, but I think that might be problematic since setting hint bits
isn't WAL-logged.  If so, we could possibly fix that by storing a
second LSN for the page out of line, e.g. in the buffer descriptor.
That might be even faster than speeding up the WAL flush.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-30 Thread YAMAMOTO Takashi
hi,

 On Tue, Nov 29, 2011 at 1:42 AM, YAMAMOTO Takashi
 y...@mwd.biglobe.ne.jp wrote:
 On Mon, Nov 7, 2011 at 5:26 PM, Simon Riggs si...@2ndquadrant.com wrote:

 5. Make the WAL writer more responsive, maybe using latches, so that
 it doesn't take as long for the commit record to make it out to disk.

 I'm working on this already as part of the update for power
 reduction/group commit/replication performance.

 I extracted this from my current patch for you to test.

 is it expected to produce more frequent fsync?
 
 Yes, I would expect that.  What kind of increase are you seeing?  Is
 it causing a problem for you, or are you just making an observation?

i was curious because my application uses async commits mainly to
avoid frequent fsync.  i have no numbers right now.

YAMAMOTO Takashi

 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-30 Thread Robert Haas
On Wed, Nov 30, 2011 at 1:37 AM, YAMAMOTO Takashi
y...@mwd.biglobe.ne.jp wrote:
 Yes, I would expect that.  What kind of increase are you seeing?  Is
 it causing a problem for you, or are you just making an observation?

 i was curious because my application uses async commits mainly to
 avoid frequent fsync.  i have no numbers right now.

Oh, that's interesting.  Why do you want to avoid frequent fsyncs?  I
thought the point of synchronous_commit=off was to move the fsyncs to
the background, but not necessarily to decrease the frequency.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-29 Thread Robert Haas
On Tue, Nov 29, 2011 at 1:42 AM, YAMAMOTO Takashi
y...@mwd.biglobe.ne.jp wrote:
 On Mon, Nov 7, 2011 at 5:26 PM, Simon Riggs si...@2ndquadrant.com wrote:

 5. Make the WAL writer more responsive, maybe using latches, so that
 it doesn't take as long for the commit record to make it out to disk.

 I'm working on this already as part of the update for power
 reduction/group commit/replication performance.

 I extracted this from my current patch for you to test.

 is it expected to produce more frequent fsync?

Yes, I would expect that.  What kind of increase are you seeing?  Is
it causing a problem for you, or are you just making an observation?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-08 Thread Robert Haas
On Tue, Nov 8, 2011 at 1:59 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Please continue to expect that, I just haven't finished it yet...

OK.

So here's the deal: this is an effective, mostly automatic solution to
the performance problem noted in the original post.  For example, at
32 clients, the original test case gives about 7800-8300 tps with
wal_writer_delay=200ms, and about 10100 tps with
wal_writer_delay=20ms.  With wal_writer_delay=200ms but the patch
applied, median of three five minute pgbench runs is 9952 tps; all
three runs are under 1 tps.  So it's not quite as good as
adjusting wal_writer_delay downward, but it gives you roughly 90% of
the benefit automatically, without needing to adjust any settings.
That seems very worthwhile.

At 1 client, 8 clients, and 80 clients, the results were even better.
The patched code with wal_writer_delay=200ms slightly outperformed the
unpatched code with wal_writer_delay=20ms (and outperformed the
unpatched code with wal_writer_delay=200ms even more).  It's possible
that some of that is random variation, but maybe not all of it - e.g.
at 1 client:

unpatched, wal_writer_delay = 200ms: 602, 604, 607 tps
unpatched, wal_writer_delay = 20ms: 614, 616, 616 tps
patched, wal_writer_delay = 200ms: 633, 634, 636 tps

The fact that those numbers aren't bouncing around much suggests that
it might be a real effect.

I have also reviewed the code and it seems OK.

So +1 from me for applying this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Robert Haas
On Mon, Nov 7, 2011 at 9:31 AM, Robert Haas robertmh...@gmail.com wrote:
 So, what could we do about this?  Ideas:

 1. Set the hint bits right away, and avoid letting the page be flushed
 to disk until the commit record is durably on disk (by bumping the
 page LSN?).
 2. Improve CLOG concurrency or performance in some way so that
 consulting it repeatedly doesn't slow us down so much.
 3. Do more backend-private XID status caching - in particular, for
 commits, since this isn't a problem for aborts.
 4. (Crazy idea) Have something that's like a hint bit, but stored in
 the buffer header rather than the data block itself.  We allocate an
 array large enough to hold 2 bits per tuple (for the maximum number of
 tuples that can exist on a page), with one bit indicating that xmin is
 async-committed and the other indicating that xmax is async-committed.

 There are probably other options as well.

5. Make the WAL writer more responsive, maybe using latches, so that
it doesn't take as long for the commit record to make it out to disk.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 SetHintBits() can't set HEAP_XMIN_COMMITTED or HEAP_XMAX_COMMITTED
 hints until the commit record has been durably flushed to disk.  It
 turns out that can cause a major performance regression on systems
 with many CPU cores.

It seems to me that you've jumped to proposing solutions before you know
where the problem actually is --- or at least, if you do know where the
problem is, you didn't explain it.  Is the cost in repeating clog
lookups, or in testing to determine whether it's safe to set the bit
yet, or is it contention associated with one or the other of those?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Merlin Moncure
On Mon, Nov 7, 2011 at 8:31 AM, Robert Haas robertmh...@gmail.com wrote:
 I've long considered synchronous_commit=off to be one of our best
 performance features.  Certainly, it's not applicable in every
 situation, but there are many applications where losing a second or so
 worth of transactions is an acceptable price to pay for not needing to
 wait for the disk to spin around for every commit.  However, recent
 experimentation has convinced me that it's got a serious downside:
 SetHintBits() can't set HEAP_XMIN_COMMITTED or HEAP_XMAX_COMMITTED
 hints until the commit record has been durably flushed to disk.  It
 turns out that can cause a major performance regression on systems
 with many CPU cores.  I fixed this for temporary and unlogged tables
 in commit 53f1ca59b5875f1d3e95ee709ecaddcbdfdbd175, but the same issue
 exists (without any clear fix) for permanent tables.

What's the source of the regression? Is it coming from losing the hint
bit and being forced out to clog?  How likely is it really going to
happen in non synthetic real world cases?

Thinking about the hint bit cache I was playing with a while back, I
guess you could have put the hint bit in the cache but refrained from
marking it in the page in the TransactionIdIsValid(xid)=false case --
in the first implementation I had only put the bit in the cache when
it was valid -- since TransactionIdIsValid(xid) is not necessarily
cheap though, maybe it's worth reserving an extra bit for the
transaction being valid in the cache if you went down that road.

Another way to attack this problem is to re-check and set the hint bit
if you can do it in the bgwriter -- there's a good chance you will
catch it in oltp environments like pgbench although it not clear if
the cost to the general case would be too high.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Merlin Moncure
On Mon, Nov 7, 2011 at 9:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 SetHintBits() can't set HEAP_XMIN_COMMITTED or HEAP_XMAX_COMMITTED
 hints until the commit record has been durably flushed to disk.  It
 turns out that can cause a major performance regression on systems
 with many CPU cores.

 It seems to me that you've jumped to proposing solutions before you know
 where the problem actually is --- or at least, if you do know where the
 problem is, you didn't explain it.  Is the cost in repeating clog
 lookups, or in testing to determine whether it's safe to set the bit
 yet, or is it contention associated with one or the other of those?

In the current code, if you get to the IsValid check and fail to set
the bit, you've essentially done all the work for no reason.   I
tested this pretty well a few months back, and (recalling from
memory), the IsValid check is maybe 25% of the entire cost when you
fail through the hint bit -- this is why I organized the cache to only
store the bit if the xid was known good -- then you get to skip the
check in the known good case and immediately set the bit (w/o marking
dirty) and move on.  As noted above, the cache I was playing with was
a win from performance point of view, but would require another bit to
address Robert's proposed case, and should really be prepared against
alternative solutions (like marking the bit in the bgwriter) before
being seriously considered.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Robert Haas
On Mon, Nov 7, 2011 at 10:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 SetHintBits() can't set HEAP_XMIN_COMMITTED or HEAP_XMAX_COMMITTED
 hints until the commit record has been durably flushed to disk.  It
 turns out that can cause a major performance regression on systems
 with many CPU cores.

 It seems to me that you've jumped to proposing solutions before you know
 where the problem actually is --- or at least, if you do know where the
 problem is, you didn't explain it.  Is the cost in repeating clog
 lookups, or in testing to determine whether it's safe to set the bit
 yet, or is it contention associated with one or the other of those?

Good question.  One possibly informative fact is that, on unlogged
tables, the same change doesn't seem to make any difference.  Here are
the benchmark results with unlogged tables, configuration otherwise
identical to the OP:

[unpatched]
tps = 10624.851704 (including connections establishing)
tps = 10507.024822 (including connections establishing)
tps = 10714.411389 (including connections establishing)
[test whacked out]
tps = 10779.704540 (including connections establishing)
tps = 10523.863100 (including connections establishing)
tps = 10654.102699 (including connections establishing)

The difference might be noise, or it may be a very small real effect,
but it's clearly tiny compared to the change for permanent tables (but
note that this was not true prior to commit
53f1ca59b5875f1d3e95ee709ecaddcbdfdbd175).  This seems to me to be
fairly compelling evidence that the problem is in the clog lookups
themselves, rather than the test that determines whether or not it's
safe to set the bit.  However, I don't know whether the problem is the
cost of the test itself or some kind of associated contention.  I
don't see much difference in CPU utilization between the patched and
unpatched code, but that's not really accurate enough to be certain.

I just reran both tests with LWLOCK_STATS defined.  Again, five minute test run:

lwlock 11: shacq 87323748 exacq 3708555 blk 1932719 [unpatched]
lwlock 11: shacq 11682513 exacq 4769472 blk 677534 [patched]

11 = CLogControlLock, so you can see that the unpatched code is
acquiring CLogControlLock in shared mode more 7x as often and blocks
on the lock about 3x as often, despite processing fewer transactions.
The patched code has more exclusive-acquires, but that's at least
partly just because it's processing more transactions.  Unfortunately,
I don't have oprofile access on this box and can't see exactly where
the time is being spent.  However, I am not sure how much it matters.
With that much of an increase in CLOG traffic, something's gotta give.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Simon Riggs
On Mon, Nov 7, 2011 at 2:45 PM, Robert Haas robertmh...@gmail.com wrote:

 2. Improve CLOG concurrency or performance in some way so that
 consulting it repeatedly doesn't slow us down so much.

We should also ask what makes the clog slow. I think it shows physical
contention as well as logical contention on the lwlock. Since we have
2 bits per transaction that means we will see at least 256
transactions fitting in each cacheline in the clog. Consecutive
transactions are currently stored next to each other in the clog, so
that the current cacheline needs to be passed around between 256
transactions, one at a time. That is a problem if they all finish near
enough the same time.

My proposal is to stripe the clog, so that consecutive xids are not
adjacent in the clog, such that xids are always at least 64 bytes
apart on a 8192 byte clog page. That allows 128 commits with
consecutive xids to complete concurrently, with respect to physical
access to memory.

That's just a one line change in the defines at top of clog.c, so
easy enough to play with.

#define CACHELINE_SZ   64
#define CACHELINES_PER_BLOCK (BLCKSZ / CACHELINE_SZ)
#define CLOG_XACTS_PER_CACHELINE (CLOG_XACTS_PER_BYTE * CACHELINE_SZ)
#define TransactionIdToByte(xid)\
(CACHELINES_PER_BLOCK * \
(TransactionIdToPgIndex(xid) /CLOG_XACTS_PER_CACHELINE)) \
+  (TransactionIdToPgIndex(xid) % CLOG_XACTS_PER_CACHELINE)

plus few extra lines to fix the other defines.


 5. Make the WAL writer more responsive, maybe using latches, so that
 it doesn't take as long for the commit record to make it out to disk.

I'm working on this already as part of the update for power
reduction/group commit/replication performance.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Robert Haas
On Mon, Nov 7, 2011 at 12:26 PM, Simon Riggs si...@2ndquadrant.com wrote:
 5. Make the WAL writer more responsive, maybe using latches, so that
 it doesn't take as long for the commit record to make it out to disk.

 I'm working on this already as part of the update for power
 reduction/group commit/replication performance.

OK.  Here's an interesting result on that front that I so far can't
explain.  I lowered wal_writer_delay to 20 ms and repeated my test:

tps = 10175.265689 (including connections establishing) [unpatched]
tps = 10159.597727 (including connections establishing) [patched]

Now, that's odd.  I expect that to improve performance on the
unpatched code, by reducing the amount of time we have to wait for the
commit record to hit the disk.  I did *not* expect it to improve the
performance of the patched code, since one would think that setting
the hint bit the first time through would be about as good as we could
possibly do.  And yet, those are the numbers.  Apparently, there's
some other effect whereby a more responsive walwriter improves
performance on this setup (beats me what it is, though).

Here it is with wal_writer_delay=50 ms:

tps = 9964.225358 (including connections establishing) [unpatched]
tps = 10048.396729 (including connections establishing) [patched]

And back to wal_writer_delay=200ms:

tps = 8119.121633 (including connections establishing) [unpatched]
tps = 9602.645495 (including connections establishing) [patched]

So it seems like there is quite a bit of win available here, though at
the moment I don't know why.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Merlin Moncure
On Mon, Nov 7, 2011 at 9:25 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Mon, Nov 7, 2011 at 8:31 AM, Robert Haas robertmh...@gmail.com wrote:
 I've long considered synchronous_commit=off to be one of our best
 performance features.  Certainly, it's not applicable in every
 situation, but there are many applications where losing a second or so
 worth of transactions is an acceptable price to pay for not needing to
 wait for the disk to spin around for every commit.  However, recent
 experimentation has convinced me that it's got a serious downside:
 SetHintBits() can't set HEAP_XMIN_COMMITTED or HEAP_XMAX_COMMITTED
 hints until the commit record has been durably flushed to disk.  It
 turns out that can cause a major performance regression on systems
 with many CPU cores.  I fixed this for temporary and unlogged tables
 in commit 53f1ca59b5875f1d3e95ee709ecaddcbdfdbd175, but the same issue
 exists (without any clear fix) for permanent tables.

 What's the source of the regression? Is it coming from losing the hint
 bit and being forced out to clog?  How likely is it really going to
 happen in non synthetic real world cases?

 Thinking about the hint bit cache I was playing with a while back, I
 guess you could have put the hint bit in the cache but refrained from
 marking it in the page in the TransactionIdIsValid(xid)=false case --
 in the first implementation I had only put the bit in the cache when
 it was valid -- since TransactionIdIsValid(xid) is not necessarily
 cheap though, maybe it's worth reserving an extra bit for the
 transaction being valid in the cache if you went down that road.

 Another way to attack this problem is to re-check and set the hint bit
 if you can do it in the bgwriter -- there's a good chance you will
 catch it in oltp environments like pgbench although it not clear if
 the cost to the general case would be too high.

Thinking about this more, the backend local cache approach is probably
going to be useless in terms of addressing this problem -- mostly due
to the fact that the cache is, well, local.  Even if backend A takes
the time to mark the bit in its own cache, backends B-Z haven't yet
and presumably by the time they do the page has been rolled out
anyways so you get no benefit.  The cache helps when a backend sees
the same transaction spread out over a number of tuples/pages --
that's simply not the case in OLTP.

Doing the work in the bgwriter might do the trick assuming the
bgwriter consistently loses the race against both transaction
resolution and the wal, and the extra clog lookup (when you win the
race) penalty doesn't sting too muh...possibly do this in conjuction
with clog striping Simon is thinking about.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Robert Haas
On Mon, Nov 7, 2011 at 1:08 PM, Merlin Moncure mmonc...@gmail.com wrote:
 Thinking about this more, the backend local cache approach is probably
 going to be useless in terms of addressing this problem -- mostly due
 to the fact that the cache is, well, local.  Even if backend A takes
 the time to mark the bit in its own cache, backends B-Z haven't yet
 and presumably by the time they do the page has been rolled out
 anyways so you get no benefit.  The cache helps when a backend sees
 the same transaction spread out over a number of tuples/pages --
 that's simply not the case in OLTP.

Ah, right.  Good point.

 Doing the work in the bgwriter might do the trick assuming the
 bgwriter consistently loses the race against both transaction
 resolution and the wal, and the extra clog lookup (when you win the
 race) penalty doesn't sting too muh...

But I can't see how this can work.  The background writer is only
designed to do one thing: ensuring a supply of clean buffers for
backends that need to allocate new ones.   I'm not sure the background
writer is going to do anything at all on this test, since the data set
fits inside shared_buffers and therefore there's no buffer eviction
happening.  But even if it does, it's certainly not going to scan all
1 million shared buffers anywhere near quick enough to matter; it's
going to be limited to at most 100 buffers every 200 ms, which means
that even if it ran at top speed for the entire test, it would only
get through about 15% of the buffer pool even *once* before the test
ended.  That's not even slightly close to what would be needed to move
the needle here; you would need to visit any given buffer within a few
hundred milliseconds of the relevant transaction commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Merlin Moncure
On Mon, Nov 7, 2011 at 12:19 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Nov 7, 2011 at 1:08 PM, Merlin Moncure mmonc...@gmail.com wrote:
 Thinking about this more, the backend local cache approach is probably
 going to be useless in terms of addressing this problem -- mostly due
 to the fact that the cache is, well, local.  Even if backend A takes
 the time to mark the bit in its own cache, backends B-Z haven't yet
 and presumably by the time they do the page has been rolled out
 anyways so you get no benefit.  The cache helps when a backend sees
 the same transaction spread out over a number of tuples/pages --
 that's simply not the case in OLTP.

 Ah, right.  Good point.

 Doing the work in the bgwriter might do the trick assuming the
 bgwriter consistently loses the race against both transaction
 resolution and the wal, and the extra clog lookup (when you win the
 race) penalty doesn't sting too muh...

 But I can't see how this can work.  The background writer is only
 designed to do one thing: ensuring a supply of clean buffers for
 backends that need to allocate new ones.   I'm not sure the background
 writer is going to do anything at all on this test, since the data set
 fits inside shared_buffers and therefore there's no buffer eviction
 happening.  But even if it does, it's certainly not going to scan all
 1 million shared buffers anywhere near quick enough to matter; it's
 going to be limited to at most 100 buffers every 200 ms, which means
 that even if it ran at top speed for the entire test, it would only
 get through about 15% of the buffer pool even *once* before the test
 ended.  That's not even slightly close to what would be needed to move
 the needle here; you would need to visit any given buffer within a few
 hundred milliseconds of the relevant transaction commit.

Well, I'd argue that in most real world, high write intensity
databases there is constant pressure on pages to be flushed out to
make room for new ones being written to and the database size is much,
much larger than shared buffers -- pgbench is 100% update and pretty
novel in that respect.  I guess I said 'bgwriter' when I really meant
'generally upon eviction, either by bgwriter or an evicting backend'.
But even given that, probably the lag is too long to be of useful
benefit to your problem.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Simon Riggs
On Mon, Nov 7, 2011 at 5:26 PM, Simon Riggs si...@2ndquadrant.com wrote:

 5. Make the WAL writer more responsive, maybe using latches, so that
 it doesn't take as long for the commit record to make it out to disk.

 I'm working on this already as part of the update for power
 reduction/group commit/replication performance.

I extracted this from my current patch for you to test.

Rather useful actually 'cos its allowed me a sensible phasing of the
development.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


walwriter_latch.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Robert Haas
On Mon, Nov 7, 2011 at 6:33 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Nov 7, 2011 at 5:26 PM, Simon Riggs si...@2ndquadrant.com wrote:

 5. Make the WAL writer more responsive, maybe using latches, so that
 it doesn't take as long for the commit record to make it out to disk.

 I'm working on this already as part of the update for power
 reduction/group commit/replication performance.

 I extracted this from my current patch for you to test.

Thank you!

 Rather useful actually 'cos its allowed me a sensible phasing of the
 development.

+1.

reads patch

Hmm, this is different than what I was expecting, although that's not
necessarily bad.  What this does is retain wal_writer_delay, but allow
the WAL writer to be woken up more frequently if there's enough WAL to
justify it. What I was expecting you to do is eliminate
wal_writer_delay altogether and drive the wakeups entirely off of the
latch.  I think you could get away with that, because SetLatch is
ridiculously cheap if the latch is already set.

Anyway, I'll give this a spin as you have it and see what falls out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Robert Haas
On Nov 7, 2011, at 9:35 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Nov 7, 2011 at 6:33 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, Nov 7, 2011 at 5:26 PM, Simon Riggs si...@2ndquadrant.com wrote:
 
 5. Make the WAL writer more responsive, maybe using latches, so that
 it doesn't take as long for the commit record to make it out to disk.
 
 I'm working on this already as part of the update for power
 reduction/group commit/replication performance.
 
 I extracted this from my current patch for you to test.
 
 Thank you!
 
 Rather useful actually 'cos its allowed me a sensible phasing of the
 development.
 
 +1.
 
 reads patch
 
 Hmm, this is different than what I was expecting, although that's not
 necessarily bad.  What this does is retain wal_writer_delay, but allow
 the WAL writer to be woken up more frequently if there's enough WAL to
 justify it. What I was expecting you to do is eliminate
 wal_writer_delay altogether and drive the wakeups entirely off of the
 latch.

Oh, I think I see why you didn't do that...

Anyway, I'll try to post test results in the morning.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronous commit vs. hint bits

2011-11-07 Thread Simon Riggs
On Tue, Nov 8, 2011 at 2:35 AM, Robert Haas robertmh...@gmail.com wrote:

 What I was expecting you to do is eliminate
 wal_writer_delay altogether and drive the wakeups entirely off of the
 latch.

Please continue to expect that, I just haven't finished it yet...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers