AW: [HACKERS] CommitDelay performance improvement

2001-02-27 Thread Zeugswetter Andreas SB


  I agree that 30k looks like the magic delay, and probably 30/5 would be a
  good conservative choice. But now I think about the choice of number, I
  think it must vary with the speed of the machine and length of the
  transactions; at 20tps, each TX is completing in around 50ms.

I think disk speed should probably be the main factor.
After the first run 30k/5 also seemed the best here, but running the test
again shows, that the results are only reproducible after a new initdb.
Anybody else see reproducible results without previous initdb ?

One thing I noticed is, that WAL_FILES needs to be at least 4, because
one run fills up to 3 logfiles, and we don't want to measure WAL formating.

Andreas



Re: AW: [HACKERS] CommitDelay performance improvement

2001-02-27 Thread Philip Warner

At 10:56 27/02/01 +0100, Zeugswetter Andreas SB wrote:

  I agree that 30k looks like the magic delay, and probably 30/5 would be a
  good conservative choice. But now I think about the choice of number, I
  think it must vary with the speed of the machine and length of the
  transactions; at 20tps, each TX is completing in around 50ms.

I think disk speed should probably be the main factor.
After the first run 30k/5 also seemed the best here, but running the test
again shows, that the results are only reproducible after a new initdb.
Anybody else see reproducible results without previous initdb ?


I think we want something that reflects the chance of a time-saving as a
result of a wait, which is why I suggested having each backend monitor
commits/sec, then basing the delay on some % of that number. eg. if
commits/sec = 1, then it's either low-load, or long tx's, in either case
CommitDelay won't help. Similarly, if we have 1000 commits/sec, then we
have a very fast system and/or disk, and CommitDelay of 10ms is clearly
glacial. 

AFAICS, dynamically monitoring commits/sec (or a similar statistic) is
TOWTG, but in all cases we need to set a max on CommitDelay to prevent
individual TXs getting too long (although I am unsure if the latter is
*really* necessary, it is far better to be safe).

Note: commits/sec need to be kept for each backend so we can remove the
contribution of the backend that is considering waiting.



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



AW: [HACKERS] CommitDelay performance improvement

2001-02-26 Thread Zeugswetter Andreas SB


One thing that I remember from a performance test we once did is, that the results
are a lot more realistic, better and more stable, if you try to decouple the startup 
of 
the different clients a little bit, so they are not all in the same section of code at 
the same time.

We inserted random usleeps, I forgot what range, but 10 ms seem reasonable to me.

This was another database, but it might also apply here.

Andreas



Re: [HACKERS] CommitDelay performance improvement

2001-02-25 Thread Nathan Myers

On Sun, Feb 25, 2001 at 12:41:28AM -0500, Tom Lane wrote:
 Attached are graphs from more thorough runs of pgbench with a commit
 delay that occurs only when at least N other backends are running active
 transactions. ...
 It's not entirely clear what set of parameters is best, but it is
 absolutely clear that a flat zero-commit-delay policy is NOT best.
 
 The test conditions are postmaster options -N 100 -B 1024, pgbench scale
 factor 10, pgbench -t (transactions per client) 100.  (Hence the results
 for a single client rely on only 100 transactions, and are pretty noisy.
 The noise level should decrease as the number of clients increases.)

It's hard to interpret these results.  In particular, "delay 10k, sibs 20"
(10k,20), or cyan-triangle, is almost the same as "delay 50k, sibs 1" 
(50k,1), or green X.  Those are pretty different parameters to get such
similar results.

The only really bad performers were (0), (10k,1), (100k,20).  The best
were (30k,1) and (30k,10), although (30k,5) also did well except at 40.
Why would 30k be a magic delay, regardless of siblings?  What happened
at 40?

At low loads, it seems (100k,1) (brown +) did best by far, which seems
very odd.  Even more odd, it did pretty well at very high loads but had 
problems at intermediate loads.  

Nathan Myers
[EMAIL PROTECTED]



Re: [HACKERS] CommitDelay performance improvement

2001-02-25 Thread Philip Warner

At 00:42 25/02/01 -0800, Nathan Myers wrote:

The only really bad performers were (0), (10k,1), (100k,20).  The best
were (30k,1) and (30k,10), although (30k,5) also did well except at 40.
Why would 30k be a magic delay, regardless of siblings?  What happened
at 40?


I had assumed that 40 was one of the glitches - it would be good if Tom (or
someone else) could rerun the suite, to see if we see the same dip.

I agree that 30k looks like the magic delay, and probably 30/5 would be a
good conservative choice. But now I think about the choice of number, I
think it must vary with the speed of the machine and length of the
transactions; at 20tps, each TX is completing in around 50ms. Probably the
delay needs to be set at a value related to the average TX duration, and
since that is not really a known figure, perhaps we should go with 30% of
TX duration, with a max of 100k. 

Alternatively, can PG monitor the commits/second, then set the delay to
reflect half of the average TX time (or 100ms, whichever is smaller)? Is
this too baroque?

 

Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



RE: [HACKERS] CommitDelay performance improvement

2001-02-25 Thread Hiroshi Inoue

 -Original Message-
 From: Tom Lane
 
 Attached are graphs from more thorough runs of pgbench with a commit
 delay that occurs only when at least N other backends are running active
 transactions.
 
 My initial try at this proved to be too noisy to tell much.  The noise
 seems to be coming from WAL checkpoints that occur during a run and
 push down the reported TPS value for the particular case that's running.
 While we'd need to include WAL checkpoints to make an honest performance
 comparison against another RDBMS, I think they are best ignored for the
 purpose of figuring out what the commit-delay behavior ought to be.
 Accordingly, I modified my test script to minimize the occurrence of
 checkpoint activity during runs (see attached script).  There are still
 some data points that are unexpectedly low compared to their neighbors;
 presumably these were affected by checkpoints or other system activity.
 
 It's not entirely clear what set of parameters is best, but it is
 absolutely clear that a flat zero-commit-delay policy is NOT best.
 
 The test conditions are postmaster options -N 100 -B 1024, pgbench scale
 factor 10, pgbench -t (transactions per client) 100.  (Hence the results
 for a single client rely on only 100 transactions, and are pretty noisy.
 The noise level should decrease as the number of clients increases.)
 
 Comments anyone?


How about the case with scaling factor 1 ?  i.e Could your
proposal detect lock conflicts in reality ? If so, I agree with
your proposal.

BTW there seems to be a misunderstanding about CommitDelay,
i.e

CommitDelay is completely a waste of time unless there's
an overlap of commit.

If other backends use the delay(cpu cycle)  the delay is never
a waste of time totally.

Regards,
Hiroshi Inoue



Re: [HACKERS] CommitDelay performance improvement

2001-02-25 Thread Tom Lane

"Hiroshi Inoue" [EMAIL PROTECTED] writes:
 How about the case with scaling factor 1 ?  i.e Could your
 proposal detect lock conflicts in reality ?

The code is set up to not count backends that are waiting on locks.
That is, to do a commit delay there must be at least N other backends
that are in transactions, have written at least one XLOG entry in
their transaction (so it's not a read-only xact and will need to
write a commit record), and are not waiting on a lock.

Is that what you meant?

 BTW there seems to be a misunderstanding about CommitDelay,
 i.e
 CommitDelay is completely a waste of time unless there's
 an overlap of commit.
 If other backends use the delay(cpu cycle)  the delay is never
 a waste of time totally.

Good point.  In fact, if we measure only the total throughput in
transactions per second then the commit delay will not appear to be
hurting performance no matter how long it is, so long as other backends
are in the RUN state for the whole delay.  This suggests that pgbench
should also measure the average transaction time seen by any one client.
Is that a simple change?

regards, tom lane



Re: [HACKERS] CommitDelay performance improvement

2001-02-25 Thread Tom Lane

Philip Warner [EMAIL PROTECTED] writes:
 At 00:42 25/02/01 -0800, Nathan Myers wrote:
 The only really bad performers were (0), (10k,1), (100k,20).  The best
 were (30k,1) and (30k,10), although (30k,5) also did well except at 40.
 Why would 30k be a magic delay, regardless of siblings?  What happened
 at 40?

 I had assumed that 40 was one of the glitches - it would be good if Tom (or
 someone else) could rerun the suite, to see if we see the same dip.

Yes, I assumed the same.  I posted the script; could someone else make
the same run?  We really need more than one test case ;-)

 I agree that 30k looks like the magic delay, and probably 30/5 would be a
 good conservative choice. But now I think about the choice of number, I
 think it must vary with the speed of the machine and length of the
 transactions; at 20tps, each TX is completing in around 50ms.

Yes, I think so too.  This machine is able to do about 40 pgbench tr/sec
single-client with fsync off, so the computational load is right about
25msec per transaction.  That's presumably why 30msec looks like a good
delay number.  What interested me was that there doesn't seem to be a
very sharp peak; anything from 10 to 100 msec yields fairly comparable
results.  This is a good thing ... if there *were* a sharp peak at the
average xact length, tuning the delay parameter would be an impossible
task in real-world cases where the transactions aren't all alike.

On the data so far, I'm inclined to go with 10k/5 as the default, so as
not to risk wasting time with overly long delays on machines that are
faster than this one.  But we really need some data from other machines
before deciding.  It'd be nice to see some results with 10k delays too,
from a machine where the kernel supports better-than-10msec delay
resolution.  Where's the Alpha contingent??

regards, tom lane



Re: [HACKERS] CommitDelay performance improvement

2001-02-25 Thread Tom Lane

[EMAIL PROTECTED] (Nathan Myers) writes:
 At low loads, it seems (100k,1) (brown +) did best by far, which seems
 very odd.  Even more odd, it did pretty well at very high loads but had 
 problems at intermediate loads.  

In theory, all these variants should behave exactly the same for a
single client, since there will be no commit delay in any of 'em in
that case.  I'm inclined to write off the aberrant result for 100k/1
as due to outside factors --- maybe the WAL file happened to be located
in a particularly convenient place on the disk during that run, or
some such.  Since there's only 100 transactions in that test, it wouldn't
take much to affect the result.

Likewise, the places where one mid-load datapoint is well below either
neighbor are probably due to outside factors --- either a background
WAL checkpoint or other activity on the machine, mail arrival for
instance.  I left the machine alone during the test, but I didn't bother
to shut down the usual system services.

My feeling is that this test run tells us that zero commit delay is
inferior to nonzero under these test conditions, but there's too much
noise to pick out one of the nonzero-delay parameter combinations as
being clearly better than the rest.  (BTW, I did repeat the zero-delay
series just to be sure it wasn't itself an outlier...)

regards, tom lane



Re: [HACKERS] CommitDelay performance improvement

2001-02-25 Thread Hiroshi Inoue
Tom Lane wrote:
 
 Philip Warner [EMAIL PROTECTED] writes:
  At 00:42 25/02/01 -0800, Nathan Myers wrote:
  The only really bad performers were (0), (10k,1), (100k,20).  The best
  were (30k,1) and (30k,10), although (30k,5) also did well except at 40.
  Why would 30k be a magic delay, regardless of siblings?  What happened
  at 40?
 
  I had assumed that 40 was one of the glitches - it would be good if Tom (or
  someone else) could rerun the suite, to see if we see the same dip.
 
 Yes, I assumed the same.  I posted the script; could someone else make
 the same run?  We really need more than one test case ;-)
 

I could find the sciript but seem to have missed your change
about commit_siblings. Where could I get it ?

Regards,
Hiroshi Inoue


Re: [HACKERS] CommitDelay performance improvement

2001-02-25 Thread Dominique Quatravaux

 Basically, I am not sure how much we lose by doing the delay after
 returning COMMIT, and I know we gain quite a bit by enabling us to group
 fsync calls.
 
 If included, this should be an option only, and not the default option. 

  Sure it should never become the default, because the "D" in ACID is just
about forbidding this kind of behaviour...

-- 
  Dominique



Re: [HACKERS] CommitDelay performance improvement

2001-02-24 Thread Nathan Myers

On Sat, Feb 24, 2001 at 01:07:17AM -0500, Tom Lane wrote:
 [EMAIL PROTECTED] (Nathan Myers) writes:
  I see, I had it backwards: N=0 corresponds to "always delay", and 
  N=infinity (~0) is "never delay", or what you call zero delay.  N=1 is 
  not interesting.  N=M/2 or N=sqrt(M) or N=log(M) might be interesting, 
  where M is the number of backends, or the number of backends with begun 
  transactions, or something.  N=10 would be conservative (and maybe 
  pointless) just because it would hardly ever trigger a delay.
 
 Why is N=1 not interesting?  That requires at least one other backend
 to be in a transaction before you'll delay.  That would seem to be
 the minimum useful value --- N=0 (always delay) seems clearly to be
 too stupid to be useful.

N=1 seems arbitrarily aggressive.  It assumes any open transaction will 
commit within a few milliseconds; otherwise the delay is wasted.  On a 
fairly busy system, it seems to me to impose a strict upper limit on 
transaction rate for any client, regardless of actual system I/O load.  
(N=0 would impose that strict upper limit even for a single client.)

Delaying isn't free, because it means that the client can't turn around 
and do even a cheap query for a while.  In a sense, when you delay you are 
charging the committer a tax to try to improve overall throughput.  If the 
delay lets you reduce I/O churn enough to increase the total bandwidth, 
then it was worthwhile; if not, you just cut system performance, and 
responsiveness to each client, for nothing.

The above suggests that maybe N should depend on recent disk I/O activity,
so you get a larger N (and thus less likely delay and more certain payoff) 
for a more lightly-loaded system.  On a system that has maxed its I/O 
bandwidth, clients will suffer delays anyhow, so they might as well 
suffer controlled delays that result in better total throughput.  On a 
lightly-loaded system there's no need, or payoff, for such throttling.

Can we measure disk system load by averaging the times taken for fsyncs?

Nathan Myers
[EMAIL PROTECTED]



Re: [HACKERS] CommitDelay performance improvement

2001-02-24 Thread Tom Lane

Attached are graphs from more thorough runs of pgbench with a commit
delay that occurs only when at least N other backends are running active
transactions.

My initial try at this proved to be too noisy to tell much.  The noise
seems to be coming from WAL checkpoints that occur during a run and
push down the reported TPS value for the particular case that's running.
While we'd need to include WAL checkpoints to make an honest performance
comparison against another RDBMS, I think they are best ignored for the
purpose of figuring out what the commit-delay behavior ought to be.
Accordingly, I modified my test script to minimize the occurrence of
checkpoint activity during runs (see attached script).  There are still
some data points that are unexpectedly low compared to their neighbors;
presumably these were affected by checkpoints or other system activity.

It's not entirely clear what set of parameters is best, but it is
absolutely clear that a flat zero-commit-delay policy is NOT best.

The test conditions are postmaster options -N 100 -B 1024, pgbench scale
factor 10, pgbench -t (transactions per client) 100.  (Hence the results
for a single client rely on only 100 transactions, and are pretty noisy.
The noise level should decrease as the number of clients increases.)

Comments anyone?

regards, tom lane


 hppabench.gif

#! /bin/sh

# Expected postmaster options: -N 100 -B 1024 -c checkpoint_timeout=1800
# Recommended pgbench setup: pgbench -i -s 10 bench

for del in 0 ; do
for sib in 1 ; do
for cli in 1 10 20 30 40 50 ; do
echo "commit_delay = $del"
echo "commit_siblings = $sib"
psql -c "vacuum branches; vacuum tellers; delete from history; vacuum history; 
checkpoint;" bench
PGOPTIONS="-c commit_delay=$del -c commit_siblings=$sib" \
pgbench -c $cli -t 100 -n bench
done
done
done

for del in 1 3 5 10 ; do
for sib in 1 5 10 20 ; do
for cli in 1 10 20 30 40 50 ; do
echo "commit_delay = $del"
echo "commit_siblings = $sib"
psql -c "vacuum branches; vacuum tellers; delete from history; vacuum history; 
checkpoint;" bench
PGOPTIONS="-c commit_delay=$del -c commit_siblings=$sib" \
pgbench -c $cli -t 100 -n bench
done
done
done



Re: [HACKERS] CommitDelay performance improvement

2001-02-24 Thread Philip Warner

At 00:41 25/02/01 -0500, Tom Lane wrote:

Comments anyone?


Don't suppose you could post the original data?



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: [HACKERS] CommitDelay performance improvement

2001-02-24 Thread Tom Lane

Philip Warner [EMAIL PROTECTED] writes:
 Don't suppose you could post the original data?

Sure.

regards, tom lane

commit_delay = 0
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 10.996953(including connections establishing)
tps = 11.051216(excluding connections establishing)
commit_delay = 0
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 17.779923(including connections establishing)
tps = 17.924390(excluding connections establishing)
commit_delay = 0
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 17.289815(including connections establishing)
tps = 17.429343(excluding connections establishing)
commit_delay = 0
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 17.292171(including connections establishing)
tps = 17.432905(excluding connections establishing)
commit_delay = 0
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 17.733478(including connections establishing)
tps = 17.913251(excluding connections establishing)
commit_delay = 0
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 18.325273(including connections establishing)
tps = 18.534556(excluding connections establishing)
commit_delay = 1
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 10.449347(including connections establishing)
tps = 10.500278(excluding connections establishing)
commit_delay = 1
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 17.865721(including connections establishing)
tps = 18.015078(excluding connections establishing)
commit_delay = 1
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 17.980234(including connections establishing)
tps = 18.131986(excluding connections establishing)
commit_delay = 1
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 30
number of transactions per client: 100
number of transactions actually processed: 3000/3000
tps = 18.858489(including connections establishing)
tps = 19.027436(excluding connections establishing)
commit_delay = 1
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 40
number of transactions per client: 100
number of transactions actually processed: 4000/4000
tps = 19.320221(including connections establishing)
tps = 19.496999(excluding connections establishing)
commit_delay = 1
commit_siblings = 1
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 50
number of transactions per client: 100
number of transactions actually processed: 5000/5000
tps = 19.440978(including connections establishing)
tps = 19.621221(excluding connections establishing)
commit_delay = 1
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
tps = 11.298701(including connections establishing)
tps = 11.357102(excluding connections establishing)
commit_delay = 1
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 19.722266(including connections establishing)
tps = 19.903373(excluding connections establishing)
commit_delay = 1
commit_siblings = 5
CHECKPOINT
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 19.042737(including connections establishing)
tps = 19.214042(excluding connections establishing)
commit_delay = 1
commit_siblings = 5

[HACKERS] CommitDelay performance improvement

2001-02-23 Thread Tom Lane

Looking at the XLOG stuff, I notice that we already have a field
(logRec) in the per-backend PROC structures that shows whether a
transaction is currently in progress with at least one change made
(ie at least one XLOG entry written).

It would be very easy to extend the existing code so that the commit
delay is not done unless there is at least one other backend with
nonzero logRec --- or, more generally, at least N other backends with
nonzero logRec.  We cannot tell if any of them are actually nearing
their commits, but this seems better than just blindly waiting.  Larger
values of N would presumably improve the odds that at least one of them
is nearing its commit.

A further refinement, still quite cheap to implement since the info is
in the PROC struct, would be to not count backends that are blocked
waiting for locks.  These guys are less likely to be ready to commit
in the next few milliseconds than the guys who are actively running;
indeed they cannot commit until someone else has committed/aborted to
release the lock they need.

Comments?  What should the threshold N be ... or do we need to make
that a tunable parameter?

regards, tom lane



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

 Bruce Momjian [EMAIL PROTECTED] writes:
  Why not just set a flag in there when someone nears commit and clear
  when they are about to commit?
 
 Define "nearing commit", in such a way that you can specify where you
 plan to set that flag.

Is there significant time between entry of CommitTransaction() and the
fsync()?  Maybe not.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

 Looking at the XLOG stuff, I notice that we already have a field
 (logRec) in the per-backend PROC structures that shows whether a
 transaction is currently in progress with at least one change made
 (ie at least one XLOG entry written).
 
 It would be very easy to extend the existing code so that the commit
 delay is not done unless there is at least one other backend with
 nonzero logRec --- or, more generally, at least N other backends with
 nonzero logRec.  We cannot tell if any of them are actually nearing
 their commits, but this seems better than just blindly waiting.  Larger
 values of N would presumably improve the odds that at least one of them
 is nearing its commit.

Why not just set a flag in there when someone nears commit and clear
when they are about to commit?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Why not just set a flag in there when someone nears commit and clear
 when they are about to commit?

Define "nearing commit", in such a way that you can specify where you
plan to set that flag.

regards, tom lane



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

 Bruce Momjian [EMAIL PROTECTED] writes:
  Is there significant time between entry of CommitTransaction() and the
  fsync()?  Maybe not.
 
 I doubt it.  No I/O anymore, anyway, unless the commit record happens to
 overrun an xlog block boundary.

That's what I was afraid of.  Since we don't write the dirty blocks to
the kernel anymore, we don't really have much happening before someone
says they are about to commit.  In the old days, we were write()'ing
those buffers, and we had some delay and kernel calls in there.

Guess that idea is dead.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Nathan Myers

On Fri, Feb 23, 2001 at 11:32:21AM -0500, Tom Lane wrote:
 A further refinement, still quite cheap to implement since the info is
 in the PROC struct, would be to not count backends that are blocked
 waiting for locks.  These guys are less likely to be ready to commit
 in the next few milliseconds than the guys who are actively running;
 indeed they cannot commit until someone else has committed/aborted to
 release the lock they need.
 
 Comments?  What should the threshold N be ... or do we need to make
 that a tunable parameter?

Once you make it tuneable, you're stuck with it.  You can always add
a knob later, after somebody discovers a real need.

Nathan Myers
[EMAIL PROTECTED]



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

 On Fri, Feb 23, 2001 at 11:32:21AM -0500, Tom Lane wrote:
  A further refinement, still quite cheap to implement since the info is
  in the PROC struct, would be to not count backends that are blocked
  waiting for locks.  These guys are less likely to be ready to commit
  in the next few milliseconds than the guys who are actively running;
  indeed they cannot commit until someone else has committed/aborted to
  release the lock they need.
  
  Comments?  What should the threshold N be ... or do we need to make
  that a tunable parameter?
 
 Once you make it tuneable, you're stuck with it.  You can always add
 a knob later, after somebody discovers a real need.

I wonder if Tom should implement it, but leave it at zero until people
can report that a non-zero helps.  We already have the parameter, we can
just make it smarter and let people test it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Tom Lane

[EMAIL PROTECTED] (Nathan Myers) writes:
 Comments?  What should the threshold N be ... or do we need to make
 that a tunable parameter?

 Once you make it tuneable, you're stuck with it.  You can always add
 a knob later, after somebody discovers a real need.

If we had a good idea what the default level should be, I'd be willing
to go without a knob.  I'm thinking of a default of about 5 (ie, at
least 5 other active backends to trigger a commit delay) ... but I'm not
so confident of that that I think it needn't be tunable.  It's really
dependent on your average and peak transaction lengths, and that's
going to vary across installations, so unless we want to try to make it
self-adjusting, a knob seems like a good idea.

A self-adjusting delay might well be a great idea, BTW, but I'm trying
to be conservative about how much complexity we should add right now.

regards, tom lane



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

 [EMAIL PROTECTED] (Nathan Myers) writes:
  Comments?  What should the threshold N be ... or do we need to make
  that a tunable parameter?
 
  Once you make it tuneable, you're stuck with it.  You can always add
  a knob later, after somebody discovers a real need.
 
 If we had a good idea what the default level should be, I'd be willing
 to go without a knob.  I'm thinking of a default of about 5 (ie, at
 least 5 other active backends to trigger a commit delay) ... but I'm not
 so confident of that that I think it needn't be tunable.  It's really
 dependent on your average and peak transaction lengths, and that's
 going to vary across installations, so unless we want to try to make it
 self-adjusting, a knob seems like a good idea.
 
 A self-adjusting delay might well be a great idea, BTW, but I'm trying
 to be conservative about how much complexity we should add right now.

OH, so you are saying N backends should have dirtied buffers before
doing the delay?  Hmm, that seems almost untunable to me.

Let's suppose we decide to sleep.  When we wake up, can we know that
someone else has fsync'ed for us?  And if they have, should we be more
likely to fsync() in the future?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

  And if they have, should we be more
  likely to fsync() in the future?

I meant more likely to sleep().

 You mean less likely.  My thought for a self-adjusting delay was to
 ratchet the delay up a little every time it succeeds in avoiding an
 fsync, and down a little every time it fails to do so.  No change when
 we don't delay at all (because of no other active backends).  But
 testing this and making sure it behaves reasonably seems like more work
 than we should try to accomplish before 7.1.

It could be tough.  Imagine the delay increasing to 3 seconds?  Seems
there has to be an upper bound on the sleep.  The more you delay, the
more likely you will be to find someone to fsync you.  Are we waking
processes up after we have fsync()'ed them?  If so, we can keep
increasing the delay.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 A self-adjusting delay might well be a great idea, BTW, but I'm trying
 to be conservative about how much complexity we should add right now.

 OH, so you are saying N backends should have dirtied buffers before
 doing the delay?  Hmm, that seems almost untunable to me.

 Let's suppose we decide to sleep.  When we wake up, can we know that
 someone else has fsync'ed for us?

XLogFlush will find that it has nothing to do, so yes we can.

 And if they have, should we be more
 likely to fsync() in the future?

You mean less likely.  My thought for a self-adjusting delay was to
ratchet the delay up a little every time it succeeds in avoiding an
fsync, and down a little every time it fails to do so.  No change when
we don't delay at all (because of no other active backends).  But
testing this and making sure it behaves reasonably seems like more work
than we should try to accomplish before 7.1.

regards, tom lane



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 It could be tough.  Imagine the delay increasing to 3 seconds?  Seems
 there has to be an upper bound on the sleep.  The more you delay, the
 more likely you will be to find someone to fsync you.

Good point, and an excellent illustration of the fact that
self-adjusting algorithms aren't that easy to get right the first
time ;-)

 Are we waking processes up after we have fsync()'ed them?

Not at the moment.  That would be another good mechanism to investigate
for 7.2; but right now there's no infrastructure that would allow a
backend to discover which other ones were sleeping for fsync.

regards, tom lane



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Nathan Myers

On Fri, Feb 23, 2001 at 05:18:19PM -0500, Tom Lane wrote:
 [EMAIL PROTECTED] (Nathan Myers) writes:
  Comments?  What should the threshold N be ... or do we need to make
  that a tunable parameter?
 
  Once you make it tuneable, you're stuck with it.  You can always add
  a knob later, after somebody discovers a real need.
 
 If we had a good idea what the default level should be, I'd be willing
 to go without a knob.  I'm thinking of a default of about 5 (ie, at
 least 5 other active backends to trigger a commit delay) ... but I'm not
 so confident of that that I think it needn't be tunable.  It's really
 dependent on your average and peak transaction lengths, and that's
 going to vary across installations, so unless we want to try to make it
 self-adjusting, a knob seems like a good idea.
 
 A self-adjusting delay might well be a great idea, BTW, but I'm trying
 to be conservative about how much complexity we should add right now.

When thinking about tuning N, I like to consider what are the interesting 
possible values for N:

  0: Ignore any other potential committers.
  1: The minimum possible responsiveness to other committers.
  5: Tom's guess for what might be a good choice.
  10: Harry's guess.
  ~0: Always delay.

I would rather release with N=1 than with 0, because it actually responds 
to conditions.  What N might best be, 1, probably varies on a lot of 
hard-to-guess parameters.

It seems to me that comparing various choices (and other, more interesting,
algorithms) to the N=1 case would be more productive than comparing them 
to the N=0 case, so releasing at N=1 would yield better statistics for 
actually tuning in 7.2.

Nathan Myers
[EMAIL PROTECTED]



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

 When thinking about tuning N, I like to consider what are the interesting 
 possible values for N:
 
   0: Ignore any other potential committers.
   1: The minimum possible responsiveness to other committers.
   5: Tom's guess for what might be a good choice.
   10: Harry's guess.
   ~0: Always delay.
 
 I would rather release with N=1 than with 0, because it actually responds 
 to conditions.  What N might best be, 1, probably varies on a lot of 
 hard-to-guess parameters.
 
 It seems to me that comparing various choices (and other, more interesting,
 algorithms) to the N=1 case would be more productive than comparing them 
 to the N=0 case, so releasing at N=1 would yield better statistics for 
 actually tuning in 7.2.

We don't release code becuase it has better tuning oportunities for
later releases.  What we can do is give people parameters where the
default is safe, and they can play and report to us.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

 Bruce Momjian [EMAIL PROTECTED] writes:
  It could be tough.  Imagine the delay increasing to 3 seconds?  Seems
  there has to be an upper bound on the sleep.  The more you delay, the
  more likely you will be to find someone to fsync you.
 
 Good point, and an excellent illustration of the fact that
 self-adjusting algorithms aren't that easy to get right the first
 time ;-)

I see.  I am concerned that anything done to 7.1 at this point may cause
problems with performance under certain circumstances.  Let's see what
the new code shows our testers.

 
  Are we waking processes up after we have fsync()'ed them?
 
 Not at the moment.  That would be another good mechanism to investigate
 for 7.2; but right now there's no infrastructure that would allow a
 backend to discover which other ones were sleeping for fsync.

Can we put the backends to sleep waiting for a lock, and have them wake
up later?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Can we put the backends to sleep waiting for a lock, and have them wake
 up later?

Locks don't have timeouts.  There is no existing mechanism that will
serve this purpose; we'll have to create a new one.

regards, tom lane



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Nathan Myers

On Fri, Feb 23, 2001 at 06:37:06PM -0500, Bruce Momjian wrote:
  When thinking about tuning N, I like to consider what are the interesting 
  possible values for N:
  
0: Ignore any other potential committers.
1: The minimum possible responsiveness to other committers.
5: Tom's guess for what might be a good choice.
10: Harry's guess.
~0: Always delay.
  
  I would rather release with N=1 than with 0, because it actually
  responds to conditions. What N might best be, 1, probably varies on
  a lot of hard-to-guess parameters.
 
  It seems to me that comparing various choices (and other, more
  interesting, algorithms) to the N=1 case would be more productive
  than comparing them to the N=0 case, so releasing at N=1 would yield
  better statistics for actually tuning in 7.2.

 We don't release code because it has better tuning opportunities for
 later releases. What we can do is give people parameters where the
 default is safe, and they can play and report to us.

Perhaps I misunderstood.  I had perceived N=1 as a conservative choice
that was nevertheless preferable to N=0.

Nathan Myers
[EMAIL PROTECTED]



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

   It seems to me that comparing various choices (and other, more
   interesting, algorithms) to the N=1 case would be more productive
   than comparing them to the N=0 case, so releasing at N=1 would yield
   better statistics for actually tuning in 7.2.
 
  We don't release code because it has better tuning opportunities for
  later releases. What we can do is give people parameters where the
  default is safe, and they can play and report to us.
 
 Perhaps I misunderstood.  I had perceived N=1 as a conservative choice
 that was nevertheless preferable to N=0.

I think zero delay is the conservative choice at this point, unless we
hear otherwise from testers.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

 Bruce Momjian [EMAIL PROTECTED] writes:
  Can we put the backends to sleep waiting for a lock, and have them wake
  up later?
 
 Locks don't have timeouts.  There is no existing mechanism that will
 serve this purpose; we'll have to create a new one.

That is what I suspected.

Having thought about it, We currently have a few options:

1) let every backend fsync on its own
2) try to delay backends so they all fsync() at the same time
3) delay fsync until after commit

Items 2 and 3 attempt to bunch up fsyncs.  Option 2 has backends waiting
to fsync() on the expectation that some other backend may commit soon. 
Option 3 I may turn out to be the best solution. No matter how smart we
make the code, we will never know for sure if someone is about to commit
and whether it is worth waiting.

My idea would be to let committing backends return "COMMIT" to the user,
and set a need_fsync flag that is guaranteed to cause an fsync within X
milliseconds.  This way, if other backends commit in the next X
millisecond, they can all use one fsync().

Now, I know many will complain that we are returning commit while not
having the stuff on the platter.  But consider, we only lose data from a
OS crash or hardware failure.  Do people who commit something, and then
the machines crashes 2 milliseconds after the commit, really expect the
data to be on the disk when they restart?  Maybe they do, but it seems
the benefit of grouped fsyncs() is large enough that many will say they
would rather have this option.

This was my point long ago that we could offer sub-second reliability
with no-fsync performance if we just had some process running that wrote
dirty pages and fsynced every 20 milliseconds.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Philip Warner

At 21:31 23/02/01 -0500, Bruce Momjian wrote:
Now, I know many will complain that we are returning commit while not
having the stuff on the platter. 

You're definitely right there.

Maybe they do, but it seems
the benefit of grouped fsyncs() is large enough that many will say they
would rather have this option.

I'd prefer to wait for a lock manager that supports timeouts and contention
notification.



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

 At 21:31 23/02/01 -0500, Bruce Momjian wrote:
 Now, I know many will complain that we are returning commit while not
 having the stuff on the platter. 
 
 You're definitely right there.
 
 Maybe they do, but it seems
 the benefit of grouped fsyncs() is large enough that many will say they
 would rather have this option.
 
 I'd prefer to wait for a lock manager that supports timeouts and contention
 notification.

I understand, and if that was going to fix the problem completely, but
it isn't.  It is just going to allow us more flexibility at guessing who
may be about to commit.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Philip Warner

At 11:32 23/02/01 -0500, Tom Lane wrote:
Looking at the XLOG stuff, I notice that we already have a field
(logRec) in the per-backend PROC structures that shows whether a
transaction is currently in progress with at least one change made
(ie at least one XLOG entry written).

Would it be worth adding a field 'waiting for fsync since xxx', so the
second process can (a) log that it is expecting someone else to FSYNC (for
perf stats, if we want them), and (b) wait for (xxx + delta)ms/us etc?





Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Philip Warner

At 23:14 23/02/01 -0500, Bruce Momjian wrote:

There is one more thing.  Even though the kernel says the data is on the
platter, it still may not be there.

This is true, but it does not mean we should say 'the disk is slightly
unreliable, so we can be too'. Also, IIRC, the last time this was
discussed, someone commented that buying expensive disks and a UPS gets you
reliability (barring a direct lightining strike) - it had something to do
with write-ordering and hardware caches. In any case, I'd hate to see DB
design decisions based closely on harware capability. At least two of my
customers use high performance ram disks for databases - do these also
suffer from 'flush is not really flush' problems?

Basically, I am not sure how much we lose by doing the delay after
returning COMMIT, and I know we gain quite a bit by enabling us to group
fsync calls.

If included, this should be an option only, and not the default option. In
fact I'd quite like to see such a feature, although I'd not only do a
'flush every X ms', but I'd also do a 'flush every X transactions' - this
way a DBA can say 'I dont mind losing the last 20 TXs in a crash'. Bear in
mind that on a fast system, 20ms is a lot of transactions.




Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Nathan Myers

On Fri, Feb 23, 2001 at 09:05:20PM -0500, Bruce Momjian wrote:
It seems to me that comparing various choices (and other, more
interesting, algorithms) to the N=1 case would be more productive
than comparing them to the N=0 case, so releasing at N=1 would yield
better statistics for actually tuning in 7.2.
  
   We don't release code because it has better tuning opportunities for
   later releases. What we can do is give people parameters where the
   default is safe, and they can play and report to us.
  
  Perhaps I misunderstood.  I had perceived N=1 as a conservative choice
  that was nevertheless preferable to N=0.
 
 I think zero delay is the conservative choice at this point, unless we
 hear otherwise from testers.

I see, I had it backwards: N=0 corresponds to "always delay", and 
N=infinity (~0) is "never delay", or what you call zero delay.  N=1 is 
not interesting.  N=M/2 or N=sqrt(M) or N=log(M) might be interesting, 
where M is the number of backends, or the number of backends with begun 
transactions, or something.  N=10 would be conservative (and maybe 
pointless) just because it would hardly ever trigger a delay.

Nathan Myers
[EMAIL PROTECTED]



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

 At 23:14 23/02/01 -0500, Bruce Momjian wrote:
 
 There is one more thing.  Even though the kernel says the data is on the
 platter, it still may not be there.
 
 This is true, but it does not mean we should say 'the disk is slightly
 unreliable, so we can be too'. Also, IIRC, the last time this was
 discussed, someone commented that buying expensive disks and a UPS gets you
 reliability (barring a direct lightining strike) - it had something to do
 with write-ordering and hardware caches. In any case, I'd hate to see DB
 design decisions based closely on harware capability. At least two of my
 customers use high performance ram disks for databases - do these also
 suffer from 'flush is not really flush' problems?

Well, I am saying we are being pretty rigid here when we may be on top
of a system that is not, meaning that our rigidity is buying us little.

 
 Basically, I am not sure how much we lose by doing the delay after
 returning COMMIT, and I know we gain quite a bit by enabling us to group
 fsync calls.
 
 If included, this should be an option only, and not the default option. In
 fact I'd quite like to see such a feature, although I'd not only do a
 'flush every X ms', but I'd also do a 'flush every X transactions' - this
 way a DBA can say 'I dont mind losing the last 20 TXs in a crash'. Bear in
 mind that on a fast system, 20ms is a lot of transactions.

Yes, I can see this as a good option for many users.  My old complaint
was that we allowed only two very extreme options, fsync() all the time,
or fsync() never and recover from a crash.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

 Bruce Momjian [EMAIL PROTECTED] writes:
  My idea would be to let committing backends return "COMMIT" to the user,
  and set a need_fsync flag that is guaranteed to cause an fsync within X
  milliseconds.  This way, if other backends commit in the next X
  millisecond, they can all use one fsync().
 
 Guaranteed by what?  We have no mechanism available to make an fsync
 happen while the backend is waiting for input.

We would need a separate binary that can look at shared memory and fsync
is someone requested it.  Again, nothing for 7.1.X.

  Now, I know many will complain that we are returning commit while not
  having the stuff on the platter.
 
 I think that's unacceptable on its face.  A remote client may take
 action on the basis that COMMIT was returned.  If the server then
 crashes, the client is unlikely to realize this for some time (certainly
 at least one TCP timeout interval).  It won't look like a "milliseconds
 later" situation to that client.  In fact, the client might *never*
 realize there was a problem; what if it disconnects after getting the
 COMMIT?
 
 If the dbadmin thinks he doesn't need fsync before commit, he'll likely
 be running with fsync off anyway.  For the ones who do think they need
 fsync, I don't believe that we get to rearrange the fsync to occur after
 commit.

I can see someone wanting some fsync, but not take the hit.  My argument
is that having this ability, there would be no need to turn off fsync.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 My idea would be to let committing backends return "COMMIT" to the user,
 and set a need_fsync flag that is guaranteed to cause an fsync within X
 milliseconds.  This way, if other backends commit in the next X
 millisecond, they can all use one fsync().

Guaranteed by what?  We have no mechanism available to make an fsync
happen while the backend is waiting for input.

 Now, I know many will complain that we are returning commit while not
 having the stuff on the platter.

I think that's unacceptable on its face.  A remote client may take
action on the basis that COMMIT was returned.  If the server then
crashes, the client is unlikely to realize this for some time (certainly
at least one TCP timeout interval).  It won't look like a "milliseconds
later" situation to that client.  In fact, the client might *never*
realize there was a problem; what if it disconnects after getting the
COMMIT?

If the dbadmin thinks he doesn't need fsync before commit, he'll likely
be running with fsync off anyway.  For the ones who do think they need
fsync, I don't believe that we get to rearrange the fsync to occur after
commit.

regards, tom lane



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

 At 21:31 23/02/01 -0500, Bruce Momjian wrote:
 Now, I know many will complain that we are returning commit while not
 having the stuff on the platter. 
 
 You're definitely right there.
 
 Maybe they do, but it seems
 the benefit of grouped fsyncs() is large enough that many will say they
 would rather have this option.
 
 I'd prefer to wait for a lock manager that supports timeouts and contention
 notification.
 

There is one more thing.  Even though the kernel says the data is on the
platter, it still may not be there.  Some OS's may return from fsync
when the data is _queued_ to the disk, rather than actually wanting for
the drive return code to say it completed.  Second, some disks report
back that the data is on the disk when it is actually in the disk memory
buffer, not really on the disk.

Basically, I am not sure how much we lose by doing the delay after
returning COMMIT, and I know we gain quite a bit by enabling us to group
fsync calls.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Tom Lane

Philip Warner [EMAIL PROTECTED] writes:
 It may have been much earler in the debate, but has anyone checked to see
 what the maximum possible gains might be - or is it self-evident to people
 who know the code?

fsync off provides an upper bound to the speed achievable from being
smarter about when to fsync... I doubt that fsync-once-per-checkpoint
would be much different.

regards, tom lane



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Tom Lane

Preliminary results from experimenting with an
N-transactions-must-be-running-to-cause-commit-delay heuristic are
attached.  It seems to be a pretty definite win.  I'm currently running
a more extensive set of cases on another machine for comparison.

The test case is pgbench, unmodified, but run at scalefactor 10
to reduce write contention on the 'branch' rows.  Postmaster
parameters are -N 100 -B 1024 in all cases.  The fsync-off (with,
of course, no commit delay either) case is shown for comparison.
"commit siblings" is the number of other backends that must be
running active (unblocked, at least one XLOG entry made) transactions
before we will do a precommit delay.

commit delay=1 is effectively commit delay=1 (10msec) on this
hardware.  Interestingly, it seems that we can push the delay up
to two or three clock ticks without degradation, given positive N.

regards, tom lane


 hppabench.gif


Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Tom Lane

[EMAIL PROTECTED] (Nathan Myers) writes:
 I see, I had it backwards: N=0 corresponds to "always delay", and 
 N=infinity (~0) is "never delay", or what you call zero delay.  N=1 is 
 not interesting.  N=M/2 or N=sqrt(M) or N=log(M) might be interesting, 
 where M is the number of backends, or the number of backends with begun 
 transactions, or something.  N=10 would be conservative (and maybe 
 pointless) just because it would hardly ever trigger a delay.

Why is N=1 not interesting?  That requires at least one other backend
to be in a transaction before you'll delay.  That would seem to be
the minimum useful value --- N=0 (always delay) seems clearly to be
too stupid to be useful.

regards, tom lane



Re: [HACKERS] CommitDelay performance improvement

2001-02-23 Thread Bruce Momjian

 Philip Warner [EMAIL PROTECTED] writes:
  It may have been much earler in the debate, but has anyone checked to see
  what the maximum possible gains might be - or is it self-evident to people
  who know the code?
 
 fsync off provides an upper bound to the speed achievable from being
 smarter about when to fsync... I doubt that fsync-once-per-checkpoint
 would be much different.

That was my point, people should be doing fsync once per checkpoint
rather than never.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026