Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-07 Thread Greg Smith

Erik Rijkers wrote:

Everything together: the raid is what Areca call 'raid10(1E)'.
(to be honest I don't remember what that 1E exactly means -
extra flexibility in the number of disks, I think).
  


Standard RAID10 only supports an even number of disks.  The 1E variants 
also allow putting an odd number in.  If you're using an even number, 
like in your case, the results are the same until you start losing 
drives, at which point the degradation performance pattern changes a bit 
due to differences in how things are striped.  See these for more info:


http://bytepile.com/raid_class.php and 
http://en.wikipedia.org/wiki/Non-standard_RAID_levels#IBM_ServeRAID_1E

http://publib.boulder.ibm.com/infocenter/eserver/v1r2/index.jsp?topic=/diricinfo/fqy0_craid1e.html

I don't think using RAID1E has anything to do with your results, but it 
is possible given your test configuration that part of the difference 
you're seeing relates to where on disk blocks are stored.  If you take a 
hard drive and write two copies of something onto it, the second copy 
will be a little slower than the first.  That's just because how drive 
speed drops over the surface as you move further along it.  There's some 
examples of what that looks like at 
http://projects.2ndquadrant.it/sites/default/files/pg-hw-bench-2010.pdf 
on pages 21-23.


Returning to your higher level results, one of the variables you weren't 
sure how to account for was caching effects on the standby--the 
possibility that it just didn't have the data cached the same as the 
master impacting results.  The usual way I look for that is by graphing 
the pgbench TPS over time.  In that situation, you can see it shoot 
upwards over time, very obvious pattern.  Example at 
http://projects.2ndquadrant.it/sites/default/files/pgbench-intro.pdf on 
pages 36,37.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Erik Rijkers
Hi Simon,

In another thread you mentioned you were lacking information from me:

On Tue, May 4, 2010 17:10, Simon Riggs wrote:

 There is no evidence that Erik's strange performance has anything to do
 with HS; it hasn't been seen elsewhere and he didn't respond to
 questions about the test setup to provide background. The profile didn't
 fit any software problem I can see.


I'm sorry if I missed requests for things that where not already mentioned.

Let me repeat:
  OS: Centos 5.4
  2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
  Areca 1280ML
  primary and standby db both on a 12 disk array (sata 7200rpm, Seagat 
Barracuda ES.2)

It goes without saying (I hope) that apart from the pgbench tests
and a few ssh sessions (myself), the machine was idle.

It would be interesting if anyone repeated these simple tests and produced
evidence that these non-HS.

(Unfortunately, I have at the moment not much time for more testing)


thanks,

Erik Rijkers



On Sun, April 25, 2010 21:07, Simon Riggs wrote:
 On Sun, 2010-04-25 at 20:25 +0200, Erik Rijkers wrote:

 Sorry if it's too much data, but to me at least it was illuminating;
 I now understand the effects of the different parameters better.

 That's great, many thanks.

 A few observations

 * Standby performance is actually slightly above normal running. This is
 credible because of the way snapshots are now taken. We don't need to
 scan the procarray looking for write transactions, since we know
 everything is read only. So we scan just the knownassignedxids, which if
 no activity from primary will be zero-length, so snapshots will actually
 get taken much faster in this case on standby. The snapshot performance
 on standby is O(n) where n is the number of write transactions
 currently on primary (transfer delays blur the word currently).

 * The results for scale factor  100 are fine, and the results for 100
 with few connections get thrown out by long transaction times. With
 larger numbers of connections the wait problems seem to go away. Looks
 like Erik (and possibly Hot Standby in general) has an I/O problem,
 though from what is not yet determined. It could be just hardware, or
 might be hardware plus other factors.

 --
  Simon Riggs   www.2ndQuadrant.com





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Simon Riggs
On Tue, 2010-05-04 at 18:10 +0200, Erik Rijkers wrote:
 It would be interesting if anyone repeated these simple tests and
 produced
 evidence that these non-HS.
 
 (Unfortunately, I have at the moment not much time for more testing)

Would you be able to make those systems available for further testing?

First, I'd perform the same test with the systems swapped, so we know
more about the symmetry of the issue. After that, would like to look
more into internals.

Is it possible to setup SytemTap and dtrace on these systems?

Thanks, either way.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Erik Rijkers
On Tue, May 4, 2010 18:19, Simon Riggs wrote:
 On Tue, 2010-05-04 at 18:10 +0200, Erik Rijkers wrote:
 It would be interesting if anyone repeated these simple tests and
 produced evidence that these non-HS.

 (Unfortunately, I have at the moment not much time for more testing)

 Would you be able to make those systems available for further testing?

No. sorry.

 First, I'd perform the same test with the systems swapped, so we know
 more about the symmetry of the issue. After that, would like to look
 more into internals.

you mean systems swapped, primary and standby?  primary and standby were on 
the same machine in
these tests (even the same raid).

I can eventually move the standby (the 'slow' side, as it stands) to another, 
quite similar
machine.  Not in the coming days though...


 Is it possible to setup SytemTap and dtrace on these systems?

I did install systemtap last week. dtrace is not installed (I think. I've never 
used either.)



Erik Rijkers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Greg Smith

Erik Rijkers wrote:

  OS: Centos 5.4
  2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
  Areca 1280ML
  primary and standby db both on a 12 disk array (sata 7200rpm, Seagat 
Barracuda ES.2)
  


To fill in from data you already mentioned upthread:
32 GB RAM
CentOS release 5.4 (Final), x86_64 Linux, 2.6.18-164.el5

Thanks for the all the reporting you've done here, really helpful.  
Questions to make sure I'm trying to duplicate the right thing here:


Is your disk array all configured as one big RAID10 volume, so 
essentially a 6-disk stripe with redundancy, or something else?  In 
particular I want know whether the WAL/database/archives are split onto 
separate volumes or all on one big one when you were testing. 


Is this is on ext3 with standard mount parameters?

Also, can you confirm that every test you ran only had a single pgbench 
worker thread (-j 1 or not specified)?  That looked to be the case from 
the ones I saw where you posted the whole command used.  It would not 
surprise me to find that the CPU usage profile of a standby is just 
different enough from the primary that it results in the pgbench program 
not being scheduled enough time, due to the known Linux issues in that 
area.  Not going to assume that, of course, just one thing I want to 
check when trying to replicate what you've run into. 

I didn't see any glaring HS performance issues like you've been 
reporting on last time I tried performance testing in this area, just a 
small percentage drop.  But I didn't specifically go looking for it 
either.  With your testing rig out of service, we're going to try and 
replicate that on a system here.  My home server is like a scaled down 
version of yours (single quad-core, 8GB RAM, smaller Areca controller, 5 
disks instead of 12) and it's running the same CentOS version.  If the 
problems really universal I should see it here too.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Stefan Kaltenbrunner

Erik Rijkers wrote:

Hi Simon,

In another thread you mentioned you were lacking information from me:

On Tue, May 4, 2010 17:10, Simon Riggs wrote:

There is no evidence that Erik's strange performance has anything to do
with HS; it hasn't been seen elsewhere and he didn't respond to
questions about the test setup to provide background. The profile didn't
fit any software problem I can see.



I'm sorry if I missed requests for things that where not already mentioned.

Let me repeat:
  OS: Centos 5.4
  2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
  Areca 1280ML
  primary and standby db both on a 12 disk array (sata 7200rpm, Seagat 
Barracuda ES.2)

It goes without saying (I hope) that apart from the pgbench tests
and a few ssh sessions (myself), the machine was idle.

It would be interesting if anyone repeated these simple tests and produced
evidence that these non-HS.

(Unfortunately, I have at the moment not much time for more testing)


FWIW - I'm seeing a behaviour here under pgbench -S workloads that looks 
kinda related.


using -j 16 -c 16 -T 120 I get either 10tps and around 66 
contextswitches per second or on some runs I end up with 15tps and 
around 1M contextswitches/s sustained. I mostly get the 100k result but 
once in a while I get the 150k one. And one even can anticipate the 
final transaction rate from watching vmstat 1...


I'm not sure yet on what is causing that behaviour but that is with 
9.0B1 on a Dual Quadcore Nehalem box with 16 cpu threads (8+HT) on a 
pure in-memory workload (scale = 20 with 48GB RAM).



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Erik Rijkers
On Tue, May 4, 2010 20:26, Greg Smith wrote:
 Erik Rijkers wrote:
   OS: Centos 5.4
   2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
   Areca 1280ML
   primary and standby db both on a 12 disk array (sata 7200rpm, Seagat 
 Barracuda ES.2)


 To fill in from data you already mentioned upthread:
 32 GB RAM
 CentOS release 5.4 (Final), x86_64 Linux, 2.6.18-164.el5

 Thanks for the all the reporting you've done here, really helpful.
 Questions to make sure I'm trying to duplicate the right thing here:

 Is your disk array all configured as one big RAID10 volume, so
 essentially a 6-disk stripe with redundancy, or something else?  In
 particular I want know whether the WAL/database/archives are split onto
 separate volumes or all on one big one when you were testing.

Everything together: the raid is what Areca call 'raid10(1E)'.
(to be honest I don't remember what that 1E exactly means -
extra flexibility in the number of disks, I think).

Btw, some of my emails contained the postgresql.conf of both instances.


 Is this is on ext3 with standard mount parameters?

ext3 noatime

 Also, can you confirm that every test you ran only had a single pgbench
 worker thread (-j 1 or not specified)?  That looked to be the case from
 the ones I saw where you posted the whole command used.  It would not

yes; the literal cmd:
time /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/bin/pgbench -h /tmp 
-p 6565 -U rijkers
-n -S -c 20 -T 900 -j 1 replicas

To avoid wrapping in the emails I just removed '-h \tmp', -U rijkers', and 
'replicas'.

(I may have run the primary's pgbench binary also against the slave - don't 
think
that should make any difference)

 surprise me to find that the CPU usage profile of a standby is just
 different enough from the primary that it results in the pgbench program
 not being scheduled enough time, due to the known Linux issues in that
 area.  Not going to assume that, of course, just one thing I want to
 check when trying to replicate what you've run into.

 I didn't see any glaring HS performance issues like you've been
 reporting on last time I tried performance testing in this area, just a
 small percentage drop.  But I didn't specifically go looking for it

Here, it seems repeatable, but does not occur with all scales.

Hm, maybe I should just dump *all* of my results on the wiki for reference.  
(I'll look at that
later).

 either.  With your testing rig out of service, we're going to try and
 replicate that on a system here.  My home server is like a scaled down
 version of yours (single quad-core, 8GB RAM, smaller Areca controller, 5
 disks instead of 12) and it's running the same CentOS version.  If the
 problems really universal I should see it here too.


Thanks,


Erik Rijkers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Simon Riggs
On Tue, 2010-05-04 at 21:34 +0200, Stefan Kaltenbrunner wrote:

 FWIW - I'm seeing a behaviour here under pgbench -S workloads that looks 
 kinda related.
 
 using -j 16 -c 16 -T 120 I get either 10tps and around 66 
 contextswitches per second or on some runs I end up with 15tps and 
 around 1M contextswitches/s sustained. I mostly get the 100k result but 
 once in a while I get the 150k one. And one even can anticipate the 
 final transaction rate from watching vmstat 1...
 
 I'm not sure yet on what is causing that behaviour but that is with 
 9.0B1 on a Dual Quadcore Nehalem box with 16 cpu threads (8+HT) on a 
 pure in-memory workload (scale = 20 with 48GB RAM).

Educated guess at a fix: please test this patch. It's good for
performance testing, but doesn't work correctly at failover, which would
obviously be addressed prior to any commit.

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 158,163  static XLogRecPtr LastRec;
--- 158,164 
   * known, need to check the shared state.
   */
  static bool LocalRecoveryInProgress = true;
+ static bool KnownRecoveryState = false;
  
  /*
   * Local state for XLogInsertAllowed():
***
*** 6524,6537  CheckRecoveryConsistency(void)
  bool
  RecoveryInProgress(void)
  {
! 	/*
! 	 * We check shared state each time only until we leave recovery mode. We
! 	 * can't re-enter recovery, so there's no need to keep checking after the
! 	 * shared variable has once been seen false.
! 	 */
! 	if (!LocalRecoveryInProgress)
! 		return false;
! 	else
  	{
  		/* use volatile pointer to prevent code rearrangement */
  		volatile XLogCtlData *xlogctl = XLogCtl;
--- 6525,6533 
  bool
  RecoveryInProgress(void)
  {
! 	if (KnownRecoveryState)
! 		return LocalRecoveryInProgress;
! 	else		
  	{
  		/* use volatile pointer to prevent code rearrangement */
  		volatile XLogCtlData *xlogctl = XLogCtl;
***
*** 6541,6546  RecoveryInProgress(void)
--- 6537,6544 
  		LocalRecoveryInProgress = xlogctl-SharedRecoveryInProgress;
  		SpinLockRelease(xlogctl-info_lck);
  
+ 		KnownRecoveryState = true;
+ 
  		/*
  		 * Initialize TimeLineID and RedoRecPtr when we discover that recovery
  		 * is finished. InitPostgres() relies upon this behaviour to ensure

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-28 Thread Simon Riggs
On Tue, 2010-04-27 at 20:13 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Tue, 2010-04-27 at 18:08 -0400, Tom Lane wrote:
  Huh?  How is a filter as coarse as an oldest-running-XID filter going
  to prevent that?  And aren't we initializing from trustworthy data in
  ProcArrayApplyRecoveryInfo, anyway?
  
  I still say it's useless.
 
  Quite possibly. Your looking at other code outside of this patch. I'm
  happy that you do so, but is it immediately related? I can have another
  look when we finish this.
 
 Well, it's nearby anyway.  I've committed the present patch (with a
 number of fixes).  While I was looking at it I came across several
 things in the existing code that I think are either wrong or at least
 inadequately documented --- the above complaint is just the tip of the
 iceberg.  I'm going to make another pass over it to see if I'm just
 missing things, and then report back.

Thank you for your input and changes.

You're welcome to share my iceberg.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 v3 attached

This patch changes KnownAssignedXidsRemove() so that failure to find
the target XID is elog(ERROR) (ie, a PANIC, since this is in the
startup process).  However, this comment is still there:

/*
 * We can fail to find an xid if the xid came from a subtransaction that
 * aborts, though the xid hadn't yet been reported and no WAL records 
have
 * been written using the subxid. In that case the abort record will
 * contain that subxid and we haven't seen it before.
 */

WTF?  Either the comment is wrong or this should not be an elog
condition.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Simon Riggs
On Tue, 2010-04-27 at 13:52 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  v3 attached
 
 This patch changes KnownAssignedXidsRemove() so that failure to find
 the target XID is elog(ERROR) (ie, a PANIC, since this is in the
 startup process).  

Not in all cases. The code is correct, as far as I am aware from
testing.

 However, this comment is still there:
   /*
* We can fail to find an xid if the xid came from a subtransaction that
* aborts, though the xid hadn't yet been reported and no WAL records 
 have
* been written using the subxid. In that case the abort record will
* contain that subxid and we haven't seen it before.
*/
 
 WTF?  Either the comment is wrong or this should not be an elog
 condition.

That section of code has been rewritten many times. I think it is now
inaccurate and should be removed. I left it there because the
unfortunate history of the project has been the removal of comments and
then later rediscovery of the truth, sometimes more than once. I could
no longer reproduce that error; someone else may know differently.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Tom Lane
Hmm ... there's another point here, which is that the array size creates
a hard maximum on the number of entries, whereas the hash table was a
bit more forgiving.  What is the proof that the array won't overflow?
The fact that the equivalent data structure on the master can't hold
more than this many entries doesn't seem to me to prove that, because
we will add intermediate not-observed XIDs to the array.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Simon Riggs
On Tue, 2010-04-27 at 14:53 -0400, Tom Lane wrote:
 Hmm ... there's another point here, which is that the array size
 creates
 a hard maximum on the number of entries, whereas the hash table was a
 bit more forgiving.  What is the proof that the array won't overflow?
 The fact that the equivalent data structure on the master can't hold
 more than this many entries doesn't seem to me to prove that, because
 we will add intermediate not-observed XIDs to the array.

We know that not-observed xids have actually been allocated on the
primary. We log an assignment record every 64 subtransactions, so that
the peak size of the array is 65 xids per connection.

It's possible for xids to stay in the array for longer, in the event of
a FATAL error that doesn't log an abort record. We clean those up every
checkpoint, if they exist. The potential number of them is unbounded, so
making special allowance for them doesn't remove the theoretical risk.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Tue, 2010-04-27 at 13:52 -0400, Tom Lane wrote:
 WTF?  Either the comment is wrong or this should not be an elog
 condition.

 That section of code has been rewritten many times. I think it is now
 inaccurate and should be removed. I left it there because the
 unfortunate history of the project has been the removal of comments and
 then later rediscovery of the truth, sometimes more than once. I could
 no longer reproduce that error; someone else may know differently.

I haven't tested this, but it appears to me that the failure would occur
in overflow situations.  If we have too many subxacts, we'll generate
XLOG_XACT_ASSIGNMENT, which will cause the subxids to be removed from
KnownAssignedXids[].  Then later when the top-level xact commits or
aborts we'll try to remove them again as a consequence of processing
the top-level's commit/abort record.  No?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Simon Riggs
On Tue, 2010-04-27 at 16:18 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Tue, 2010-04-27 at 13:52 -0400, Tom Lane wrote:
  WTF?  Either the comment is wrong or this should not be an elog
  condition.
 
  That section of code has been rewritten many times. I think it is now
  inaccurate and should be removed. I left it there because the
  unfortunate history of the project has been the removal of comments and
  then later rediscovery of the truth, sometimes more than once. I could
  no longer reproduce that error; someone else may know differently.
 
 I haven't tested this, but it appears to me that the failure would occur
 in overflow situations.  If we have too many subxacts, we'll generate
 XLOG_XACT_ASSIGNMENT, which will cause the subxids to be removed from
 KnownAssignedXids[].  Then later when the top-level xact commits or
 aborts we'll try to remove them again as a consequence of processing
 the top-level's commit/abort record.  No?

Yes, thank you for clear thinking. Anyway, looks like the comment was
right after all and the new code to throw an error is wrong in some
cases. It was useful for testing, at least. The comment was slightly
misleading, which is a good reason to rewrite it.

It seems like it might be possible to identify which xids could cause an
error and which won't. Much harder than that. We still have the possible
case where we have 64 subtransactions allocated but many of them abort
and we are left with a final commit of 64 subtransactions.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Tom Lane
Isn't the snapshotOldestActiveXid filter in
RecordKnownAssignedTransactionIds completely wrong/useless/bogus?

AFAICS, snapshotOldestActiveXid is only set once at the start of
recovery.  This means it will soon be too old to provide any useful
filtering.  But what's far worse is that the XID space will eventually
wrap around, and that test will start filtering *everything*.

I think we should just lose that test, as well as the variable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Simon Riggs
On Tue, 2010-04-27 at 17:24 -0400, Tom Lane wrote:

 Isn't the snapshotOldestActiveXid filter in
 RecordKnownAssignedTransactionIds completely wrong/useless/bogus?
 
 AFAICS, snapshotOldestActiveXid is only set once at the start of
 recovery.  This means it will soon be too old to provide any useful
 filtering.  But what's far worse is that the XID space will eventually
 wrap around, and that test will start filtering *everything*.
 
 I think we should just lose that test, as well as the variable.

Yes, though it looks like it is still necessary in creating a valid
initial state because otherwise we may have xids in KnownAssigned array
that are already complete. The comment there talks about wasting memory,
though it appears to be a correctness issue.

So perhaps a similar test is required in ProcArrayApplyRecoveryInfo()
but not in RecordKnownAssignedTransactionIds(). That way it is applied,
but only once at initialisation.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Tue, 2010-04-27 at 17:24 -0400, Tom Lane wrote:
 I think we should just lose that test, as well as the variable.

 Yes, though it looks like it is still necessary in creating a valid
 initial state because otherwise we may have xids in KnownAssigned array
 that are already complete.

Huh?  How is a filter as coarse as an oldest-running-XID filter going
to prevent that?  And aren't we initializing from trustworthy data in
ProcArrayApplyRecoveryInfo, anyway?

I still say it's useless.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Simon Riggs
On Tue, 2010-04-27 at 18:08 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Tue, 2010-04-27 at 17:24 -0400, Tom Lane wrote:
  I think we should just lose that test, as well as the variable.
 
  Yes, though it looks like it is still necessary in creating a valid
  initial state because otherwise we may have xids in KnownAssigned array
  that are already complete.
 
 Huh?  How is a filter as coarse as an oldest-running-XID filter going
 to prevent that?  And aren't we initializing from trustworthy data in
 ProcArrayApplyRecoveryInfo, anyway?
 
 I still say it's useless.

Quite possibly. Your looking at other code outside of this patch. I'm
happy that you do so, but is it immediately related? I can have another
look when we finish this.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Tue, 2010-04-27 at 18:08 -0400, Tom Lane wrote:
 Huh?  How is a filter as coarse as an oldest-running-XID filter going
 to prevent that?  And aren't we initializing from trustworthy data in
 ProcArrayApplyRecoveryInfo, anyway?
 
 I still say it's useless.

 Quite possibly. Your looking at other code outside of this patch. I'm
 happy that you do so, but is it immediately related? I can have another
 look when we finish this.

Well, it's nearby anyway.  I've committed the present patch (with a
number of fixes).  While I was looking at it I came across several
things in the existing code that I think are either wrong or at least
inadequately documented --- the above complaint is just the tip of the
iceberg.  I'm going to make another pass over it to see if I'm just
missing things, and then report back.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-26 Thread Fujii Masao
On Mon, Apr 26, 2010 at 3:25 AM, Erik Rijkers e...@xs4all.nl wrote:
 FWIW, here are some more results from pgbench comparing
 primary and standby (both with Simon's patch).

Was there a difference in CPU  utilization between the primary
and standby?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-26 Thread Simon Riggs
On Sun, 2010-04-25 at 19:18 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  [ v2 patch ]
 
 I've been studying this some more while making notes for improved
 comments, and I've about come to the conclusion that having readers
 move the tail pointer (at the end of KnownAssignedXidsGetAndSetXmin)
 is overly tricky and probably not a performance improvement anyway.
 The code is in fact wrong as it stands: it's off-by-one about setting
 the new tail value.  And there's potential for contention with multiple
 readers all wanting to move the tail pointer at once.  

OK, since contention was my concern, I want to avoid that.

 And most
 importantly, KnownAssignedXidsSearch can't move the tail pointer so
 we might expend many inefficient searches while never moving the tail
 pointer.

 I think we should get rid of that and just have the two functions that
 can mark entries invalid (which they must do with exclusive lock)
 advance the tail pointer when they invalidate the current tail element.

OK

 Then we have the very simple rule that only the startup process ever
 changes this data structure.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-26 Thread Simon Riggs
On Sun, 2010-04-25 at 23:52 +0200, Erik Rijkers wrote:

 I'll try to repeat this pattern on other hardware; although
 if my tests were run with faulty hardware I wouldn't know how/why
 that would give the above effect (such a 'regular aberration').

 testing is more difficult than I thought...

Thanks again for your help.

Please can you confirm:
* Are the standby tests run while the primary is completely quiet?
* What OS is this? Can we use dtrace scripts?

Can anyone else confirm these test results: large scale factor and small
number of sessions?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-26 Thread Erik Rijkers
On Mon, April 26, 2010 08:52, Fujii Masao wrote:
 On Mon, Apr 26, 2010 at 3:25 AM, Erik Rijkers e...@xs4all.nl wrote:
 FWIW, here are some more results from pgbench comparing
 primary and standby (both with Simon's patch).

 Was there a difference in CPU  utilization between the primary
 and standby?


I haven't monitored it..


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-26 Thread Erik Rijkers
On Mon, April 26, 2010 09:43, Simon Riggs wrote:
 On Sun, 2010-04-25 at 23:52 +0200, Erik Rijkers wrote:

 I'll try to repeat this pattern on other hardware; although
 if my tests were run with faulty hardware I wouldn't know how/why
 that would give the above effect (such a 'regular aberration').

 testing is more difficult than I thought...

 Thanks again for your help.

 Please can you confirm:
 * Are the standby tests run while the primary is completely quiet?

autovacuum was on. Which is probably not a good idea - I'll try a few runs 
without it.

 * What OS is this? Can we use dtrace scripts?

Centos 5.4.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-26 Thread Simon Riggs
On Sun, 2010-04-25 at 13:51 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Sun, 2010-04-25 at 13:33 -0400, Tom Lane wrote:
  If you like I'll have a go at rewriting the comments for this patch,
  because I am currently thinking that the problem is not so much with
  the code as with the poor explanation of what it's doing.  Sometimes
  the author is too close to the code to understand why other people
  have a hard time understanding it.
 
  That would help me, thank you.
 
 OK.  You said you were currently working some more on the patch, so
 I'll wait for v3 and then work on it.

v3 attached

Changes:
* Strange locking in KnownAssignedXidsAdd() moved to RecordKnown...
* KnownAssignedXidsAdd() reordered, assert-ish code added
* Tail movement during snapshots no longer possible
* Tail movement during xid removal added to KnownAssignedXidsSearch()
* Major comment hacking

Little bit rough, definitely needs a re-read of all comments, so good
time to send over.

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
***
*** 1200,1205  StandbyTransactionIdIsPrepared(TransactionId xid)
--- 1200,1208 
  
  	Assert(TransactionIdIsValid(xid));
  
+ 	if (max_prepared_xacts = 0)
+ 		return false;	/* nothing to do */
+ 
  	/* Read and validate file */
  	buf = ReadTwoPhaseFile(xid, false);
  	if (buf == NULL)
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 6454,6459  CheckRecoveryConsistency(void)
--- 6454,6465 
  	}
  }
  
+ bool
+ XLogConsistentState(void)
+ {
+ 	return reachedMinRecoveryPoint;
+ }
+ 
  /*
   * Is the system still in recovery?
   *
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***
*** 52,57 
--- 52,58 
  #include access/twophase.h
  #include miscadmin.h
  #include storage/procarray.h
+ #include storage/spin.h
  #include storage/standby.h
  #include utils/builtins.h
  #include utils/snapmgr.h
***
*** 64,73  typedef struct ProcArrayStruct
  	int			numProcs;		/* number of valid procs entries */
  	int			maxProcs;		/* allocated size of procs array */
  
! 	int			numKnownAssignedXids;	/* current number of known assigned
! 		 * xids */
! 	int			maxKnownAssignedXids;	/* allocated size of known assigned
! 		 * xids */
  
  	/*
  	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
--- 65,84 
  	int			numProcs;		/* number of valid procs entries */
  	int			maxProcs;		/* allocated size of procs array */
  
! 	/*
! 	 * Known assigned xids handling
! 	 */
! 	int			maxKnownAssignedXids;	/* allocated size */
! 
! 	/*
! 	 * Callers must hold either ProcArrayLock in Exclusive mode or
! 	 * ProcArrayLock in Shared mode *and* known_assigned_xids_lck
! 	 * to update these values.
! 	 */
! 	int			numKnownAssignedXids;	/* currrent # valid entries */
! 	int			tailKnownAssignedXids;	/* current tail */
! 	int			headKnownAssignedXids;	/* current head */
! 	slock_t		known_assigned_xids_lck;	/* shared protection lock */
  
  	/*
  	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
***
*** 87,93  static ProcArrayStruct *procArray;
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
! static HTAB *KnownAssignedXidsHash;
  static TransactionId latestObservedXid = InvalidTransactionId;
  
  /*
--- 98,105 
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
! static TransactionId *KnownAssignedXids;
! static bool *KnownAssignedXidsValid;
  static TransactionId latestObservedXid = InvalidTransactionId;
  
  /*
***
*** 142,150  static int	KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax);
  static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
  			   TransactionId xmax);
  static bool KnownAssignedXidsExist(TransactionId xid);
! static void KnownAssignedXidsAdd(TransactionId *xids, int nxids);
  static void KnownAssignedXidsRemove(TransactionId xid);
! static void KnownAssignedXidsRemoveMany(TransactionId xid, bool keepPreparedXacts);
  static void KnownAssignedXidsDisplay(int trace_level);
  
  /*
--- 154,166 
  static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
  			   TransactionId xmax);
  static bool KnownAssignedXidsExist(TransactionId xid);
! static void KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
! bool exclusive_lock);
  static void KnownAssignedXidsRemove(TransactionId xid);
! static void KnownAssignedXidsRemoveMany(TransactionId xid);
! static void KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
! 	  TransactionId *subxids);
! static void KnownAssignedXidsCompress(bool full);
  static void KnownAssignedXidsDisplay(int trace_level);
  
  /*
***
*** 204,228  

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-26 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 v3 attached

Thanks, will work on this tomorrow.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Robert Haas
On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Both Heikki and I objected to that patch.

 Please explain your objection, based upon the patch and my explanations.

Well, we objected to the locking.  Having reread the patch a few times
though, I think I'm starting to wrap my head around it so, I don't
know, maybe it's OK.  Have you tested grabbing the ProcArrayLock in
exclusive mode instead of having a separate spinlock, to see how that
performs?

 And apparently it doesn't
 fix the problem, either.  So, -1 from me.

 There is an issue observed in Erik's later tests, but my interpretation
 of the results so far is that the sorted array patch successfully
 removes the initially reported loss of performance.

Is it possible the remaining spikes are due to fights over the spinlock?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Simon Riggs
On Sun, 2010-04-25 at 06:53 -0400, Robert Haas wrote:
 On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
  Both Heikki and I objected to that patch.
 
  Please explain your objection, based upon the patch and my explanations.
 
 Well, we objected to the locking.  Having reread the patch a few times
 though, I think I'm starting to wrap my head around it so, I don't
 know, maybe it's OK.  Have you tested grabbing the ProcArrayLock in
 exclusive mode instead of having a separate spinlock, to see how that
 performs?

They both perform well. Heikki and I agree that spinlocks perform well.

The point of taking the ProcArrayLock in Shared rather than Exclusive
mode is the reduction in contention that gives. Adding a
KnownAssignedXid is exactly analogous to starting a transaction. We go
to some trouble to avoid taking a lock when we start a transaction and I
want to do a similar thing on the standby. I think it's even more
important we reduce contention on the standby than it is on the primary
because if the Startup process is delayed then it slows down recovery.

  And apparently it doesn't
  fix the problem, either.  So, -1 from me.
 
  There is an issue observed in Erik's later tests, but my interpretation
  of the results so far is that the sorted array patch successfully
  removes the initially reported loss of performance.
 
 Is it possible the remaining spikes are due to fights over the spinlock?

That one specific spinlock? I considered it. I think its doubtful.

The spikes all involved a jump from 5ms to much more than that, often
as long as a second. That looks like a pure I/O problem or an LWlock
held across a slow I/O or platform specific issues of some kind.

Erik's tests were with 1 backend, so that would mean that at most 2
processes might want the spinlock you mention. The spinlock is only
taken at snapshot time, so once per transaction in Erik's tests. So the
if the spinlock were the problem then one or other of the processes
would need to grab and hold the spinlock for a very long time and then
release it sometime later. Since the spinlock code has been with us for
some time, I doubt there is anything wrong there.

I don't have any evidence as to whether the problem is solely on Erik's
system, nor whether it is an issue already in the codebase or introduced
by this patch. It seems unlikely to me that this patch introduces the
problem, and even more unlikely that the issue is caused by two
processes fighting over that particular spinlock. 

The patch is very effective at reducing overall cost of taking snapshots
and so I think it needs to be applied to allow more performance data to
be collected. I don't suppose this is the last performance tuning we
will need to do.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Robert Haas
On Sun, Apr 25, 2010 at 8:50 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Sun, 2010-04-25 at 06:53 -0400, Robert Haas wrote:
 On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
  Both Heikki and I objected to that patch.
 
  Please explain your objection, based upon the patch and my explanations.

 Well, we objected to the locking.  Having reread the patch a few times
 though, I think I'm starting to wrap my head around it so, I don't
 know, maybe it's OK.  Have you tested grabbing the ProcArrayLock in
 exclusive mode instead of having a separate spinlock, to see how that
 performs?

 They both perform well. Heikki and I agree that spinlocks perform well.

 The point of taking the ProcArrayLock in Shared rather than Exclusive
 mode is the reduction in contention that gives. Adding a
 KnownAssignedXid is exactly analogous to starting a transaction. We go
 to some trouble to avoid taking a lock when we start a transaction and I
 want to do a similar thing on the standby. I think it's even more
 important we reduce contention on the standby than it is on the primary
 because if the Startup process is delayed then it slows down recovery.

  And apparently it doesn't
  fix the problem, either.  So, -1 from me.
 
  There is an issue observed in Erik's later tests, but my interpretation
  of the results so far is that the sorted array patch successfully
  removes the initially reported loss of performance.

 Is it possible the remaining spikes are due to fights over the spinlock?

 That one specific spinlock? I considered it. I think its doubtful.

 The spikes all involved a jump from 5ms to much more than that, often
 as long as a second. That looks like a pure I/O problem or an LWlock
 held across a slow I/O or platform specific issues of some kind.

 Erik's tests were with 1 backend, so that would mean that at most 2
 processes might want the spinlock you mention. The spinlock is only
 taken at snapshot time, so once per transaction in Erik's tests. So the
 if the spinlock were the problem then one or other of the processes
 would need to grab and hold the spinlock for a very long time and then
 release it sometime later. Since the spinlock code has been with us for
 some time, I doubt there is anything wrong there.

 I don't have any evidence as to whether the problem is solely on Erik's
 system, nor whether it is an issue already in the codebase or introduced
 by this patch. It seems unlikely to me that this patch introduces the
 problem, and even more unlikely that the issue is caused by two
 processes fighting over that particular spinlock.

 The patch is very effective at reducing overall cost of taking snapshots
 and so I think it needs to be applied to allow more performance data to
 be collected. I don't suppose this is the last performance tuning we
 will need to do.

OK, fair enough.  I withdraw my objections.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Both Heikki and I objected to that patch.
 
 Please explain your objection, based upon the patch and my explanations.

 Well, we objected to the locking.

Not only is the locking overly complex, but it's outright wrong;
or at least the comments are.  KnownAssignedXidsAdd in particular
has a comment that is 100% misleading, and an actual behavior that
I find extremely likely to provoke bugs (eg, consider caller calling
it twice under the illusion it stlll has only shared lock).  I'm not
even convinced that it works correctly, since it will happily write
into KnownAssignedXids[] while holding only shared ProcArrayLock.
What happens when two processes are doing that concurrently?

This needs a redesign before it can be considered committable.  I don't
really care whether it makes things faster; it's too complicated and too
poorly documented to be maintainable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 [ v2 patch ]

BTW, while I'm looking at this, why bother with the separate
KnownAssignedXidsValid[] array?  Wouldn't it be cheaper
(certainly so in storage, probably so in access/update times)
to have just the KnownAssignedXids[] array and store
InvalidTransactionId in unused entries?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Simon Riggs
On Sun, 2010-04-25 at 11:50 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
  Both Heikki and I objected to that patch.
  
  Please explain your objection, based upon the patch and my explanations.
 
  Well, we objected to the locking.
 
 Not only is the locking overly complex, but it's outright wrong;
 or at least the comments are.  KnownAssignedXidsAdd in particular
 has a comment that is 100% misleading, and an actual behavior that
 I find extremely likely to provoke bugs (eg, consider caller calling
 it twice under the illusion it stlll has only shared lock).  

I will reword that comment.

 I'm not
 even convinced that it works correctly, since it will happily write
 into KnownAssignedXids[] while holding only shared ProcArrayLock.
 What happens when two processes are doing that concurrently?

Heikki and I discussed this upthread. Only the Startup process ever
calls KnownAssignedXids, so the situation you describe does not happen.
We currently rely on the fact that we have only a single writer in other
places in Hot Standby, so this is not a restriction we are imposing on
the system just to support this change.

Both you and Heikki have previously spoken against having recovery run
in parallel, so the likelihood of having multiple readers in the future
seems low.

If we were to allow multiple callers of the routine in future, the use
of spinlocks to protect the head of the queue means we can safely
reserve slots and so KnownAssignedXidsAdd can easily be made to accept
multiple writers in the future. I think its possible to future proof
that now without significant change or performance drop, so I'll do
that.

 This needs a redesign before it can be considered committable.  I don't
 really care whether it makes things faster; it's too complicated and too
 poorly documented to be maintainable.

It is sufficiently complex to achieve its objective, which is keeping
track of xids during recovery, using the same locking as we do during
normal running. This is the third re-write of the code, developed over
more than 20 months and specifically modularised by me to make future
changes drop-in replacements. Initially we had an insert-sorted array,
then a hash table and now this current proposal. Recently, Heikki and I
independently came up with almost exactly the same design, with the
algorithmic characteristics we need for performance. The call points,
frequency and workload of calls are documented and well understood by
multiple people.

The only remaining point of discussion has been the code that takes
ProcArrayLock in shared mode. The difference being discussed here is
whether we access the head and tail of the array with both Shared
+spinlock or just Exclusive. The patch would be almost identical with
that change, but would not meet its locking objectives. So making the
locking stricter isn't going to make it more maintainable by a
measurable amount.

There are more than 60 lines of header comment explaining in detail how
this works, with a full algorithmic analysis. The remaining code is
commented to project standards, with easily more than 100 lines of
comments. 

I will recheck every comment and see if other comments can be usefully
added, and further review the code to see if there's anything that can
be done to improve it further. I was going to do this anyway, since I
will be adding an agreed Assert() into the patch.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Sun, 2010-04-25 at 11:50 -0400, Tom Lane wrote:
 This needs a redesign before it can be considered committable.  I don't
 really care whether it makes things faster; it's too complicated and too
 poorly documented to be maintainable.

 There are more than 60 lines of header comment explaining in detail how
 this works, with a full algorithmic analysis. The remaining code is
 commented to project standards, with easily more than 100 lines of
 comments. 

If the comments were correct, I wouldn't be complaining.  They're
misleading or outright wrong on many points.  In particular, I don't
think you actually understand the weak-memory-ordering issue, because
the comments about that are entirely wrong.  I don't think they are
adequate on other points either --- for example, I now understand why my
complaint of a few minutes ago about KnownAssignedXidsValid is wrong,
but the comments certainly didn't help me on that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Simon Riggs
On Sun, 2010-04-25 at 12:43 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  [ v2 patch ]
 
 BTW, while I'm looking at this, why bother with the separate
 KnownAssignedXidsValid[] array?  Wouldn't it be cheaper
 (certainly so in storage, probably so in access/update times)
 to have just the KnownAssignedXids[] array and store
 InvalidTransactionId in unused entries?

Well, that was exactly my first design.

Heikki came up with the additional array and I think it is an inspired
suggestion, because it allows the search() to use a simple binary
search. I attempted to write the binary-search-with-holes and it was
going to be very ugly code, so I went for another algorithm which was
also quite cool, but not as cool as Heikki's idea - which I was going to
give credit for.

The array adds 520*max_connections bytes of shared memory, but seems a
good tradeoff for the additional performance it allows. I guess we could
use a bit array, but that's slower to access. The bit array would be
9*max_connections bytes of shared memory, so a 40kB saving.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Simon Riggs
On Sun, 2010-04-25 at 12:51 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Sun, 2010-04-25 at 11:50 -0400, Tom Lane wrote:
  This needs a redesign before it can be considered committable.  I don't
  really care whether it makes things faster; it's too complicated and too
  poorly documented to be maintainable.
 
  There are more than 60 lines of header comment explaining in detail how
  this works, with a full algorithmic analysis. The remaining code is
  commented to project standards, with easily more than 100 lines of
  comments. 
 
 If the comments were correct, I wouldn't be complaining.  They're
 misleading or outright wrong on many points.  In particular, I don't
 think you actually understand the weak-memory-ordering issue, because
 the comments about that are entirely wrong.  

The comments says on CPUs with
+ * weak-memory ordering we can't reliably move pointers atomically, so
the
+ * rule is that updates of head and tail of the array require
ProcArrayLock
+ * in exclusive mode or (shared mode and known_assigned_xids_lck
spinlock)

I will reword this, so it is clear that I'm talking about the head and
tail of the array, not pointers in general.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Sun, 2010-04-25 at 12:51 -0400, Tom Lane wrote:
 If the comments were correct, I wouldn't be complaining.  They're
 misleading or outright wrong on many points.  In particular, I don't
 think you actually understand the weak-memory-ordering issue, because
 the comments about that are entirely wrong.  

 The comments says on CPUs with
 + * weak-memory ordering we can't reliably move pointers atomically, so
 the
 + * rule is that updates of head and tail of the array require
 ProcArrayLock
 + * in exclusive mode or (shared mode and known_assigned_xids_lck
 spinlock)

 I will reword this, so it is clear that I'm talking about the head and
 tail of the array, not pointers in general.

It's not about whether the pointers can be assigned atomically; on most
hardware they can.  It's about whether other processors will see that
happen in the correct sequence relative to the changes in the array
elements.

If you like I'll have a go at rewriting the comments for this patch,
because I am currently thinking that the problem is not so much with
the code as with the poor explanation of what it's doing.  Sometimes
the author is too close to the code to understand why other people
have a hard time understanding it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Simon Riggs
On Sun, 2010-04-25 at 13:33 -0400, Tom Lane wrote:

 If you like I'll have a go at rewriting the comments for this patch,
 because I am currently thinking that the problem is not so much with
 the code as with the poor explanation of what it's doing.  Sometimes
 the author is too close to the code to understand why other people
 have a hard time understanding it.

That would help me, thank you.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Sun, 2010-04-25 at 13:33 -0400, Tom Lane wrote:
 If you like I'll have a go at rewriting the comments for this patch,
 because I am currently thinking that the problem is not so much with
 the code as with the poor explanation of what it's doing.  Sometimes
 the author is too close to the code to understand why other people
 have a hard time understanding it.

 That would help me, thank you.

OK.  You said you were currently working some more on the patch, so
I'll wait for v3 and then work on it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Erik Rijkers
On Sat, April 24, 2010 01:17, Erik Rijkers wrote:
 On Sat, April 24, 2010 00:39, Simon Riggs wrote:
 On Fri, 2010-04-23 at 11:32 -0400, Robert Haas wrote:
 
  99% of transactions happen in similar times between primary and standby,
  everything dragged down by rare but severe spikes.
 
  We're looking for something that would delay something that normally
  takes 0.1ms into something that takes 100ms, yet does eventually
  return. That looks like a severe resource contention issue.

 Wow.  Good detective work.

 While we haven't fully established the source of those problems, I am
 now happy that these test results don't present any reason to avoid
 commiting the main patch tested by Erik (not the smaller additional one
 I sent). I expect to commit that on Sunday.


 yes, that (main) patch seems to have largely
 closed the gap between primary and standby; here
 are some results from a lower scale (10):

FWIW, here are some more results from pgbench comparing
primary and standby (both with Simon's patch).

Sorry if it's too much data, but to me at least it was illuminating;
I now understand the effects of the different parameters better.



pgbench results, all -S (SELECT-only).

  scale: 10, 100, 500, 1000
clients: 1, 10, 20, 40, 60, 90

for each: 4x primary, 4x standby, for 15 minutes:

(port 6565=primary, 6566=standby)

scale: 10  clients:  1  tps =  6404.768132  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 10  clients:  1  tps =  6435.072983  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 10  clients:  1  tps =  6458.958549  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 10  clients:  1  tps =  6432.088099  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1

scale: 10  clients:  1  tps =  6442.519035  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 10  clients:  1  tps =  6449.837931  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 10  clients:  1  tps =  6419.879026  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 10  clients:  1  tps =  6428.290175  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1

---
scale: 10  clients: 10  tps = 27624.339871  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 27604.261750  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 28015.093466  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 28422.561280  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1

scale: 10  clients: 10  tps = 27254.806526  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 27686.470866  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 28078.904035  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 27101.622337  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1


scale: 10  clients: 20  tps = 23106.795587  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 23101.681155  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22893.364004  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 23038.577109  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1

scale: 10  clients: 20  tps = 22903.578552  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22970.691946  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22999.473318  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22884.854749  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1


scale: 10  clients: 40  tps = 23522.499429  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23611.319191  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23616.905302  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23572.213990  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1

scale: 10  clients: 40  tps = 23714.721220  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23711.781175  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23691.867023  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23691.699231  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1


scale: 10  clients: 60  tps = 21987.497095  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 21950.344204  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22006.461447  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 21824.071303  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1

scale: 10  clients: 60  tps = 22149.415231  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22211.064402  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22164.238081  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22174.585736  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1


scale: 10  clients: 90  tps = 18751.213002  pgbench -p 6565 -n 

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane
Erik Rijkers e...@xs4all.nl writes:
 FWIW, here are some more results from pgbench comparing
 primary and standby (both with Simon's patch).

That seems weird.  Why do most of the runs show primary and standby
as having comparable speed, but a few show the standby as much slower?
The parameters for those runs don't seem obviously different from cases
where it's fast.  I think there might have been something else going on
on the standby during those runs.  Or do you think those represent
cases where the mystery slowdown event happened?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Simon Riggs
On Sun, 2010-04-25 at 20:25 +0200, Erik Rijkers wrote:

 Sorry if it's too much data, but to me at least it was illuminating;
 I now understand the effects of the different parameters better.

That's great, many thanks.

A few observations

* Standby performance is actually slightly above normal running. This is
credible because of the way snapshots are now taken. We don't need to
scan the procarray looking for write transactions, since we know
everything is read only. So we scan just the knownassignedxids, which if
no activity from primary will be zero-length, so snapshots will actually
get taken much faster in this case on standby. The snapshot performance
on standby is O(n) where n is the number of write transactions
currently on primary (transfer delays blur the word currently).

* The results for scale factor  100 are fine, and the results for 100
with few connections get thrown out by long transaction times. With
larger numbers of connections the wait problems seem to go away. Looks
like Erik (and possibly Hot Standby in general) has an I/O problem,
though from what is not yet determined. It could be just hardware, or
might be hardware plus other factors.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Erik Rijkers
On Sun, April 25, 2010 20:55, Tom Lane wrote:

 That seems weird.  Why do most of the runs show primary and standby
 as having comparable speed, but a few show the standby as much slower?
 The parameters for those runs don't seem obviously different from cases
 where it's fast.  I think there might have been something else going on
 on the standby during those runs.  Or do you think those represent
 cases where the mystery slowdown event happened?


the strange case is the scale 100 standby's slow start, followed by
a steady increase during -c 1, then -c 10, and finally getting up to speed
with -c 20 (and up).  And these slow-but-growing standby series are interspersed
with normal (high-speed) primary series.

I'll try to repeat this pattern on other hardware; although
if my tests were run with faulty hardware I wouldn't know how/why
that would give the above effect (such a 'regular aberration').


testing is more difficult than I thought...


Erik Rijkers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 [ v2 patch ]

I've been studying this some more while making notes for improved
comments, and I've about come to the conclusion that having readers
move the tail pointer (at the end of KnownAssignedXidsGetAndSetXmin)
is overly tricky and probably not a performance improvement anyway.
The code is in fact wrong as it stands: it's off-by-one about setting
the new tail value.  And there's potential for contention with multiple
readers all wanting to move the tail pointer at once.  And most
importantly, KnownAssignedXidsSearch can't move the tail pointer so
we might expend many inefficient searches while never moving the tail
pointer.

I think we should get rid of that and just have the two functions that
can mark entries invalid (which they must do with exclusive lock)
advance the tail pointer when they invalidate the current tail element.
Then we have the very simple rule that only the startup process ever
changes this data structure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-24 Thread Simon Riggs
On Fri, 2010-04-23 at 19:07 -0400, Robert Haas wrote:
 On Fri, Apr 23, 2010 at 6:39 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Fri, 2010-04-23 at 11:32 -0400, Robert Haas wrote:
  
   99% of transactions happen in similar times between primary and standby,
   everything dragged down by rare but severe spikes.
  
   We're looking for something that would delay something that normally
   takes 0.1ms into something that takes 100ms, yet does eventually
   return. That looks like a severe resource contention issue.
 
  Wow.  Good detective work.
 
  While we haven't fully established the source of those problems, I am
  now happy that these test results don't present any reason to avoid
  commiting the main patch tested by Erik (not the smaller additional one
  I sent). I expect to commit that on Sunday.
 
 Both Heikki and I objected to that patch.  

Please explain your objection, based upon the patch and my explanations.

 And apparently it doesn't
 fix the problem, either.  So, -1 from me.

There is an issue observed in Erik's later tests, but my interpretation
of the results so far is that the sorted array patch successfully
removes the initially reported loss of performance.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Simon Riggs
On Thu, 2010-04-22 at 23:45 +0100, Simon Riggs wrote:
 On Thu, 2010-04-22 at 20:39 +0200, Erik Rijkers wrote:
  On Sun, April 18, 2010 13:01, Simon Riggs wrote:
 
  any comment is welcome...
 
 Please can you re-run with -l and post me the file of times

Erik has sent me details of a test run. My analysis of that is:

I'm seeing the response time profile on the standby as
99% 110us
99.9% 639us
99.99% 615ms

0.052% (52 samples) are 5ms elapsed and account for 24 s, which is
about 45% of elapsed time. 

Of the 52 samples 5ms, 50 of them are 100ms and 2 1s. 

99% of transactions happen in similar times between primary and standby,
everything dragged down by rare but severe spikes.

We're looking for something that would delay something that normally
takes 0.1ms into something that takes 100ms, yet does eventually
return. That looks like a severe resource contention issue.

This effect happens when running just a single read-only session on
standby from pgbench. No confirmation as yet as to whether recovery is
active or dormant, and what other activitity if any occurs on standby
server at same time. So no other clues as yet as to what the contention
might be, except that we note the standby is writing data and the
database is large.

 Please also rebuild using --enable-profile so we can see what's
 happening.
 
 Can you also try the enclosed patch which implements prefetching during
 replay of btree delete records. (Need to set effective_io_concurrency)

As yet, no confirmation that the attached patch is even relevant. It was
just a wild guess at some tuning, while we wait for further info.

 Thanks for your further help.

Some kind of contention is best we can say at present.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Robert Haas
On Fri, Apr 23, 2010 at 11:14 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, 2010-04-22 at 23:45 +0100, Simon Riggs wrote:
 On Thu, 2010-04-22 at 20:39 +0200, Erik Rijkers wrote:
  On Sun, April 18, 2010 13:01, Simon Riggs wrote:

  any comment is welcome...

 Please can you re-run with -l and post me the file of times

 Erik has sent me details of a test run. My analysis of that is:

 I'm seeing the response time profile on the standby as
 99% 110us
 99.9% 639us
 99.99% 615ms

 0.052% (52 samples) are 5ms elapsed and account for 24 s, which is
 about 45% of elapsed time.

 Of the 52 samples 5ms, 50 of them are 100ms and 2 1s.

 99% of transactions happen in similar times between primary and standby,
 everything dragged down by rare but severe spikes.

 We're looking for something that would delay something that normally
 takes 0.1ms into something that takes 100ms, yet does eventually
 return. That looks like a severe resource contention issue.

Wow.  Good detective work.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Marko Kreen
On 4/18/10, Simon Riggs si...@2ndquadrant.com wrote:
 On Sat, 2010-04-17 at 16:48 -0400, Tom Lane wrote:
   There are some places where we suppose that a *single* write into shared
   memory can safely be done without a lock, if we're not too concerned
   about how soon other transactions will see the effects.  But what you
   are proposing here requires more than one related write.
  
   I've been burnt by this myself:
   http://archives.postgresql.org/pgsql-committers/2008-06/msg00228.php


 W O W - thank you for sharing.

  What I'm not clear on is why you've used a spinlock everywhere when only
  weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
  macro that does does nada when the hardware already protects us? (i.e. a
  spinlock only for the hardware that needs it).

Um, you have been burned by exactly this on x86 also:

  http://archives.postgresql.org/pgsql-hackers/2009-03/msg01265.php

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Tom Lane
Marko Kreen mark...@gmail.com writes:
 Um, you have been burned by exactly this on x86 also:
   http://archives.postgresql.org/pgsql-hackers/2009-03/msg01265.php

Yeah, we never did figure out exactly how come you were observing that
failure on Intel-ish hardware.  I was under the impression that Intel
machines didn't have weak-memory-ordering behavior.

I wonder whether your compiler had rearranged the code in ProcArrayAdd
so that the increment happened before the array element store at the
machine-code level.  I think it would be entitled to do that under
standard C semantics, since that ProcArrayStruct pointer isn't marked
volatile.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Marko Kreen
On 4/23/10, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
   Um, you have been burned by exactly this on x86 also:
 http://archives.postgresql.org/pgsql-hackers/2009-03/msg01265.php


  Yeah, we never did figure out exactly how come you were observing that
  failure on Intel-ish hardware.  I was under the impression that Intel
  machines didn't have weak-memory-ordering behavior.

  I wonder whether your compiler had rearranged the code in ProcArrayAdd
  so that the increment happened before the array element store at the
  machine-code level.  I think it would be entitled to do that under
  standard C semantics, since that ProcArrayStruct pointer isn't marked
  volatile.

Sounds likely.

Which seems to hint its better to handle all processors as weak ordered
and then work with explicit locks/memory barriers, than to sprinkle
code with 'volatile' to supress optimizations on intel and then still
fail on non-intel.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Simon Riggs
On Fri, 2010-04-23 at 11:32 -0400, Robert Haas wrote:
 
  99% of transactions happen in similar times between primary and standby,
  everything dragged down by rare but severe spikes.
 
  We're looking for something that would delay something that normally
  takes 0.1ms into something that takes 100ms, yet does eventually
  return. That looks like a severe resource contention issue.
 
 Wow.  Good detective work.

While we haven't fully established the source of those problems, I am
now happy that these test results don't present any reason to avoid
commiting the main patch tested by Erik (not the smaller additional one
I sent). I expect to commit that on Sunday.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Robert Haas
On Fri, Apr 23, 2010 at 6:39 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, 2010-04-23 at 11:32 -0400, Robert Haas wrote:
 
  99% of transactions happen in similar times between primary and standby,
  everything dragged down by rare but severe spikes.
 
  We're looking for something that would delay something that normally
  takes 0.1ms into something that takes 100ms, yet does eventually
  return. That looks like a severe resource contention issue.

 Wow.  Good detective work.

 While we haven't fully established the source of those problems, I am
 now happy that these test results don't present any reason to avoid
 commiting the main patch tested by Erik (not the smaller additional one
 I sent). I expect to commit that on Sunday.

Both Heikki and I objected to that patch.  And apparently it doesn't
fix the problem, either.  So, -1 from me.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Erik Rijkers
On Sat, April 24, 2010 00:39, Simon Riggs wrote:
 On Fri, 2010-04-23 at 11:32 -0400, Robert Haas wrote:
 
  99% of transactions happen in similar times between primary and standby,
  everything dragged down by rare but severe spikes.
 
  We're looking for something that would delay something that normally
  takes 0.1ms into something that takes 100ms, yet does eventually
  return. That looks like a severe resource contention issue.

 Wow.  Good detective work.

 While we haven't fully established the source of those problems, I am
 now happy that these test results don't present any reason to avoid
 commiting the main patch tested by Erik (not the smaller additional one
 I sent). I expect to commit that on Sunday.


yes, that (main) patch seems to have largely
closed the gap between primary and standby; here
are some results from a lower scale (10):

  scale: 10
clients: 10, 20, 40, 60, 90
for each: 4x primary, 4x standby:  (6565=primary, 6566=standby)
-
scale: 10  clients: 10  tps = 27624.339871  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 27604.261750  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 28015.093466  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 28422.561280  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1

scale: 10  clients: 10  tps = 27254.806526  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 27686.470866  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 28078.904035  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 27101.622337  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1

-
scale: 10  clients: 20  tps = 23106.795587  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 23101.681155  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22893.364004  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 23038.577109  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1

scale: 10  clients: 20  tps = 22903.578552  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22970.691946  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22999.473318  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22884.854749  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1

-
scale: 10  clients: 40  tps = 23522.499429  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23611.319191  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23616.905302  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23572.213990  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1

scale: 10  clients: 40  tps = 23714.721220  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23711.781175  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23691.867023  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23691.699231  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1

-
scale: 10  clients: 60  tps = 21987.497095  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 21950.344204  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22006.461447  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 21824.071303  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1

scale: 10  clients: 60  tps = 22149.415231  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22211.064402  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22164.238081  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22174.585736  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1

-
scale: 10  clients: 90  tps = 18751.213002  pgbench -p 6565 -n -S -c 90 -T 
900 -j 1
scale: 10  clients: 90  tps = 18757.115811  pgbench -p 6565 -n -S -c 90 -T 
900 -j 1
scale: 10  clients: 90  tps = 18692.942329  pgbench -p 6565 -n -S -c 90 -T 
900 -j 1
scale: 10  clients: 90  tps = 18765.390154  pgbench -p 6565 -n -S -c 90 -T 
900 -j 1

scale: 10  clients: 90  tps = 18929.462104  pgbench -p 6566 -n -S -c 90 -T 
900 -j 1
scale: 10  clients: 90  tps = 18999.851184  pgbench -p 6566 -n -S -c 90 -T 
900 -j 1
scale: 10  clients: 90  tps = 18972.321607  pgbench -p 6566 -n -S -c 90 -T 
900 -j 1
scale: 10  clients: 90  tps = 18924.058827  pgbench -p 6566 -n -S -c 90 -T 
900 -j 1


The higher scales still have that other standby-slowness.  It may be
caching effects (as Mark Kirkwood suggested):  the idea being that the
primary data is pre-cached because of the initial create; standby data
needs to be first-time-read from disk.

Does that make sense?

I will try to confirm this.






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Simon Riggs
On Thu, 2010-04-22 at 08:57 +0300, Heikki Linnakangas wrote:
  
  I think the assert is a good idea.  If there's no real problem here,
  the assert won't trip.  It's just a safety precaution.
 
 Right. And assertions also act as documentation, they are a precise and
 compact way to document invariants we assume to hold. A comment
 explaining why the cyclic nature of XIDs is not a problem would be nice
 too, in addition or instead of the assertions.

Agreed. I was going to reply just that earlier but have been distracted
on other things.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Erik Rijkers
On Sun, April 18, 2010 13:01, Simon Riggs wrote:

 OK, I'll put a spinlock around access to the head of the array.

 v2 patch attached


knownassigned_sortedarray.v2.diff applied to cvs HEAD (2010.04.21 22:36)

I have done a few smaller tests (scale 500, clients 1, 20):

init:
  pgbench -h /tmp -p 6565 -U rijkers -i -s 500 replicas


4x primary, clients 1:
scale: 500 clients:  1  tps = 11496.372655  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps = 11580.141685  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps = 11478.294747  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps = 11741.432016  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1

4x standby, clients 1:
scale: 500 clients:  1  tps =   727.217672  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps =   785.431011  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps =   825.291817  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps =   868.107638  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1


4x primary, clients 20:
scale: 500 clients: 20  tps = 34963.054102  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps = 34818.985407  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps = 34964.545013  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps = 34959.210687  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1

4x standby, clients 20:
scale: 500 clients: 20  tps =  1099.808192  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps =   905.926703  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps =   943.531989  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps =  1082.215913  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1


This is the same behaviour (i.e. extreme slow standby) that I saw earlier (and 
which caused the
original post, btw).  In that earlier instance, the extreme slowness 
disappeared later, after many
hours maybe even days (without bouncing either primary or standby).

I have no idea what could cause this; is no one else is seeing this ?

(if I have time I'll repeat on other hardware in the weekend)

any comment is welcome...


Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Greg Smith

Erik Rijkers wrote:

This is the same behaviour (i.e. extreme slow standby) that I saw earlier (and 
which caused the
original post, btw).  In that earlier instance, the extreme slowness 
disappeared later, after many
hours maybe even days (without bouncing either primary or standby).
  


Any possibility the standby is built with assertions turned out?  That's 
often the cause of this type of difference between pgbench results on 
two systems, which easy to introduce when everyone is building from 
source.  You should try this on both systems:


psql -c show debug_assertions


just to rule that out.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Mark Kirkwood

Greg Smith wrote:

Erik Rijkers wrote:
This is the same behaviour (i.e. extreme slow standby) that I saw 
earlier (and which caused the
original post, btw).  In that earlier instance, the extreme slowness 
disappeared later, after many

hours maybe even days (without bouncing either primary or standby).
  


Any possibility the standby is built with assertions turned out?  
That's often the cause of this type of difference between pgbench 
results on two systems, which easy to introduce when everyone is 
building from source.  You should try this on both systems:


psql -c show debug_assertions




Or even:

pg_config --configure

on both systems might be worth checking.


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Erik Rijkers
On Thu, April 22, 2010 23:54, Mark Kirkwood wrote:
 Greg Smith wrote:
 Erik Rijkers wrote:
 This is the same behaviour (i.e. extreme slow standby) that I saw
 earlier (and which caused the
 original post, btw).  In that earlier instance, the extreme slowness
 disappeared later, after many
 hours maybe even days (without bouncing either primary or standby).


 Any possibility the standby is built with assertions turned out?
 That's often the cause of this type of difference between pgbench
 results on two systems, which easy to introduce when everyone is
 building from source.  You should try this on both systems:

 psql -c show debug_assertions



 Or even:

 pg_config --configure

 on both systems might be worth checking.

(these instances are on a single server, btw)

primary:

$ pg_config
BINDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/bin
DOCDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/doc
HTMLDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/doc
INCLUDEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include
PKGINCLUDEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include
INCLUDEDIR-SERVER = 
/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include/server
LIBDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib
PKGLIBDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib
LOCALEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/locale
MANDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/man
SHAREDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share
SYSCONFDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/etc
PGXS = 
/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary' 
'--with-pgport=6565'
'--enable-depend' '--with-openssl' '--with-perl' '--with-libxml' 
'--with-libxslt'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement
-Wendif-labels -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,'/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib'
LDFLAGS_SL =
LIBS = -lpgport -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -ltermcap -lcrypt 
-ldl -lm
VERSION = PostgreSQL 9.0devel-sr_primary
[data:port:db   PGPORT=6565   
PGDATA=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/data  
PGDATABASE=replicas]
2010.04.22 20:55:28 rijk...@denkraam:~/src/perl/85devel [0]
$ time ./run_test_suite.sh
[data:port:db   PGPORT=6565   
PGDATA=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/data  
PGDATABASE=replicas]
2010.04.22 21:00:26 rijk...@denkraam:~/src/perl/85devel [1]

standby:

$ pg_config
BINDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/bin
DOCDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/doc
HTMLDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/doc
INCLUDEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include
PKGINCLUDEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include
INCLUDEDIR-SERVER = 
/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include/server
LIBDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib
PKGLIBDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib
LOCALEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/locale
MANDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/man
SHAREDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share
SYSCONFDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/etc
PGXS = 
/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary' 
'--with-pgport=6565'
'--enable-depend' '--with-openssl' '--with-perl' '--with-libxml' 
'--with-libxslt'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement
-Wendif-labels -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,'/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib'
LDFLAGS_SL =
LIBS = -lpgport -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -ltermcap -lcrypt 
-ldl -lm
VERSION = PostgreSQL 9.0devel-sr_primary



$ grep -Ev '(^[[:space:]]*#)|(^$)' pgsql.sr_*ry/data/postgresql.conf
pgsql.sr_primary/data/postgresql.conf:data_directory =
'/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/data'
pgsql.sr_primary/data/postgresql.conf:port = 6565
pgsql.sr_primary/data/postgresql.conf:max_connections = 100
pgsql.sr_primary/data/postgresql.conf:shared_buffers = 256MB
pgsql.sr_primary/data/postgresql.conf:checkpoint_segments = 50
pgsql.sr_primary/data/postgresql.conf:archive_mode = 'on'
pgsql.sr_primary/data/postgresql.conf:archive_command= 'cp %p
/var/data1/pg_stuff/dump/replication_archive/%f'

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Simon Riggs
On Thu, 2010-04-22 at 20:39 +0200, Erik Rijkers wrote:
 On Sun, April 18, 2010 13:01, Simon Riggs wrote:

 any comment is welcome...

Please can you re-run with -l and post me the file of times

Please also rebuild using --enable-profile so we can see what's
happening.

Can you also try the enclosed patch which implements prefetching during
replay of btree delete records. (Need to set effective_io_concurrency)

Thanks for your further help.

-- 
 Simon Riggs   www.2ndQuadrant.com
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index f4c7bf4..9918688 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -578,6 +578,8 @@ btree_xlog_delete_get_latestRemovedXid(XLogRecord *record)
 	OffsetNumber 	hoffnum;
 	TransactionId	latestRemovedXid = InvalidTransactionId;
 	TransactionId	htupxid = InvalidTransactionId;
+	TransactionId	oldestxmin = GetCurrentOldestXmin(true, true);
+	TransactionId	latestCompletedXid;
 	int i;
 
 	/*
@@ -586,8 +588,12 @@ btree_xlog_delete_get_latestRemovedXid(XLogRecord *record)
 	 * That returns InvalidTransactionId, and so will conflict with
 	 * users, but since we just worked out that's zero people, its OK.
 	 */
-	if (CountDBBackends(InvalidOid) == 0)
-		return latestRemovedXid;
+	if (!TransactionIdIsValid(oldestxmin))
+		return oldestxmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	latestCompletedXid = ShmemVariableCache-latestCompletedXid;
+	LWLockRelease(ProcArrayLock);
 
 	/*
 	 * Get index page
@@ -603,6 +609,27 @@ btree_xlog_delete_get_latestRemovedXid(XLogRecord *record)
 	 */
 	unused = (OffsetNumber *) ((char *) xlrec + SizeOfBtreeDelete);
 
+	/*
+	 * Prefetch the heap buffers.
+	 */
+	for (i = 0; i  xlrec-nitems; i++)
+	{
+		/*
+		 * Identify the index tuple about to be deleted
+		 */
+		iitemid = PageGetItemId(ipage, unused[i]);
+		itup = (IndexTuple) PageGetItem(ipage, iitemid);
+
+		/*
+		 * Locate the heap page that the index tuple points at
+		 */
+		hblkno = ItemPointerGetBlockNumber((itup-t_tid));
+		XLogPrefetchBuffer(xlrec-hnode, MAIN_FORKNUM, hblkno);
+	}
+
+	/*
+	 * Read through the heap tids
+	 */
 	for (i = 0; i  xlrec-nitems; i++)
 	{
 		/*
@@ -659,6 +686,16 @@ btree_xlog_delete_get_latestRemovedXid(XLogRecord *record)
 latestRemovedXid = htupxid;
 
 			htupxid = HeapTupleHeaderGetXmax(htuphdr);
+
+			/*
+			 * Stop searching when we've found a recent xid
+			 */
+			if (TransactionIdFollowsOrEquals(htupxid,latestCompletedXid))
+			{
+UnlockReleaseBuffer(hbuffer);
+break;
+			}
+
 			if (TransactionIdFollows(htupxid, latestRemovedXid))
 latestRemovedXid = htupxid;
 		}
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 9ee2036..3ea3a40 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -342,6 +342,16 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	return buffer;
 }
 
+void
+XLogPrefetchBuffer(RelFileNode rnode, ForkNumber forknum,
+	BlockNumber blkno)
+{
+	Relation reln = CreateFakeRelcacheEntry(rnode);
+
+	reln-rd_istemp = false;
+
+	PrefetchBuffer(reln, forknum, blkno);
+}
 
 /*
  * Struct actually returned by XLogFakeRelcacheEntry, though the declared
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 5a214c8..bb23c16 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -933,6 +933,21 @@ TransactionIdIsActive(TransactionId xid)
 TransactionId
 GetOldestXmin(bool allDbs, bool ignoreVacuum)
 {
+	TransactionId result = GetCurrentOldestXmin(allDbs, ignoreVacuum);
+
+	/*
+	 * Compute the cutoff XID, being careful not to generate a permanent XID
+	 */
+	result -= vacuum_defer_cleanup_age;
+	if (!TransactionIdIsNormal(result))
+		result = FirstNormalTransactionId;
+
+	return result;
+}
+
+TransactionId
+GetCurrentOldestXmin(bool allDbs, bool ignoreVacuum)
+{
 	ProcArrayStruct *arrayP = procArray;
 	TransactionId result;
 	int			index;
@@ -985,13 +1000,6 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 
 	LWLockRelease(ProcArrayLock);
 
-	/*
-	 * Compute the cutoff XID, being careful not to generate a permanent XID
-	 */
-	result -= vacuum_defer_cleanup_age;
-	if (!TransactionIdIsNormal(result))
-		result = FirstNormalTransactionId;
-
 	return result;
 }
 
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 8477f88..caa8aa3 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -28,6 +28,9 @@ extern void XLogTruncateRelation(RelFileNode rnode, ForkNumber forkNum,
 extern Buffer XLogReadBuffer(RelFileNode rnode, BlockNumber blkno, bool init);
 extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	   BlockNumber blkno, ReadBufferMode mode);
+extern void XLogPrefetchBuffer(RelFileNode rnode, ForkNumber forknum,
+		BlockNumber blkno);
+
 
 extern Relation 

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Mark Kirkwood

Erik Rijkers wrote:


This is the same behaviour (i.e. extreme slow standby) that I saw earlier (and 
which caused the
original post, btw).  In that earlier instance, the extreme slowness 
disappeared later, after many
hours maybe even days (without bouncing either primary or standby).

I have no idea what could cause this; is no one else is seeing this ?

(if I have time I'll repeat on other hardware in the weekend)

any comment is welcome...


  


I wonder if what you are seeing is perhaps due to the tables on the 
primary being almost completely cached (from the initial create) and 
those on the standby being at best partially so. That would explain why 
the standby performance catches up after a while ( when its tables are 
equivalently cached).


One way to test this is to 'pre-cache' the standby by selecting every 
row from its tables before running the pgbench test.


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Simon Riggs
On Wed, 2010-04-21 at 15:09 +1200, Mark Kirkwood wrote:

 I did some testing of this patch (v2). Unfortunately I don't have access 
 to hardware capable of doing tests at the same scale as Erik used. 
 However I was still able to show a consistent difference (I think) 
 between standby performance with and without the patch applied.

...

 Overall looks like the patch gets standby read only performance close to 
 the master - at least in the case where there are minimal master 
 transactions being tracked by the standby (I had to leave the master 
 idle whilst running the standby case, as they shared the machine). Hope 
 this info is useful.

Thanks very much for the report; always good to get confirmation.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
 On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 What I'm not clear on is why you've used a spinlock everywhere when only
 weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
 macro that does does nada when the hardware already protects us? (i.e. a
 spinlock only for the hardware that needs it).
 Well, we could certainly consider that, if we had enough places where
 there was a demonstrable benefit from it.  I couldn't measure any real
 slowdown from adding a spinlock in that sinval code, so I didn't propose
 doing so at the time --- and I'm pretty dubious that this code is
 sufficiently performance-critical to justify the work, either.
 OK, I'll put a spinlock around access to the head of the array.
 
 v2 patch attached

The locking seems overly complex to me. Looking at
KnownAssignedXidsAdd(), isn't it possible for two processes to call it
concurrently in exclusive_lock==false mode and get the same 'head'
value, and step on each others toes? I guess KnownAssignedXidsAdd() is
only called by the startup process, but it seems like an accident
waiting to happen.

Spinlocks are fast, if you have to add an if-branch to decide whether to
acquire it, I suspect you've lost any performance gain to be had
already. Let's keep it simple. And acquiring ProcArrayLock in exclusive
mode while adding entries to the array seems OK to me as well. It only
needs to be held very briefly, and getting this correct and keeping it
simple is much more important at this point than squeezing out every
possible CPU cycle from the system.

Just require that the caller holds ProcArrayLock in exclusive mode when
calling an operation that modifies the array, and in shared mode when
reading it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
 On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 What I'm not clear on is why you've used a spinlock everywhere when only
 weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
 macro that does does nada when the hardware already protects us? (i.e. a
 spinlock only for the hardware that needs it).
 Well, we could certainly consider that, if we had enough places where
 there was a demonstrable benefit from it.  I couldn't measure any real
 slowdown from adding a spinlock in that sinval code, so I didn't propose
 doing so at the time --- and I'm pretty dubious that this code is
 sufficiently performance-critical to justify the work, either.
 OK, I'll put a spinlock around access to the head of the array.
 
 v2 patch attached

Given the discussion about the cyclic nature of XIDs, it would be good
to add an assertion that when a new XID is added to the array, it is

a) larger than the biggest value already in the array
(TransactionIdFollows(new, head)), and
b) not too far from the smallest value in the array to confuse binary
search (TransactionIdFollows(new, tail)).

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Simon Riggs
On Wed, 2010-04-21 at 15:20 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
  On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
  Simon Riggs si...@2ndquadrant.com writes:
  What I'm not clear on is why you've used a spinlock everywhere when only
  weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
  macro that does does nada when the hardware already protects us? (i.e. a
  spinlock only for the hardware that needs it).
  Well, we could certainly consider that, if we had enough places where
  there was a demonstrable benefit from it.  I couldn't measure any real
  slowdown from adding a spinlock in that sinval code, so I didn't propose
  doing so at the time --- and I'm pretty dubious that this code is
  sufficiently performance-critical to justify the work, either.
  OK, I'll put a spinlock around access to the head of the array.
  
  v2 patch attached
 
 The locking seems overly complex to me. 

 Looking at
 KnownAssignedXidsAdd(), isn't it possible for two processes to call it
 concurrently in exclusive_lock==false mode and get the same 'head'
 value, and step on each others toes? I guess KnownAssignedXidsAdd() is
 only called by the startup process, but it seems like an accident
 waiting to happen.

Not at all. That assumption is also used elsewhere, so it is safe to use
here.

 Spinlocks are fast, if you have to add an if-branch to decide whether to
 acquire it, I suspect you've lost any performance gain to be had
 already.

I think you're misreading the code. One caller already holds exclusive
lock, one does not. The if test is to determine whether to acquire the
lock or not.

 Let's keep it simple. And acquiring ProcArrayLock in exclusive
 mode while adding entries to the array seems OK to me as well. It only
 needs to be held very briefly, and getting this correct and keeping it
 simple is much more important at this point than squeezing out every
 possible CPU cycle from the system.

I don't understand what you're saying: you say I'm wasting a few cycles
on an if test and should change that, but at the same time you say I
shouldn't worry about a few cycles.

 Just require that the caller holds ProcArrayLock in exclusive mode when
 calling an operation that modifies the array, and in shared mode when
 reading it.

The point of the code you're discussing is to remove the exclusive lock
requests, not to save a few cycles. Spinlocks are fast, as you say.

Exclusive lock requests block under a heavy load of shared lock holders,
we know that already. It is worth removing the contention that can occur
by minimising the number of exclusive locks required. This patch shows
how and I don't see any reason from what you have said to avoid
committing it. I'm willing to hear some sound reasons, if any exist.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Simon Riggs
On Wed, 2010-04-21 at 15:27 +0300, Heikki Linnakangas wrote:

 Given the discussion about the cyclic nature of XIDs, it would be good
 to add an assertion that when a new XID is added to the array, it is
 
 a) larger than the biggest value already in the array
 (TransactionIdFollows(new, head)), and
 b) not too far from the smallest value in the array to confuse binary
 search (TransactionIdFollows(new, tail)).

We discussed this in November. You convinced me it isn't possible for
older xids to stay in the standby because anti-wraparound vacuums would
conflict and kick them out. The primary can't run with wrapped xids and
neither can the standby. I think that is correct.

Adding an assertion isn't going to do much because it's unlikely anybody
is going to be running for 2^31 transactions with asserts enabled.

Worrying about things like this seems strange when real and negative
behaviours are right in our faces elsewhere. Performance and scalability
are real world concerns.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Robert Haas
On Wed, Apr 21, 2010 at 9:37 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-04-21 at 15:27 +0300, Heikki Linnakangas wrote:

 Given the discussion about the cyclic nature of XIDs, it would be good
 to add an assertion that when a new XID is added to the array, it is

 a) larger than the biggest value already in the array
 (TransactionIdFollows(new, head)), and
 b) not too far from the smallest value in the array to confuse binary
 search (TransactionIdFollows(new, tail)).

 We discussed this in November. You convinced me it isn't possible for
 older xids to stay in the standby because anti-wraparound vacuums would
 conflict and kick them out. The primary can't run with wrapped xids and
 neither can the standby. I think that is correct.

 Adding an assertion isn't going to do much because it's unlikely anybody
 is going to be running for 2^31 transactions with asserts enabled.

 Worrying about things like this seems strange when real and negative
 behaviours are right in our faces elsewhere. Performance and scalability
 are real world concerns.

I think the assert is a good idea.  If there's no real problem here,
the assert won't trip.  It's just a safety precaution.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Robert Haas
On Wed, Apr 21, 2010 at 8:20 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 The locking seems overly complex to me.

I tend to agree.

!   /*
!* Callers must hold either ProcArrayLock in Exclusive mode or
!* ProcArrayLock in Shared mode *and* known_assigned_xids_lck
!* to update these values.
!*/

I'm not convinced that this is either (a) correct or (b) performant.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Simon Riggs
On Wed, 2010-04-21 at 09:51 -0400, Robert Haas wrote:
 
  Adding an assertion isn't going to do much because it's unlikely anybody
  is going to be running for 2^31 transactions with asserts enabled.
 

 I think the assert is a good idea.  If there's no real problem here,
 the assert won't trip.  It's just a safety precaution.

If you believe that, then I think you should add this to all the other
places in the current server where that assumption is made without
assertion being added. As a safety precaution.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread marcin mank
On Wed, Apr 21, 2010 at 4:12 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-04-21 at 09:51 -0400, Robert Haas wrote:
 
  Adding an assertion isn't going to do much because it's unlikely anybody
  is going to be running for 2^31 transactions with asserts enabled.
 

 I think the assert is a good idea.  If there's no real problem here,
 the assert won't trip.  It's just a safety precaution.

 If you believe that, then I think you should add this to all the other
 places in the current server where that assumption is made without
 assertion being added. As a safety precaution.


Is that not a good idea that (at least for dev-builds, like with
enable-cassert) the xid counter start at like 2^31 - 1000 ? It could
help catch some bugs.

Greetings
Marcin Mańk

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Simon Riggs
On Wed, 2010-04-21 at 16:22 +0200, marcin mank wrote:

 Is that not a good idea that (at least for dev-builds, like with
 enable-cassert) the xid counter start at like 2^31 - 1000 ? It could
 help catch some bugs.

It is a good idea, I'm sure that would help catch bugs.

It wouldn't help here because the case in doubt is whether it's possible
to have an xid still showing in memory arrays from the last time the
cycle wrapped. It isn't. These things aren't random. These numbers are
extracted directly from activity that was occurring on the primary and
regularly checked and cleaned as the standby runs.

So you'll need to do 2^31 transactions to prove this isn't true, which
isn't ever going to happen in testing with an assert build and nobody
with that many transactions would run an assert build anyway.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Robert Haas
On Wed, Apr 21, 2010 at 10:12 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-04-21 at 09:51 -0400, Robert Haas wrote:
 
  Adding an assertion isn't going to do much because it's unlikely anybody
  is going to be running for 2^31 transactions with asserts enabled.
 

 I think the assert is a good idea.  If there's no real problem here,
 the assert won't trip.  It's just a safety precaution.

 If you believe that, then I think you should add this to all the other
 places in the current server where that assumption is made without
 assertion being added. As a safety precaution.

I feel like this conversation is getting a little heated.  We are just
trying to solve a technical problem here.  Perhaps I am misreading -
tone doesn't come through very well in email.

I think the assumptions that are being made in this particular case
are different from the ones made elsewhere in the server.  Generally,
we don't assume transaction IDs are arriving in ascending order - in
fact, we usually explicitly have to deal with the fact that they might
not be.   So if we have a situation where we ARE relying on them
arriving in order, because we have extrinsic reasons why we know it
has to happen that way, adding an assertion to make sure that things
are happening the way we expect doesn't seem out of line.  This code
is fairly complex.

There is arguably less value in asserting that the newly added xid
follows the tail as well as the head, but I still like the idea.  Not
sure whether that's rational or not.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Florian Pflug
On Apr 21, 2010, at 16:49 , Simon Riggs wrote:
 On Wed, 2010-04-21 at 16:22 +0200, marcin mank wrote:
 
 Is that not a good idea that (at least for dev-builds, like with
 enable-cassert) the xid counter start at like 2^31 - 1000 ? It could
 help catch some bugs.
 
 It is a good idea, I'm sure that would help catch bugs.
 
 It wouldn't help here because the case in doubt is whether it's possible
 to have an xid still showing in memory arrays from the last time the
 cycle wrapped. It isn't. These things aren't random. These numbers are
 extracted directly from activity that was occurring on the primary and
 regularly checked and cleaned as the standby runs.
 
 So you'll need to do 2^31 transactions to prove this isn't true, which
 isn't ever going to happen in testing with an assert build and nobody
 with that many transactions would run an assert build anyway.


ISTM that there's no need to actually execute 2^31 transactions to trigger this 
bug (if it actually exists), it'd be sufficient to increment the xid counter by 
more than one each time a xid is assigned, no?

Or would that trip snapshot creation on the standby?

best regards,
Florian Pflug



smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Heikki Linnakangas
Robert Haas wrote:
 On Wed, Apr 21, 2010 at 9:37 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-04-21 at 15:27 +0300, Heikki Linnakangas wrote:

 Given the discussion about the cyclic nature of XIDs, it would be good
 to add an assertion that when a new XID is added to the array, it is

 a) larger than the biggest value already in the array
 (TransactionIdFollows(new, head)), and
 b) not too far from the smallest value in the array to confuse binary
 search (TransactionIdFollows(new, tail)).
 We discussed this in November. You convinced me it isn't possible for
 older xids to stay in the standby because anti-wraparound vacuums would
 conflict and kick them out. The primary can't run with wrapped xids and
 neither can the standby. I think that is correct.

 Adding an assertion isn't going to do much because it's unlikely anybody
 is going to be running for 2^31 transactions with asserts enabled.

 Worrying about things like this seems strange when real and negative
 behaviours are right in our faces elsewhere. Performance and scalability
 are real world concerns.
 
 I think the assert is a good idea.  If there's no real problem here,
 the assert won't trip.  It's just a safety precaution.

Right. And assertions also act as documentation, they are a precise and
compact way to document invariants we assume to hold. A comment
explaining why the cyclic nature of XIDs is not a problem would be nice
too, in addition or instead of the assertions.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-20 Thread Robert Haas
On Sun, Apr 18, 2010 at 4:22 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Sun, 2010-04-18 at 13:16 -0700, David Fetter wrote:
 On Sun, Apr 18, 2010 at 12:01:05PM +0100, Simon Riggs wrote:
  On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
   On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
Simon Riggs si...@2ndquadrant.com writes:
 What I'm not clear on is why you've used a spinlock everywhere
 when only weak-memory thang CPUs are a problem. Why not have a
 weak-memory-protect macro that does does nada when the
 hardware already protects us? (i.e. a spinlock only for the
 hardware that needs it).
   
Well, we could certainly consider that, if we had enough places
where there was a demonstrable benefit from it.  I couldn't
measure any real slowdown from adding a spinlock in that sinval
code, so I didn't propose doing so at the time --- and I'm
pretty dubious that this code is sufficiently
performance-critical to justify the work, either.
  
   OK, I'll put a spinlock around access to the head of the array.
 
  v2 patch attached

 If you've committed this, or any other patch you've sent here,
 *please* mention so on the same thread.

 I haven't yet. I've written two patches - this is a major module rewrite
 and is still under discussion. The other patch has nothing to do with
 this (though I did accidentally include a couple of changes from this
 patch and immediately revoked them).

 I will wait awhile to see if anybody has some independent test results.

So, does anyone have a few cycles to test this out?  We are down to
handful of remaining open items, so getting this tested and committed
sooner = beta sooner.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-20 Thread Mark Kirkwood

Robert Haas wrote:


So, does anyone have a few cycles to test this out?  We are down to
handful of remaining open items, so getting this tested and committed
sooner = beta sooner.


  


I did some testing of this patch (v2). Unfortunately I don't have access 
to hardware capable of doing tests at the same scale as Erik used. 
However I was still able to show a consistent difference (I think) 
between standby performance with and without the patch applied.


Setup:

host: 2.7 Ghz dual core amd64 with 4G ram and 1 sata drive,
code: cvs head from 2010-04-14.
pgbench:  scale=100, 4 clients, 1 (select) transactions each.

Results:

Master performance (with and without patch applied ):
tps = 10903.612340 - 14070.109951 (including connections establishing)

Standby performance without patch (:
tps = 8288.119913 - 9722.245178 (including connections establishing)

Standby performance with patch applied:
tps = 11592.922637 - 14065.553214 (including connections establishing)

I performed 8 runs of each, and results would start at the low range and 
climb up to the high one, where they would stabilize. In between runs I 
cleared the os buffer cache and (partially) reloaded it by selecting 
counts from the pgbench tables (i.e I was trying to ensure each run had 
the same or similar os cache contents).


Overall looks like the patch gets standby read only performance close to 
the master - at least in the case where there are minimal master 
transactions being tracked by the standby (I had to leave the master 
idle whilst running the standby case, as they shared the machine). Hope 
this info is useful.


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-20 Thread Robert Haas
On Tue, Apr 20, 2010 at 11:09 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:
 Robert Haas wrote:

 So, does anyone have a few cycles to test this out?  We are down to
 handful of remaining open items, so getting this tested and committed
 sooner = beta sooner.




 I did some testing of this patch (v2). Unfortunately I don't have access to
 hardware capable of doing tests at the same scale as Erik used. However I
 was still able to show a consistent difference (I think) between standby
 performance with and without the patch applied.

 Setup:

 host: 2.7 Ghz dual core amd64 with 4G ram and 1 sata drive,
 code: cvs head from 2010-04-14.
 pgbench:  scale=100, 4 clients, 1 (select) transactions each.

 Results:

 Master performance (with and without patch applied ):
 tps = 10903.612340 - 14070.109951 (including connections establishing)

 Standby performance without patch (:
 tps = 8288.119913 - 9722.245178 (including connections establishing)

 Standby performance with patch applied:
 tps = 11592.922637 - 14065.553214 (including connections establishing)

 I performed 8 runs of each, and results would start at the low range and
 climb up to the high one, where they would stabilize. In between runs I
 cleared the os buffer cache and (partially) reloaded it by selecting counts
 from the pgbench tables (i.e I was trying to ensure each run had the same or
 similar os cache contents).

 Overall looks like the patch gets standby read only performance close to the
 master - at least in the case where there are minimal master transactions
 being tracked by the standby (I had to leave the master idle whilst running
 the standby case, as they shared the machine). Hope this info is useful.

Thanks, that sounds promising.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-18 Thread Simon Riggs
On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  What I'm not clear on is why you've used a spinlock everywhere when only
  weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
  macro that does does nada when the hardware already protects us? (i.e. a
  spinlock only for the hardware that needs it).
 
 Well, we could certainly consider that, if we had enough places where
 there was a demonstrable benefit from it.  I couldn't measure any real
 slowdown from adding a spinlock in that sinval code, so I didn't propose
 doing so at the time --- and I'm pretty dubious that this code is
 sufficiently performance-critical to justify the work, either.

OK, I'll put a spinlock around access to the head of the array.

Thanks for your input.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-18 Thread Simon Riggs
On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
 On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
  Simon Riggs si...@2ndquadrant.com writes:
   What I'm not clear on is why you've used a spinlock everywhere when only
   weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
   macro that does does nada when the hardware already protects us? (i.e. a
   spinlock only for the hardware that needs it).
  
  Well, we could certainly consider that, if we had enough places where
  there was a demonstrable benefit from it.  I couldn't measure any real
  slowdown from adding a spinlock in that sinval code, so I didn't propose
  doing so at the time --- and I'm pretty dubious that this code is
  sufficiently performance-critical to justify the work, either.
 
 OK, I'll put a spinlock around access to the head of the array.

v2 patch attached

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
***
*** 1200,1205  StandbyTransactionIdIsPrepared(TransactionId xid)
--- 1200,1208 
  
  	Assert(TransactionIdIsValid(xid));
  
+ 	if (max_prepared_xacts = 0)
+ 		return false;	/* nothing to do */
+ 
  	/* Read and validate file */
  	buf = ReadTwoPhaseFile(xid, false);
  	if (buf == NULL)
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 6453,6458  CheckRecoveryConsistency(void)
--- 6453,6464 
  	}
  }
  
+ bool
+ XLogConsistentState(void)
+ {
+ 	return reachedMinRecoveryPoint;
+ }
+ 
  /*
   * Is the system still in recovery?
   *
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***
*** 52,57 
--- 52,58 
  #include access/twophase.h
  #include miscadmin.h
  #include storage/procarray.h
+ #include storage/spin.h
  #include storage/standby.h
  #include utils/builtins.h
  #include utils/snapmgr.h
***
*** 64,73  typedef struct ProcArrayStruct
  	int			numProcs;		/* number of valid procs entries */
  	int			maxProcs;		/* allocated size of procs array */
  
! 	int			numKnownAssignedXids;	/* current number of known assigned
! 		 * xids */
! 	int			maxKnownAssignedXids;	/* allocated size of known assigned
! 		 * xids */
  
  	/*
  	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
--- 65,84 
  	int			numProcs;		/* number of valid procs entries */
  	int			maxProcs;		/* allocated size of procs array */
  
! 	/*
! 	 * Known assigned xids handling
! 	 */
! 	int			maxKnownAssignedXids;	/* allocated size */
! 
! 	/*
! 	 * Callers must hold either ProcArrayLock in Exclusive mode or
! 	 * ProcArrayLock in Shared mode *and* known_assigned_xids_lck
! 	 * to update these values.
! 	 */
! 	int			numKnownAssignedXids;	/* currrent # valid entries */
! 	int			tailKnownAssignedXids;	/* current tail */
! 	int			headKnownAssignedXids;	/* current head */
! 	slock_t		known_assigned_xids_lck;	/* shared protection lock */
  
  	/*
  	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
***
*** 87,93  static ProcArrayStruct *procArray;
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
! static HTAB *KnownAssignedXidsHash;
  static TransactionId latestObservedXid = InvalidTransactionId;
  
  /*
--- 98,105 
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
! static TransactionId *KnownAssignedXids;
! static bool *KnownAssignedXidsValid;
  static TransactionId latestObservedXid = InvalidTransactionId;
  
  /*
***
*** 142,150  static int	KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax);
  static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
  			   TransactionId xmax);
  static bool KnownAssignedXidsExist(TransactionId xid);
! static void KnownAssignedXidsAdd(TransactionId *xids, int nxids);
  static void KnownAssignedXidsRemove(TransactionId xid);
! static void KnownAssignedXidsRemoveMany(TransactionId xid, bool keepPreparedXacts);
  static void KnownAssignedXidsDisplay(int trace_level);
  
  /*
--- 154,166 
  static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
  			   TransactionId xmax);
  static bool KnownAssignedXidsExist(TransactionId xid);
! static void KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
! bool exclusive_lock);
  static void KnownAssignedXidsRemove(TransactionId xid);
! static void KnownAssignedXidsRemoveMany(TransactionId xid);
! static void KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
! 	  TransactionId *subxids);
! static void KnownAssignedXidsCompress(bool full);
  static void KnownAssignedXidsDisplay(int trace_level);
  
  /*
***
*** 204,228  CreateSharedProcArray(void)
  		procArray-numKnownAssignedXids = 0;
  		procArray-maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
  		

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-18 Thread David Fetter
On Sun, Apr 18, 2010 at 12:01:05PM +0100, Simon Riggs wrote:
 On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
  On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
   Simon Riggs si...@2ndquadrant.com writes:
What I'm not clear on is why you've used a spinlock everywhere
when only weak-memory thang CPUs are a problem. Why not have a
weak-memory-protect macro that does does nada when the
hardware already protects us? (i.e. a spinlock only for the
hardware that needs it).
   
   Well, we could certainly consider that, if we had enough places
   where there was a demonstrable benefit from it.  I couldn't
   measure any real slowdown from adding a spinlock in that sinval
   code, so I didn't propose doing so at the time --- and I'm
   pretty dubious that this code is sufficiently
   performance-critical to justify the work, either.
  
  OK, I'll put a spinlock around access to the head of the array.
 
 v2 patch attached

If you've committed this, or any other patch you've sent here,
*please* mention so on the same thread.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-18 Thread Simon Riggs
On Sun, 2010-04-18 at 13:16 -0700, David Fetter wrote:
 On Sun, Apr 18, 2010 at 12:01:05PM +0100, Simon Riggs wrote:
  On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
   On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
Simon Riggs si...@2ndquadrant.com writes:
 What I'm not clear on is why you've used a spinlock everywhere
 when only weak-memory thang CPUs are a problem. Why not have a
 weak-memory-protect macro that does does nada when the
 hardware already protects us? (i.e. a spinlock only for the
 hardware that needs it).

Well, we could certainly consider that, if we had enough places
where there was a demonstrable benefit from it.  I couldn't
measure any real slowdown from adding a spinlock in that sinval
code, so I didn't propose doing so at the time --- and I'm
pretty dubious that this code is sufficiently
performance-critical to justify the work, either.
   
   OK, I'll put a spinlock around access to the head of the array.
  
  v2 patch attached
 
 If you've committed this, or any other patch you've sent here,
 *please* mention so on the same thread.

I haven't yet. I've written two patches - this is a major module rewrite
and is still under discussion. The other patch has nothing to do with
this (though I did accidentally include a couple of changes from this
patch and immediately revoked them).

I will wait awhile to see if anybody has some independent test results.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-18 Thread David Fetter
On Sun, Apr 18, 2010 at 09:22:21PM +0100, Simon Riggs wrote:
 On Sun, 2010-04-18 at 13:16 -0700, David Fetter wrote:
  On Sun, Apr 18, 2010 at 12:01:05PM +0100, Simon Riggs wrote:
   On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  What I'm not clear on is why you've used a spinlock
  everywhere when only weak-memory thang CPUs are a problem.
  Why not have a weak-memory-protect macro that does does
  nada when the hardware already protects us? (i.e. a
  spinlock only for the hardware that needs it).
 
 Well, we could certainly consider that, if we had enough
 places where there was a demonstrable benefit from it.  I
 couldn't measure any real slowdown from adding a spinlock in
 that sinval code, so I didn't propose doing so at the time
 --- and I'm pretty dubious that this code is sufficiently
 performance-critical to justify the work, either.

OK, I'll put a spinlock around access to the head of the
array.
   
   v2 patch attached
  
  If you've committed this, or any other patch you've sent here,
  *please* mention so on the same thread.
 
 I haven't yet. I've written two patches - this is a major module
 rewrite and is still under discussion. The other patch has nothing
 to do with this (though I did accidentally include a couple of
 changes from this patch and immediately revoked them).
 
 I will wait awhile to see if anybody has some independent test
 results.

Thanks :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Simon Riggs
On Fri, 2010-04-16 at 11:10 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Fri, 2010-04-16 at 10:39 -0400, Tom Lane wrote:
  I think you're outsmarting yourself there.  A binary search will in fact
  *not work* with circular xid comparison (this is exactly why there's no
  btree opclass for XID).
 
  I don't understand the exact, please explain more.
  I'm not using bsearch() just a quick chop based upon xid comparison,
  which looks to me like it will work.
 
 Implementing it yourself doesn't get you out of the fact that it won't
 work.  Consider
 
   1 2 3 ... 30 ... 39
 
 and suppose that 30 is the array's middle element.  If you
 search for 100, your first probe will conclude that it is to the right
 of 30, which is the wrong direction.  Binary search, or indeed
 the entire concept that the array is sorted in the first place,
 depends on a transitive comparison operator.  TransactionIdFollows does
 not satisfy the transitive law.

AFAICS the example you give isn't correct.

We would lay out the values like this

W-3 W-2 W-1 W 3 4 5

where W is the wrap value and in this example we have 7 values in the
current array, with tail at W-3 and head at 5. Note the gap between W
and 3 where we would skip the values 1 and 2 because they are special.
Each element's xid is TransactionIdAdvanced(previous element).

So when we search for value 3 we would start from W, then decide it is
to the right, which is correct and continue from there.

The values are laid out in TransactionIdFollows order, not in numeric
order, hence we need to use TransactionIdFollows to decide which way to
branch.

As long as it works I'm not worried if the array is not technically
sorted.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 AFAICS the example you give isn't correct.

 We would lay out the values like this

 W-3 W-2 W-1 W 3 4 5

 where W is the wrap value

Stop right there, you're already failing to think clearly.  There is no
unique wrap value, all values act the same in circular XID space.

The fundamental problem here is that if the set of XIDs involved spans a
sufficiently long range, your array will have a[0]  a[1]  ...  a[n]
but it will fail to be true that a[0]  a[n].  If that's also true for,
say, a[0] vs the midpoint element, then a binary search for a[0] will
fail because it will make the wrong decision while probing the midpoint.

 The values are laid out in TransactionIdFollows order,

They *cannot* be laid out in TransactionIdFollows order.  It's not
possible, because that relationship isn't a total ordering.

Now it might be the case that this is OK for HS purposes because the set
of XIDs that are relevant at any one instant should never span more than
half of the XID space.  But I'd just as soon not use that assumption
here if it's unnecessary.  It'd be cheaper anyway to sort and search the
array using plain , so why are you so eager to use
TransactionIdFollows?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Robert Haas
On Sat, Apr 17, 2010 at 11:13 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 AFAICS the example you give isn't correct.

 We would lay out the values like this

 W-3 W-2 W-1 W 3 4 5

 where W is the wrap value

 Stop right there, you're already failing to think clearly.  There is no
 unique wrap value, all values act the same in circular XID space.

Or to put it a different way, what happens when the wrap value
changes?  An array that was in order under one wrap value can cease to
be in order under another.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Simon Riggs
On Sat, 2010-04-17 at 11:13 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  AFAICS the example you give isn't correct.
 
  We would lay out the values like this
 
  W-3 W-2 W-1 W 3 4 5
 
  where W is the wrap value
 
 Stop right there, you're already failing to think clearly.  There is no
 unique wrap value, all values act the same in circular XID space.
 
 The fundamental problem here is that if the set of XIDs involved spans a
 sufficiently long range, your array will have a[0]  a[1]  ...  a[n]
 but it will fail to be true that a[0]  a[n].  If that's also true for,
 say, a[0] vs the midpoint element, then a binary search for a[0] will
 fail because it will make the wrong decision while probing the midpoint.

I understand that the xids are circular, but there is only one point
where the next xid has a lower integer value but is still later from
the perspective of TransactionIdFollows. Apart from that single
discontinuity all other values are monotonic. From your insistence I
presume I must be missing something, but I just don't see it. Perhaps we
are just misunderstanding each other about the sufficiently long
range.

  The values are laid out in TransactionIdFollows order,
 
 They *cannot* be laid out in TransactionIdFollows order.  It's not
 possible, because that relationship isn't a total ordering.
 
 Now it might be the case that this is OK for HS purposes because the set
 of XIDs that are relevant at any one instant should never span more than
 half of the XID space.

Agree that is true.

  But I'd just as soon not use that assumption
 here if it's unnecessary. 

I understand what your saying. I think it is necessary here.

  It'd be cheaper anyway to sort and search the
 array using plain , so why are you so eager to use
 TransactionIdFollows?

The array grows to the right and is laid out one xid per element,
resulting in a sequence of values that are transactionid-monotonic. As
the values increase there will eventually reach the discontinuity where
they cease being normally monotonic. Doing it this way means that we can
add rows past the head of the array and then move the head atomically,
so that we can make adding xids lock-free.

If we try to actually sort the values then the algorithm is both more
complex and requires locking. It would be easier to just remember where
the discontinuity is and act accordingly.

So I'm not eager to use either way, but I only currently see one way
that would work.

If there's a different way, that gives the same or better algorithmic
characteristics, I will be happy to code it.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Sat, 2010-04-17 at 11:13 -0400, Tom Lane wrote:
 It'd be cheaper anyway to sort and search the
 array using plain , so why are you so eager to use
 TransactionIdFollows?

 The array grows to the right and is laid out one xid per element,
 resulting in a sequence of values that are transactionid-monotonic.

How do you know that just adding items at the right will produce a
sorted array?  It seems quite unlikely to me that transactions can be
guaranteed to arrive at this code in XID order.  I think you need to do
an explicitly sorted insertion.

 ... Doing it this way means that we can
 add rows past the head of the array and then move the head atomically,
 so that we can make adding xids lock-free.

... and even without that issue, this seems like utter fantasy.  How
are you going to do that atomically?  Have you considered what will
happen on weak-memory-ordering machines like PPC, in particular?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Simon Riggs
On Sat, 2010-04-17 at 15:45 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Sat, 2010-04-17 at 11:13 -0400, Tom Lane wrote:
  It'd be cheaper anyway to sort and search the
  array using plain , so why are you so eager to use
  TransactionIdFollows?
 
  The array grows to the right and is laid out one xid per element,
  resulting in a sequence of values that are transactionid-monotonic.
 
 How do you know that just adding items at the right will produce a
 sorted array?  It seems quite unlikely to me that transactions can be
 guaranteed to arrive at this code in XID order.  I think you need to do
 an explicitly sorted insertion.

Xids don't arrive in sequence, but known assigned xids are added in
sequence because we infer the existence of the intermediate xids and
assuming they are running for the snapshot.

  ... Doing it this way means that we can
  add rows past the head of the array and then move the head atomically,
  so that we can make adding xids lock-free.
 
 ... and even without that issue, this seems like utter fantasy.  How
 are you going to do that atomically?  Have you considered what will
 happen on weak-memory-ordering machines like PPC, in particular?

We search the array between tail and head. If the head moves by integer
overwrite just as already happens for xid assignment, then we would use
the new head for the search. The code is careful to fetch only once.

I would freely admit I know absolutely nothing about details of
weak-memory-ordering machines and have not considered them at all. How
would what I have proposed fail to work, yet what we already rely on
work correctly? Do the circumstances differ?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Sat, 2010-04-17 at 15:45 -0400, Tom Lane wrote:
 How do you know that just adding items at the right will produce a
 sorted array?

 Xids don't arrive in sequence, but known assigned xids are added in
 sequence because we infer the existence of the intermediate xids and
 assuming they are running for the snapshot.

Hm.  Okay, maybe that will work.

 ... and even without that issue, this seems like utter fantasy.  How
 are you going to do that atomically?  Have you considered what will
 happen on weak-memory-ordering machines like PPC, in particular?

 We search the array between tail and head. If the head moves by integer
 overwrite just as already happens for xid assignment, then we would use
 the new head for the search. The code is careful to fetch only once.

... but this will not.  You need to use a lock, because there is
otherwise no guarantee that other processors see the write into the
array element before they see the change in the head pointer.

 I would freely admit I know absolutely nothing about details of
 weak-memory-ordering machines and have not considered them at all. How
 would what I have proposed fail to work, yet what we already rely on
 work correctly? Do the circumstances differ?

Yes.  We have memory ordering instructions inserted in the lock
acquisition/release code.  Trying to access and modify a shared-memory
data structure without any locking will not work.

There are some places where we suppose that a *single* write into shared
memory can safely be done without a lock, if we're not too concerned
about how soon other transactions will see the effects.  But what you
are proposing here requires more than one related write.

I've been burnt by this myself:
http://archives.postgresql.org/pgsql-committers/2008-06/msg00228.php

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Simon Riggs
On Sat, 2010-04-17 at 16:48 -0400, Tom Lane wrote:

  We search the array between tail and head. If the head moves by integer
  overwrite just as already happens for xid assignment, then we would use
  the new head for the search. The code is careful to fetch only once.
 
 ... but this will not.  You need to use a lock, because there is
 otherwise no guarantee that other processors see the write into the
 array element before they see the change in the head pointer.
 
  I would freely admit I know absolutely nothing about details of
  weak-memory-ordering machines and have not considered them at all. How
  would what I have proposed fail to work, yet what we already rely on
  work correctly? Do the circumstances differ?
 
 Yes.  We have memory ordering instructions inserted in the lock
 acquisition/release code.  Trying to access and modify a shared-memory
 data structure without any locking will not work.
 
 There are some places where we suppose that a *single* write into shared
 memory can safely be done without a lock, if we're not too concerned
 about how soon other transactions will see the effects.  But what you
 are proposing here requires more than one related write.
 
 I've been burnt by this myself:
 http://archives.postgresql.org/pgsql-committers/2008-06/msg00228.php

W O W - thank you for sharing.

What I'm not clear on is why you've used a spinlock everywhere when only
weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
macro that does does nada when the hardware already protects us? (i.e. a
spinlock only for the hardware that needs it).

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 What I'm not clear on is why you've used a spinlock everywhere when only
 weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
 macro that does does nada when the hardware already protects us? (i.e. a
 spinlock only for the hardware that needs it).

Well, we could certainly consider that, if we had enough places where
there was a demonstrable benefit from it.  I couldn't measure any real
slowdown from adding a spinlock in that sinval code, so I didn't propose
doing so at the time --- and I'm pretty dubious that this code is
sufficiently performance-critical to justify the work, either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Tue, 2010-04-13 at 21:09 +0300, Heikki Linnakangas wrote:
 A quick fix would be to check if there's any entries in the hash table
 before scanning it. That would eliminate the overhead when there's no
 in-progress transactions in the master. But as soon as there's even one,
 the overhead comes back.
 
 Any fix should be fairly quick because of the way its modularised - with
 something like this in mind.
 
 I'll try a circular buffer implementation, with fastpath.

I started experimenting with a sorted array based implementation on
Tuesday but got carried away with other stuff. I now got back to that
and cleaned it up.

How does the attached patch look like? It's probably similar to what you
had in mind.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 88169b4..4a04051 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -64,8 +64,10 @@ typedef struct ProcArrayStruct
 	int			numProcs;		/* number of valid procs entries */
 	int			maxProcs;		/* allocated size of procs array */
 
-	int			numKnownAssignedXids;	/* current number of known assigned
-		 * xids */
+	int			firstKnownAssigned;		/* location of first valid known
+		 * assigned xid in the array */
+	int			lastKnownAssigned;		/* location of last valid known
+		 * assigned xid in the array */
 	int			maxKnownAssignedXids;	/* allocated size of known assigned
 		 * xids */
 
@@ -87,7 +89,8 @@ static ProcArrayStruct *procArray;
 /*
  * Bookkeeping for tracking emulated transactions in recovery
  */
-static HTAB *KnownAssignedXidsHash;
+static TransactionId *KnownAssignedXidsArray;
+static bool *KnownAssignedXidsValidArray;
 static TransactionId latestObservedXid = InvalidTransactionId;
 
 /*
@@ -201,28 +204,33 @@ CreateSharedProcArray(void)
 		/* Normal processing */
 		procArray-numProcs = 0;
 		procArray-maxProcs = PROCARRAY_MAXPROCS;
-		procArray-numKnownAssignedXids = 0;
 		procArray-maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
+		procArray-firstKnownAssignedXid = 0;
+		procArray-lastKnownAssignedXid = 0;
 		procArray-lastOverflowedXid = InvalidTransactionId;
 	}
 
 	if (XLogRequestRecoveryConnections)
 	{
-		/* Create or attach to the KnownAssignedXids hash table */
-		HASHCTL		info;
-
-		MemSet(info, 0, sizeof(info));
-		info.keysize = sizeof(TransactionId);
-		info.entrysize = sizeof(TransactionId);
-		info.hash = tag_hash;
-
-		KnownAssignedXidsHash = ShmemInitHash(KnownAssignedXids Hash,
-			  TOTAL_MAX_CACHED_SUBXIDS,
-			  TOTAL_MAX_CACHED_SUBXIDS,
-			  info,
-			  HASH_ELEM | HASH_FUNCTION);
-		if (!KnownAssignedXidsHash)
-			elog(FATAL, could not initialize known assigned xids hash table);
+		Size size;
+		/* Create or attach to the KnownAssignedXids arrays */
+		size = mul_size(sizeof(TransactionId), TOTAL_MAX_CACHED_SUBXIDS);
+		KnownAssignedXidsArray = ShmemInitStruct(KnownAssignedXidsArray,
+ size,
+ found);
+		if (!KnownAssignedXidsArray)
+			elog(FATAL, could not initialize known assigned xids array);
+		if (!found)
+			MemSet(KnownAssignedXidsArray, 0, size);
+
+		size = mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS);
+		KnownAssignedXidsValidArray = ShmemInitStruct(KnownAssignedXidsValidArray,
+	  size,
+	  found);
+		if (!KnownAssignedXidsValidArray)
+			elog(FATAL, could not initialize known assigned xids used array);
+		if (!found)
+			MemSet(KnownAssignedXidsValidArray, 0, size);
 	}
 }
 
@@ -2162,7 +2170,7 @@ DisplayXidCache(void)
  *
  * During recovery we do not fret too much about the distinction between
  * top-level xids and subtransaction xids. We hold both together in
- * a hash table called KnownAssignedXids. In backends, this is copied into
+ * an array called KnownAssignedXids. In backends, this is copied into
  * snapshots in GetSnapshotData(), taking advantage
  * of the fact that XidInMVCCSnapshot() doesn't care about the distinction
  * either. Subtransaction xids are effectively treated as top-level xids
@@ -2207,7 +2215,7 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 	 * We can see WAL records before the running-xacts snapshot that contain
 	 * XIDs that are not in the running-xacts snapshot, but that we know to
 	 * have finished before the running-xacts snapshot was taken. Don't waste
-	 * precious shared memory by keeping them in the hash table.
+	 * precious shared memory by keeping them in the array.
 	 *
 	 * We can also see WAL records before the running-xacts snapshot that
 	 * contain XIDs that are not in the running-xacts snapshot for a different
@@ -2330,24 +2338,30 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid)
 /*
  * Private module functions to manipulate KnownAssignedXids
  *
- * There are 3 main users of the KnownAssignedXids data structure:
+ * We use a fixed-size sorted array in 

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Simon Riggs
On Fri, 2010-04-16 at 11:29 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Tue, 2010-04-13 at 21:09 +0300, Heikki Linnakangas wrote:
  A quick fix would be to check if there's any entries in the hash table
  before scanning it. That would eliminate the overhead when there's no
  in-progress transactions in the master. But as soon as there's even one,
  the overhead comes back.
  
  Any fix should be fairly quick because of the way its modularised - with
  something like this in mind.
  
  I'll try a circular buffer implementation, with fastpath.
 
 I started experimenting with a sorted array based implementation on
 Tuesday but got carried away with other stuff. I now got back to that
 and cleaned it up.
 
 How does the attached patch look like? It's probably similar to what you
 had in mind.

It looks like a second version of what I'm working on and about to
publish. I'll take that as a compliment!

My patch is attached here also, for discussion.

The two patches look same in their main parts, though I have quite a few
extra tweaks in there, which you can read about in comments. One tweak I
don't have is the use of the presence array that allows a sensible
bsearch, so I'll to alter my patch to use that idea but keep the rest of
my code.

-- 
 Simon Riggs   www.2ndQuadrant.com
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 3cc9bdd..d432c9d 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1200,6 +1200,9 @@ StandbyTransactionIdIsPrepared(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
+	if (max_prepared_xacts = 0)
+		return false;	/* nothing to do */
+
 	/* Read and validate file */
 	buf = ReadTwoPhaseFile(xid, false);
 	if (buf == NULL)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3adbee2..9245727 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6453,6 +6453,12 @@ CheckRecoveryConsistency(void)
 	}
 }
 
+bool
+XLogConsistentState(void)
+{
+	return reachedMinRecoveryPoint;
+}
+
 /*
  * Is the system still in recovery?
  *
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 88169b4..050c547 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -68,6 +68,10 @@ typedef struct ProcArrayStruct
 		 * xids */
 	int			maxKnownAssignedXids;	/* allocated size of known assigned
 		 * xids */
+	int			tailKnownAssignedXids;	/* current tail of known assigned
+		 * xids */
+	int			headKnownAssignedXids;	/* current head of known assigned
+		 * xids */
 
 	/*
 	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
@@ -87,7 +91,7 @@ static ProcArrayStruct *procArray;
 /*
  * Bookkeeping for tracking emulated transactions in recovery
  */
-static HTAB *KnownAssignedXidsHash;
+static TransactionId *KnownAssignedXids;
 static TransactionId latestObservedXid = InvalidTransactionId;
 
 /*
@@ -142,9 +146,12 @@ static int	KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax);
 static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
 			   TransactionId xmax);
 static bool KnownAssignedXidsExist(TransactionId xid);
-static void KnownAssignedXidsAdd(TransactionId *xids, int nxids);
-static void KnownAssignedXidsRemove(TransactionId xid);
-static void KnownAssignedXidsRemoveMany(TransactionId xid, bool keepPreparedXacts);
+static void KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid, bool have_lock);
+static void KnownAssignedXidsRemove(TransactionId xid, int *index);
+static void KnownAssignedXidsRemoveMany(TransactionId xid);
+static void KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
+	  TransactionId *subxids);
+static void KnownAssignedXidsCompress(bool full);
 static void KnownAssignedXidsDisplay(int trace_level);
 
 /*
@@ -204,10 +211,13 @@ CreateSharedProcArray(void)
 		procArray-numKnownAssignedXids = 0;
 		procArray-maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
 		procArray-lastOverflowedXid = InvalidTransactionId;
+		procArray-tailKnownAssignedXids = 0;
+		procArray-headKnownAssignedXids = 0;
 	}
 
 	if (XLogRequestRecoveryConnections)
 	{
+#ifdef KNOWNASSIGNED_USE_HASH
 		/* Create or attach to the KnownAssignedXids hash table */
 		HASHCTL		info;
 
@@ -223,6 +233,12 @@ CreateSharedProcArray(void)
 			  HASH_ELEM | HASH_FUNCTION);
 		if (!KnownAssignedXidsHash)
 			elog(FATAL, could not initialize known assigned xids hash table);
+#else
+		KnownAssignedXids = (TransactionId *)
+ShmemInitStruct(KnownAssignedXids,
+		mul_size(sizeof(TransactionId), TOTAL_MAX_CACHED_SUBXIDS),
+		found);
+#endif
 	}
 }
 
@@ -417,6 +433,8 @@ void
 ProcArrayInitRecoveryInfo(TransactionId oldestActiveXid)
 {
 	snapshotOldestActiveXid = oldestActiveXid;
+
+//	KnownAssignedXidsTest();
 }
 
 /*
@@ 

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Fri, 2010-04-16 at 11:29 +0300, Heikki Linnakangas wrote:
 How does the attached patch look like? It's probably similar to what you
 had in mind.
 
 It looks like a second version of what I'm working on and about to
 publish. I'll take that as a compliment!
 
 My patch is attached here also, for discussion.
 
 The two patches look same in their main parts, though I have quite a few
 extra tweaks in there, which you can read about in comments.

Yeah. Yours seems a lot more complex with all those extra tweaks, I
would suggest to keep it simple. I did realize one bug in my patch: I
didn't handle xid wraparound correctly in the binary search, need to use
TransactionIdFollows instead of plan .

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Simon Riggs
On Fri, 2010-04-16 at 14:47 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Fri, 2010-04-16 at 11:29 +0300, Heikki Linnakangas wrote:
  How does the attached patch look like? It's probably similar to what you
  had in mind.
  
  It looks like a second version of what I'm working on and about to
  publish. I'll take that as a compliment!
  
  My patch is attached here also, for discussion.
  
  The two patches look same in their main parts, though I have quite a few
  extra tweaks in there, which you can read about in comments.
 
 Yeah. Yours seems a lot more complex with all those extra tweaks, I
 would suggest to keep it simple. I did realize one bug in my patch: I
 didn't handle xid wraparound correctly in the binary search, need to use
 TransactionIdFollows instead of plan .

Almost done, yes, much simpler. I wrote a lot of that in the wee small
hours last night, so the difference is amusing.

And I spotted that bug, plus the off by one error also. Just rewritten
all other parts, so no worries.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 I didn't handle xid wraparound correctly in the binary search, need to use
 TransactionIdFollows instead of plan .

I think you're outsmarting yourself there.  A binary search will in fact
*not work* with circular xid comparison (this is exactly why there's no
btree opclass for XID).  You need to use plain , and make sure the
array you're searching is ordered that way too.  The other way might
accidentally fail to malfunction if you only tested ranges of XIDs that
weren't long enough to wrap around, but that doesn't make it right.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Simon Riggs
On Fri, 2010-04-16 at 10:39 -0400, Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  I didn't handle xid wraparound correctly in the binary search, need to use
  TransactionIdFollows instead of plan .
 
 I think you're outsmarting yourself there.  A binary search will in fact
 *not work* with circular xid comparison (this is exactly why there's no
 btree opclass for XID).  You need to use plain , and make sure the
 array you're searching is ordered that way too.  The other way might
 accidentally fail to malfunction if you only tested ranges of XIDs that
 weren't long enough to wrap around, but that doesn't make it right.

I don't understand the exact, please explain more.

I'm not using bsearch() just a quick chop based upon xid comparison,
which looks to me like it will work.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Fri, 2010-04-16 at 10:39 -0400, Tom Lane wrote:
 I think you're outsmarting yourself there.  A binary search will in fact
 *not work* with circular xid comparison (this is exactly why there's no
 btree opclass for XID).

 I don't understand the exact, please explain more.
 I'm not using bsearch() just a quick chop based upon xid comparison,
 which looks to me like it will work.

Implementing it yourself doesn't get you out of the fact that it won't
work.  Consider

1 2 3 ... 30 ... 39

and suppose that 30 is the array's middle element.  If you
search for 100, your first probe will conclude that it is to the right
of 30, which is the wrong direction.  Binary search, or indeed
the entire concept that the array is sorted in the first place,
depends on a transitive comparison operator.  TransactionIdFollows does
not satisfy the transitive law.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Simon Riggs
On Fri, 2010-04-16 at 13:00 +0100, Simon Riggs wrote:
 Almost done

Here's the finished patch.

Internal changes only, no functional or user visible changes, so no
docs, just detailed explanatory comments.

Expectation is
* performance no longer depends upon max_connections
* recovery will be about same or slightly faster
* snapshots should be about equivalent primary/standby

Main changes are
* lock free add to sorted array (equivalent to GetNewTransactionId)
* bsearch for xid removal at commit/abort/TransactionIdIsInProgress
* faster, more compact approach to snapshots, with self-trimming
* some minor API changes to facilitate above
* new code to ignore deletion failures only when !consistent

Erik,

Could you retest please? 

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
***
*** 1200,1205  StandbyTransactionIdIsPrepared(TransactionId xid)
--- 1200,1208 
  
  	Assert(TransactionIdIsValid(xid));
  
+ 	if (max_prepared_xacts = 0)
+ 		return false;	/* nothing to do */
+ 
  	/* Read and validate file */
  	buf = ReadTwoPhaseFile(xid, false);
  	if (buf == NULL)
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 6453,6458  CheckRecoveryConsistency(void)
--- 6453,6464 
  	}
  }
  
+ bool
+ XLogConsistentState(void)
+ {
+ 	return reachedMinRecoveryPoint;
+ }
+ 
  /*
   * Is the system still in recovery?
   *
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***
*** 68,73  typedef struct ProcArrayStruct
--- 68,77 
  		 * xids */
  	int			maxKnownAssignedXids;	/* allocated size of known assigned
  		 * xids */
+ 	int			tailKnownAssignedXids;	/* current tail of known assigned
+ 		 * xids */
+ 	int			headKnownAssignedXids;	/* current head of known assigned
+ 		 * xids */
  
  	/*
  	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
***
*** 87,93  static ProcArrayStruct *procArray;
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
! static HTAB *KnownAssignedXidsHash;
  static TransactionId latestObservedXid = InvalidTransactionId;
  
  /*
--- 91,98 
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
! static TransactionId *KnownAssignedXids;
! static bool *KnownAssignedXidsValid;
  static TransactionId latestObservedXid = InvalidTransactionId;
  
  /*
***
*** 142,150  static int	KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax);
  static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
  			   TransactionId xmax);
  static bool KnownAssignedXidsExist(TransactionId xid);
! static void KnownAssignedXidsAdd(TransactionId *xids, int nxids);
  static void KnownAssignedXidsRemove(TransactionId xid);
! static void KnownAssignedXidsRemoveMany(TransactionId xid, bool keepPreparedXacts);
  static void KnownAssignedXidsDisplay(int trace_level);
  
  /*
--- 147,158 
  static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
  			   TransactionId xmax);
  static bool KnownAssignedXidsExist(TransactionId xid);
! static void KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid, bool have_lock);
  static void KnownAssignedXidsRemove(TransactionId xid);
! static void KnownAssignedXidsRemoveMany(TransactionId xid);
! static void KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
! 	  TransactionId *subxids);
! static void KnownAssignedXidsCompress(bool full);
  static void KnownAssignedXidsDisplay(int trace_level);
  
  /*
***
*** 204,228  CreateSharedProcArray(void)
  		procArray-numKnownAssignedXids = 0;
  		procArray-maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
  		procArray-lastOverflowedXid = InvalidTransactionId;
  	}
  
  	if (XLogRequestRecoveryConnections)
  	{
! 		/* Create or attach to the KnownAssignedXids hash table */
! 		HASHCTL		info;
! 
! 		MemSet(info, 0, sizeof(info));
! 		info.keysize = sizeof(TransactionId);
! 		info.entrysize = sizeof(TransactionId);
! 		info.hash = tag_hash;
! 
! 		KnownAssignedXidsHash = ShmemInitHash(KnownAssignedXids Hash,
! 			  TOTAL_MAX_CACHED_SUBXIDS,
! 			  TOTAL_MAX_CACHED_SUBXIDS,
! 			  info,
! 			  HASH_ELEM | HASH_FUNCTION);
! 		if (!KnownAssignedXidsHash)
! 			elog(FATAL, could not initialize known assigned xids hash table);
  	}
  }
  
--- 212,231 
  		procArray-numKnownAssignedXids = 0;
  		procArray-maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
  		procArray-lastOverflowedXid = InvalidTransactionId;
+ 		procArray-tailKnownAssignedXids = 0;
+ 		procArray-headKnownAssignedXids = 0;
  	}
  
  	if (XLogRequestRecoveryConnections)
  	{
! 		KnownAssignedXids = (TransactionId *)
! ShmemInitStruct(KnownAssignedXids,
! 		

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-14 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  Changing the KnownAssignedXids data structure from
 hash table into something that's quicker to scan. Preferably something
 with O(N), where N is the number of entries in the data structure, not
 the maximum number of entries it can hold as it is with the hash table
 currently.

So that's pretty good news RedBlack Trees made it in 9.0, isn't it? :)

 A quick fix would be to check if there's any entries in the hash table
 before scanning it. That would eliminate the overhead when there's no
 in-progress transactions in the master. But as soon as there's even one,
 the overhead comes back.

Does not sound like typical, does it?
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-14 Thread Simon Riggs
On Tue, 2010-04-13 at 21:09 +0300, Heikki Linnakangas wrote:
 Heikki Linnakangas wrote:
  I could reproduce this on my laptop, standby is about 20% slower. I ran
  oprofile, and what stands out as the difference between the master and
  standby is that on standby about 20% of the CPU time is spent in
  hash_seq_search(). The callpath is GetSnapshotDat() -
  KnownAssignedXidsGetAndSetXmin() - hash_seq_search(). That explains the
  difference in performance.
 
 The slowdown is proportional to the max_connections setting in the
 standby. 20% slowdown might still be acceptable, but if you increase
 max_connections to say 1000, things get really slow. I wouldn't
 recommend max_connections=1000, of course, but I think we need to do
 something about this. Changing the KnownAssignedXids data structure from
 hash table into something that's quicker to scan. Preferably something
 with O(N), where N is the number of entries in the data structure, not
 the maximum number of entries it can hold as it is with the hash table
 currently.

There's a tradeoff here to consider. KnownAssignedXids faces two
workloads: one for each WAL record where we check if the xid is already
known assigned, one for snapshots. The current implementation is
optimised towards recovery performance, not snapshot performance.

 A quick fix would be to check if there's any entries in the hash table
 before scanning it. That would eliminate the overhead when there's no
 in-progress transactions in the master. But as soon as there's even one,
 the overhead comes back.

Any fix should be fairly quick because of the way its modularised - with
something like this in mind.

I'll try a circular buffer implementation, with fastpath.

Have something in a few hours.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >