subject:"Re\: \[HACKERS\] testing HS\/SR \- 1 vs 2 performance"

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-07 Thread Greg Smith

Erik Rijkers wrote:

Everything together: the raid is what Areca call 'raid10(1E)'.
(to be honest I don't remember what that 1E exactly means -
extra flexibility in the number of disks, I think).

Standard RAID10 only supports an even number of disks. The 1E variants
also allow putting an odd number in. If you're using an even number,
like in your case, the results are the same until you start losing
drives, at which point the degradation performance pattern changes a bit
due to differences in how things are striped. See these for more info:

http://bytepile.com/raid_class.php and
http://en.wikipedia.org/wiki/Non-standard_RAID_levels#IBM_ServeRAID_1E

http://publib.boulder.ibm.com/infocenter/eserver/v1r2/index.jsp?topic=/diricinfo/fqy0_craid1e.html

I don't think using RAID1E has anything to do with your results, but it
is possible given your test configuration that part of the difference
you're seeing relates to where on disk blocks are stored. If you take a
hard drive and write two copies of something onto it, the second copy
will be a little slower than the first. That's just because how drive
speed drops over the surface as you move further along it. There's some
examples of what that looks like at
http://projects.2ndquadrant.it/sites/default/files/pg-hw-bench-2010.pdf
on pages 21-23.

Returning to your higher level results, one of the variables you weren't
sure how to account for was caching effects on the standby--the
possibility that it just didn't have the data cached the same as the
master impacting results. The usual way I look for that is by graphing
the pgbench TPS over time. In that situation, you can see it shoot
upwards over time, very obvious pattern. Example at
http://projects.2ndquadrant.it/sites/default/files/pgbench-intro.pdf on
pages 36,37.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com www.2ndQuadrant.us

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 21:34 +0200, Stefan Kaltenbrunner wrote:

> FWIW - I'm seeing a behaviour here under pgbench -S workloads that looks 
> kinda related.
> 
> using -j 16 -c 16 -T 120 I get either 10tps and around 66 
> contextswitches per second or on some runs I end up with 15tps and 
> around 1M contextswitches/s sustained. I mostly get the 100k result but 
> once in a while I get the 150k one. And one even can anticipate the 
> final transaction rate from watching "vmstat 1"...
> 
> I'm not sure yet on what is causing that behaviour but that is with 
> 9.0B1 on a Dual Quadcore Nehalem box with 16 cpu threads (8+HT) on a 
> pure in-memory workload (scale = 20 with 48GB RAM).

Educated guess at a fix: please test this patch. It's good for
performance testing, but doesn't work correctly at failover, which would
obviously be addressed prior to any commit.

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 158,163  static XLogRecPtr LastRec;
--- 158,164 
   * known, need to check the shared state".
   */
  static bool LocalRecoveryInProgress = true;
+ static bool KnownRecoveryState = false;
  
  /*
   * Local state for XLogInsertAllowed():
***
*** 6524,6537  CheckRecoveryConsistency(void)
  bool
  RecoveryInProgress(void)
  {
! 	/*
! 	 * We check shared state each time only until we leave recovery mode. We
! 	 * can't re-enter recovery, so there's no need to keep checking after the
! 	 * shared variable has once been seen false.
! 	 */
! 	if (!LocalRecoveryInProgress)
! 		return false;
! 	else
  	{
  		/* use volatile pointer to prevent code rearrangement */
  		volatile XLogCtlData *xlogctl = XLogCtl;
--- 6525,6533 
  bool
  RecoveryInProgress(void)
  {
! 	if (KnownRecoveryState)
! 		return LocalRecoveryInProgress;
! 	else		
  	{
  		/* use volatile pointer to prevent code rearrangement */
  		volatile XLogCtlData *xlogctl = XLogCtl;
***
*** 6541,6546  RecoveryInProgress(void)
--- 6537,6544 
  		LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress;
  		SpinLockRelease(&xlogctl->info_lck);
  
+ 		KnownRecoveryState = true;
+ 
  		/*
  		 * Initialize TimeLineID and RedoRecPtr when we discover that recovery
  		 * is finished. InitPostgres() relies upon this behaviour to ensure

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Erik Rijkers

On Tue, May 4, 2010 20:26, Greg Smith wrote:
> Erik Rijkers wrote:
>>   OS: Centos 5.4
>>   2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
>>   Areca 1280ML
>>   primary and standby db both on a 12 disk array (sata 7200rpm, Seagat 
>> Barracuda ES.2)
>>
>
> To fill in from data you already mentioned upthread:
> 32 GB RAM
> CentOS release 5.4 (Final), x86_64 Linux, 2.6.18-164.el5
>
> Thanks for the all the reporting you've done here, really helpful.
> Questions to make sure I'm trying to duplicate the right thing here:
>
> Is your disk array all configured as one big RAID10 volume, so
> essentially a 6-disk stripe with redundancy, or something else?  In
> particular I want know whether the WAL/database/archives are split onto
> separate volumes or all on one big one when you were testing.

Everything together: the raid is what Areca call 'raid10(1E)'.
(to be honest I don't remember what that 1E exactly means -
extra flexibility in the number of disks, I think).

Btw, some of my emails contained the postgresql.conf of both instances.

>
> Is this is on ext3 with standard mount parameters?

ext3 noatime

> Also, can you confirm that every test you ran only had a single pgbench
> worker thread (-j 1 or not specified)?  That looked to be the case from
> the ones I saw where you posted the whole command used.  It would not

yes; the literal cmd:
time /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/bin/pgbench -h /tmp 
-p 6565 -U rijkers
-n -S -c 20 -T 900 -j 1 replicas

To avoid wrapping in the emails I just removed '-h \tmp', -U rijkers', and 
'replicas'.

(I may have run the primary's pgbench binary also against the slave - don't 
think
that should make any difference)

> surprise me to find that the CPU usage profile of a standby is just
> different enough from the primary that it results in the pgbench program
> not being scheduled enough time, due to the known Linux issues in that
> area.  Not going to assume that, of course, just one thing I want to
> check when trying to replicate what you've run into.
>
> I didn't see any glaring HS performance issues like you've been
> reporting on last time I tried performance testing in this area, just a
> small percentage drop.  But I didn't specifically go looking for it

Here, it seems repeatable, but does not occur with all scales.

Hm, maybe I should just dump *all* of my results on the wiki for reference.  
(I'll look at that
later).

> either.  With your testing rig out of service, we're going to try and
> replicate that on a system here.  My home server is like a scaled down
> version of yours (single quad-core, 8GB RAM, smaller Areca controller, 5
> disks instead of 12) and it's running the same CentOS version.  If the
> problems really universal I should see it here too.
>

Thanks,


Erik Rijkers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Stefan Kaltenbrunner


Erik Rijkers wrote:

Hi Simon,

In another thread you mentioned you were lacking information from me:

On Tue, May 4, 2010 17:10, Simon Riggs wrote:

There is no evidence that Erik's strange performance has anything to do
with HS; it hasn't been seen elsewhere and he didn't respond to
questions about the test setup to provide background. The profile didn't
fit any software problem I can see.



I'm sorry if I missed requests for things that where not already mentioned.

Let me repeat:
  OS: Centos 5.4
  2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
  Areca 1280ML
  primary and standby db both on a 12 disk array (sata 7200rpm, Seagat 
Barracuda ES.2)

It goes without saying (I hope) that apart from the pgbench tests
and a few ssh sessions (myself), the machine was idle.

It would be interesting if anyone repeated these simple tests and produced
evidence that these non-HS.

(Unfortunately, I have at the moment not much time for more testing)


FWIW - I'm seeing a behaviour here under pgbench -S workloads that looks 
kinda related.


using -j 16 -c 16 -T 120 I get either 10tps and around 66 
contextswitches per second or on some runs I end up with 15tps and 
around 1M contextswitches/s sustained. I mostly get the 100k result but 
once in a while I get the 150k one. And one even can anticipate the 
final transaction rate from watching "vmstat 1"...


I'm not sure yet on what is causing that behaviour but that is with 
9.0B1 on a Dual Quadcore Nehalem box with 16 cpu threads (8+HT) on a 
pure in-memory workload (scale = 20 with 48GB RAM).



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Greg Smith


Erik Rijkers wrote:

  OS: Centos 5.4
  2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
  Areca 1280ML
  primary and standby db both on a 12 disk array (sata 7200rpm, Seagat 
Barracuda ES.2)
  


To fill in from data you already mentioned upthread:
32 GB RAM
CentOS release 5.4 (Final), x86_64 Linux, 2.6.18-164.el5

Thanks for the all the reporting you've done here, really helpful.  
Questions to make sure I'm trying to duplicate the right thing here:


Is your disk array all configured as one big RAID10 volume, so 
essentially a 6-disk stripe with redundancy, or something else?  In 
particular I want know whether the WAL/database/archives are split onto 
separate volumes or all on one big one when you were testing. 


Is this is on ext3 with standard mount parameters?

Also, can you confirm that every test you ran only had a single pgbench 
worker thread (-j 1 or not specified)?  That looked to be the case from 
the ones I saw where you posted the whole command used.  It would not 
surprise me to find that the CPU usage profile of a standby is just 
different enough from the primary that it results in the pgbench program 
not being scheduled enough time, due to the known Linux issues in that 
area.  Not going to assume that, of course, just one thing I want to 
check when trying to replicate what you've run into. 

I didn't see any glaring HS performance issues like you've been 
reporting on last time I tried performance testing in this area, just a 
small percentage drop.  But I didn't specifically go looking for it 
either.  With your testing rig out of service, we're going to try and 
replicate that on a system here.  My home server is like a scaled down 
version of yours (single quad-core, 8GB RAM, smaller Areca controller, 5 
disks instead of 12) and it's running the same CentOS version.  If the 
problems really universal I should see it here too.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Erik Rijkers

On Tue, May 4, 2010 18:19, Simon Riggs wrote:
> On Tue, 2010-05-04 at 18:10 +0200, Erik Rijkers wrote:
>> It would be interesting if anyone repeated these simple tests and
>> produced evidence that these non-HS.
>>
>> (Unfortunately, I have at the moment not much time for more testing)
>
> Would you be able to make those systems available for further testing?

No. sorry.

> First, I'd perform the same test with the systems swapped, so we know
> more about the symmetry of the issue. After that, would like to look
> more into internals.

you mean "systems swapped", primary and standby?  primary and standby were on 
the same machine in
these tests (even the same raid).

I can eventually move the standby (the 'slow' side, as it stands) to another, 
quite similar
machine.  Not in the coming days though...

> Is it possible to setup SytemTap and dtrace on these systems?

I did install systemtap last week. dtrace is not installed (I think. I've never 
used either.)

Erik Rijkers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Simon Riggs

On Tue, 2010-05-04 at 18:10 +0200, Erik Rijkers wrote:
> It would be interesting if anyone repeated these simple tests and
> produced
> evidence that these non-HS.
> 
> (Unfortunately, I have at the moment not much time for more testing)

Would you be able to make those systems available for further testing?

First, I'd perform the same test with the systems swapped, so we know
more about the symmetry of the issue. After that, would like to look
more into internals.

Is it possible to setup SytemTap and dtrace on these systems?

Thanks, either way.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-05-04 Thread Erik Rijkers

Hi Simon,

In another thread you mentioned you were lacking information from me:

On Tue, May 4, 2010 17:10, Simon Riggs wrote:
>
> There is no evidence that Erik's strange performance has anything to do
> with HS; it hasn't been seen elsewhere and he didn't respond to
> questions about the test setup to provide background. The profile didn't
> fit any software problem I can see.
>

I'm sorry if I missed requests for things that where not already mentioned.

Let me repeat:
  OS: Centos 5.4
  2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
  Areca 1280ML
  primary and standby db both on a 12 disk array (sata 7200rpm, Seagat 
Barracuda ES.2)

It goes without saying (I hope) that apart from the pgbench tests
and a few ssh sessions (myself), the machine was idle.

It would be interesting if anyone repeated these simple tests and produced
evidence that these non-HS.

(Unfortunately, I have at the moment not much time for more testing)


thanks,

Erik Rijkers



On Sun, April 25, 2010 21:07, Simon Riggs wrote:
> On Sun, 2010-04-25 at 20:25 +0200, Erik Rijkers wrote:
>
>> Sorry if it's too much data, but to me at least it was illuminating;
>> I now understand the effects of the different parameters better.
>
> That's great, many thanks.
>
> A few observations
>
> * Standby performance is actually slightly above normal running. This is
> credible because of the way snapshots are now taken. We don't need to
> scan the procarray looking for write transactions, since we know
> everything is read only. So we scan just the knownassignedxids, which if
> no activity from primary will be zero-length, so snapshots will actually
> get taken much faster in this case on standby. The snapshot performance
> on standby is O(n) where n is the number of write transactions
> "currently" on primary (transfer delays blur the word "currently").
>
> * The results for scale factor < 100 are fine, and the results for >100
> with few connections get thrown out by long transaction times. With
> larger numbers of connections the wait problems seem to go away. Looks
> like Erik (and possibly Hot Standby in general) has an I/O problem,
> though "from what" is not yet determined. It could be just hardware, or
> might be hardware plus other factors.
>
> --
>  Simon Riggs   www.2ndQuadrant.com
>
>



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-28 Thread Simon Riggs

On Tue, 2010-04-27 at 20:13 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On Tue, 2010-04-27 at 18:08 -0400, Tom Lane wrote:
> >> Huh?  How is a filter as coarse as an oldest-running-XID filter going
> >> to prevent that?  And aren't we initializing from trustworthy data in
> >> ProcArrayApplyRecoveryInfo, anyway?
> >> 
> >> I still say it's useless.
> 
> > Quite possibly. Your looking at other code outside of this patch. I'm
> > happy that you do so, but is it immediately related? I can have another
> > look when we finish this.
> 
> Well, it's nearby anyway.  I've committed the present patch (with a
> number of fixes).  While I was looking at it I came across several
> things in the existing code that I think are either wrong or at least
> inadequately documented --- the above complaint is just the tip of the
> iceberg.  I'm going to make another pass over it to see if I'm just
> missing things, and then report back.

Thank you for your input and changes.

You're welcome to share my iceberg.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Tom Lane

Simon Riggs  writes:
> On Tue, 2010-04-27 at 18:08 -0400, Tom Lane wrote:
>> Huh?  How is a filter as coarse as an oldest-running-XID filter going
>> to prevent that?  And aren't we initializing from trustworthy data in
>> ProcArrayApplyRecoveryInfo, anyway?
>> 
>> I still say it's useless.

> Quite possibly. Your looking at other code outside of this patch. I'm
> happy that you do so, but is it immediately related? I can have another
> look when we finish this.

Well, it's nearby anyway.  I've committed the present patch (with a
number of fixes).  While I was looking at it I came across several
things in the existing code that I think are either wrong or at least
inadequately documented --- the above complaint is just the tip of the
iceberg.  I'm going to make another pass over it to see if I'm just
missing things, and then report back.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Simon Riggs

On Tue, 2010-04-27 at 18:08 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On Tue, 2010-04-27 at 17:24 -0400, Tom Lane wrote:
> >> I think we should just lose that test, as well as the variable.
> 
> > Yes, though it looks like it is still necessary in creating a valid
> > initial state because otherwise we may have xids in KnownAssigned array
> > that are already complete.
> 
> Huh?  How is a filter as coarse as an oldest-running-XID filter going
> to prevent that?  And aren't we initializing from trustworthy data in
> ProcArrayApplyRecoveryInfo, anyway?
> 
> I still say it's useless.

Quite possibly. Your looking at other code outside of this patch. I'm
happy that you do so, but is it immediately related? I can have another
look when we finish this.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Tom Lane

Simon Riggs  writes:
> On Tue, 2010-04-27 at 17:24 -0400, Tom Lane wrote:
>> I think we should just lose that test, as well as the variable.

> Yes, though it looks like it is still necessary in creating a valid
> initial state because otherwise we may have xids in KnownAssigned array
> that are already complete.

Huh?  How is a filter as coarse as an oldest-running-XID filter going
to prevent that?  And aren't we initializing from trustworthy data in
ProcArrayApplyRecoveryInfo, anyway?

I still say it's useless.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Simon Riggs

On Tue, 2010-04-27 at 17:24 -0400, Tom Lane wrote:

> Isn't the snapshotOldestActiveXid filter in
> RecordKnownAssignedTransactionIds completely wrong/useless/bogus?
> 
> AFAICS, snapshotOldestActiveXid is only set once at the start of
> recovery.  This means it will soon be too old to provide any useful
> filtering.  But what's far worse is that the XID space will eventually
> wrap around, and that test will start filtering *everything*.
> 
> I think we should just lose that test, as well as the variable.

Yes, though it looks like it is still necessary in creating a valid
initial state because otherwise we may have xids in KnownAssigned array
that are already complete. The comment there talks about wasting memory,
though it appears to be a correctness issue.

So perhaps a similar test is required in ProcArrayApplyRecoveryInfo()
but not in RecordKnownAssignedTransactionIds(). That way it is applied,
but only once at initialisation.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Tom Lane

Isn't the snapshotOldestActiveXid filter in
RecordKnownAssignedTransactionIds completely wrong/useless/bogus?

AFAICS, snapshotOldestActiveXid is only set once at the start of
recovery.  This means it will soon be too old to provide any useful
filtering.  But what's far worse is that the XID space will eventually
wrap around, and that test will start filtering *everything*.

I think we should just lose that test, as well as the variable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Simon Riggs

On Tue, 2010-04-27 at 16:18 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On Tue, 2010-04-27 at 13:52 -0400, Tom Lane wrote:
> >> WTF?  Either the comment is wrong or this should not be an elog
> >> condition.
> 
> > That section of code has been rewritten many times. I think it is now
> > inaccurate and should be removed. I left it there because the
> > unfortunate history of the project has been the removal of comments and
> > then later rediscovery of the truth, sometimes more than once. I could
> > no longer reproduce that error; someone else may know differently.
> 
> I haven't tested this, but it appears to me that the failure would occur
> in overflow situations.  If we have too many subxacts, we'll generate
> XLOG_XACT_ASSIGNMENT, which will cause the subxids to be removed from
> KnownAssignedXids[].  Then later when the top-level xact commits or
> aborts we'll try to remove them again as a consequence of processing
> the top-level's commit/abort record.  No?

Yes, thank you for clear thinking. Anyway, looks like the comment was
right after all and the new code to throw an error is wrong in some
cases. It was useful for testing, at least. The comment was slightly
misleading, which is a good reason to rewrite it.

It seems like it might be possible to identify which xids could cause an
error and which won't. Much harder than that. We still have the possible
case where we have >64 subtransactions allocated but many of them abort
and we are left with a final commit of <64 subtransactions.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Tom Lane

Simon Riggs  writes:
> On Tue, 2010-04-27 at 13:52 -0400, Tom Lane wrote:
>> WTF?  Either the comment is wrong or this should not be an elog
>> condition.

> That section of code has been rewritten many times. I think it is now
> inaccurate and should be removed. I left it there because the
> unfortunate history of the project has been the removal of comments and
> then later rediscovery of the truth, sometimes more than once. I could
> no longer reproduce that error; someone else may know differently.

I haven't tested this, but it appears to me that the failure would occur
in overflow situations.  If we have too many subxacts, we'll generate
XLOG_XACT_ASSIGNMENT, which will cause the subxids to be removed from
KnownAssignedXids[].  Then later when the top-level xact commits or
aborts we'll try to remove them again as a consequence of processing
the top-level's commit/abort record.  No?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Simon Riggs

On Tue, 2010-04-27 at 14:53 -0400, Tom Lane wrote:
> Hmm ... there's another point here, which is that the array size
> creates
> a hard maximum on the number of entries, whereas the hash table was a
> bit more forgiving.  What is the proof that the array won't overflow?
> The fact that the equivalent data structure on the master can't hold
> more than this many entries doesn't seem to me to prove that, because
> we will add intermediate not-observed XIDs to the array.

We know that not-observed xids have actually been allocated on the
primary. We log an assignment record every 64 subtransactions, so that
the peak size of the array is 65 xids per connection.

It's possible for xids to stay in the array for longer, in the event of
a FATAL error that doesn't log an abort record. We clean those up every
checkpoint, if they exist. The potential number of them is unbounded, so
making special allowance for them doesn't remove the theoretical risk.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Tom Lane

Hmm ... there's another point here, which is that the array size creates
a hard maximum on the number of entries, whereas the hash table was a
bit more forgiving.  What is the proof that the array won't overflow?
The fact that the equivalent data structure on the master can't hold
more than this many entries doesn't seem to me to prove that, because
we will add intermediate not-observed XIDs to the array.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Simon Riggs

On Tue, 2010-04-27 at 13:52 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > v3 attached
> 
> This patch changes KnownAssignedXidsRemove() so that failure to find
> the target XID is elog(ERROR) (ie, a PANIC, since this is in the
> startup process).  

Not in all cases. The code is correct, as far as I am aware from
testing.

> However, this comment is still there:
>   /*
>* We can fail to find an xid if the xid came from a subtransaction that
>* aborts, though the xid hadn't yet been reported and no WAL records 
> have
>* been written using the subxid. In that case the abort record will
>* contain that subxid and we haven't seen it before.
>*/
> 
> WTF?  Either the comment is wrong or this should not be an elog
> condition.

That section of code has been rewritten many times. I think it is now
inaccurate and should be removed. I left it there because the
unfortunate history of the project has been the removal of comments and
then later rediscovery of the truth, sometimes more than once. I could
no longer reproduce that error; someone else may know differently.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-27 Thread Tom Lane

Simon Riggs  writes:
> v3 attached

This patch changes KnownAssignedXidsRemove() so that failure to find
the target XID is elog(ERROR) (ie, a PANIC, since this is in the
startup process).  However, this comment is still there:

/*
 * We can fail to find an xid if the xid came from a subtransaction that
 * aborts, though the xid hadn't yet been reported and no WAL records 
have
 * been written using the subxid. In that case the abort record will
 * contain that subxid and we haven't seen it before.
 */

WTF?  Either the comment is wrong or this should not be an elog
condition.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-26 Thread Tom Lane

Simon Riggs  writes:
> v3 attached

Thanks, will work on this tomorrow.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-26 Thread Simon Riggs

On Sun, 2010-04-25 at 13:51 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On Sun, 2010-04-25 at 13:33 -0400, Tom Lane wrote:
> >> If you like I'll have a go at rewriting the comments for this patch,
> >> because I am currently thinking that the problem is not so much with
> >> the code as with the poor explanation of what it's doing.  Sometimes
> >> the author is too close to the code to understand why other people
> >> have a hard time understanding it.
> 
> > That would help me, thank you.
> 
> OK.  You said you were currently working some more on the patch, so
> I'll wait for v3 and then work on it.

v3 attached

Changes:
* Strange locking in KnownAssignedXidsAdd() moved to RecordKnown...
* KnownAssignedXidsAdd() reordered, assert-ish code added
* Tail movement during snapshots no longer possible
* Tail movement during xid removal added to KnownAssignedXidsSearch()
* Major comment hacking

Little bit rough, definitely needs a re-read of all comments, so good
time to send over.

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
***
*** 1200,1205  StandbyTransactionIdIsPrepared(TransactionId xid)
--- 1200,1208 
  
  	Assert(TransactionIdIsValid(xid));
  
+ 	if (max_prepared_xacts <= 0)
+ 		return false;	/* nothing to do */
+ 
  	/* Read and validate file */
  	buf = ReadTwoPhaseFile(xid, false);
  	if (buf == NULL)
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 6454,6459  CheckRecoveryConsistency(void)
--- 6454,6465 
  	}
  }
  
+ bool
+ XLogConsistentState(void)
+ {
+ 	return reachedMinRecoveryPoint;
+ }
+ 
  /*
   * Is the system still in recovery?
   *
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***
*** 52,57 
--- 52,58 
  #include "access/twophase.h"
  #include "miscadmin.h"
  #include "storage/procarray.h"
+ #include "storage/spin.h"
  #include "storage/standby.h"
  #include "utils/builtins.h"
  #include "utils/snapmgr.h"
***
*** 64,73  typedef struct ProcArrayStruct
  	int			numProcs;		/* number of valid procs entries */
  	int			maxProcs;		/* allocated size of procs array */
  
! 	int			numKnownAssignedXids;	/* current number of known assigned
! 		 * xids */
! 	int			maxKnownAssignedXids;	/* allocated size of known assigned
! 		 * xids */
  
  	/*
  	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
--- 65,84 
  	int			numProcs;		/* number of valid procs entries */
  	int			maxProcs;		/* allocated size of procs array */
  
! 	/*
! 	 * Known assigned xids handling
! 	 */
! 	int			maxKnownAssignedXids;	/* allocated size */
! 
! 	/*
! 	 * Callers must hold either ProcArrayLock in Exclusive mode or
! 	 * ProcArrayLock in Shared mode *and* known_assigned_xids_lck
! 	 * to update these values.
! 	 */
! 	int			numKnownAssignedXids;	/* currrent # valid entries */
! 	int			tailKnownAssignedXids;	/* current tail */
! 	int			headKnownAssignedXids;	/* current head */
! 	slock_t		known_assigned_xids_lck;	/* shared protection lock */
  
  	/*
  	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
***
*** 87,93  static ProcArrayStruct *procArray;
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
! static HTAB *KnownAssignedXidsHash;
  static TransactionId latestObservedXid = InvalidTransactionId;
  
  /*
--- 98,105 
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
! static TransactionId *KnownAssignedXids;
! static bool *KnownAssignedXidsValid;
  static TransactionId latestObservedXid = InvalidTransactionId;
  
  /*
***
*** 142,150  static int	KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax);
  static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
  			   TransactionId xmax);
  static bool KnownAssignedXidsExist(TransactionId xid);
! static void KnownAssignedXidsAdd(TransactionId *xids, int nxids);
  static void KnownAssignedXidsRemove(TransactionId xid);
! static void KnownAssignedXidsRemoveMany(TransactionId xid, bool keepPreparedXacts);
  static void KnownAssignedXidsDisplay(int trace_level);
  
  /*
--- 154,166 
  static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
  			   TransactionId xmax);
  static bool KnownAssignedXidsExist(TransactionId xid);
! static void KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
! bool exclusive_lock);
  static void KnownAssignedXidsRemove(TransactionId xid);
! static void KnownAssignedXidsRemoveMany(TransactionId xid);
! static void KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
! 	  TransactionId *subxids);
! static void KnownAssignedXidsCompress(bool full);
  static void KnownAssignedXidsDisplay(int trace_level);
  
  /*
***
*** 204,228

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-26 Thread Erik Rijkers

On Mon, April 26, 2010 09:43, Simon Riggs wrote:
> On Sun, 2010-04-25 at 23:52 +0200, Erik Rijkers wrote:
>
>> I'll try to repeat this pattern on other hardware; although
>> if my tests were run with faulty hardware I wouldn't know how/why
>> that would give the above effect (such a 'regular aberration').
>
>> testing is more difficult than I thought...
>
> Thanks again for your help.
>
> Please can you confirm:
> * Are the standby tests run while the primary is completely quiet?

autovacuum was on. Which is probably not a good idea - I'll try a few runs 
without it.

> * What OS is this? Can we use dtrace scripts?

Centos 5.4.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-26 Thread Erik Rijkers

On Mon, April 26, 2010 08:52, Fujii Masao wrote:
> On Mon, Apr 26, 2010 at 3:25 AM, Erik Rijkers  wrote:
>> FWIW, here are some more results from pgbench comparing
>> primary and standby (both with Simon's patch).
>
> Was there a difference in CPU  utilization between the primary
> and standby?
>

I haven't monitored it..


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-26 Thread Simon Riggs

On Sun, 2010-04-25 at 23:52 +0200, Erik Rijkers wrote:

> I'll try to repeat this pattern on other hardware; although
> if my tests were run with faulty hardware I wouldn't know how/why
> that would give the above effect (such a 'regular aberration').

> testing is more difficult than I thought...

Thanks again for your help.

Please can you confirm:
* Are the standby tests run while the primary is completely quiet?
* What OS is this? Can we use dtrace scripts?

Can anyone else confirm these test results: large scale factor and small
number of sessions?

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-26 Thread Simon Riggs

On Sun, 2010-04-25 at 19:18 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > [ v2 patch ]
> 
> I've been studying this some more while making notes for improved
> comments, and I've about come to the conclusion that having readers
> move the tail pointer (at the end of KnownAssignedXidsGetAndSetXmin)
> is overly tricky and probably not a performance improvement anyway.
> The code is in fact wrong as it stands: it's off-by-one about setting
> the new tail value.  And there's potential for contention with multiple
> readers all wanting to move the tail pointer at once.  

OK, since contention was my concern, I want to avoid that.

> And most
> importantly, KnownAssignedXidsSearch can't move the tail pointer so
> we might expend many inefficient searches while never moving the tail
> pointer.

> I think we should get rid of that and just have the two functions that
> can mark entries invalid (which they must do with exclusive lock)
> advance the tail pointer when they invalidate the current tail element.

OK

> Then we have the very simple rule that only the startup process ever
> changes this data structure.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Fujii Masao

On Mon, Apr 26, 2010 at 3:25 AM, Erik Rijkers  wrote:
> FWIW, here are some more results from pgbench comparing
> primary and standby (both with Simon's patch).

Was there a difference in CPU  utilization between the primary
and standby?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane

Simon Riggs  writes:
> [ v2 patch ]

I've been studying this some more while making notes for improved
comments, and I've about come to the conclusion that having readers
move the tail pointer (at the end of KnownAssignedXidsGetAndSetXmin)
is overly tricky and probably not a performance improvement anyway.
The code is in fact wrong as it stands: it's off-by-one about setting
the new tail value.  And there's potential for contention with multiple
readers all wanting to move the tail pointer at once.  And most
importantly, KnownAssignedXidsSearch can't move the tail pointer so
we might expend many inefficient searches while never moving the tail
pointer.

I think we should get rid of that and just have the two functions that
can mark entries invalid (which they must do with exclusive lock)
advance the tail pointer when they invalidate the current tail element.
Then we have the very simple rule that only the startup process ever
changes this data structure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Erik Rijkers

On Sun, April 25, 2010 20:55, Tom Lane wrote:
>
> That seems weird.  Why do most of the runs show primary and standby
> as having comparable speed, but a few show the standby as much slower?
> The parameters for those runs don't seem obviously different from cases
> where it's fast.  I think there might have been something else going on
> on the standby during those runs.  Or do you think those represent
> cases where the mystery slowdown event happened?
>

the strange case is the scale 100 standby's slow start, followed by
a steady increase during -c 1, then -c 10, and finally getting up to speed
with -c 20 (and up).  And these slow-but-growing standby series are interspersed
with normal (high-speed) primary series.

I'll try to repeat this pattern on other hardware; although
if my tests were run with faulty hardware I wouldn't know how/why
that would give the above effect (such a 'regular aberration').

testing is more difficult than I thought...

Erik Rijkers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Simon Riggs

On Sun, 2010-04-25 at 20:25 +0200, Erik Rijkers wrote:

> Sorry if it's too much data, but to me at least it was illuminating;
> I now understand the effects of the different parameters better.

That's great, many thanks.

A few observations

* Standby performance is actually slightly above normal running. This is
credible because of the way snapshots are now taken. We don't need to
scan the procarray looking for write transactions, since we know
everything is read only. So we scan just the knownassignedxids, which if
no activity from primary will be zero-length, so snapshots will actually
get taken much faster in this case on standby. The snapshot performance
on standby is O(n) where n is the number of write transactions
"currently" on primary (transfer delays blur the word "currently").

* The results for scale factor < 100 are fine, and the results for >100
with few connections get thrown out by long transaction times. With
larger numbers of connections the wait problems seem to go away. Looks
like Erik (and possibly Hot Standby in general) has an I/O problem,
though "from what" is not yet determined. It could be just hardware, or
might be hardware plus other factors.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane

"Erik Rijkers"  writes:
> FWIW, here are some more results from pgbench comparing
> primary and standby (both with Simon's patch).

That seems weird.  Why do most of the runs show primary and standby
as having comparable speed, but a few show the standby as much slower?
The parameters for those runs don't seem obviously different from cases
where it's fast.  I think there might have been something else going on
on the standby during those runs.  Or do you think those represent
cases where the mystery slowdown event happened?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Erik Rijkers

On Sat, April 24, 2010 01:17, Erik Rijkers wrote:
> On Sat, April 24, 2010 00:39, Simon Riggs wrote:
>> On Fri, 2010-04-23 at 11:32 -0400, Robert Haas wrote:
>>> >
>>> > 99% of transactions happen in similar times between primary and standby,
>>> > everything dragged down by rare but severe spikes.
>>> >
>>> > We're looking for something that would delay something that normally
>>> > takes <0.1ms into something that takes >100ms, yet does eventually
>>> > return. That looks like a severe resource contention issue.
>>>
>>> Wow.  Good detective work.
>>
>> While we haven't fully established the source of those problems, I am
>> now happy that these test results don't present any reason to avoid
>> commiting the main patch tested by Erik (not the smaller additional one
>> I sent). I expect to commit that on Sunday.
>>
>
> yes, that (main) patch seems to have largely
> closed the gap between primary and standby; here
> are some results from a lower scale (10):

FWIW, here are some more results from pgbench comparing
primary and standby (both with Simon's patch).

Sorry if it's too much data, but to me at least it was illuminating;
I now understand the effects of the different parameters better.



pgbench results, all -S (SELECT-only).

  scale: 10, 100, 500, 1000
clients: 1, 10, 20, 40, 60, 90

for each: 4x primary, 4x standby, for 15 minutes:

(port 6565=primary, 6566=standby)

scale: 10  clients:  1  tps =  6404.768132  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 10  clients:  1  tps =  6435.072983  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 10  clients:  1  tps =  6458.958549  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 10  clients:  1  tps =  6432.088099  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1

scale: 10  clients:  1  tps =  6442.519035  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 10  clients:  1  tps =  6449.837931  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 10  clients:  1  tps =  6419.879026  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 10  clients:  1  tps =  6428.290175  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1

---
scale: 10  clients: 10  tps = 27624.339871  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 27604.261750  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 28015.093466  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 28422.561280  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1

scale: 10  clients: 10  tps = 27254.806526  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 27686.470866  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 28078.904035  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 27101.622337  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1


scale: 10  clients: 20  tps = 23106.795587  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 23101.681155  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22893.364004  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 23038.577109  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1

scale: 10  clients: 20  tps = 22903.578552  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22970.691946  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22999.473318  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22884.854749  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1


scale: 10  clients: 40  tps = 23522.499429  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23611.319191  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23616.905302  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23572.213990  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1

scale: 10  clients: 40  tps = 23714.721220  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23711.781175  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23691.867023  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23691.699231  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1


scale: 10  clients: 60  tps = 21987.497095  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 21950.344204  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22006.461447  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 21824.071303  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1

scale: 10  clients: 60  tps = 22149.415231  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22211.064402  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22164.238081  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22174.585736  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1


scale: 10

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane

Simon Riggs  writes:
> On Sun, 2010-04-25 at 13:33 -0400, Tom Lane wrote:
>> If you like I'll have a go at rewriting the comments for this patch,
>> because I am currently thinking that the problem is not so much with
>> the code as with the poor explanation of what it's doing.  Sometimes
>> the author is too close to the code to understand why other people
>> have a hard time understanding it.

> That would help me, thank you.

OK.  You said you were currently working some more on the patch, so
I'll wait for v3 and then work on it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Simon Riggs

On Sun, 2010-04-25 at 13:33 -0400, Tom Lane wrote:

> If you like I'll have a go at rewriting the comments for this patch,
> because I am currently thinking that the problem is not so much with
> the code as with the poor explanation of what it's doing.  Sometimes
> the author is too close to the code to understand why other people
> have a hard time understanding it.

That would help me, thank you.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane

Simon Riggs  writes:
> On Sun, 2010-04-25 at 12:51 -0400, Tom Lane wrote:
>> If the comments were correct, I wouldn't be complaining.  They're
>> misleading or outright wrong on many points.  In particular, I don't
>> think you actually understand the weak-memory-ordering issue, because
>> the comments about that are entirely wrong.  

> The comments says "on CPUs with
> + * weak-memory ordering we can't reliably move pointers atomically, so
> the
> + * rule is that updates of head and tail of the array require
> ProcArrayLock
> + * in exclusive mode or (shared mode and known_assigned_xids_lck
> spinlock)"

> I will reword this, so it is clear that I'm talking about the head and
> tail of the array, not pointers in general.

It's not about whether the pointers can be assigned atomically; on most
hardware they can.  It's about whether other processors will see that
happen in the correct sequence relative to the changes in the array
elements.

If you like I'll have a go at rewriting the comments for this patch,
because I am currently thinking that the problem is not so much with
the code as with the poor explanation of what it's doing.  Sometimes
the author is too close to the code to understand why other people
have a hard time understanding it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Simon Riggs

On Sun, 2010-04-25 at 12:51 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On Sun, 2010-04-25 at 11:50 -0400, Tom Lane wrote:
> >> This needs a redesign before it can be considered committable.  I don't
> >> really care whether it makes things faster; it's too complicated and too
> >> poorly documented to be maintainable.
> 
> > There are more than 60 lines of header comment explaining in detail how
> > this works, with a full algorithmic analysis. The remaining code is
> > commented to project standards, with easily more than 100 lines of
> > comments. 
> 
> If the comments were correct, I wouldn't be complaining.  They're
> misleading or outright wrong on many points.  In particular, I don't
> think you actually understand the weak-memory-ordering issue, because
> the comments about that are entirely wrong.  

The comments says "on CPUs with
+ * weak-memory ordering we can't reliably move pointers atomically, so
the
+ * rule is that updates of head and tail of the array require
ProcArrayLock
+ * in exclusive mode or (shared mode and known_assigned_xids_lck
spinlock)"

I will reword this, so it is clear that I'm talking about the head and
tail of the array, not pointers in general.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Simon Riggs

On Sun, 2010-04-25 at 12:43 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > [ v2 patch ]
> 
> BTW, while I'm looking at this, why bother with the separate
> KnownAssignedXidsValid[] array?  Wouldn't it be cheaper
> (certainly so in storage, probably so in access/update times)
> to have just the KnownAssignedXids[] array and store
> InvalidTransactionId in unused entries?

Well, that was exactly my first design.

Heikki came up with the additional array and I think it is an inspired
suggestion, because it allows the search() to use a simple binary
search. I attempted to write the binary-search-with-holes and it was
going to be very ugly code, so I went for another algorithm which was
also quite cool, but not as cool as Heikki's idea - which I was going to
give credit for.

The array adds 520*max_connections bytes of shared memory, but seems a
good tradeoff for the additional performance it allows. I guess we could
use a bit array, but that's slower to access. The bit array would be
9*max_connections bytes of shared memory, so a 40kB saving.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane

Simon Riggs  writes:
> On Sun, 2010-04-25 at 11:50 -0400, Tom Lane wrote:
>> This needs a redesign before it can be considered committable.  I don't
>> really care whether it makes things faster; it's too complicated and too
>> poorly documented to be maintainable.

> There are more than 60 lines of header comment explaining in detail how
> this works, with a full algorithmic analysis. The remaining code is
> commented to project standards, with easily more than 100 lines of
> comments. 

If the comments were correct, I wouldn't be complaining.  They're
misleading or outright wrong on many points.  In particular, I don't
think you actually understand the weak-memory-ordering issue, because
the comments about that are entirely wrong.  I don't think they are
adequate on other points either --- for example, I now understand why my
complaint of a few minutes ago about KnownAssignedXidsValid is wrong,
but the comments certainly didn't help me on that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Simon Riggs

On Sun, 2010-04-25 at 11:50 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs  wrote:
> > Both Heikki and I objected to that patch.
> >> 
> >> Please explain your objection, based upon the patch and my explanations.
> 
> > Well, we objected to the locking.
> 
> Not only is the locking overly complex, but it's outright wrong;
> or at least the comments are.  KnownAssignedXidsAdd in particular
> has a comment that is 100% misleading, and an actual behavior that
> I find extremely likely to provoke bugs (eg, consider caller calling
> it twice under the illusion it stlll has only shared lock).  

I will reword that comment.

> I'm not
> even convinced that it works correctly, since it will happily write
> into KnownAssignedXids[] while holding only shared ProcArrayLock.
> What happens when two processes are doing that concurrently?

Heikki and I discussed this upthread. Only the Startup process ever
calls KnownAssignedXids, so the situation you describe does not happen.
We currently rely on the fact that we have only a single writer in other
places in Hot Standby, so this is not a restriction we are imposing on
the system just to support this change.

Both you and Heikki have previously spoken against having recovery run
in parallel, so the likelihood of having multiple readers in the future
seems low.

If we were to allow multiple callers of the routine in future, the use
of spinlocks to protect the head of the queue means we can safely
reserve slots and so KnownAssignedXidsAdd can easily be made to accept
multiple writers in the future. I think its possible to future proof
that now without significant change or performance drop, so I'll do
that.

> This needs a redesign before it can be considered committable.  I don't
> really care whether it makes things faster; it's too complicated and too
> poorly documented to be maintainable.

It is sufficiently complex to achieve its objective, which is keeping
track of xids during recovery, using the same locking as we do during
normal running. This is the third re-write of the code, developed over
more than 20 months and specifically modularised by me to make future
changes drop-in replacements. Initially we had an insert-sorted array,
then a hash table and now this current proposal. Recently, Heikki and I
independently came up with almost exactly the same design, with the
algorithmic characteristics we need for performance. The call points,
frequency and workload of calls are documented and well understood by
multiple people.

The only remaining point of discussion has been the code that takes
ProcArrayLock in shared mode. The difference being discussed here is
whether we access the head and tail of the array with both Shared
+spinlock or just Exclusive. The patch would be almost identical with
that change, but would not meet its locking objectives. So making the
locking stricter isn't going to make it more maintainable by a
measurable amount.

There are more than 60 lines of header comment explaining in detail how
this works, with a full algorithmic analysis. The remaining code is
commented to project standards, with easily more than 100 lines of
comments. 

I will recheck every comment and see if other comments can be usefully
added, and further review the code to see if there's anything that can
be done to improve it further. I was going to do this anyway, since I
will be adding an agreed Assert() into the patch.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane

Simon Riggs  writes:
> [ v2 patch ]

BTW, while I'm looking at this, why bother with the separate
KnownAssignedXidsValid[] array?  Wouldn't it be cheaper
(certainly so in storage, probably so in access/update times)
to have just the KnownAssignedXids[] array and store
InvalidTransactionId in unused entries?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Tom Lane

Robert Haas  writes:
> On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs  wrote:
> Both Heikki and I objected to that patch.
>> 
>> Please explain your objection, based upon the patch and my explanations.

> Well, we objected to the locking.

Not only is the locking overly complex, but it's outright wrong;
or at least the comments are.  KnownAssignedXidsAdd in particular
has a comment that is 100% misleading, and an actual behavior that
I find extremely likely to provoke bugs (eg, consider caller calling
it twice under the illusion it stlll has only shared lock).  I'm not
even convinced that it works correctly, since it will happily write
into KnownAssignedXids[] while holding only shared ProcArrayLock.
What happens when two processes are doing that concurrently?

This needs a redesign before it can be considered committable.  I don't
really care whether it makes things faster; it's too complicated and too
poorly documented to be maintainable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Robert Haas

On Sun, Apr 25, 2010 at 8:50 AM, Simon Riggs  wrote:
> On Sun, 2010-04-25 at 06:53 -0400, Robert Haas wrote:
>> On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs  wrote:
>> >> Both Heikki and I objected to that patch.
>> >
>> > Please explain your objection, based upon the patch and my explanations.
>>
>> Well, we objected to the locking.  Having reread the patch a few times
>> though, I think I'm starting to wrap my head around it so, I don't
>> know, maybe it's OK.  Have you tested grabbing the ProcArrayLock in
>> exclusive mode instead of having a separate spinlock, to see how that
>> performs?
>
> They both perform well. Heikki and I agree that spinlocks perform well.
>
> The point of taking the ProcArrayLock in Shared rather than Exclusive
> mode is the reduction in contention that gives. Adding a
> KnownAssignedXid is exactly analogous to starting a transaction. We go
> to some trouble to avoid taking a lock when we start a transaction and I
> want to do a similar thing on the standby. I think it's even more
> important we reduce contention on the standby than it is on the primary
> because if the Startup process is delayed then it slows down recovery.
>
>> >> And apparently it doesn't
>> >> fix the problem, either.  So, -1 from me.
>> >
>> > There is an issue observed in Erik's later tests, but my interpretation
>> > of the results so far is that the sorted array patch successfully
>> > removes the initially reported loss of performance.
>>
>> Is it possible the remaining spikes are due to fights over the spinlock?
>
> That one specific spinlock? I considered it. I think its doubtful.
>
> The spikes all involved a jump from <5ms to much more than that, often
> as long as a second. That looks like a pure I/O problem or an LWlock
> held across a slow I/O or platform specific issues of some kind.
>
> Erik's tests were with 1 backend, so that would mean that at most 2
> processes might want the spinlock you mention. The spinlock is only
> taken at snapshot time, so once per transaction in Erik's tests. So the
> if the spinlock were the problem then one or other of the processes
> would need to grab and hold the spinlock for a very long time and then
> release it sometime later. Since the spinlock code has been with us for
> some time, I doubt there is anything wrong there.
>
> I don't have any evidence as to whether the problem is solely on Erik's
> system, nor whether it is an issue already in the codebase or introduced
> by this patch. It seems unlikely to me that this patch introduces the
> problem, and even more unlikely that the issue is caused by two
> processes fighting over that particular spinlock.
>
> The patch is very effective at reducing overall cost of taking snapshots
> and so I think it needs to be applied to allow more performance data to
> be collected. I don't suppose this is the last performance tuning we
> will need to do.

OK, fair enough.  I withdraw my objections.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Simon Riggs

On Sun, 2010-04-25 at 06:53 -0400, Robert Haas wrote:
> On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs  wrote:
> >> Both Heikki and I objected to that patch.
> >
> > Please explain your objection, based upon the patch and my explanations.
> 
> Well, we objected to the locking.  Having reread the patch a few times
> though, I think I'm starting to wrap my head around it so, I don't
> know, maybe it's OK.  Have you tested grabbing the ProcArrayLock in
> exclusive mode instead of having a separate spinlock, to see how that
> performs?

They both perform well. Heikki and I agree that spinlocks perform well.

The point of taking the ProcArrayLock in Shared rather than Exclusive
mode is the reduction in contention that gives. Adding a
KnownAssignedXid is exactly analogous to starting a transaction. We go
to some trouble to avoid taking a lock when we start a transaction and I
want to do a similar thing on the standby. I think it's even more
important we reduce contention on the standby than it is on the primary
because if the Startup process is delayed then it slows down recovery.

> >> And apparently it doesn't
> >> fix the problem, either.  So, -1 from me.
> >
> > There is an issue observed in Erik's later tests, but my interpretation
> > of the results so far is that the sorted array patch successfully
> > removes the initially reported loss of performance.
> 
> Is it possible the remaining spikes are due to fights over the spinlock?

That one specific spinlock? I considered it. I think its doubtful.

The spikes all involved a jump from <5ms to much more than that, often
as long as a second. That looks like a pure I/O problem or an LWlock
held across a slow I/O or platform specific issues of some kind.

Erik's tests were with 1 backend, so that would mean that at most 2
processes might want the spinlock you mention. The spinlock is only
taken at snapshot time, so once per transaction in Erik's tests. So the
if the spinlock were the problem then one or other of the processes
would need to grab and hold the spinlock for a very long time and then
release it sometime later. Since the spinlock code has been with us for
some time, I doubt there is anything wrong there.

I don't have any evidence as to whether the problem is solely on Erik's
system, nor whether it is an issue already in the codebase or introduced
by this patch. It seems unlikely to me that this patch introduces the
problem, and even more unlikely that the issue is caused by two
processes fighting over that particular spinlock. 

The patch is very effective at reducing overall cost of taking snapshots
and so I think it needs to be applied to allow more performance data to
be collected. I don't suppose this is the last performance tuning we
will need to do.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-25 Thread Robert Haas

On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs  wrote:
>> Both Heikki and I objected to that patch.
>
> Please explain your objection, based upon the patch and my explanations.

Well, we objected to the locking.  Having reread the patch a few times
though, I think I'm starting to wrap my head around it so, I don't
know, maybe it's OK.  Have you tested grabbing the ProcArrayLock in
exclusive mode instead of having a separate spinlock, to see how that
performs?

>> And apparently it doesn't
>> fix the problem, either.  So, -1 from me.
>
> There is an issue observed in Erik's later tests, but my interpretation
> of the results so far is that the sorted array patch successfully
> removes the initially reported loss of performance.

Is it possible the remaining spikes are due to fights over the spinlock?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-24 Thread Simon Riggs

On Fri, 2010-04-23 at 19:07 -0400, Robert Haas wrote:
> On Fri, Apr 23, 2010 at 6:39 PM, Simon Riggs  wrote:
> > On Fri, 2010-04-23 at 11:32 -0400, Robert Haas wrote:
> >> >
> >> > 99% of transactions happen in similar times between primary and standby,
> >> > everything dragged down by rare but severe spikes.
> >> >
> >> > We're looking for something that would delay something that normally
> >> > takes <0.1ms into something that takes >100ms, yet does eventually
> >> > return. That looks like a severe resource contention issue.
> >>
> >> Wow.  Good detective work.
> >
> > While we haven't fully established the source of those problems, I am
> > now happy that these test results don't present any reason to avoid
> > commiting the main patch tested by Erik (not the smaller additional one
> > I sent). I expect to commit that on Sunday.
> 
> Both Heikki and I objected to that patch.  

Please explain your objection, based upon the patch and my explanations.

> And apparently it doesn't
> fix the problem, either.  So, -1 from me.

There is an issue observed in Erik's later tests, but my interpretation
of the results so far is that the sorted array patch successfully
removes the initially reported loss of performance.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Erik Rijkers

On Sat, April 24, 2010 00:39, Simon Riggs wrote:
> On Fri, 2010-04-23 at 11:32 -0400, Robert Haas wrote:
>> >
>> > 99% of transactions happen in similar times between primary and standby,
>> > everything dragged down by rare but severe spikes.
>> >
>> > We're looking for something that would delay something that normally
>> > takes <0.1ms into something that takes >100ms, yet does eventually
>> > return. That looks like a severe resource contention issue.
>>
>> Wow.  Good detective work.
>
> While we haven't fully established the source of those problems, I am
> now happy that these test results don't present any reason to avoid
> commiting the main patch tested by Erik (not the smaller additional one
> I sent). I expect to commit that on Sunday.
>

yes, that (main) patch seems to have largely
closed the gap between primary and standby; here
are some results from a lower scale (10):

  scale: 10
clients: 10, 20, 40, 60, 90
for each: 4x primary, 4x standby:  (6565=primary, 6566=standby)
-
scale: 10  clients: 10  tps = 27624.339871  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 27604.261750  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 28015.093466  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 28422.561280  pgbench -p 6565 -n -S -c 10 -T 
900 -j 1

scale: 10  clients: 10  tps = 27254.806526  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 27686.470866  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 28078.904035  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1
scale: 10  clients: 10  tps = 27101.622337  pgbench -p 6566 -n -S -c 10 -T 
900 -j 1

-
scale: 10  clients: 20  tps = 23106.795587  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 23101.681155  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22893.364004  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 23038.577109  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1

scale: 10  clients: 20  tps = 22903.578552  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22970.691946  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22999.473318  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 10  clients: 20  tps = 22884.854749  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1

-
scale: 10  clients: 40  tps = 23522.499429  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23611.319191  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23616.905302  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23572.213990  pgbench -p 6565 -n -S -c 40 -T 
900 -j 1

scale: 10  clients: 40  tps = 23714.721220  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23711.781175  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23691.867023  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1
scale: 10  clients: 40  tps = 23691.699231  pgbench -p 6566 -n -S -c 40 -T 
900 -j 1

-
scale: 10  clients: 60  tps = 21987.497095  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 21950.344204  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22006.461447  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 21824.071303  pgbench -p 6565 -n -S -c 60 -T 
900 -j 1

scale: 10  clients: 60  tps = 22149.415231  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22211.064402  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22164.238081  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1
scale: 10  clients: 60  tps = 22174.585736  pgbench -p 6566 -n -S -c 60 -T 
900 -j 1

-
scale: 10  clients: 90  tps = 18751.213002  pgbench -p 6565 -n -S -c 90 -T 
900 -j 1
scale: 10  clients: 90  tps = 18757.115811  pgbench -p 6565 -n -S -c 90 -T 
900 -j 1
scale: 10  clients: 90  tps = 18692.942329  pgbench -p 6565 -n -S -c 90 -T 
900 -j 1
scale: 10  clients: 90  tps = 18765.390154  pgbench -p 6565 -n -S -c 90 -T 
900 -j 1

scale: 10  clients: 90  tps = 18929.462104  pgbench -p 6566 -n -S -c 90 -T 
900 -j 1
scale: 10  clients: 90  tps = 18999.851184  pgbench -p 6566 -n -S -c 90 -T 
900 -j 1
scale: 10  clients: 90  tps = 18972.321607  pgbench -p 6566 -n -S -c 90 -T 
900 -j 1
scale: 10  clients: 90  tps = 18924.058827  pgbench -p 6566 -n -S -c 90 -T 
900 -j 1


The higher scales still have that other standby-slowness.  It may be
caching effects (as Mark Kirkwood suggested):  the idea being that the
primary data is pre-cached because of the initial create; standby data
needs to be first-time-read from disk.

Does that make sense?

I will try to confirm this.






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Robert Haas

On Fri, Apr 23, 2010 at 6:39 PM, Simon Riggs  wrote:
> On Fri, 2010-04-23 at 11:32 -0400, Robert Haas wrote:
>> >
>> > 99% of transactions happen in similar times between primary and standby,
>> > everything dragged down by rare but severe spikes.
>> >
>> > We're looking for something that would delay something that normally
>> > takes <0.1ms into something that takes >100ms, yet does eventually
>> > return. That looks like a severe resource contention issue.
>>
>> Wow.  Good detective work.
>
> While we haven't fully established the source of those problems, I am
> now happy that these test results don't present any reason to avoid
> commiting the main patch tested by Erik (not the smaller additional one
> I sent). I expect to commit that on Sunday.

Both Heikki and I objected to that patch.  And apparently it doesn't
fix the problem, either.  So, -1 from me.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Simon Riggs

On Fri, 2010-04-23 at 11:32 -0400, Robert Haas wrote:
> >
> > 99% of transactions happen in similar times between primary and standby,
> > everything dragged down by rare but severe spikes.
> >
> > We're looking for something that would delay something that normally
> > takes <0.1ms into something that takes >100ms, yet does eventually
> > return. That looks like a severe resource contention issue.
> 
> Wow.  Good detective work.

While we haven't fully established the source of those problems, I am
now happy that these test results don't present any reason to avoid
commiting the main patch tested by Erik (not the smaller additional one
I sent). I expect to commit that on Sunday.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Marko Kreen

On 4/23/10, Tom Lane  wrote:
> Marko Kreen  writes:
>  > Um, you have been burned by exactly this on x86 also:
>  >   http://archives.postgresql.org/pgsql-hackers/2009-03/msg01265.php
>
>
>  Yeah, we never did figure out exactly how come you were observing that
>  failure on Intel-ish hardware.  I was under the impression that Intel
>  machines didn't have weak-memory-ordering behavior.
>
>  I wonder whether your compiler had rearranged the code in ProcArrayAdd
>  so that the increment happened before the array element store at the
>  machine-code level.  I think it would be entitled to do that under
>  standard C semantics, since that ProcArrayStruct pointer isn't marked
>  volatile.

Sounds likely.

Which seems to hint its better to handle all processors as weak ordered
and then work with explicit locks/memory barriers, than to sprinkle
code with 'volatile' to supress optimizations on intel and then still
fail on non-intel.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Tom Lane

Marko Kreen  writes:
> Um, you have been burned by exactly this on x86 also:
>   http://archives.postgresql.org/pgsql-hackers/2009-03/msg01265.php

Yeah, we never did figure out exactly how come you were observing that
failure on Intel-ish hardware.  I was under the impression that Intel
machines didn't have weak-memory-ordering behavior.

I wonder whether your compiler had rearranged the code in ProcArrayAdd
so that the increment happened before the array element store at the
machine-code level.  I think it would be entitled to do that under
standard C semantics, since that ProcArrayStruct pointer isn't marked
volatile.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Marko Kreen

On 4/18/10, Simon Riggs  wrote:
> On Sat, 2010-04-17 at 16:48 -0400, Tom Lane wrote:
>  > There are some places where we suppose that a *single* write into shared
>  > memory can safely be done without a lock, if we're not too concerned
>  > about how soon other transactions will see the effects.  But what you
>  > are proposing here requires more than one related write.
>  >
>  > I've been burnt by this myself:
>  > http://archives.postgresql.org/pgsql-committers/2008-06/msg00228.php
>
>
> W O W - thank you for sharing.
>
>  What I'm not clear on is why you've used a spinlock everywhere when only
>  weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
>  macro that does does nada when the hardware already protects us? (i.e. a
>  spinlock only for the hardware that needs it).

Um, you have been burned by exactly this on x86 also:

  http://archives.postgresql.org/pgsql-hackers/2009-03/msg01265.php

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Robert Haas

On Fri, Apr 23, 2010 at 11:14 AM, Simon Riggs  wrote:
> On Thu, 2010-04-22 at 23:45 +0100, Simon Riggs wrote:
>> On Thu, 2010-04-22 at 20:39 +0200, Erik Rijkers wrote:
>> > On Sun, April 18, 2010 13:01, Simon Riggs wrote:
>>
>> > any comment is welcome...
>>
>> Please can you re-run with -l and post me the file of times
>
> Erik has sent me details of a test run. My analysis of that is:
>
> I'm seeing the response time profile on the standby as
> 99% <110us
> 99.9% <639us
> 99.99% <615ms
>
> 0.052% (52 samples) are >5ms elapsed and account for 24 s, which is
> about 45% of elapsed time.
>
> Of the 52 samples >5ms, 50 of them are >100ms and 2 >1s.
>
> 99% of transactions happen in similar times between primary and standby,
> everything dragged down by rare but severe spikes.
>
> We're looking for something that would delay something that normally
> takes <0.1ms into something that takes >100ms, yet does eventually
> return. That looks like a severe resource contention issue.

Wow.  Good detective work.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-23 Thread Simon Riggs

On Thu, 2010-04-22 at 23:45 +0100, Simon Riggs wrote:
> On Thu, 2010-04-22 at 20:39 +0200, Erik Rijkers wrote:
> > On Sun, April 18, 2010 13:01, Simon Riggs wrote:
> 
> > any comment is welcome...
> 
> Please can you re-run with -l and post me the file of times

Erik has sent me details of a test run. My analysis of that is:

I'm seeing the response time profile on the standby as
99% <110us
99.9% <639us
99.99% <615ms

0.052% (52 samples) are >5ms elapsed and account for 24 s, which is
about 45% of elapsed time. 

Of the 52 samples >5ms, 50 of them are >100ms and 2 >1s. 

99% of transactions happen in similar times between primary and standby,
everything dragged down by rare but severe spikes.

We're looking for something that would delay something that normally
takes <0.1ms into something that takes >100ms, yet does eventually
return. That looks like a severe resource contention issue.

This effect happens when running just a single read-only session on
standby from pgbench. No confirmation as yet as to whether recovery is
active or dormant, and what other activitity if any occurs on standby
server at same time. So no other clues as yet as to what the contention
might be, except that we note the standby is writing data and the
database is large.

> Please also rebuild using --enable-profile so we can see what's
> happening.
> 
> Can you also try the enclosed patch which implements prefetching during
> replay of btree delete records. (Need to set effective_io_concurrency)

As yet, no confirmation that the attached patch is even relevant. It was
just a wild guess at some tuning, while we wait for further info.

> Thanks for your further help.

"Some kind of contention" is best we can say at present.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Mark Kirkwood


Erik Rijkers wrote:


This is the same behaviour (i.e. extreme slow standby) that I saw earlier (and 
which caused the
original post, btw).  In that earlier instance, the extreme slowness 
disappeared later, after many
hours maybe even days (without bouncing either primary or standby).

I have no idea what could cause this; is no one else is seeing this ?

(if I have time I'll repeat on other hardware in the weekend)

any comment is welcome...


  


I wonder if what you are seeing is perhaps due to the tables on the 
primary being almost completely cached (from the initial create) and 
those on the standby being at best partially so. That would explain why 
the standby performance catches up after a while ( when its tables are 
equivalently cached).


One way to test this is to 'pre-cache' the standby by selecting every 
row from its tables before running the pgbench test.


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Simon Riggs

On Thu, 2010-04-22 at 20:39 +0200, Erik Rijkers wrote:
> On Sun, April 18, 2010 13:01, Simon Riggs wrote:

> any comment is welcome...

Please can you re-run with -l and post me the file of times

Please also rebuild using --enable-profile so we can see what's
happening.

Can you also try the enclosed patch which implements prefetching during
replay of btree delete records. (Need to set effective_io_concurrency)

Thanks for your further help.

-- 
 Simon Riggs   www.2ndQuadrant.com
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index f4c7bf4..9918688 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -578,6 +578,8 @@ btree_xlog_delete_get_latestRemovedXid(XLogRecord *record)
 	OffsetNumber 	hoffnum;
 	TransactionId	latestRemovedXid = InvalidTransactionId;
 	TransactionId	htupxid = InvalidTransactionId;
+	TransactionId	oldestxmin = GetCurrentOldestXmin(true, true);
+	TransactionId	latestCompletedXid;
 	int i;
 
 	/*
@@ -586,8 +588,12 @@ btree_xlog_delete_get_latestRemovedXid(XLogRecord *record)
 	 * That returns InvalidTransactionId, and so will conflict with
 	 * users, but since we just worked out that's zero people, its OK.
 	 */
-	if (CountDBBackends(InvalidOid) == 0)
-		return latestRemovedXid;
+	if (!TransactionIdIsValid(oldestxmin))
+		return oldestxmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	latestCompletedXid = ShmemVariableCache->latestCompletedXid;
+	LWLockRelease(ProcArrayLock);
 
 	/*
 	 * Get index page
@@ -603,6 +609,27 @@ btree_xlog_delete_get_latestRemovedXid(XLogRecord *record)
 	 */
 	unused = (OffsetNumber *) ((char *) xlrec + SizeOfBtreeDelete);
 
+	/*
+	 * Prefetch the heap buffers.
+	 */
+	for (i = 0; i < xlrec->nitems; i++)
+	{
+		/*
+		 * Identify the index tuple about to be deleted
+		 */
+		iitemid = PageGetItemId(ipage, unused[i]);
+		itup = (IndexTuple) PageGetItem(ipage, iitemid);
+
+		/*
+		 * Locate the heap page that the index tuple points at
+		 */
+		hblkno = ItemPointerGetBlockNumber(&(itup->t_tid));
+		XLogPrefetchBuffer(xlrec->hnode, MAIN_FORKNUM, hblkno);
+	}
+
+	/*
+	 * Read through the heap tids
+	 */
 	for (i = 0; i < xlrec->nitems; i++)
 	{
 		/*
@@ -659,6 +686,16 @@ btree_xlog_delete_get_latestRemovedXid(XLogRecord *record)
 latestRemovedXid = htupxid;
 
 			htupxid = HeapTupleHeaderGetXmax(htuphdr);
+
+			/*
+			 * Stop searching when we've found a recent xid
+			 */
+			if (TransactionIdFollowsOrEquals(htupxid,latestCompletedXid))
+			{
+UnlockReleaseBuffer(hbuffer);
+break;
+			}
+
 			if (TransactionIdFollows(htupxid, latestRemovedXid))
 latestRemovedXid = htupxid;
 		}
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 9ee2036..3ea3a40 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -342,6 +342,16 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	return buffer;
 }
 
+void
+XLogPrefetchBuffer(RelFileNode rnode, ForkNumber forknum,
+	BlockNumber blkno)
+{
+	Relation reln = CreateFakeRelcacheEntry(rnode);
+
+	reln->rd_istemp = false;
+
+	PrefetchBuffer(reln, forknum, blkno);
+}
 
 /*
  * Struct actually returned by XLogFakeRelcacheEntry, though the declared
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 5a214c8..bb23c16 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -933,6 +933,21 @@ TransactionIdIsActive(TransactionId xid)
 TransactionId
 GetOldestXmin(bool allDbs, bool ignoreVacuum)
 {
+	TransactionId result = GetCurrentOldestXmin(allDbs, ignoreVacuum);
+
+	/*
+	 * Compute the cutoff XID, being careful not to generate a "permanent" XID
+	 */
+	result -= vacuum_defer_cleanup_age;
+	if (!TransactionIdIsNormal(result))
+		result = FirstNormalTransactionId;
+
+	return result;
+}
+
+TransactionId
+GetCurrentOldestXmin(bool allDbs, bool ignoreVacuum)
+{
 	ProcArrayStruct *arrayP = procArray;
 	TransactionId result;
 	int			index;
@@ -985,13 +1000,6 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 
 	LWLockRelease(ProcArrayLock);
 
-	/*
-	 * Compute the cutoff XID, being careful not to generate a "permanent" XID
-	 */
-	result -= vacuum_defer_cleanup_age;
-	if (!TransactionIdIsNormal(result))
-		result = FirstNormalTransactionId;
-
 	return result;
 }
 
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 8477f88..caa8aa3 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -28,6 +28,9 @@ extern void XLogTruncateRelation(RelFileNode rnode, ForkNumber forkNum,
 extern Buffer XLogReadBuffer(RelFileNode rnode, BlockNumber blkno, bool init);
 extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	   BlockNumber blkno, ReadBufferMode mode);
+extern void XLogPrefetchBuffer(RelFileNode rnode, ForkNumber forknum,
+		BlockNumber blkno);
+
 
 extern Relation Crea

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Erik Rijkers

On Thu, April 22, 2010 23:54, Mark Kirkwood wrote:
> Greg Smith wrote:
>> Erik Rijkers wrote:
>>> This is the same behaviour (i.e. extreme slow standby) that I saw
>>> earlier (and which caused the
>>> original post, btw).  In that earlier instance, the extreme slowness
>>> disappeared later, after many
>>> hours maybe even days (without bouncing either primary or standby).
>>>
>>
>> Any possibility the standby is built with assertions turned out?
>> That's often the cause of this type of difference between pgbench
>> results on two systems, which easy to introduce when everyone is
>> building from source.  You should try this on both systems:
>>
>> psql -c "show debug_assertions"
>>
>>
>>
> Or even:
>
> pg_config --configure
>
> on both systems might be worth checking.

(these instances are on a single server, btw)

primary:

$ pg_config
BINDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/bin
DOCDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/doc
HTMLDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/doc
INCLUDEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include
PKGINCLUDEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include
INCLUDEDIR-SERVER = 
/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include/server
LIBDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib
PKGLIBDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib
LOCALEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/locale
MANDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/man
SHAREDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share
SYSCONFDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/etc
PGXS = 
/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary' 
'--with-pgport=6565'
'--enable-depend' '--with-openssl' '--with-perl' '--with-libxml' 
'--with-libxslt'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement
-Wendif-labels -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,'/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib'
LDFLAGS_SL =
LIBS = -lpgport -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -ltermcap -lcrypt 
-ldl -lm
VERSION = PostgreSQL 9.0devel-sr_primary
[data:port:db   PGPORT=6565   
PGDATA=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/data  
PGDATABASE=replicas]
2010.04.22 20:55:28 rijk...@denkraam:~/src/perl/85devel [0]
$ time ./run_test_suite.sh
[data:port:db   PGPORT=6565   
PGDATA=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/data  
PGDATABASE=replicas]
2010.04.22 21:00:26 rijk...@denkraam:~/src/perl/85devel [1]

standby:

$ pg_config
BINDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/bin
DOCDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/doc
HTMLDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/doc
INCLUDEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include
PKGINCLUDEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include
INCLUDEDIR-SERVER = 
/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include/server
LIBDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib
PKGLIBDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib
LOCALEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/locale
MANDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/man
SHAREDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share
SYSCONFDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/etc
PGXS = 
/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary' 
'--with-pgport=6565'
'--enable-depend' '--with-openssl' '--with-perl' '--with-libxml' 
'--with-libxslt'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement
-Wendif-labels -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,'/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib'
LDFLAGS_SL =
LIBS = -lpgport -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -ltermcap -lcrypt 
-ldl -lm
VERSION = PostgreSQL 9.0devel-sr_primary



$ grep -Ev '(^[[:space:]]*#)|(^$)' pgsql.sr_*ry/data/postgresql.conf
pgsql.sr_primary/data/postgresql.conf:data_directory =
'/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/data'
pgsql.sr_primary/data/postgresql.conf:port = 6565
pgsql.sr_primary/data/postgresql.conf:max_connections = 100
pgsql.sr_primary/data/postgresql.conf:shared_buffers = 256MB
pgsql.sr_primary/data/postgresql.conf:checkpoint_segments = 50
pgsql.sr_primary/data/postgresql.conf:archive_mode = 'on'
pgsql.sr_primary/data/postgresql.conf:archive_command= 'cp %p
/var/data1/p

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Mark Kirkwood


Greg Smith wrote:

Erik Rijkers wrote:
This is the same behaviour (i.e. extreme slow standby) that I saw 
earlier (and which caused the
original post, btw).  In that earlier instance, the extreme slowness 
disappeared later, after many

hours maybe even days (without bouncing either primary or standby).
  


Any possibility the standby is built with assertions turned out?  
That's often the cause of this type of difference between pgbench 
results on two systems, which easy to introduce when everyone is 
building from source.  You should try this on both systems:


psql -c "show debug_assertions"




Or even:

pg_config --configure

on both systems might be worth checking.


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Greg Smith


Erik Rijkers wrote:

This is the same behaviour (i.e. extreme slow standby) that I saw earlier (and 
which caused the
original post, btw).  In that earlier instance, the extreme slowness 
disappeared later, after many
hours maybe even days (without bouncing either primary or standby).
  


Any possibility the standby is built with assertions turned out?  That's 
often the cause of this type of difference between pgbench results on 
two systems, which easy to introduce when everyone is building from 
source.  You should try this on both systems:


psql -c "show debug_assertions"


just to rule that out.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Erik Rijkers

On Sun, April 18, 2010 13:01, Simon Riggs wrote:
>>
>> OK, I'll put a spinlock around access to the head of the array.
>
> v2 patch attached
>

knownassigned_sortedarray.v2.diff applied to cvs HEAD (2010.04.21 22:36)

I have done a few smaller tests (scale 500, clients 1, 20):

init:
  pgbench -h /tmp -p 6565 -U rijkers -i -s 500 replicas


4x primary, clients 1:
scale: 500 clients:  1  tps = 11496.372655  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps = 11580.141685  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps = 11478.294747  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps = 11741.432016  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1

4x standby, clients 1:
scale: 500 clients:  1  tps =   727.217672  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps =   785.431011  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps =   825.291817  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps =   868.107638  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1


4x primary, clients 20:
scale: 500 clients: 20  tps = 34963.054102  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps = 34818.985407  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps = 34964.545013  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps = 34959.210687  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1

4x standby, clients 20:
scale: 500 clients: 20  tps =  1099.808192  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps =   905.926703  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps =   943.531989  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps =  1082.215913  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1


This is the same behaviour (i.e. extreme slow standby) that I saw earlier (and 
which caused the
original post, btw).  In that earlier instance, the extreme slowness 
disappeared later, after many
hours maybe even days (without bouncing either primary or standby).

I have no idea what could cause this; is no one else is seeing this ?

(if I have time I'll repeat on other hardware in the weekend)

any comment is welcome...


Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Simon Riggs

On Thu, 2010-04-22 at 08:57 +0300, Heikki Linnakangas wrote:
> > 
> > I think the assert is a good idea.  If there's no real problem here,
> > the assert won't trip.  It's just a safety precaution.
> 
> Right. And assertions also act as documentation, they are a precise and
> compact way to document invariants we assume to hold. A comment
> explaining why the cyclic nature of XIDs is not a problem would be nice
> too, in addition or instead of the assertions.

Agreed. I was going to reply just that earlier but have been distracted
on other things.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Heikki Linnakangas

Robert Haas wrote:
> On Wed, Apr 21, 2010 at 9:37 AM, Simon Riggs  wrote:
>> On Wed, 2010-04-21 at 15:27 +0300, Heikki Linnakangas wrote:
>>
>>> Given the discussion about the cyclic nature of XIDs, it would be good
>>> to add an assertion that when a new XID is added to the array, it is
>>>
>>> a) larger than the biggest value already in the array
>>> (TransactionIdFollows(new, head)), and
>>> b) not too far from the smallest value in the array to confuse binary
>>> search (TransactionIdFollows(new, tail)).
>> We discussed this in November. You convinced me it isn't possible for
>> older xids to stay in the standby because anti-wraparound vacuums would
>> conflict and kick them out. The primary can't run with wrapped xids and
>> neither can the standby. I think that is correct.
>>
>> Adding an assertion isn't going to do much because it's unlikely anybody
>> is going to be running for 2^31 transactions with asserts enabled.
>>
>> Worrying about things like this seems strange when real and negative
>> behaviours are right in our faces elsewhere. Performance and scalability
>> are real world concerns.
> 
> I think the assert is a good idea.  If there's no real problem here,
> the assert won't trip.  It's just a safety precaution.

Right. And assertions also act as documentation, they are a precise and
compact way to document invariants we assume to hold. A comment
explaining why the cyclic nature of XIDs is not a problem would be nice
too, in addition or instead of the assertions.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Florian Pflug

On Apr 21, 2010, at 16:49 , Simon Riggs wrote:
> On Wed, 2010-04-21 at 16:22 +0200, marcin mank wrote:
> 
>> Is that not a good idea that (at least for dev-builds, like with
>> enable-cassert) the xid counter start at like 2^31 - 1000 ? It could
>> help catch some bugs.
> 
> It is a good idea, I'm sure that would help catch bugs.
> 
> It wouldn't help here because the case in doubt is whether it's possible
> to have an xid still showing in memory arrays from the last time the
> cycle wrapped. It isn't. These things aren't random. These numbers are
> extracted directly from activity that was occurring on the primary and
> regularly checked and cleaned as the standby runs.
> 
> So you'll need to do 2^31 transactions to prove this isn't true, which
> isn't ever going to happen in testing with an assert build and nobody
> with that many transactions would run an assert build anyway.


ISTM that there's no need to actually execute 2^31 transactions to trigger this 
bug (if it actually exists), it'd be sufficient to increment the xid counter by 
more than one each time a xid is assigned, no?

Or would that trip snapshot creation on the standby?

best regards,
Florian Pflug



smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Robert Haas

On Wed, Apr 21, 2010 at 10:12 AM, Simon Riggs  wrote:
> On Wed, 2010-04-21 at 09:51 -0400, Robert Haas wrote:
>> >
>> > Adding an assertion isn't going to do much because it's unlikely anybody
>> > is going to be running for 2^31 transactions with asserts enabled.
>> >
>
>> I think the assert is a good idea.  If there's no real problem here,
>> the assert won't trip.  It's just a safety precaution.
>
> If you believe that, then I think you should add this to all the other
> places in the current server where that assumption is made without
> assertion being added. As a safety precaution.

I feel like this conversation is getting a little heated.  We are just
trying to solve a technical problem here.  Perhaps I am misreading -
tone doesn't come through very well in email.

I think the assumptions that are being made in this particular case
are different from the ones made elsewhere in the server.  Generally,
we don't assume transaction IDs are arriving in ascending order - in
fact, we usually explicitly have to deal with the fact that they might
not be.   So if we have a situation where we ARE relying on them
arriving in order, because we have extrinsic reasons why we know it
has to happen that way, adding an assertion to make sure that things
are happening the way we expect doesn't seem out of line.  This code
is fairly complex.

There is arguably less value in asserting that the newly added xid
follows the tail as well as the head, but I still like the idea.  Not
sure whether that's rational or not.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Simon Riggs

On Wed, 2010-04-21 at 16:22 +0200, marcin mank wrote:

> Is that not a good idea that (at least for dev-builds, like with
> enable-cassert) the xid counter start at like 2^31 - 1000 ? It could
> help catch some bugs.

It is a good idea, I'm sure that would help catch bugs.

It wouldn't help here because the case in doubt is whether it's possible
to have an xid still showing in memory arrays from the last time the
cycle wrapped. It isn't. These things aren't random. These numbers are
extracted directly from activity that was occurring on the primary and
regularly checked and cleaned as the standby runs.

So you'll need to do 2^31 transactions to prove this isn't true, which
isn't ever going to happen in testing with an assert build and nobody
with that many transactions would run an assert build anyway.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread marcin mank

On Wed, Apr 21, 2010 at 4:12 PM, Simon Riggs  wrote:
> On Wed, 2010-04-21 at 09:51 -0400, Robert Haas wrote:
>> >
>> > Adding an assertion isn't going to do much because it's unlikely anybody
>> > is going to be running for 2^31 transactions with asserts enabled.
>> >
>
>> I think the assert is a good idea.  If there's no real problem here,
>> the assert won't trip.  It's just a safety precaution.
>
> If you believe that, then I think you should add this to all the other
> places in the current server where that assumption is made without
> assertion being added. As a safety precaution.
>

Is that not a good idea that (at least for dev-builds, like with
enable-cassert) the xid counter start at like 2^31 - 1000 ? It could
help catch some bugs.

Greetings
Marcin Mańk

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Simon Riggs

On Wed, 2010-04-21 at 09:51 -0400, Robert Haas wrote:
> >
> > Adding an assertion isn't going to do much because it's unlikely anybody
> > is going to be running for 2^31 transactions with asserts enabled.
> >

> I think the assert is a good idea.  If there's no real problem here,
> the assert won't trip.  It's just a safety precaution.

If you believe that, then I think you should add this to all the other
places in the current server where that assumption is made without
assertion being added. As a safety precaution.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Robert Haas

On Wed, Apr 21, 2010 at 8:20 AM, Heikki Linnakangas
 wrote:
> The locking seems overly complex to me.

I tend to agree.

!   /*
!* Callers must hold either ProcArrayLock in Exclusive mode or
!* ProcArrayLock in Shared mode *and* known_assigned_xids_lck
!* to update these values.
!*/

I'm not convinced that this is either (a) correct or (b) performant.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Robert Haas

On Wed, Apr 21, 2010 at 9:37 AM, Simon Riggs  wrote:
> On Wed, 2010-04-21 at 15:27 +0300, Heikki Linnakangas wrote:
>
>> Given the discussion about the cyclic nature of XIDs, it would be good
>> to add an assertion that when a new XID is added to the array, it is
>>
>> a) larger than the biggest value already in the array
>> (TransactionIdFollows(new, head)), and
>> b) not too far from the smallest value in the array to confuse binary
>> search (TransactionIdFollows(new, tail)).
>
> We discussed this in November. You convinced me it isn't possible for
> older xids to stay in the standby because anti-wraparound vacuums would
> conflict and kick them out. The primary can't run with wrapped xids and
> neither can the standby. I think that is correct.
>
> Adding an assertion isn't going to do much because it's unlikely anybody
> is going to be running for 2^31 transactions with asserts enabled.
>
> Worrying about things like this seems strange when real and negative
> behaviours are right in our faces elsewhere. Performance and scalability
> are real world concerns.

I think the assert is a good idea.  If there's no real problem here,
the assert won't trip.  It's just a safety precaution.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Simon Riggs

On Wed, 2010-04-21 at 15:27 +0300, Heikki Linnakangas wrote:

> Given the discussion about the cyclic nature of XIDs, it would be good
> to add an assertion that when a new XID is added to the array, it is
> 
> a) larger than the biggest value already in the array
> (TransactionIdFollows(new, head)), and
> b) not too far from the smallest value in the array to confuse binary
> search (TransactionIdFollows(new, tail)).

We discussed this in November. You convinced me it isn't possible for
older xids to stay in the standby because anti-wraparound vacuums would
conflict and kick them out. The primary can't run with wrapped xids and
neither can the standby. I think that is correct.

Adding an assertion isn't going to do much because it's unlikely anybody
is going to be running for 2^31 transactions with asserts enabled.

Worrying about things like this seems strange when real and negative
behaviours are right in our faces elsewhere. Performance and scalability
are real world concerns.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Simon Riggs

On Wed, 2010-04-21 at 15:20 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
> >> On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
> >>> Simon Riggs  writes:
>  What I'm not clear on is why you've used a spinlock everywhere when only
>  weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
>  macro that does does nada when the hardware already protects us? (i.e. a
>  spinlock only for the hardware that needs it).
> >>> Well, we could certainly consider that, if we had enough places where
> >>> there was a demonstrable benefit from it.  I couldn't measure any real
> >>> slowdown from adding a spinlock in that sinval code, so I didn't propose
> >>> doing so at the time --- and I'm pretty dubious that this code is
> >>> sufficiently performance-critical to justify the work, either.
> >> OK, I'll put a spinlock around access to the head of the array.
> > 
> > v2 patch attached
> 
> The locking seems overly complex to me. 

> Looking at
> KnownAssignedXidsAdd(), isn't it possible for two processes to call it
> concurrently in exclusive_lock==false mode and get the same 'head'
> value, and step on each others toes? I guess KnownAssignedXidsAdd() is
> only called by the startup process, but it seems like an accident
> waiting to happen.

Not at all. That assumption is also used elsewhere, so it is safe to use
here.

> Spinlocks are fast, if you have to add an if-branch to decide whether to
> acquire it, I suspect you've lost any performance gain to be had
> already.

I think you're misreading the code. One caller already holds exclusive
lock, one does not. The if test is to determine whether to acquire the
lock or not.

> Let's keep it simple. And acquiring ProcArrayLock in exclusive
> mode while adding entries to the array seems OK to me as well. It only
> needs to be held very briefly, and getting this correct and keeping it
> simple is much more important at this point than squeezing out every
> possible CPU cycle from the system.

I don't understand what you're saying: you say I'm wasting a few cycles
on an if test and should change that, but at the same time you say I
shouldn't worry about a few cycles.

> Just require that the caller holds ProcArrayLock in exclusive mode when
> calling an operation that modifies the array, and in shared mode when
> reading it.

The point of the code you're discussing is to remove the exclusive lock
requests, not to save a few cycles. Spinlocks are fast, as you say.

Exclusive lock requests block under a heavy load of shared lock holders,
we know that already. It is worth removing the contention that can occur
by minimising the number of exclusive locks required. This patch shows
how and I don't see any reason from what you have said to avoid
committing it. I'm willing to hear some sound reasons, if any exist.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Heikki Linnakangas

Simon Riggs wrote:
> On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
>> On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
>>> Simon Riggs  writes:
 What I'm not clear on is why you've used a spinlock everywhere when only
 weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
 macro that does does nada when the hardware already protects us? (i.e. a
 spinlock only for the hardware that needs it).
>>> Well, we could certainly consider that, if we had enough places where
>>> there was a demonstrable benefit from it.  I couldn't measure any real
>>> slowdown from adding a spinlock in that sinval code, so I didn't propose
>>> doing so at the time --- and I'm pretty dubious that this code is
>>> sufficiently performance-critical to justify the work, either.
>> OK, I'll put a spinlock around access to the head of the array.
> 
> v2 patch attached

Given the discussion about the cyclic nature of XIDs, it would be good
to add an assertion that when a new XID is added to the array, it is

a) larger than the biggest value already in the array
(TransactionIdFollows(new, head)), and
b) not too far from the smallest value in the array to confuse binary
search (TransactionIdFollows(new, tail)).

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-21 Thread Heikki Linnakangas

Simon Riggs wrote:
> On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
>> On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
>>> Simon Riggs  writes:
 What I'm not clear on is why you've used a spinlock everywhere when only
 weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
 macro that does does nada when the hardware already protects us? (i.e. a
 spinlock only for the hardware that needs it).
>>> Well, we could certainly consider that, if we had enough places where
>>> there was a demonstrable benefit from it.  I couldn't measure any real
>>> slowdown from adding a spinlock in that sinval code, so I didn't propose
>>> doing so at the time --- and I'm pretty dubious that this code is
>>> sufficiently performance-critical to justify the work, either.
>> OK, I'll put a spinlock around access to the head of the array.
> 
> v2 patch attached

The locking seems overly complex to me. Looking at
KnownAssignedXidsAdd(), isn't it possible for two processes to call it
concurrently in exclusive_lock==false mode and get the same 'head'
value, and step on each others toes? I guess KnownAssignedXidsAdd() is
only called by the startup process, but it seems like an accident
waiting to happen.

Spinlocks are fast, if you have to add an if-branch to decide whether to
acquire it, I suspect you've lost any performance gain to be had
already. Let's keep it simple. And acquiring ProcArrayLock in exclusive
mode while adding entries to the array seems OK to me as well. It only
needs to be held very briefly, and getting this correct and keeping it
simple is much more important at this point than squeezing out every
possible CPU cycle from the system.

Just require that the caller holds ProcArrayLock in exclusive mode when
calling an operation that modifies the array, and in shared mode when
reading it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-20 Thread Simon Riggs

On Wed, 2010-04-21 at 15:09 +1200, Mark Kirkwood wrote:

> I did some testing of this patch (v2). Unfortunately I don't have access 
> to hardware capable of doing tests at the same scale as Erik used. 
> However I was still able to show a consistent difference (I think) 
> between standby performance with and without the patch applied.

...

> Overall looks like the patch gets standby read only performance close to 
> the master - at least in the case where there are minimal master 
> transactions being tracked by the standby (I had to leave the master 
> idle whilst running the standby case, as they shared the machine). Hope 
> this info is useful.

Thanks very much for the report; always good to get confirmation.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-20 Thread Robert Haas

On Tue, Apr 20, 2010 at 11:09 PM, Mark Kirkwood
 wrote:
> Robert Haas wrote:
>>
>> So, does anyone have a few cycles to test this out?  We are down to
>> handful of remaining open items, so getting this tested and committed
>> sooner = beta sooner.
>>
>>
>>
>
> I did some testing of this patch (v2). Unfortunately I don't have access to
> hardware capable of doing tests at the same scale as Erik used. However I
> was still able to show a consistent difference (I think) between standby
> performance with and without the patch applied.
>
> Setup:
>
> host: 2.7 Ghz dual core amd64 with 4G ram and 1 sata drive,
> code: cvs head from 2010-04-14.
> pgbench:  scale=100, 4 clients, 1 (select) transactions each.
>
> Results:
>
> Master performance (with and without patch applied ):
> tps = 10903.612340 - 14070.109951 (including connections establishing)
>
> Standby performance without patch (:
> tps = 8288.119913 - 9722.245178 (including connections establishing)
>
> Standby performance with patch applied:
> tps = 11592.922637 - 14065.553214 (including connections establishing)
>
> I performed 8 runs of each, and results would start at the low range and
> climb up to the high one, where they would stabilize. In between runs I
> cleared the os buffer cache and (partially) reloaded it by selecting counts
> from the pgbench tables (i.e I was trying to ensure each run had the same or
> similar os cache contents).
>
> Overall looks like the patch gets standby read only performance close to the
> master - at least in the case where there are minimal master transactions
> being tracked by the standby (I had to leave the master idle whilst running
> the standby case, as they shared the machine). Hope this info is useful.

Thanks, that sounds promising.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-20 Thread Mark Kirkwood


Robert Haas wrote:


So, does anyone have a few cycles to test this out?  We are down to
handful of remaining open items, so getting this tested and committed
sooner = beta sooner.


  


I did some testing of this patch (v2). Unfortunately I don't have access 
to hardware capable of doing tests at the same scale as Erik used. 
However I was still able to show a consistent difference (I think) 
between standby performance with and without the patch applied.


Setup:

host: 2.7 Ghz dual core amd64 with 4G ram and 1 sata drive,
code: cvs head from 2010-04-14.
pgbench:  scale=100, 4 clients, 1 (select) transactions each.

Results:

Master performance (with and without patch applied ):
tps = 10903.612340 - 14070.109951 (including connections establishing)

Standby performance without patch (:
tps = 8288.119913 - 9722.245178 (including connections establishing)

Standby performance with patch applied:
tps = 11592.922637 - 14065.553214 (including connections establishing)

I performed 8 runs of each, and results would start at the low range and 
climb up to the high one, where they would stabilize. In between runs I 
cleared the os buffer cache and (partially) reloaded it by selecting 
counts from the pgbench tables (i.e I was trying to ensure each run had 
the same or similar os cache contents).


Overall looks like the patch gets standby read only performance close to 
the master - at least in the case where there are minimal master 
transactions being tracked by the standby (I had to leave the master 
idle whilst running the standby case, as they shared the machine). Hope 
this info is useful.


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-20 Thread Robert Haas

On Sun, Apr 18, 2010 at 4:22 PM, Simon Riggs  wrote:
> On Sun, 2010-04-18 at 13:16 -0700, David Fetter wrote:
>> On Sun, Apr 18, 2010 at 12:01:05PM +0100, Simon Riggs wrote:
>> > On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
>> > > On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
>> > > > Simon Riggs  writes:
>> > > > > What I'm not clear on is why you've used a spinlock everywhere
>> > > > > when only weak-memory thang CPUs are a problem. Why not have a
>> > > > > weak-memory-protect macro that does does nada when the
>> > > > > hardware already protects us? (i.e. a spinlock only for the
>> > > > > hardware that needs it).
>> > > >
>> > > > Well, we could certainly consider that, if we had enough places
>> > > > where there was a demonstrable benefit from it.  I couldn't
>> > > > measure any real slowdown from adding a spinlock in that sinval
>> > > > code, so I didn't propose doing so at the time --- and I'm
>> > > > pretty dubious that this code is sufficiently
>> > > > performance-critical to justify the work, either.
>> > >
>> > > OK, I'll put a spinlock around access to the head of the array.
>> >
>> > v2 patch attached
>>
>> If you've committed this, or any other patch you've sent here,
>> *please* mention so on the same thread.
>
> I haven't yet. I've written two patches - this is a major module rewrite
> and is still under discussion. The other patch has nothing to do with
> this (though I did accidentally include a couple of changes from this
> patch and immediately revoked them).
>
> I will wait awhile to see if anybody has some independent test results.

So, does anyone have a few cycles to test this out?  We are down to
handful of remaining open items, so getting this tested and committed
sooner = beta sooner.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-18 Thread David Fetter

On Sun, Apr 18, 2010 at 09:22:21PM +0100, Simon Riggs wrote:
> On Sun, 2010-04-18 at 13:16 -0700, David Fetter wrote:
> > On Sun, Apr 18, 2010 at 12:01:05PM +0100, Simon Riggs wrote:
> > > On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
> > > > On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
> > > > > Simon Riggs  writes:
> > > > > > What I'm not clear on is why you've used a spinlock
> > > > > > everywhere when only weak-memory thang CPUs are a problem.
> > > > > > Why not have a weak-memory-protect macro that does does
> > > > > > nada when the hardware already protects us? (i.e. a
> > > > > > spinlock only for the hardware that needs it).
> > > > > 
> > > > > Well, we could certainly consider that, if we had enough
> > > > > places where there was a demonstrable benefit from it.  I
> > > > > couldn't measure any real slowdown from adding a spinlock in
> > > > > that sinval code, so I didn't propose doing so at the time
> > > > > --- and I'm pretty dubious that this code is sufficiently
> > > > > performance-critical to justify the work, either.
> > > > 
> > > > OK, I'll put a spinlock around access to the head of the
> > > > array.
> > > 
> > > v2 patch attached
> > 
> > If you've committed this, or any other patch you've sent here,
> > *please* mention so on the same thread.
> 
> I haven't yet. I've written two patches - this is a major module
> rewrite and is still under discussion. The other patch has nothing
> to do with this (though I did accidentally include a couple of
> changes from this patch and immediately revoked them).
> 
> I will wait awhile to see if anybody has some independent test
> results.

Thanks :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-18 Thread Simon Riggs

On Sun, 2010-04-18 at 13:16 -0700, David Fetter wrote:
> On Sun, Apr 18, 2010 at 12:01:05PM +0100, Simon Riggs wrote:
> > On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
> > > On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
> > > > Simon Riggs  writes:
> > > > > What I'm not clear on is why you've used a spinlock everywhere
> > > > > when only weak-memory thang CPUs are a problem. Why not have a
> > > > > weak-memory-protect macro that does does nada when the
> > > > > hardware already protects us? (i.e. a spinlock only for the
> > > > > hardware that needs it).
> > > > 
> > > > Well, we could certainly consider that, if we had enough places
> > > > where there was a demonstrable benefit from it.  I couldn't
> > > > measure any real slowdown from adding a spinlock in that sinval
> > > > code, so I didn't propose doing so at the time --- and I'm
> > > > pretty dubious that this code is sufficiently
> > > > performance-critical to justify the work, either.
> > > 
> > > OK, I'll put a spinlock around access to the head of the array.
> > 
> > v2 patch attached
> 
> If you've committed this, or any other patch you've sent here,
> *please* mention so on the same thread.

I haven't yet. I've written two patches - this is a major module rewrite
and is still under discussion. The other patch has nothing to do with
this (though I did accidentally include a couple of changes from this
patch and immediately revoked them).

I will wait awhile to see if anybody has some independent test results.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-18 Thread David Fetter

On Sun, Apr 18, 2010 at 12:01:05PM +0100, Simon Riggs wrote:
> On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
> > On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
> > > Simon Riggs  writes:
> > > > What I'm not clear on is why you've used a spinlock everywhere
> > > > when only weak-memory thang CPUs are a problem. Why not have a
> > > > weak-memory-protect macro that does does nada when the
> > > > hardware already protects us? (i.e. a spinlock only for the
> > > > hardware that needs it).
> > > 
> > > Well, we could certainly consider that, if we had enough places
> > > where there was a demonstrable benefit from it.  I couldn't
> > > measure any real slowdown from adding a spinlock in that sinval
> > > code, so I didn't propose doing so at the time --- and I'm
> > > pretty dubious that this code is sufficiently
> > > performance-critical to justify the work, either.
> > 
> > OK, I'll put a spinlock around access to the head of the array.
> 
> v2 patch attached

If you've committed this, or any other patch you've sent here,
*please* mention so on the same thread.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-18 Thread Simon Riggs

On Sun, 2010-04-18 at 08:24 +0100, Simon Riggs wrote:
> On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
> > Simon Riggs  writes:
> > > What I'm not clear on is why you've used a spinlock everywhere when only
> > > weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
> > > macro that does does nada when the hardware already protects us? (i.e. a
> > > spinlock only for the hardware that needs it).
> > 
> > Well, we could certainly consider that, if we had enough places where
> > there was a demonstrable benefit from it.  I couldn't measure any real
> > slowdown from adding a spinlock in that sinval code, so I didn't propose
> > doing so at the time --- and I'm pretty dubious that this code is
> > sufficiently performance-critical to justify the work, either.
> 
> OK, I'll put a spinlock around access to the head of the array.

v2 patch attached

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
***
*** 1200,1205  StandbyTransactionIdIsPrepared(TransactionId xid)
--- 1200,1208 
  
  	Assert(TransactionIdIsValid(xid));
  
+ 	if (max_prepared_xacts <= 0)
+ 		return false;	/* nothing to do */
+ 
  	/* Read and validate file */
  	buf = ReadTwoPhaseFile(xid, false);
  	if (buf == NULL)
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 6453,6458  CheckRecoveryConsistency(void)
--- 6453,6464 
  	}
  }
  
+ bool
+ XLogConsistentState(void)
+ {
+ 	return reachedMinRecoveryPoint;
+ }
+ 
  /*
   * Is the system still in recovery?
   *
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***
*** 52,57 
--- 52,58 
  #include "access/twophase.h"
  #include "miscadmin.h"
  #include "storage/procarray.h"
+ #include "storage/spin.h"
  #include "storage/standby.h"
  #include "utils/builtins.h"
  #include "utils/snapmgr.h"
***
*** 64,73  typedef struct ProcArrayStruct
  	int			numProcs;		/* number of valid procs entries */
  	int			maxProcs;		/* allocated size of procs array */
  
! 	int			numKnownAssignedXids;	/* current number of known assigned
! 		 * xids */
! 	int			maxKnownAssignedXids;	/* allocated size of known assigned
! 		 * xids */
  
  	/*
  	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
--- 65,84 
  	int			numProcs;		/* number of valid procs entries */
  	int			maxProcs;		/* allocated size of procs array */
  
! 	/*
! 	 * Known assigned xids handling
! 	 */
! 	int			maxKnownAssignedXids;	/* allocated size */
! 
! 	/*
! 	 * Callers must hold either ProcArrayLock in Exclusive mode or
! 	 * ProcArrayLock in Shared mode *and* known_assigned_xids_lck
! 	 * to update these values.
! 	 */
! 	int			numKnownAssignedXids;	/* currrent # valid entries */
! 	int			tailKnownAssignedXids;	/* current tail */
! 	int			headKnownAssignedXids;	/* current head */
! 	slock_t		known_assigned_xids_lck;	/* shared protection lock */
  
  	/*
  	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
***
*** 87,93  static ProcArrayStruct *procArray;
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
! static HTAB *KnownAssignedXidsHash;
  static TransactionId latestObservedXid = InvalidTransactionId;
  
  /*
--- 98,105 
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
! static TransactionId *KnownAssignedXids;
! static bool *KnownAssignedXidsValid;
  static TransactionId latestObservedXid = InvalidTransactionId;
  
  /*
***
*** 142,150  static int	KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax);
  static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
  			   TransactionId xmax);
  static bool KnownAssignedXidsExist(TransactionId xid);
! static void KnownAssignedXidsAdd(TransactionId *xids, int nxids);
  static void KnownAssignedXidsRemove(TransactionId xid);
! static void KnownAssignedXidsRemoveMany(TransactionId xid, bool keepPreparedXacts);
  static void KnownAssignedXidsDisplay(int trace_level);
  
  /*
--- 154,166 
  static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
  			   TransactionId xmax);
  static bool KnownAssignedXidsExist(TransactionId xid);
! static void KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
! bool exclusive_lock);
  static void KnownAssignedXidsRemove(TransactionId xid);
! static void KnownAssignedXidsRemoveMany(TransactionId xid);
! static void KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
! 	  TransactionId *subxids);
! static void KnownAssignedXidsCompress(bool full);
  static void KnownAssignedXidsDisplay(int trace_level);
  
  /*
***
*** 204,228  CreateSharedProcArray(void)
  		procArray->numKnownAssignedXids = 0;
  		procArray->maxKnownAssignedXids = TOTAL_MA

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-18 Thread Simon Riggs

On Sat, 2010-04-17 at 18:52 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > What I'm not clear on is why you've used a spinlock everywhere when only
> > weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
> > macro that does does nada when the hardware already protects us? (i.e. a
> > spinlock only for the hardware that needs it).
> 
> Well, we could certainly consider that, if we had enough places where
> there was a demonstrable benefit from it.  I couldn't measure any real
> slowdown from adding a spinlock in that sinval code, so I didn't propose
> doing so at the time --- and I'm pretty dubious that this code is
> sufficiently performance-critical to justify the work, either.

OK, I'll put a spinlock around access to the head of the array.

Thanks for your input.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Tom Lane

Simon Riggs  writes:
> What I'm not clear on is why you've used a spinlock everywhere when only
> weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
> macro that does does nada when the hardware already protects us? (i.e. a
> spinlock only for the hardware that needs it).

Well, we could certainly consider that, if we had enough places where
there was a demonstrable benefit from it.  I couldn't measure any real
slowdown from adding a spinlock in that sinval code, so I didn't propose
doing so at the time --- and I'm pretty dubious that this code is
sufficiently performance-critical to justify the work, either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Simon Riggs

On Sat, 2010-04-17 at 16:48 -0400, Tom Lane wrote:

> > We search the array between tail and head. If the head moves by integer
> > overwrite just as already happens for xid assignment, then we would use
> > the new head for the search. The code is careful to fetch only once.
> 
> ... but this will not.  You need to use a lock, because there is
> otherwise no guarantee that other processors see the write into the
> array element before they see the change in the head pointer.
> 
> > I would freely admit I know absolutely nothing about details of
> > weak-memory-ordering machines and have not considered them at all. How
> > would what I have proposed fail to work, yet what we already rely on
> > work correctly? Do the circumstances differ?
> 
> Yes.  We have memory ordering instructions inserted in the lock
> acquisition/release code.  Trying to access and modify a shared-memory
> data structure without any locking will not work.
> 
> There are some places where we suppose that a *single* write into shared
> memory can safely be done without a lock, if we're not too concerned
> about how soon other transactions will see the effects.  But what you
> are proposing here requires more than one related write.
> 
> I've been burnt by this myself:
> http://archives.postgresql.org/pgsql-committers/2008-06/msg00228.php

W O W - thank you for sharing.

What I'm not clear on is why you've used a spinlock everywhere when only
weak-memory thang CPUs are a problem. Why not have a weak-memory-protect
macro that does does nada when the hardware already protects us? (i.e. a
spinlock only for the hardware that needs it).

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Tom Lane

Simon Riggs  writes:
> On Sat, 2010-04-17 at 15:45 -0400, Tom Lane wrote:
>> How do you know that just adding items at the right will produce a
>> sorted array?

> Xids don't arrive in sequence, but "known assigned xids" are added in
> sequence because we infer the existence of the intermediate xids and
> assuming they are running for the snapshot.

Hm.  Okay, maybe that will work.

>> ... and even without that issue, this seems like utter fantasy.  How
>> are you going to do that "atomically"?  Have you considered what will
>> happen on weak-memory-ordering machines like PPC, in particular?

> We search the array between tail and head. If the head moves by integer
> overwrite just as already happens for xid assignment, then we would use
> the new head for the search. The code is careful to fetch only once.

... but this will not.  You need to use a lock, because there is
otherwise no guarantee that other processors see the write into the
array element before they see the change in the head pointer.

> I would freely admit I know absolutely nothing about details of
> weak-memory-ordering machines and have not considered them at all. How
> would what I have proposed fail to work, yet what we already rely on
> work correctly? Do the circumstances differ?

Yes.  We have memory ordering instructions inserted in the lock
acquisition/release code.  Trying to access and modify a shared-memory
data structure without any locking will not work.

There are some places where we suppose that a *single* write into shared
memory can safely be done without a lock, if we're not too concerned
about how soon other transactions will see the effects.  But what you
are proposing here requires more than one related write.

I've been burnt by this myself:
http://archives.postgresql.org/pgsql-committers/2008-06/msg00228.php

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Simon Riggs

On Sat, 2010-04-17 at 15:45 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On Sat, 2010-04-17 at 11:13 -0400, Tom Lane wrote:
> >> It'd be cheaper anyway to sort and search the
> >> array using plain <, so why are you so eager to use
> >> TransactionIdFollows?
> 
> > The array grows to the right and is laid out one xid per element,
> > resulting in a sequence of values that are transactionid-monotonic.
> 
> How do you know that just adding items at the right will produce a
> sorted array?  It seems quite unlikely to me that transactions can be
> guaranteed to arrive at this code in XID order.  I think you need to do
> an explicitly sorted insertion.

Xids don't arrive in sequence, but "known assigned xids" are added in
sequence because we infer the existence of the intermediate xids and
assuming they are running for the snapshot.

> > ... Doing it this way means that we can
> > add rows past the head of the array and then move the head atomically,
> > so that we can make adding xids lock-free.
> 
> ... and even without that issue, this seems like utter fantasy.  How
> are you going to do that "atomically"?  Have you considered what will
> happen on weak-memory-ordering machines like PPC, in particular?

We search the array between tail and head. If the head moves by integer
overwrite just as already happens for xid assignment, then we would use
the new head for the search. The code is careful to fetch only once.

I would freely admit I know absolutely nothing about details of
weak-memory-ordering machines and have not considered them at all. How
would what I have proposed fail to work, yet what we already rely on
work correctly? Do the circumstances differ?

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Tom Lane

Simon Riggs  writes:
> On Sat, 2010-04-17 at 11:13 -0400, Tom Lane wrote:
>> It'd be cheaper anyway to sort and search the
>> array using plain <, so why are you so eager to use
>> TransactionIdFollows?

> The array grows to the right and is laid out one xid per element,
> resulting in a sequence of values that are transactionid-monotonic.

How do you know that just adding items at the right will produce a
sorted array?  It seems quite unlikely to me that transactions can be
guaranteed to arrive at this code in XID order.  I think you need to do
an explicitly sorted insertion.

> ... Doing it this way means that we can
> add rows past the head of the array and then move the head atomically,
> so that we can make adding xids lock-free.

... and even without that issue, this seems like utter fantasy.  How
are you going to do that "atomically"?  Have you considered what will
happen on weak-memory-ordering machines like PPC, in particular?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Simon Riggs

On Sat, 2010-04-17 at 11:13 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > AFAICS the example you give isn't correct.
> 
> > We would lay out the values like this
> 
> > W-3 W-2 W-1 W 3 4 5
> 
> > where W is the wrap value
> 
> Stop right there, you're already failing to think clearly.  There is no
> unique "wrap value", all values act the same in circular XID space.
> 
> The fundamental problem here is that if the set of XIDs involved spans a
> sufficiently long range, your array will have a[0] < a[1] < ... < a[n]
> but it will fail to be true that a[0] < a[n].  If that's also true for,
> say, a[0] vs the midpoint element, then a binary search for a[0] will
> fail because it will make the wrong decision while probing the midpoint.

I understand that the xids are circular, but there is only one point
where the next xid has a lower integer value but is still "later" from
the perspective of TransactionIdFollows. Apart from that single
discontinuity all other values are monotonic. From your insistence I
presume I must be missing something, but I just don't see it. Perhaps we
are just misunderstanding each other about the "sufficiently long
range".

> > The values are laid out in TransactionIdFollows order,
> 
> They *cannot* be "laid out in TransactionIdFollows order".  It's not
> possible, because that relationship isn't a total ordering.
> 
> Now it might be the case that this is OK for HS purposes because the set
> of XIDs that are relevant at any one instant should never span more than
> half of the XID space.

Agree that is true.

>  But I'd just as soon not use that assumption
> here if it's unnecessary. 

I understand what your saying. I think it is necessary here.

>  It'd be cheaper anyway to sort and search the
> array using plain <, so why are you so eager to use
> TransactionIdFollows?

The array grows to the right and is laid out one xid per element,
resulting in a sequence of values that are transactionid-monotonic. As
the values increase there will eventually reach the discontinuity where
they cease being normally monotonic. Doing it this way means that we can
add rows past the head of the array and then move the head atomically,
so that we can make adding xids lock-free.

If we try to actually sort the values then the algorithm is both more
complex and requires locking. It would be easier to just remember where
the discontinuity is and act accordingly.

So I'm not eager to use either way, but I only currently see one way
that would work.

If there's a different way, that gives the same or better algorithmic
characteristics, I will be happy to code it.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Robert Haas

On Sat, Apr 17, 2010 at 11:13 AM, Tom Lane  wrote:
> Simon Riggs  writes:
>> AFAICS the example you give isn't correct.
>
>> We would lay out the values like this
>
>> W-3 W-2 W-1 W 3 4 5
>
>> where W is the wrap value
>
> Stop right there, you're already failing to think clearly.  There is no
> unique "wrap value", all values act the same in circular XID space.

Or to put it a different way, what happens when the "wrap value"
changes?  An array that was in order under one wrap value can cease to
be in order under another.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Tom Lane

Simon Riggs  writes:
> AFAICS the example you give isn't correct.

> We would lay out the values like this

> W-3 W-2 W-1 W 3 4 5

> where W is the wrap value

Stop right there, you're already failing to think clearly.  There is no
unique "wrap value", all values act the same in circular XID space.

The fundamental problem here is that if the set of XIDs involved spans a
sufficiently long range, your array will have a[0] < a[1] < ... < a[n]
but it will fail to be true that a[0] < a[n].  If that's also true for,
say, a[0] vs the midpoint element, then a binary search for a[0] will
fail because it will make the wrong decision while probing the midpoint.

> The values are laid out in TransactionIdFollows order,

They *cannot* be "laid out in TransactionIdFollows order".  It's not
possible, because that relationship isn't a total ordering.

Now it might be the case that this is OK for HS purposes because the set
of XIDs that are relevant at any one instant should never span more than
half of the XID space.  But I'd just as soon not use that assumption
here if it's unnecessary.  It'd be cheaper anyway to sort and search the
array using plain <, so why are you so eager to use
TransactionIdFollows?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-17 Thread Simon Riggs

On Fri, 2010-04-16 at 11:10 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On Fri, 2010-04-16 at 10:39 -0400, Tom Lane wrote:
> >> I think you're outsmarting yourself there.  A binary search will in fact
> >> *not work* with circular xid comparison (this is exactly why there's no
> >> btree opclass for XID).
> 
> > I don't understand the exact, please explain more.
> > I'm not using bsearch() just a quick chop based upon xid comparison,
> > which looks to me like it will work.
> 
> Implementing it yourself doesn't get you out of the fact that it won't
> work.  Consider
> 
>   1 2 3 ... 30 ... 39
> 
> and suppose that 30 is the array's middle element.  If you
> search for 100, your first probe will conclude that it is to the right
> of 30, which is the wrong direction.  Binary search, or indeed
> the entire concept that the array is "sorted" in the first place,
> depends on a transitive comparison operator.  TransactionIdFollows does
> not satisfy the transitive law.

AFAICS the example you give isn't correct.

We would lay out the values like this

W-3 W-2 W-1 W 3 4 5

where W is the wrap value and in this example we have 7 values in the
current array, with tail at W-3 and head at 5. Note the gap between W
and 3 where we would skip the values 1 and 2 because they are special.
Each element's xid is TransactionIdAdvanced(previous element).

So when we search for value 3 we would start from W, then decide it is
to the right, which is correct and continue from there.

The values are laid out in TransactionIdFollows order, not in numeric
order, hence we need to use TransactionIdFollows to decide which way to
branch.

As long as it works I'm not worried if the array is not technically
"sorted".

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Simon Riggs

On Fri, 2010-04-16 at 13:00 +0100, Simon Riggs wrote:
> Almost done

Here's the finished patch.

Internal changes only, no functional or user visible changes, so no
docs, just detailed explanatory comments.

Expectation is
* performance no longer depends upon max_connections
* recovery will be about same or slightly faster
* snapshots should be about equivalent primary/standby

Main changes are
* lock free add to sorted array (equivalent to GetNewTransactionId)
* bsearch for xid removal at commit/abort/TransactionIdIsInProgress
* faster, more compact approach to snapshots, with self-trimming
* some minor API changes to facilitate above
* new code to ignore deletion failures only when !consistent

Erik,

Could you retest please? 

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
***
*** 1200,1205  StandbyTransactionIdIsPrepared(TransactionId xid)
--- 1200,1208 
  
  	Assert(TransactionIdIsValid(xid));
  
+ 	if (max_prepared_xacts <= 0)
+ 		return false;	/* nothing to do */
+ 
  	/* Read and validate file */
  	buf = ReadTwoPhaseFile(xid, false);
  	if (buf == NULL)
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 6453,6458  CheckRecoveryConsistency(void)
--- 6453,6464 
  	}
  }
  
+ bool
+ XLogConsistentState(void)
+ {
+ 	return reachedMinRecoveryPoint;
+ }
+ 
  /*
   * Is the system still in recovery?
   *
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***
*** 68,73  typedef struct ProcArrayStruct
--- 68,77 
  		 * xids */
  	int			maxKnownAssignedXids;	/* allocated size of known assigned
  		 * xids */
+ 	int			tailKnownAssignedXids;	/* current tail of known assigned
+ 		 * xids */
+ 	int			headKnownAssignedXids;	/* current head of known assigned
+ 		 * xids */
  
  	/*
  	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
***
*** 87,93  static ProcArrayStruct *procArray;
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
! static HTAB *KnownAssignedXidsHash;
  static TransactionId latestObservedXid = InvalidTransactionId;
  
  /*
--- 91,98 
  /*
   * Bookkeeping for tracking emulated transactions in recovery
   */
! static TransactionId *KnownAssignedXids;
! static bool *KnownAssignedXidsValid;
  static TransactionId latestObservedXid = InvalidTransactionId;
  
  /*
***
*** 142,150  static int	KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax);
  static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
  			   TransactionId xmax);
  static bool KnownAssignedXidsExist(TransactionId xid);
! static void KnownAssignedXidsAdd(TransactionId *xids, int nxids);
  static void KnownAssignedXidsRemove(TransactionId xid);
! static void KnownAssignedXidsRemoveMany(TransactionId xid, bool keepPreparedXacts);
  static void KnownAssignedXidsDisplay(int trace_level);
  
  /*
--- 147,158 
  static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
  			   TransactionId xmax);
  static bool KnownAssignedXidsExist(TransactionId xid);
! static void KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid, bool have_lock);
  static void KnownAssignedXidsRemove(TransactionId xid);
! static void KnownAssignedXidsRemoveMany(TransactionId xid);
! static void KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
! 	  TransactionId *subxids);
! static void KnownAssignedXidsCompress(bool full);
  static void KnownAssignedXidsDisplay(int trace_level);
  
  /*
***
*** 204,228  CreateSharedProcArray(void)
  		procArray->numKnownAssignedXids = 0;
  		procArray->maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
  		procArray->lastOverflowedXid = InvalidTransactionId;
  	}
  
  	if (XLogRequestRecoveryConnections)
  	{
! 		/* Create or attach to the KnownAssignedXids hash table */
! 		HASHCTL		info;
! 
! 		MemSet(&info, 0, sizeof(info));
! 		info.keysize = sizeof(TransactionId);
! 		info.entrysize = sizeof(TransactionId);
! 		info.hash = tag_hash;
! 
! 		KnownAssignedXidsHash = ShmemInitHash("KnownAssignedXids Hash",
! 			  TOTAL_MAX_CACHED_SUBXIDS,
! 			  TOTAL_MAX_CACHED_SUBXIDS,
! 			  &info,
! 			  HASH_ELEM | HASH_FUNCTION);
! 		if (!KnownAssignedXidsHash)
! 			elog(FATAL, "could not initialize known assigned xids hash table");
  	}
  }
  
--- 212,231 
  		procArray->numKnownAssignedXids = 0;
  		procArray->maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
  		procArray->lastOverflowedXid = InvalidTransactionId;
+ 		procArray->tailKnownAssignedXids = 0;
+ 		procArray->headKnownAssignedXids = 0;
  	}
  
  	if (XLogRequestRecoveryConnections)
  	{
! 		KnownAssignedXids = (TransactionId *)
! ShmemInitStruct("KnownAssignedXids",
! 		mul_size(size

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Tom Lane

Simon Riggs  writes:
> On Fri, 2010-04-16 at 10:39 -0400, Tom Lane wrote:
>> I think you're outsmarting yourself there.  A binary search will in fact
>> *not work* with circular xid comparison (this is exactly why there's no
>> btree opclass for XID).

> I don't understand the exact, please explain more.
> I'm not using bsearch() just a quick chop based upon xid comparison,
> which looks to me like it will work.

Implementing it yourself doesn't get you out of the fact that it won't
work.  Consider

1 2 3 ... 30 ... 39

and suppose that 30 is the array's middle element.  If you
search for 100, your first probe will conclude that it is to the right
of 30, which is the wrong direction.  Binary search, or indeed
the entire concept that the array is "sorted" in the first place,
depends on a transitive comparison operator.  TransactionIdFollows does
not satisfy the transitive law.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Simon Riggs

On Fri, 2010-04-16 at 10:39 -0400, Tom Lane wrote:
> Heikki Linnakangas  writes:
> > I didn't handle xid wraparound correctly in the binary search, need to use
> > TransactionIdFollows instead of plan >.
> 
> I think you're outsmarting yourself there.  A binary search will in fact
> *not work* with circular xid comparison (this is exactly why there's no
> btree opclass for XID).  You need to use plain >, and make sure the
> array you're searching is ordered that way too.  The other way might
> accidentally fail to malfunction if you only tested ranges of XIDs that
> weren't long enough to wrap around, but that doesn't make it right.

I don't understand the exact, please explain more.

I'm not using bsearch() just a quick chop based upon xid comparison,
which looks to me like it will work.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Tom Lane

Heikki Linnakangas  writes:
> I didn't handle xid wraparound correctly in the binary search, need to use
> TransactionIdFollows instead of plan >.

I think you're outsmarting yourself there.  A binary search will in fact
*not work* with circular xid comparison (this is exactly why there's no
btree opclass for XID).  You need to use plain >, and make sure the
array you're searching is ordered that way too.  The other way might
accidentally fail to malfunction if you only tested ranges of XIDs that
weren't long enough to wrap around, but that doesn't make it right.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Simon Riggs

On Fri, 2010-04-16 at 14:47 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Fri, 2010-04-16 at 11:29 +0300, Heikki Linnakangas wrote:
> >> How does the attached patch look like? It's probably similar to what you
> >> had in mind.
> > 
> > It looks like a second version of what I'm working on and about to
> > publish. I'll take that as a compliment!
> > 
> > My patch is attached here also, for discussion.
> > 
> > The two patches look same in their main parts, though I have quite a few
> > extra tweaks in there, which you can read about in comments.
> 
> Yeah. Yours seems a lot more complex with all those extra tweaks, I
> would suggest to keep it simple. I did realize one bug in my patch: I
> didn't handle xid wraparound correctly in the binary search, need to use
> TransactionIdFollows instead of plan >.

Almost done, yes, much simpler. I wrote a lot of that in the wee small
hours last night, so the difference is amusing.

And I spotted that bug, plus the off by one error also. Just rewritten
all other parts, so no worries.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Heikki Linnakangas

Simon Riggs wrote:
> On Fri, 2010-04-16 at 11:29 +0300, Heikki Linnakangas wrote:
>> How does the attached patch look like? It's probably similar to what you
>> had in mind.
> 
> It looks like a second version of what I'm working on and about to
> publish. I'll take that as a compliment!
> 
> My patch is attached here also, for discussion.
> 
> The two patches look same in their main parts, though I have quite a few
> extra tweaks in there, which you can read about in comments.

Yeah. Yours seems a lot more complex with all those extra tweaks, I
would suggest to keep it simple. I did realize one bug in my patch: I
didn't handle xid wraparound correctly in the binary search, need to use
TransactionIdFollows instead of plan >.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Simon Riggs

On Fri, 2010-04-16 at 11:29 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Tue, 2010-04-13 at 21:09 +0300, Heikki Linnakangas wrote:
> >> A quick fix would be to check if there's any entries in the hash table
> >> before scanning it. That would eliminate the overhead when there's no
> >> in-progress transactions in the master. But as soon as there's even one,
> >> the overhead comes back.
> > 
> > Any fix should be fairly quick because of the way its modularised - with
> > something like this in mind.
> > 
> > I'll try a circular buffer implementation, with fastpath.
> 
> I started experimenting with a sorted array based implementation on
> Tuesday but got carried away with other stuff. I now got back to that
> and cleaned it up.
> 
> How does the attached patch look like? It's probably similar to what you
> had in mind.

It looks like a second version of what I'm working on and about to
publish. I'll take that as a compliment!

My patch is attached here also, for discussion.

The two patches look same in their main parts, though I have quite a few
extra tweaks in there, which you can read about in comments. One tweak I
don't have is the use of the presence array that allows a sensible
bsearch, so I'll to alter my patch to use that idea but keep the rest of
my code.

-- 
 Simon Riggs   www.2ndQuadrant.com
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 3cc9bdd..d432c9d 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1200,6 +1200,9 @@ StandbyTransactionIdIsPrepared(TransactionId xid)
 
 	Assert(TransactionIdIsValid(xid));
 
+	if (max_prepared_xacts <= 0)
+		return false;	/* nothing to do */
+
 	/* Read and validate file */
 	buf = ReadTwoPhaseFile(xid, false);
 	if (buf == NULL)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3adbee2..9245727 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6453,6 +6453,12 @@ CheckRecoveryConsistency(void)
 	}
 }
 
+bool
+XLogConsistentState(void)
+{
+	return reachedMinRecoveryPoint;
+}
+
 /*
  * Is the system still in recovery?
  *
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 88169b4..050c547 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -68,6 +68,10 @@ typedef struct ProcArrayStruct
 		 * xids */
 	int			maxKnownAssignedXids;	/* allocated size of known assigned
 		 * xids */
+	int			tailKnownAssignedXids;	/* current tail of known assigned
+		 * xids */
+	int			headKnownAssignedXids;	/* current head of known assigned
+		 * xids */
 
 	/*
 	 * Highest subxid that overflowed KnownAssignedXids array. Similar to
@@ -87,7 +91,7 @@ static ProcArrayStruct *procArray;
 /*
  * Bookkeeping for tracking emulated transactions in recovery
  */
-static HTAB *KnownAssignedXidsHash;
+static TransactionId *KnownAssignedXids;
 static TransactionId latestObservedXid = InvalidTransactionId;
 
 /*
@@ -142,9 +146,12 @@ static int	KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax);
 static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
 			   TransactionId xmax);
 static bool KnownAssignedXidsExist(TransactionId xid);
-static void KnownAssignedXidsAdd(TransactionId *xids, int nxids);
-static void KnownAssignedXidsRemove(TransactionId xid);
-static void KnownAssignedXidsRemoveMany(TransactionId xid, bool keepPreparedXacts);
+static void KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid, bool have_lock);
+static void KnownAssignedXidsRemove(TransactionId xid, int *index);
+static void KnownAssignedXidsRemoveMany(TransactionId xid);
+static void KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
+	  TransactionId *subxids);
+static void KnownAssignedXidsCompress(bool full);
 static void KnownAssignedXidsDisplay(int trace_level);
 
 /*
@@ -204,10 +211,13 @@ CreateSharedProcArray(void)
 		procArray->numKnownAssignedXids = 0;
 		procArray->maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
 		procArray->lastOverflowedXid = InvalidTransactionId;
+		procArray->tailKnownAssignedXids = 0;
+		procArray->headKnownAssignedXids = 0;
 	}
 
 	if (XLogRequestRecoveryConnections)
 	{
+#ifdef KNOWNASSIGNED_USE_HASH
 		/* Create or attach to the KnownAssignedXids hash table */
 		HASHCTL		info;
 
@@ -223,6 +233,12 @@ CreateSharedProcArray(void)
 			  HASH_ELEM | HASH_FUNCTION);
 		if (!KnownAssignedXidsHash)
 			elog(FATAL, "could not initialize known assigned xids hash table");
+#else
+		KnownAssignedXids = (TransactionId *)
+ShmemInitStruct("KnownAssignedXids",
+		mul_size(sizeof(TransactionId), TOTAL_MAX_CACHED_SUBXIDS),
+		&found);
+#endif
 	}
 }
 
@@ -417,6 +433,8 @@ void
 ProcArrayInitRecoveryInfo(TransactionId oldestActiveXid)
 {
 	snapshotOldestActiveXid = oldestActiveXid;
+
+/

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-16 Thread Heikki Linnakangas

Simon Riggs wrote:
> On Tue, 2010-04-13 at 21:09 +0300, Heikki Linnakangas wrote:
>> A quick fix would be to check if there's any entries in the hash table
>> before scanning it. That would eliminate the overhead when there's no
>> in-progress transactions in the master. But as soon as there's even one,
>> the overhead comes back.
> 
> Any fix should be fairly quick because of the way its modularised - with
> something like this in mind.
> 
> I'll try a circular buffer implementation, with fastpath.

I started experimenting with a sorted array based implementation on
Tuesday but got carried away with other stuff. I now got back to that
and cleaned it up.

How does the attached patch look like? It's probably similar to what you
had in mind.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 88169b4..4a04051 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -64,8 +64,10 @@ typedef struct ProcArrayStruct
 	int			numProcs;		/* number of valid procs entries */
 	int			maxProcs;		/* allocated size of procs array */
 
-	int			numKnownAssignedXids;	/* current number of known assigned
-		 * xids */
+	int			firstKnownAssigned;		/* location of first valid known
+		 * assigned xid in the array */
+	int			lastKnownAssigned;		/* location of last valid known
+		 * assigned xid in the array */
 	int			maxKnownAssignedXids;	/* allocated size of known assigned
 		 * xids */
 
@@ -87,7 +89,8 @@ static ProcArrayStruct *procArray;
 /*
  * Bookkeeping for tracking emulated transactions in recovery
  */
-static HTAB *KnownAssignedXidsHash;
+static TransactionId *KnownAssignedXidsArray;
+static bool *KnownAssignedXidsValidArray;
 static TransactionId latestObservedXid = InvalidTransactionId;
 
 /*
@@ -201,28 +204,33 @@ CreateSharedProcArray(void)
 		/* Normal processing */
 		procArray->numProcs = 0;
 		procArray->maxProcs = PROCARRAY_MAXPROCS;
-		procArray->numKnownAssignedXids = 0;
 		procArray->maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
+		procArray->firstKnownAssignedXid = 0;
+		procArray->lastKnownAssignedXid = 0;
 		procArray->lastOverflowedXid = InvalidTransactionId;
 	}
 
 	if (XLogRequestRecoveryConnections)
 	{
-		/* Create or attach to the KnownAssignedXids hash table */
-		HASHCTL		info;
-
-		MemSet(&info, 0, sizeof(info));
-		info.keysize = sizeof(TransactionId);
-		info.entrysize = sizeof(TransactionId);
-		info.hash = tag_hash;
-
-		KnownAssignedXidsHash = ShmemInitHash("KnownAssignedXids Hash",
-			  TOTAL_MAX_CACHED_SUBXIDS,
-			  TOTAL_MAX_CACHED_SUBXIDS,
-			  &info,
-			  HASH_ELEM | HASH_FUNCTION);
-		if (!KnownAssignedXidsHash)
-			elog(FATAL, "could not initialize known assigned xids hash table");
+		Size size;
+		/* Create or attach to the KnownAssignedXids arrays */
+		size = mul_size(sizeof(TransactionId), TOTAL_MAX_CACHED_SUBXIDS);
+		KnownAssignedXidsArray = ShmemInitStruct("KnownAssignedXidsArray",
+ size,
+ &found);
+		if (!KnownAssignedXidsArray)
+			elog(FATAL, "could not initialize known assigned xids array");
+		if (!found)
+			MemSet(KnownAssignedXidsArray, 0, size);
+
+		size = mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS);
+		KnownAssignedXidsValidArray = ShmemInitStruct("KnownAssignedXidsValidArray",
+	  size,
+	  &found);
+		if (!KnownAssignedXidsValidArray)
+			elog(FATAL, "could not initialize known assigned xids used array");
+		if (!found)
+			MemSet(KnownAssignedXidsValidArray, 0, size);
 	}
 }
 
@@ -2162,7 +2170,7 @@ DisplayXidCache(void)
  *
  * During recovery we do not fret too much about the distinction between
  * top-level xids and subtransaction xids. We hold both together in
- * a hash table called KnownAssignedXids. In backends, this is copied into
+ * an array called KnownAssignedXids. In backends, this is copied into
  * snapshots in GetSnapshotData(), taking advantage
  * of the fact that XidInMVCCSnapshot() doesn't care about the distinction
  * either. Subtransaction xids are effectively treated as top-level xids
@@ -2207,7 +2215,7 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 	 * We can see WAL records before the running-xacts snapshot that contain
 	 * XIDs that are not in the running-xacts snapshot, but that we know to
 	 * have finished before the running-xacts snapshot was taken. Don't waste
-	 * precious shared memory by keeping them in the hash table.
+	 * precious shared memory by keeping them in the array.
 	 *
 	 * We can also see WAL records before the running-xacts snapshot that
 	 * contain XIDs that are not in the running-xacts snapshot for a different
@@ -2330,24 +2338,30 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid)
 /*
  * Private module functions to manipulate KnownAssignedXids
  *
- * There are 3 main users of the KnownAssignedXids data structure:
+ * We us

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-14 Thread Simon Riggs

On Tue, 2010-04-13 at 21:09 +0300, Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
> > I could reproduce this on my laptop, standby is about 20% slower. I ran
> > oprofile, and what stands out as the difference between the master and
> > standby is that on standby about 20% of the CPU time is spent in
> > hash_seq_search(). The callpath is GetSnapshotDat() ->
> > KnownAssignedXidsGetAndSetXmin() -> hash_seq_search(). That explains the
> > difference in performance.
> 
> The slowdown is proportional to the max_connections setting in the
> standby. 20% slowdown might still be acceptable, but if you increase
> max_connections to say 1000, things get really slow. I wouldn't
> recommend max_connections=1000, of course, but I think we need to do
> something about this. Changing the KnownAssignedXids data structure from
> hash table into something that's quicker to scan. Preferably something
> with O(N), where N is the number of entries in the data structure, not
> the maximum number of entries it can hold as it is with the hash table
> currently.

There's a tradeoff here to consider. KnownAssignedXids faces two
workloads: one for each WAL record where we check if the xid is already
known assigned, one for snapshots. The current implementation is
optimised towards recovery performance, not snapshot performance.

> A quick fix would be to check if there's any entries in the hash table
> before scanning it. That would eliminate the overhead when there's no
> in-progress transactions in the master. But as soon as there's even one,
> the overhead comes back.

Any fix should be fairly quick because of the way its modularised - with
something like this in mind.

I'll try a circular buffer implementation, with fastpath.

Have something in a few hours.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-14 Thread Dimitri Fontaine

Heikki Linnakangas  writes:
>  Changing the KnownAssignedXids data structure from
> hash table into something that's quicker to scan. Preferably something
> with O(N), where N is the number of entries in the data structure, not
> the maximum number of entries it can hold as it is with the hash table
> currently.

So that's pretty good news RedBlack Trees made it in 9.0, isn't it? :)

> A quick fix would be to check if there's any entries in the hash table
> before scanning it. That would eliminate the overhead when there's no
> in-progress transactions in the master. But as soon as there's even one,
> the overhead comes back.

Does not sound like typical, does it?
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >