Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-12 Thread Maxime Henrion
Mark Kirkwood wrote:
 Kris Kennaway wrote:
 If so, then your task is the following:
 
 Make SYSV semaphores less dumb about process wakeups.  Currently
 whenever the semaphore state changes, all processes sleeping on the
 semaphore are woken, even if we only have released enough resources
 for one waiting process to claim.  i.e. there is a thundering herd
 wakeup situation which destroys performance at high loads.  Fixing
 this will involve replacing the wakeup() calls with appropriate
 amounts of wakeup_one().
 
 I'm forwarding this to the pgsql-hackers list so that folks more 
 qualified than I can comment, but as I understand the way postgres 
 implements locking each process has it *own* semaphore it waits on  - 
 and who is waiting for what is controlled by an in (shared) memory hash 
 of lock structs (access to these is controlled via platform Dependant 
 spinlock code). So a given semaphore state change should only involve 
 one process wakeup.

Yes but there are still a lot of wakeups to be avoided in the current
System V semaphore code.  More specifically, not only do we wakeup all
the processes waiting on a single semaphore everytime something changes,
but we also wakeup all processes waiting on *any* of the semaphore in
the semaphore *set*, whatever the reason we're sleeping.

I came up with a quick patch so that Kris could do some testing with it,
and it appears to have helped, but only very slightly; apparently some
contention within the netisr code caused problems, so that in some cases
the patch helped slightly, and in others it didn't.

The semaphore code needs a clean rewrite and I hope to take care of this
soon, as time permits, since we are heavy consumers of PostgreSQL under
FreeBSD at my company.

Cheers,
Maxime

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-12 Thread Maxime Henrion
Mark Kirkwood wrote:
 Kris Kennaway wrote:
 If so, then your task is the following:
 
 Make SYSV semaphores less dumb about process wakeups.  Currently
 whenever the semaphore state changes, all processes sleeping on the
 semaphore are woken, even if we only have released enough resources
 for one waiting process to claim.  i.e. there is a thundering herd
 wakeup situation which destroys performance at high loads.  Fixing
 this will involve replacing the wakeup() calls with appropriate
 amounts of wakeup_one().
 
 I'm forwarding this to the pgsql-hackers list so that folks more 
 qualified than I can comment, but as I understand the way postgres 
 implements locking each process has it *own* semaphore it waits on  - 
 and who is waiting for what is controlled by an in (shared) memory hash 
 of lock structs (access to these is controlled via platform Dependant 
 spinlock code). So a given semaphore state change should only involve 
 one process wakeup.

[mail resent, it seems it got eaten by pgsql-hackers@ MTA somehow]

Yes but there are still a lot of wakeups to be avoided in the current
System V semaphore code.  More specifically, not only do we wakeup all
the processes waiting on a single semaphore everytime something changes,
but we also wakeup all processes waiting on *any* of the semaphore in
the semaphore *set*, whatever the reason we're sleeping.

I came up with a quick patch so that Kris could do some testing with it,
and it appears to have helped, but only very slightly; apparently some
contention within the netisr code caused problems, so that in some cases
the patch helped slightly, and in others it didn't.

The semaphore code needs a clean rewrite and I hope to take care of this
soon, as time permits, since we are heavy consumers of PostgreSQL under
FreeBSD at my company.

Cheers,
Maxime

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Mark Kirkwood

Kris Kennaway wrote:

If so, then your task is the following:

Make SYSV semaphores less dumb about process wakeups.  Currently
whenever the semaphore state changes, all processes sleeping on the
semaphore are woken, even if we only have released enough resources
for one waiting process to claim.  i.e. there is a thundering herd
wakeup situation which destroys performance at high loads.  Fixing
this will involve replacing the wakeup() calls with appropriate
amounts of wakeup_one().


I'm forwarding this to the pgsql-hackers list so that folks more 
qualified than I can comment, but as I understand the way postgres 
implements locking each process has it *own* semaphore it waits on  - 
and who is waiting for what is controlled by an in (shared) memory hash 
of lock structs (access to these is controlled via platform Dependant 
spinlock code). So a given semaphore state change should only involve 
one process wakeup.


Cheers

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Tom Lane
Mark Kirkwood [EMAIL PROTECTED] writes:
 Kris Kennaway wrote:
 If so, then your task is the following:
 
 Make SYSV semaphores less dumb about process wakeups.  Currently
 whenever the semaphore state changes, all processes sleeping on the
 semaphore are woken, even if we only have released enough resources
 for one waiting process to claim.  i.e. there is a thundering herd
 wakeup situation which destroys performance at high loads.  Fixing
 this will involve replacing the wakeup() calls with appropriate
 amounts of wakeup_one().

 I'm forwarding this to the pgsql-hackers list so that folks more 
 qualified than I can comment, but as I understand the way postgres 
 implements locking each process has it *own* semaphore it waits on  - 
 and who is waiting for what is controlled by an in (shared) memory hash 
 of lock structs (access to these is controlled via platform Dependant 
 spinlock code). So a given semaphore state change should only involve 
 one process wakeup.

Correct.  The behavior Kris describes is surely bad, but it's not
relevant to Postgres' usage of SysV semaphores.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Tom Lane
Kris Kennaway [EMAIL PROTECTED] writes:
 Make SYSV semaphores less dumb about process wakeups.  Currently
 whenever the semaphore state changes, all processes sleeping on the
 semaphore are woken, even if we only have released enough resources
 for one waiting process to claim.

 Correct.  The behavior Kris describes is surely bad, but it's not
 relevant to Postgres' usage of SysV semaphores.

 Sorry, but the behaviour is real.

Oh, I'm sure the BSD kernel acts as you describe.  But Mark's point is
that Postgres never has more than one process waiting on any particular
SysV semaphore, and so the problem doesn't really affect us.

Or do you mean that the kernel wakes all processes sleeping on *any*
SysV semaphore?  That would be nasty :-(

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Kris Kennaway
On Tue, Apr 10, 2007 at 10:41:04PM +1200, Mark Kirkwood wrote:
 Kris Kennaway wrote:
 If so, then your task is the following:
 
 Make SYSV semaphores less dumb about process wakeups.  Currently
 whenever the semaphore state changes, all processes sleeping on the
 semaphore are woken, even if we only have released enough resources
 for one waiting process to claim.  i.e. there is a thundering herd
 wakeup situation which destroys performance at high loads.  Fixing
 this will involve replacing the wakeup() calls with appropriate
 amounts of wakeup_one().
 
 I'm forwarding this to the pgsql-hackers list so that folks more 
 qualified than I can comment, but as I understand the way postgres 
 implements locking each process has it *own* semaphore it waits on  - 
 and who is waiting for what is controlled by an in (shared) memory hash 
 of lock structs (access to these is controlled via platform Dependant 
 spinlock code). So a given semaphore state change should only involve 
 one process wakeup.

I have not studied the exact code path, but there are indeed multiple
wakeups happening from the semaphore code (as many as the number of
active postgresql processes).  It is easy to instrument
sleepq_broadcast() and log them when they happen.

Anyway mux@ fixed this some time ago, which indeed helped scaling for
traffic over a local domain socket (particularly at higher loads), but
I saw some anomalous results when using loopback TCP traffic.  I think
this is unrelated (in this situation TCP is highly contended, and it
is often the case that fixing one bottleneck can make a highly
contended situation perform worse, because you were effectively
serializing a bit before, and reducing the non-linear behaviour) but
am still investigating, so the patch has not yet been committed.

Kris


pgpDDvfRzeiGJ.pgp
Description: PGP signature


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Kris Kennaway
On Tue, Apr 10, 2007 at 10:23:42AM -0400, Tom Lane wrote:
 Mark Kirkwood [EMAIL PROTECTED] writes:
  Kris Kennaway wrote:
  If so, then your task is the following:
  
  Make SYSV semaphores less dumb about process wakeups.  Currently
  whenever the semaphore state changes, all processes sleeping on the
  semaphore are woken, even if we only have released enough resources
  for one waiting process to claim.  i.e. there is a thundering herd
  wakeup situation which destroys performance at high loads.  Fixing
  this will involve replacing the wakeup() calls with appropriate
  amounts of wakeup_one().
 
  I'm forwarding this to the pgsql-hackers list so that folks more 
  qualified than I can comment, but as I understand the way postgres 
  implements locking each process has it *own* semaphore it waits on  - 
  and who is waiting for what is controlled by an in (shared) memory hash 
  of lock structs (access to these is controlled via platform Dependant 
  spinlock code). So a given semaphore state change should only involve 
  one process wakeup.
 
 Correct.  The behavior Kris describes is surely bad, but it's not
 relevant to Postgres' usage of SysV semaphores.

Sorry, but the behaviour is real.

Kris

pgphJTqz6La4j.pgp
Description: PGP signature


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Tom Lane
Kris Kennaway [EMAIL PROTECTED] writes:
 On Tue, Apr 10, 2007 at 02:46:56PM -0400, Tom Lane wrote:
 Oh, I'm sure the BSD kernel acts as you describe.  But Mark's point is
 that Postgres never has more than one process waiting on any particular
 SysV semaphore, and so the problem doesn't really affect us.

 To be clear, some behaviour that postgresql does with sysv semaphores
 causes wakeups of many processes at once.  i.e. if you have 20
 clients, you will get up to 20 wakeups.  I haven't studied the precise
 cause of this, but it is empirically true.  This is the scaling
 problem I described, and it's what mux's patch addresses.

[ shrug... ]  To the extent that that happens, it's Postgres' own issue,
and no amount of kernel rejiggering will change it.  But I certainly
have no objection to a patch that fixes the kernel behavior ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Tom Lane
Kris Kennaway [EMAIL PROTECTED] writes:
 I have not studied the exact code path, but there are indeed multiple
 wakeups happening from the semaphore code (as many as the number of
 active postgresql processes).  It is easy to instrument
 sleepq_broadcast() and log them when they happen.

There are certainly cases where Postgres will wake up a number of
processes in quick succession, but that should happen from a separate
semop() kernel call, on a different semaphore, for each such process.
If there's really multiple processes being released by the same semop()
then there's a bug we need to look into (or maybe it's a kernel bug?).
Anyway I'd be interested to know what the test case is, and which PG
version you were testing.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Andrew - Supernews
On 2007-04-10, Tom Lane [EMAIL PROTECTED] wrote:
 Kris Kennaway [EMAIL PROTECTED] writes:
 I have not studied the exact code path, but there are indeed multiple
 wakeups happening from the semaphore code (as many as the number of
 active postgresql processes).  It is easy to instrument
 sleepq_broadcast() and log them when they happen.

 There are certainly cases where Postgres will wake up a number of
 processes in quick succession, but that should happen from a separate
 semop() kernel call, on a different semaphore, for each such process.
 If there's really multiple processes being released by the same semop()
 then there's a bug we need to look into (or maybe it's a kernel bug?).
 Anyway I'd be interested to know what the test case is, and which PG
 version you were testing.

This is a problem in FreeBSD, not specifically to do with postgres - the
granularity for SysV semaphore wakeups in FreeBSD-6.x and earlier is the
entire semaphore set, not just one specific semaphore within the set. I
explained that to Kris some weeks ago, and someone (mux) did a patch (to
FreeBSD, not pg) which was already mentioned in this discussion.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Tom Lane
Kris Kennaway [EMAIL PROTECTED] writes:
 On Tue, Apr 10, 2007 at 05:36:17PM -0400, Tom Lane wrote:
 Anyway I'd be interested to know what the test case is, and which PG
 version you were testing.

 I used 8.2 (and some older version when I first noticed it a year ago)
 and either sysbench or supersmack will show it - presumably anything
 that makes simultaneous queries.  Just instrument sleepq_broadcast()
 to e.g. log a KTR event when it wakes more than 1 process and you'll
 see it happening.

Sorry, I'm not much of a BSD kernel hacker ... but sleepq_broadcast
seems a rather generic name.  Is that called *only* from semop?
I'm wondering if you are seeing simultaneous wakeup from some other
cause --- sleep timeout being the obvious possibility.  We are aware
of behaviors (search the PG lists for context swap storm) where a
number of backends will all fail to get a spinlock and do short usleep
or select-timeout waits.  In this situation they'd all wake up at the
next scheduler clock tick ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Kris Kennaway
On Tue, Apr 10, 2007 at 03:52:00PM -0400, Tom Lane wrote:
 Kris Kennaway [EMAIL PROTECTED] writes:
  On Tue, Apr 10, 2007 at 02:46:56PM -0400, Tom Lane wrote:
  Oh, I'm sure the BSD kernel acts as you describe.  But Mark's point is
  that Postgres never has more than one process waiting on any particular
  SysV semaphore, and so the problem doesn't really affect us.
 
  To be clear, some behaviour that postgresql does with sysv semaphores
  causes wakeups of many processes at once.  i.e. if you have 20
  clients, you will get up to 20 wakeups.  I haven't studied the precise
  cause of this, but it is empirically true.  This is the scaling
  problem I described, and it's what mux's patch addresses.
 
 [ shrug... ]  To the extent that that happens, it's Postgres' own issue,
 and no amount of kernel rejiggering will change it.  But I certainly
 have no objection to a patch that fixes the kernel behavior ...

As we've discussed before, by far the bigger issue with postgresql
performance on FreeBSD is the default setting of
update_process_titles=on.

Kris


pgpxNR2bN01jL.pgp
Description: PGP signature


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Kris Kennaway
On Tue, Apr 10, 2007 at 06:26:37PM -0400, Tom Lane wrote:
 Kris Kennaway [EMAIL PROTECTED] writes:
  On Tue, Apr 10, 2007 at 05:36:17PM -0400, Tom Lane wrote:
  Anyway I'd be interested to know what the test case is, and which PG
  version you were testing.
 
  I used 8.2 (and some older version when I first noticed it a year ago)
  and either sysbench or supersmack will show it - presumably anything
  that makes simultaneous queries.  Just instrument sleepq_broadcast()
  to e.g. log a KTR event when it wakes more than 1 process and you'll
  see it happening.
 
 Sorry, I'm not much of a BSD kernel hacker ... but sleepq_broadcast
 seems a rather generic name.  Is that called *only* from semop?

It's part of how wakeup() is implemented.

 I'm wondering if you are seeing simultaneous wakeup from some other
 cause --- sleep timeout being the obvious possibility.  We are aware
 of behaviors (search the PG lists for context swap storm) where a
 number of backends will all fail to get a spinlock and do short usleep
 or select-timeout waits.  In this situation they'd all wake up at the
 next scheduler clock tick ...

Nope, it's not this.

Kris


pgpa4cQe39p9O.pgp
Description: PGP signature


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Kris Kennaway
On Tue, Apr 10, 2007 at 02:46:56PM -0400, Tom Lane wrote:
 Kris Kennaway [EMAIL PROTECTED] writes:
  Make SYSV semaphores less dumb about process wakeups.  Currently
  whenever the semaphore state changes, all processes sleeping on the
  semaphore are woken, even if we only have released enough resources
  for one waiting process to claim.
 
  Correct.  The behavior Kris describes is surely bad, but it's not
  relevant to Postgres' usage of SysV semaphores.
 
  Sorry, but the behaviour is real.
 
 Oh, I'm sure the BSD kernel acts as you describe.  But Mark's point is
 that Postgres never has more than one process waiting on any particular
 SysV semaphore, and so the problem doesn't really affect us.
 
 Or do you mean that the kernel wakes all processes sleeping on *any*
 SysV semaphore?  That would be nasty :-(

To be clear, some behaviour that postgresql does with sysv semaphores
causes wakeups of many processes at once.  i.e. if you have 20
clients, you will get up to 20 wakeups.  I haven't studied the precise
cause of this, but it is empirically true.  This is the scaling
problem I described, and it's what mux's patch addresses.

Kris


pgp00SdLk8acL.pgp
Description: PGP signature


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Kris Kennaway
On Tue, Apr 10, 2007 at 05:36:17PM -0400, Tom Lane wrote:
 Kris Kennaway [EMAIL PROTECTED] writes:
  I have not studied the exact code path, but there are indeed multiple
  wakeups happening from the semaphore code (as many as the number of
  active postgresql processes).  It is easy to instrument
  sleepq_broadcast() and log them when they happen.
 
 There are certainly cases where Postgres will wake up a number of
 processes in quick succession, but that should happen from a separate
 semop() kernel call, on a different semaphore, for each such process.
 If there's really multiple processes being released by the same semop()
 then there's a bug we need to look into (or maybe it's a kernel bug?).
 Anyway I'd be interested to know what the test case is, and which PG
 version you were testing.

I used 8.2 (and some older version when I first noticed it a year ago)
and either sysbench or supersmack will show it - presumably anything
that makes simultaneous queries.  Just instrument sleepq_broadcast()
to e.g. log a KTR event when it wakes more than 1 process and you'll
see it happening.

Kris


pgptMLonITGtT.pgp
Description: PGP signature