Guys,
Oh, you wanted a fix? That seems harder :-(. AFAICS we need a redesign
that causes less load on the BufMgrLock.
FWIW, we've been pursuing two routes of quick patch fixes.
1) Dave Cramer and I have been testing setting varying rates of spin_delay in
an effort to find a sweet spot
Josh Berkus [EMAIL PROTECTED] writes:
I'm really curious, BTW, about how all of Jan's changes to buffer
usage in 7.5 affect this issue. Has anyone tested it on a recent
snapshot?
Won't help.
(1) Theoretical argument: the problem case is select-only and touches
few enough buffers that it need
Did we ever come to a conclusion about excessive SMP context switching
under load?
---
Dave Cramer wrote:
Robert,
The real question is does it help under real life circumstances ?
Did you do the tests with Tom's sql
When grilled further on (Wed, 19 May 2004 21:20:20 -0400 (EDT)),
Bruce Momjian [EMAIL PROTECTED] confessed:
Did we ever come to a conclusion about excessive SMP context switching
under load?
I just figured out what was causing the problem on my system Monday. I'm using
the pg_autovacuum
Bruce Momjian [EMAIL PROTECTED] writes:
Did we ever come to a conclusion about excessive SMP context switching
under load?
Yeah: it's bad.
Oh, you wanted a fix? That seems harder :-(. AFAICS we need a redesign
that causes less load on the BufMgrLock. However, the traditional
solution to
Robert Creager [EMAIL PROTECTED] writes:
I just figured out what was causing the problem on my system Monday.
I'm using the pg_autovacuum daemon, and it was not vacuuming my db.
Do you have the post-7.4.2 datatype fixes for pg_autovacuum?
regards, tom lane
When grilled further on (Wed, 19 May 2004 22:42:26 -0400),
Tom Lane [EMAIL PROTECTED] confessed:
Robert Creager [EMAIL PROTECTED] writes:
I just figured out what was causing the problem on my system Monday.
I'm using the pg_autovacuum daemon, and it was not vacuuming my db.
Do you have
Robert Creager [EMAIL PROTECTED] writes:
Tom Lane [EMAIL PROTECTED] confessed:
Do you have the post-7.4.2 datatype fixes for pg_autovacuum?
No. I'm still running 7.4.1 w/associated contrib. I guess an upgrade is in
order then. I'm currently downloading 7.4.2 to see what the change is that
Bruce Momjian [EMAIL PROTECTED] writes:
Tom Lane wrote:
... The SMP issue seems to be not with whether there is
instantaneous contention for the locked datastructure, but with the cost
of making it possible for processor B to acquire a lock recently held by
processor A.
I see. I don't
OK, added to TODO:
* Investigate SMP context switching issues
---
Tom Lane wrote:
Bruce Momjian [EMAIL PROTECTED] writes:
Tom Lane wrote:
... The SMP issue seems to be not with whether there is
On Wed, 2004-05-19 at 21:59, Robert Creager wrote:
When grilled further on (Wed, 19 May 2004 21:20:20 -0400 (EDT)),
Bruce Momjian [EMAIL PROTECTED] confessed:
Did we ever come to a conclusion about excessive SMP context switching
under load?
I just figured out what was causing the
In an attempt to throw the authorities off his trail, [EMAIL PROTECTED] (Tom Lane)
transmitted:
ObQuote: Research is what I am doing when I don't know what I am
doing. - attributed to Werner von Braun, but has anyone got a
definitive reference?
Found some co-workers at work yesterday to load up my library...
The sample period is 5 minutes long (vs 2 minutes previously):
Context switches - avgmax
Default 7.4.1 code : 48784 107354
Default patch - 10 : 20400 28160
patch at 100 : 38574 85372
patch
No, don't go away and be quiet. Keep testing, it may be that under
normal operation the context switching goes up but under the conditions
that you were seeing the high CS it may not be as bad.
As others have mentioned the real solution to this is to rewrite the
buffer management so that the lock
When grilled further on (Thu, 29 Apr 2004 11:21:51 -0700),
Josh Berkus [EMAIL PROTECTED] confessed:
spins_per_delay was not beneficial. Instead, try increasing them, one step
at a time:
(take baseline measurement at 100)
250
500
1000
1500
2000
3000
5000
... until you find an
], Dirk_Lutzebäck [EMAIL PROTECTED], [EMAIL PROTECTED],
Tom Lane [EMAIL PROTECTED], Joe Conway [EMAIL PROTECTED],
scott.marlowe [EMAIL PROTECTED],
Bruce Momjian [EMAIL PROTECTED], [EMAIL PROTECTED],
Neil Conway [EMAIL PROTECTED]
Subject: Re: [PERFORM] Wierd context-switching issue
Rob,
I would like to see the same, as I have a system that exhibits the same
behavior
on a production db that's running 7.4.1.
If you checked the thread follow-ups, you'd see that *decreasing*
spins_per_delay was not beneficial. Instead, try increasing them, one step
at a time:
(take
When grilled further on (Wed, 21 Apr 2004 10:29:43 -0700),
Josh Berkus [EMAIL PROTECTED] confessed:
Dave,
After some testing if you use the current head code for s_lock.c which
has some mods in it to alleviate this situation, and change
SPINS_PER_DELAY to 10 you can drastically reduce
Dave,
Are you testing this with Tom's code, you need to do a baseline
measurement with 10 and then increase it, you will still get lots of cs,
but it will be less.
No, that was just a test of 1000 straight up.Tom outlined a method, but I
didn't see any code that would help me find a
Josh,
I think you can safely increase by orders of magnitude here, instead of
by +100, my wild ass guess is that the sweet spot is the spin time
should be approximately the time it takes to consume the resource. So if
you have a really fast machine then the spin count should be higher.
Also you
Dave,
But... you need a baseline first.
A baseline on CS? I have that
--
-Josh Berkus
Aglio Database Solutions
San Francisco
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings
On Wed, Apr 21, 2004 at 02:51:31PM -0400, Tom Lane wrote:
The context swap storm is happening because of contention at the next
level up (LWLocks rather than spinlocks). It could be an independent
issue that just happens to be triggered by the same sort of access
pattern. I put forward a
Magus,
It would be interesting to see what a locking implementation ala FUTEX
style would give on an 2.6 kernel, as i understood it that would work
cross process with some work.
I'mm working on testing a FUTEX patch, but am having some trouble with it.
Will let you know the results
Dave,
Yeah, I did some more testing myself, and actually get better numbers
with increasing spins per delay to 1000, but my suspicion is that it is
highly dependent on finding the right delay for the processor you are
on.
Well, it certainly didn't help here:
procs
Are you testing this with Tom's code, you need to do a baseline
measurement with 10 and then increase it, you will still get lots of cs,
but it will be less.
Dave
On Mon, 2004-04-26 at 20:03, Josh Berkus wrote:
Dave,
Yeah, I did some more testing myself, and actually get better numbers
On Thu, 2004-04-22 at 10:37 -0700, Josh Berkus wrote:
Tom,
The tricky
part is that a slow adaptation rate means we can't have every backend
figuring this out for itself --- the right value would have to be
maintained globally, and I'm not sure how to do that without adding a
lot of
Yeah, I did some more testing myself, and actually get better numbers
with increasing spins per delay to 1000, but my suspicion is that it is
highly dependent on finding the right delay for the processor you are
on.
My hypothesis is that if you spin approximately the same or more time
than the
More data
On a dual xeon with HTT enabled:
I tried increasing the NUM_SPINS to 1000 and it works better.
NUM_SPINLOCKS CS ID pgbench
100 250K59% 230 TPS
1000125K55% 228 TPS
This is certainly heading in the right direction ? Although it
Paul Tuckfield [EMAIL PROTECTED] writes:
I used the taskset command:
taskset 01 -p pid for backend of test_run.sql 1
taskset 01 -p pid for backend of test_run.sql 1
I guess that 0 and 1 are the two cores (pipelines? hyper-threads?) on
the first Xeon processor in the box.
AFAICT, what
Dave Cramer [EMAIL PROTECTED] writes:
My hypothesis is that if you spin approximately the same or more time
than the average time it takes to get finished with the shared resource
then this should reduce cs.
The only thing we use spinlocks for nowadays is to protect LWLocks, so
the average
Josh Berkus wrote:
Tom,
Having to recompile to run on single- vs dual-processor machines doesn't
seem like it would fly.
Oh, I don't know. Many applications require compiling for a target
architecture; SQL Server, for example, won't use a 2nd processor without
re-installation. I'm
Tom,
Having to recompile to run on single- vs dual-processor machines doesn't
seem like it would fly.
Oh, I don't know. Many applications require compiling for a target
architecture; SQL Server, for example, won't use a 2nd processor without
re-installation. I'm not sure about Oracle.
It
Tom,
The tricky
part is that a slow adaptation rate means we can't have every backend
figuring this out for itself --- the right value would have to be
maintained globally, and I'm not sure how to do that without adding a
lot of overhead.
This may be a moot point, since you've stated that
On Thu, 2004-04-22 at 13:55, Tom Lane wrote:
Josh Berkus [EMAIL PROTECTED] writes:
This may be a moot point, since you've stated that changing the loop timing
won't solve the problem, but what about making the test part of make? I
don't think too many systems are going to change
Lane
Cc: [EMAIL PROTECTED]; Neil Conway
Subject: Re: [PERFORM] Wierd context-switching issue on Xeon
Anjan,
Quad 2.0GHz XEON with highest load we have seen on the applications, DB
performing great -
Can you run Tom's test? It takes
Hi,
Dual Xeon P4 2.8
linux RedHat AS 3
kernel 2.4.21-4-EL-smp
2 GB ram
I can see the same problem:
procs memory swap io
system cpu
r b swpd free buff cache si sobibo incs us sy
id wa
1 0 0 96212 61056 17202400
context-switching issue on Xeon
Here is a test case. To set up, run the test_setup.sql script once;
then launch two copies of the test_run.sql script. (For those of
you with more than two CPUs, see whether you need one per CPU to make
trouble, or whether two test_runs are enough.) Check
After some testing if you use the current head code for s_lock.c which
has some mods in it to alleviate this situation, and change
SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test.
I am seeing a slight degradation in throughput using pgbench -c 10 -t
1000 but it might be
Dave,
After some testing if you use the current head code for s_lock.c which
has some mods in it to alleviate this situation, and change
SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test.
I am seeing a slight degradation in throughput using pgbench -c 10 -t
1000 but it
Dave:
Why would test and set increase context swtches:
Note that it *does not increase* context swtiches when the two threads
are on the two cores of a single Xeon processor. (use taskset to force
affinity on linux)
Scenario:
If the two test and set processes are testing and setting the same
Paul Tuckfield [EMAIL PROTECTED] writes:
I wonder do the threads stall so badly when pinging cache lines back
and forth, that the kernel sees it as an opportunity to put the
process to sleep? or do these worst case misses cause an interrupt?
No; AFAICS the kernel could not even be aware of
FYI,
I am doing my testing on non hyperthreading dual athlons.
Also, the test and set is attempting to set the same resource, and not
simply a bit. It's really an lock;xchg in assemblelr.
Also we are using the PAUSE mnemonic, so we should not be seeing any
cache coherency issues, as the cache
attached.
--
Dave Cramer
519 939 0336
ICQ # 14675561
Index: backend/storage/lmgr/s_lock.c
===
RCS file: /usr/local/cvs/pgsql-server/src/backend/storage/lmgr/s_lock.c,v
retrieving revision 1.16
diff -c -r1.16 s_lock.c
***
Kenneth Marshall [EMAIL PROTECTED] writes:
If the context swap storm derives from LWLock contention, maybe using
a random order to assign buffer locks in buf_init.c would prevent
simple adjacency of buffer allocation to cause the storm.
Good try, but no cigar ;-). The test cases I've been
Dave Cramer [EMAIL PROTECTED] writes:
diff -c -r1.16 s_lock.c
*** backend/storage/lmgr/s_lock.c 8 Aug 2003 21:42:00 - 1.16
--- backend/storage/lmgr/s_lock.c 21 Apr 2004 20:27:34 -
***
*** 76,82
* The select() delays are measured in centiseconds
Bruce Momjian [EMAIL PROTECTED] writes:
For BSDOS it has:
#if (CLIENT_OS == OS_FREEBSD) || (CLIENT_OS == OS_BSDOS) || \
(CLIENT_OS == OS_OPENBSD) || (CLIENT_OS == OS_NETBSD)
{ /* comment out if inappropriate for your *bsd - cyp (25/may/1999) */
int ncpus; size_t len =
[EMAIL PROTECTED],
Bruce Momjian [EMAIL PROTECTED], [EMAIL PROTECTED],
[EMAIL PROTECTED], Neil Conway [EMAIL PROTECTED]
Subject: Re: [PERFORM] Wierd context-switching issue on Xeon
I wrote:
Here is a test case.
Hmmm ... I've been able to reproduce the CS storm on a dual Athlon
Dual Athlon
With one process running 30 cs/second
with two process running 15000 cs/second
Dave
On Tue, 2004-04-20 at 08:46, Jeff wrote:
On Apr 19, 2004, at 8:01 PM, Tom Lane wrote:
[test case]
Quad P3-700Mhz, ServerWorks, pg 7.4.2 - 1 process: 10-30 cs / second
; [EMAIL PROTECTED];
[EMAIL PROTECTED]; Neil Conway
Subject: Re: [PERFORM] Wierd context-switching issue on Xeon
Here is a test case. To set up, run the test_setup.sql script once;
then launch two copies of the test_run.sql script. (For those of
you with more than two CPUs, see whether
PROTECTED]
Sent: Sunday, April 18, 2004 11:47 PM
Subject: Re: [PERFORM] Wierd context-switching issue on Xeon
After some further digging I think I'm starting to understand what's up
here, and the really fundamental answer is that a multi-CPU Xeon MP box
sucks for running Postgres.
I did
Dirk Lutzebaeck wrote:
c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro)
performs well and I could not observe context switch peaks here (one
user active), almost no extra semop calls
Did Tom's test here: with 2 processes I'll reach 200k+ CS with peaks to
300k CS. Bummer..
I tried to test how this is related to cache coherency, by forcing
affinity of the two test_run.sql processes to the two cores (pipelines?
threads) of a single hyperthreaded xeon processor in an smp xeon box.
When the processes are allowed to run on distinct chips in the smp box,
the CS storm
Ooops, what I meant to say was that 2 threads bound to one
(hyperthreaded) cpu does *NOT* cause the storm, even on an smp xeon.
Therefore, the context switches may be a result of cache coherency
related delays. (2 threads on one hyperthreaded cpu presumably have
tightly coupled 1,l2 cache.)
Dirk, Tom,
OK, off IRC, I have the following reports:
Linux 2.4.21 or 2.4.20 on dual Pentium III : problem verified
Linux 2.4.21 or 2.4.20 on dual Penitum II : problem cannot be reproduced
Solaris 2.6 on 6 cpu e4500 (using 8 processes) : problem not reproduced
--
-Josh Berkus
Aglio Database
I verified problem on a Dual Opteron server. I temporarily killed the
normal load, so the server was largely idle when the test was run.
Hardware:
2x Opteron 242
Rioworks HDAMA server board
4Gb RAM
OS Kernel:
RedHat9 + XFS
1 proc: 10-15 cs/sec
2 proc: 400,000-420,000 cs/sec
j. andrew
To: Tom Lane; Josh Berkus
Cc: [EMAIL PROTECTED]; Neil Conway
Subject: Re: [PERFORM] Wierd context-switching issue on Xeon
Dirk Lutzebaeck wrote:
c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro)
performs well and I could not observe context switch peaks here (one
user
Dirk Lutzebäck wrote:
Dirk Lutzebaeck wrote:
c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro)
performs well and I could not observe context switch peaks here (one
user active), almost no extra semop calls
Did Tom's test here: with 2 processes I'll reach 200k+ CS
Anjan,
Quad 2.0GHz XEON with highest load we have seen on the applications, DB
performing great -
Can you run Tom's test? It takes a particular pattern of data access to
reproduce the issue.
--
Josh Berkus
Aglio Database Solutions
San Francisco
---(end of
I modified the code in s_lock.c to remove the spins
#define SPINS_PER_DELAY 1
and it doesn't exhibit the behaviour
This effectively changes the code to
while(TAS(lock))
select(1); // 10ms
Can anyone explain why executing TAS 100 times would increase context
switches ?
Joe Conway wrote:
In isolation, test_run.sql should do essentially no syscalls at all once
it's past the initial ramp-up. On a machine that's functioning per
expectations, multiple copies of test_run show a relatively low rate of
semop() calls --- a few per second, at most --- and maybe a
Josh, I cannot reproduce the excessive semop() on a Dual XEON DP on a
non-bigmem kernel, HT on. Interesting to know if the problem is related
to XEON MP (as Tom wrote) or bigmem.
Josh Berkus wrote:
Dirk,
I'm not sure if this semop() problem is still an issue but the database
behaves a bit
was mentioned...
Thanks,
Anjan
-Original Message-
From: Greg Stark [mailto:[EMAIL PROTECTED]
Sent: Sun 4/18/2004 8:40 PM
To: Tom Lane
Cc: [EMAIL PROTECTED]; Josh Berkus; [EMAIL PROTECTED]; Neil Conway
Subject: Re: [PERFORM] Wierd context
I decided to check the context-switching behavior here for baseline
since we have a rather diverse set of postgres server hardware, though
nothing using Xeon MP that is also running a postgres instance, and
everything looks normal under load. Some platforms are better than
others, but nothing is
Josh Berkus [EMAIL PROTECTED] writes:
The other thing I'd like your comment on, Tom, is that Dirk appears to have
reported that when he installed a non-bigmem kernel, the issue went away.
Dirk, is this correct?
I'd be really surprised if that had anything to do with it. AFAIR
Dirk's test
scott.marlowe wrote:
On Mon, 19 Apr 2004, Bruce Momjian wrote:
I have BSD on a SuperMicro dual Xeon, so if folks want another
hardware/OS combination to test, I can give out logins to my machine.
I can probably do some nighttime testing on a dual 2800MHz non-MP Xeon
machine as well. It's a Dell
Joe,
I've got a quad 2.8Ghz MP Xeon (IBM x445) that I could test on. Does
anyone have a test set that can reliably reproduce the problem?
Unfortunately we can't seem to come up with one.So far we have 2 machines
that exhibit the issue, and their databases are highly confidential (State
Josh Berkus [EMAIL PROTECTED] writes:
I've got a quad 2.8Ghz MP Xeon (IBM x445) that I could test on. Does
anyone have a test set that can reliably reproduce the problem?
Unfortunately we can't seem to come up with one.
It does seem to require a database which is in the many GB ( 10GB), and
Here is a test case. To set up, run the test_setup.sql script once;
then launch two copies of the test_run.sql script. (For those of
you with more than two CPUs, see whether you need one per CPU to make
trouble, or whether two test_runs are enough.) Check that you get a
I wrote:
Here is a test case.
Hmmm ... I've been able to reproduce the CS storm on a dual Athlon,
which seems to pretty much let the Xeon per se off the hook. Anybody
got a multiple Opteron to try? Totally non-Intel CPUs?
It would be interesting to see results with non-Linux kernels, too.
Tom Lane wrote:
Here is a test case. To set up, run the test_setup.sql script once;
then launch two copies of the test_run.sql script. (For those of
you with more than two CPUs, see whether you need one per CPU to make
trouble, or whether two test_runs are enough.) Check that you get a
When grilled further on (Mon, 19 Apr 2004 20:53:09 -0400),
Tom Lane [EMAIL PROTECTED] confessed:
I wrote:
Here is a test case.
Hmmm ... I've been able to reproduce the CS storm on a dual Athlon,
which seems to pretty much let the Xeon per se off the hook. Anybody
got a multiple Opteron
Same problem with dual 1Ghz P3's running Postgres 7.4.2, linux 2.4.x, and
2GB ram, under load, with long transactions (i.e. 1 cannot serialize
rollback per minute). 200K was the worst observed with vmstat.
Finally moved DB to a single xeon box.
---(end of
After some further digging I think I'm starting to understand what's up
here, and the really fundamental answer is that a multi-CPU Xeon MP box
sucks for running Postgres.
I did a bunch of oprofile measurements on a machine belonging to one of
Josh's clients, using a test case that involved heavy
So the the kernel/OS is irrelevant here ? this happens on any dual xeon?
What about hypterthreading does it still happen if HTT is turned off ?
Dave
On Sun, 2004-04-18 at 17:47, Tom Lane wrote:
After some further digging I think I'm starting to understand what's up
here, and the really
Tom Lane [EMAIL PROTECTED] writes:
So in the short term I think we have to tell people that Xeon MP is not
the most desirable SMP platform to run Postgres on. (Josh thinks that
the specific motherboard chipset being used in these machines might
share some of the blame too. I don't have any
Dave Cramer [EMAIL PROTECTED] writes:
So the the kernel/OS is irrelevant here ? this happens on any dual xeon?
I believe so. The context-switch behavior might possibly be a little
more pleasant on other kernels, but the underlying spinlock problem is
not dependent on the kernel.
What about
Greg Stark [EMAIL PROTECTED] writes:
There's nothing about the way Postgres spinlocks are coded that affects this?
No. AFAICS our spinlock sequences are pretty much equivalent to the way
the Linux kernel codes its spinlocks, so there's no deep dark knowledge
to be mined there.
We could
What about hypterthreading does it still happen if HTT is turned off ?
The problem comes from keeping the caches synchronized between multiple
physical CPUs. AFAICS enabling HTT wouldn't make it worse, because a
hyperthreaded processor still only has one cache.
Also, I forgot to say that
Tom, Josh,
I think we have the problem resolved after I found the following note
from Tom:
A large number of semops may mean that you have excessive contention
on some lockable
resource, but I don't have enough info to guess what resource.
This was the key to look at: we were missing all
=?ISO-8859-1?Q?Dirk_Lutzeb=E4ck?= [EMAIL PROTECTED] writes:
This was the key to look at: we were missing all indices on table which
is used heavily and does lots of locking. After recreating the missing
indices the production system performed normal. No, more excessive
semop() calls, load
Tom,
Strictly a WAG ... but what this sounds like to me is disastrously bad
behavior of the spinlock code under heavy contention. We thought we'd
fixed the spinlock code for SMP machines awhile ago, but maybe
hyperthreading opens some new vistas for misbehavior ...
Yeah, I thought of that
81 matches
Mail list logo