Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-20 Thread Tom Lane
Josh Berkus <[EMAIL PROTECTED]> writes: > I'm really curious, BTW, about how all of Jan's changes to buffer > usage in 7.5 affect this issue. Has anyone tested it on a recent > snapshot? Won't help. (1) Theoretical argument: the problem case is select-only and touches few enough buffers that it

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-20 Thread Josh Berkus
Guys, > Oh, you wanted a fix? That seems harder :-(. AFAICS we need a redesign > that causes less load on the BufMgrLock. FWIW, we've been pursuing two routes of quick patch fixes. 1) Dave Cramer and I have been testing setting varying rates of spin_delay in an effort to find a "sweet spot"

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Christopher Browne
In an attempt to throw the authorities off his trail, [EMAIL PROTECTED] (Tom Lane) transmitted: > ObQuote: "Research is what I am doing when I don't know what I am > doing." - attributed to Werner von Braun, but has anyone got a > definitive reference?

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Matthew T. O'Connor
On Wed, 2004-05-19 at 21:59, Robert Creager wrote: > When grilled further on (Wed, 19 May 2004 21:20:20 -0400 (EDT)), > Bruce Momjian <[EMAIL PROTECTED]> confessed: > > > > > Did we ever come to a conclusion about excessive SMP context switching > > under load? > > > > I just figured out what w

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Bruce Momjian
OK, added to TODO: * Investigate SMP context switching issues --- Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> ... The SMP issue seems to be not with whether there is > >> i

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Bruce Momjian
Tom Lane wrote: > Robert Creager <[EMAIL PROTECTED]> writes: > > Tom Lane <[EMAIL PROTECTED]> confessed: > >> Do you have the post-7.4.2 datatype fixes for pg_autovacuum? > > > No. I'm still running 7.4.1 w/associated contrib. I guess an upgrade is in > > order then. I'm currently downloading 7

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> ... The SMP issue seems to be not with whether there is >> instantaneous contention for the locked datastructure, but with the cost >> of making it possible for processor B to acquire a lock recently held by >> processor A. > I see.

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Tom Lane
Robert Creager <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> confessed: >> Do you have the post-7.4.2 datatype fixes for pg_autovacuum? > No. I'm still running 7.4.1 w/associated contrib. I guess an upgrade is in > order then. I'm currently downloading 7.4.2 to see what the change

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Robert Creager
When grilled further on (Wed, 19 May 2004 22:42:26 -0400), Tom Lane <[EMAIL PROTECTED]> confessed: > Robert Creager <[EMAIL PROTECTED]> writes: > > I just figured out what was causing the problem on my system Monday. > > I'm using the pg_autovacuum daemon, and it was not vacuuming my db. > > Do y

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Bruce Momjian
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Did we ever come to a conclusion about excessive SMP context switching > > under load? > > Yeah: it's bad. > > Oh, you wanted a fix? That seems harder :-(. AFAICS we need a redesign > that causes less load on the BufMgrLock. Howev

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Tom Lane
Robert Creager <[EMAIL PROTECTED]> writes: > I just figured out what was causing the problem on my system Monday. > I'm using the pg_autovacuum daemon, and it was not vacuuming my db. Do you have the post-7.4.2 datatype fixes for pg_autovacuum? regards, tom lane -

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > Did we ever come to a conclusion about excessive SMP context switching > under load? Yeah: it's bad. Oh, you wanted a fix? That seems harder :-(. AFAICS we need a redesign that causes less load on the BufMgrLock. However, the traditional solution to

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Robert Creager
When grilled further on (Wed, 19 May 2004 21:20:20 -0400 (EDT)), Bruce Momjian <[EMAIL PROTECTED]> confessed: > > Did we ever come to a conclusion about excessive SMP context switching > under load? > I just figured out what was causing the problem on my system Monday. I'm using the pg_autovac

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Bruce Momjian
Did we ever come to a conclusion about excessive SMP context switching under load? --- Dave Cramer wrote: > Robert, > > The real question is does it help under real life circumstances ? > > Did you do the tests with Tom's

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-02 Thread Robert Creager
When grilled further on (Sun, 02 May 2004 11:39:22 -0400), Dave Cramer <[EMAIL PROTECTED]> confessed: > Robert, > > The real question is does it help under real life circumstances ? I'm not yet at the point where the CS's are causing appreciable delays. I should get there early this week and w

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-02 Thread Dave Cramer
Robert, The real question is does it help under real life circumstances ? Did you do the tests with Tom's sql code that is designed to create high context switchs ? Dave On Sun, 2004-05-02 at 11:20, Robert Creager wrote: > Found some co-workers at work yesterday to load up my library... > > Th

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-02 Thread Robert Creager
Found some co-workers at work yesterday to load up my library... The sample period is 5 minutes long (vs 2 minutes previously): Context switches - avgmax Default 7.4.1 code : 48784 107354 Default patch - 10 : 20400 28160 patch at 100 : 38574 85372 patch at

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-01 Thread Dave Cramer
No, don't go away and be quiet. Keep testing, it may be that under normal operation the context switching goes up but under the conditions that you were seeing the high CS it may not be as bad. As others have mentioned the real solution to this is to rewrite the buffer management so that the lock

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-30 Thread Robert Creager
When grilled further on (Thu, 29 Apr 2004 11:21:51 -0700), Josh Berkus <[EMAIL PROTECTED]> confessed: > spins_per_delay was not beneficial. Instead, try increasing them, one step > at a time: > > (take baseline measurement at 100) > 250 > 500 > 1000 > 1500 > 2000 > 3000 > 5000 > > ... until y

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-29 Thread Josh Berkus
Rob, > I would like to see the same, as I have a system that exhibits the same behavior > on a production db that's running 7.4.1. If you checked the thread follow-ups, you'd see that *decreasing* spins_per_delay was not beneficial. Instead, try increasing them, one step at a time: (take b

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-29 Thread ohp
TECTED], > Neil Conway <[EMAIL PROTECTED]> > Subject: Re: [PERFORM] Wierd context-switching issue on Xeon > > When grilled further on (Wed, 21 Apr 2004 10:29:43 -0700), > Josh Berkus <[EMAIL PROTECTED]> confessed: > > > Dave, > > > > > After s

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-28 Thread Robert Creager
When grilled further on (Wed, 21 Apr 2004 10:29:43 -0700), Josh Berkus <[EMAIL PROTECTED]> confessed: > Dave, > > > After some testing if you use the current head code for s_lock.c which > > has some mods in it to alleviate this situation, and change > > SPINS_PER_DELAY to 10 you can drastically

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-27 Thread Josh Berkus
Dave, > But... you need a baseline first. A baseline on CS? I have that -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-27 Thread Dave Cramer
Josh, I think you can safely increase by orders of magnitude here, instead of by +100, my wild ass guess is that the sweet spot is the spin time should be approximately the time it takes to consume the resource. So if you have a really fast machine then the spin count should be higher. Also you

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-27 Thread Josh Berkus
Dave, > Are you testing this with Tom's code, you need to do a baseline > measurement with 10 and then increase it, you will still get lots of cs, > but it will be less. No, that was just a test of 1000 straight up.Tom outlined a method, but I didn't see any code that would help me find a be

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-26 Thread Dave Cramer
Are you testing this with Tom's code, you need to do a baseline measurement with 10 and then increase it, you will still get lots of cs, but it will be less. Dave On Mon, 2004-04-26 at 20:03, Josh Berkus wrote: > Dave, > > > Yeah, I did some more testing myself, and actually get better numbers >

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-26 Thread Josh Berkus
Dave, > Yeah, I did some more testing myself, and actually get better numbers > with increasing spins per delay to 1000, but my suspicion is that it is > highly dependent on finding the right delay for the processor you are > on. Well, it certainly didn't help here: procs me

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-26 Thread Josh Berkus
Magus, > It would be interesting to see what a locking implementation ala FUTEX > style would give on an 2.6 kernel, as i understood it that would work > cross process with some work. I'mm working on testing a FUTEX patch, but am having some trouble with it. Will let you know the results

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-26 Thread Kenneth Marshall
On Wed, Apr 21, 2004 at 02:51:31PM -0400, Tom Lane wrote: > The context swap storm is happening because of contention at the next > level up (LWLocks rather than spinlocks). It could be an independent > issue that just happens to be triggered by the same sort of access > pattern. I put forward a

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-26 Thread Magnus Naeslund(t)
Tom Lane wrote: Hmmm ... I've been able to reproduce the CS storm on a dual Athlon, which seems to pretty much let the Xeon per se off the hook. Anybody got a multiple Opteron to try? Totally non-Intel CPUs? It would be interesting to see results with non-Linux kernels, too. regards, tom lane

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-25 Thread Andrew McMillan
On Thu, 2004-04-22 at 10:37 -0700, Josh Berkus wrote: > Tom, > > > The tricky > > part is that a slow adaptation rate means we can't have every backend > > figuring this out for itself --- the right value would have to be > > maintained globally, and I'm not sure how to do that without adding a >

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-22 Thread Anjan Dave
ebÃck; Tom Lane Cc: [EMAIL PROTECTED]; Neil Conway Subject: Re: [PERFORM] Wierd context-switching issue on Xeon Anjan, > Quad 2.0GHz XEON with highest load we have seen on the applications, DB > performing great - Can you run Tom'

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Rod Taylor
On Thu, 2004-04-22 at 13:55, Tom Lane wrote: > Josh Berkus <[EMAIL PROTECTED]> writes: > > This may be a moot point, since you've stated that changing the loop timing > > won't solve the problem, but what about making the test part of make? I > > don't think too many systems are going to change

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Magnus Hagander
>> Having to recompile to run on single- vs dual-processor >machines doesn't >> seem like it would fly. > >Oh, I don't know. Many applications require compiling for a target >architecture; SQL Server, for example, won't use a 2nd >processor without >re-installation. I'm not sure about Oracle

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Josh Berkus
Tom, > The tricky > part is that a slow adaptation rate means we can't have every backend > figuring this out for itself --- the right value would have to be > maintained globally, and I'm not sure how to do that without adding a > lot of overhead. This may be a moot point, since you've stated th

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Tom Lane
Josh Berkus <[EMAIL PROTECTED]> writes: > This may be a moot point, since you've stated that changing the loop timing > won't solve the problem, but what about making the test part of make? I > don't think too many systems are going to change processor architectures once > in production, and th

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Josh Berkus
Tom, > Having to recompile to run on single- vs dual-processor machines doesn't > seem like it would fly. Oh, I don't know. Many applications require compiling for a target architecture; SQL Server, for example, won't use a 2nd processor without re-installation. I'm not sure about Oracle. I

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Bruce Momjian
Josh Berkus wrote: > Tom, > > > Having to recompile to run on single- vs dual-processor machines doesn't > > seem like it would fly. > > Oh, I don't know. Many applications require compiling for a target > architecture; SQL Server, for example, won't use a 2nd processor without > re-installati

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Tom Lane
Dave Cramer <[EMAIL PROTECTED]> writes: > My hypothesis is that if you spin approximately the same or more time > than the average time it takes to get finished with the shared resource > then this should reduce cs. The only thing we use spinlocks for nowadays is to protect LWLocks, so the "averag

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-22 Thread Tom Lane
Paul Tuckfield <[EMAIL PROTECTED]> writes: >> I used the taskset command: >> taskset 01 -p >> taskset 01 -p >> >> I guess that 0 and 1 are the two cores (pipelines? hyper-threads?) on >> the first Xeon processor in the box. AFAICT, what you've actually done here is to bind both backends to the

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Dave Cramer
More data On a dual xeon with HTT enabled: I tried increasing the NUM_SPINS to 1000 and it works better. NUM_SPINLOCKS CS ID pgbench 100 250K59% 230 TPS 1000125K55% 228 TPS This is certainly heading in the right direction ? Although it lo

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-21 Thread Dave Cramer
Yeah, I did some more testing myself, and actually get better numbers with increasing spins per delay to 1000, but my suspicion is that it is highly dependent on finding the right delay for the processor you are on. My hypothesis is that if you spin approximately the same or more time than the ave

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-21 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > For BSDOS it has: > #if (CLIENT_OS == OS_FREEBSD) || (CLIENT_OS == OS_BSDOS) || \ > (CLIENT_OS == OS_OPENBSD) || (CLIENT_OS == OS_NETBSD) > { /* comment out if inappropriate for your *bsd - cyp (25/may/1999) */ > int ncpus; size_t l

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-21 Thread Bruce Momjian
Tom Lane wrote: > Dave Cramer <[EMAIL PROTECTED]> writes: > > I tried increasing the NUM_SPINS to 1000 and it works better. > > Doesn't surprise me. The value of 100 is about right on the assumption > that the spinlock instruction per se is not too much more expensive than > any other instruction

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-21 Thread Christopher Kings-Lynne
Yeah. I don't know a reasonable way to tune this number automatically for particular systems ... but at the very least we'd need to find a way to distinguish uniprocessor from multiprocessor, because on a uniprocessor the optimal value is surely 1. From TODO: * Add code to detect an SMP machine a

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-21 Thread Tom Lane
Dave Cramer <[EMAIL PROTECTED]> writes: > I tried increasing the NUM_SPINS to 1000 and it works better. Doesn't surprise me. The value of 100 is about right on the assumption that the spinlock instruction per se is not too much more expensive than any other instruction. What I was seeing from op

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-21 Thread Tom Lane
Dave Cramer <[EMAIL PROTECTED]> writes: > diff -c -r1.16 s_lock.c > *** backend/storage/lmgr/s_lock.c 8 Aug 2003 21:42:00 - 1.16 > --- backend/storage/lmgr/s_lock.c 21 Apr 2004 20:27:34 - > *** > *** 76,82 >* The select() delays are measured in centise

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread Tom Lane
Kenneth Marshall <[EMAIL PROTECTED]> writes: > If the context swap storm derives from LWLock contention, maybe using > a random order to assign buffer locks in buf_init.c would prevent > simple adjacency of buffer allocation to cause the storm. Good try, but no cigar ;-). The test cases I've been

Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-21 Thread Dave Cramer
attached. -- Dave Cramer 519 939 0336 ICQ # 14675561 Index: backend/storage/lmgr/s_lock.c === RCS file: /usr/local/cvs/pgsql-server/src/backend/storage/lmgr/s_lock.c,v retrieving revision 1.16 diff -c -r1.16 s_lock.c *** backend/stora

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread Dave Cramer
FYI, I am doing my testing on non hyperthreading dual athlons. Also, the test and set is attempting to set the same resource, and not simply a bit. It's really an lock;xchg in assemblelr. Also we are using the PAUSE mnemonic, so we should not be seeing any cache coherency issues, as the cache i

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread Tom Lane
Paul Tuckfield <[EMAIL PROTECTED]> writes: > I wonder do the threads stall so badly when pinging cache lines back > and forth, that the kernel sees it as an opportunity to put the > process to sleep? or do these worst case misses cause an interrupt? No; AFAICS the kernel could not even be aware

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread Paul Tuckfield
Dave: Why would test and set increase context swtches: Note that it *does not increase* context swtiches when the two threads are on the two cores of a single Xeon processor. (use taskset to force affinity on linux) Scenario: If the two test and set processes are testing and setting the same bi

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread Josh Berkus
Dave, > After some testing if you use the current head code for s_lock.c which > has some mods in it to alleviate this situation, and change > SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test. > I am seeing a slight degradation in throughput using pgbench -c 10 -t > 1000 but

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread Dave Cramer
After some testing if you use the current head code for s_lock.c which has some mods in it to alleviate this situation, and change SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test. I am seeing a slight degradation in throughput using pgbench -c 10 -t 1000 but it might be live

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread Dirk Lutzebäck
It is intended to run indefinately. Dirk [EMAIL PROTECTED] wrote: How long is this test supposed to run? I've launched just 1 for testing, the plan seems horrible; the test is cpu bound and hasn't finished yet after 17:02 min of CPU time, dual XEON 2.6G Unixware 713 The machine is a Fujitsu-Sie

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread ohp
CTED], Neil Conway <[EMAIL PROTECTED]> > Subject: Re: [PERFORM] Wierd context-switching issue on Xeon > > Here is a test case. To set up, run the "test_setup.sql" script once; > then launch two copies of the "test_run.sql" script. (For those of > you with mor

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread pginfo
Hi, Dual Xeon P4 2.8 linux RedHat AS 3 kernel 2.4.21-4-EL-smp 2 GB ram I can see the same problem: procs memory swap io system cpu r b swpd free buff cache si sobibo incs us sy id wa 1 0 0 96212 61056 172024000

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Joe Conway
Joe Conway wrote: In isolation, test_run.sql should do essentially no syscalls at all once it's past the initial ramp-up. On a machine that's functioning per expectations, multiple copies of test_run show a relatively low rate of semop() calls --- a few per second, at most --- and maybe a delaying

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Dave Cramer
I modified the code in s_lock.c to remove the spins #define SPINS_PER_DELAY 1 and it doesn't exhibit the behaviour This effectively changes the code to while(TAS(lock)) select(1); // 10ms Can anyone explain why executing TAS 100 times would increase context switches ? Da

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Josh Berkus
Anjan, > Quad 2.0GHz XEON with highest load we have seen on the applications, DB > performing great - Can you run Tom's test? It takes a particular pattern of data access to reproduce the issue. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadca

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Bruce Momjian
Dirk Lutzebäck wrote: > Dirk Lutzebaeck wrote: > > > c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro) > > > > performs well and I could not observe context switch peaks here (one > > user active), almost no extra semop calls > > Did Tom's test here: with 2 processes I'll reac

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Anjan Dave
To: Tom Lane; Josh Berkus Cc: [EMAIL PROTECTED]; Neil Conway Subject: Re: [PERFORM] Wierd context-switching issue on Xeon Dirk Lutzebaeck wrote: > c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro) > > performs well and I could not observe context switch peaks here (on

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread J. Andrew Rogers
I verified problem on a Dual Opteron server. I temporarily killed the normal load, so the server was largely idle when the test was run. Hardware: 2x Opteron 242 Rioworks HDAMA server board 4Gb RAM OS Kernel: RedHat9 + XFS 1 proc: 10-15 cs/sec 2 proc: 400,000-420,000 cs/sec j. andrew rogers

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Josh Berkus
Dirk, Tom, OK, off IRC, I have the following reports: Linux 2.4.21 or 2.4.20 on dual Pentium III : problem verified Linux 2.4.21 or 2.4.20 on dual Penitum II : problem cannot be reproduced Solaris 2.6 on 6 cpu e4500 (using 8 processes) : problem not reproduced -- -Josh Berkus Aglio Database So

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Paul Tuckfield
Ooops, what I meant to say was that 2 threads bound to one (hyperthreaded) cpu does *NOT* cause the storm, even on an smp xeon. Therefore, the context switches may be a result of cache coherency related delays. (2 threads on one hyperthreaded cpu presumably have tightly coupled 1,l2 cache.) O

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Rod Taylor
> It would be interesting to see results with non-Linux kernels, too. Dual Celeron 500Mhz (Abit BP6 mobo) - client & server on same machine 2 processes FreeBSD (5.2.1): 1800cs 3 processes FreeBSD: 14000cs 4 processes FreeBSD: 14500cs 2 processes Linux (2.4.18 kernel): 52000cs 3 processes Linux:

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Paul Tuckfield
I tried to test how this is related to cache coherency, by forcing affinity of the two test_run.sql processes to the two cores (pipelines? threads) of a single hyperthreaded xeon processor in an smp xeon box. When the processes are allowed to run on distinct chips in the smp box, the CS storm h

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Dirk Lutzebäck
Dirk Lutzebaeck wrote: c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro) performs well and I could not observe context switch peaks here (one user active), almost no extra semop calls Did Tom's test here: with 2 processes I'll reach 200k+ CS with peaks to 300k CS. Bummer.. Jo

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Dirk Lutzebaeck
I would agree to Tom, that too much parameters are involved to blame bigmem. I have access to the following machines where the same application operates: a) Dual (4way) XEON MP, bigmem, HT off, ServerWorks chipset (a Fujitsu-Siemens Primergy) performs ok now because missing indexes were added

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Sven Geisler
us" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; "Neil Conway" <[EMAIL PROTECTED]> Sent: Sunday, April 18, 2004 11:47 PM Subject: Re: [PERFORM] Wierd context-switching issue on Xeon > After some further digging I think I'm starting to understand what's up >

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Matt Clark
Conway; scott.marlowe; Bruce Momjian; [EMAIL PROTECTED]; > [EMAIL PROTECTED]; Neil Conway > Subject: Re: [PERFORM] Wierd context-switching issue on Xeon > > > Here is a test case. To set up, run the "test_setup.sql" script once; > then launch two copies of the "

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Dave Cramer
Dual Athlon With one process running 30 cs/second with two process running 15000 cs/second Dave On Tue, 2004-04-20 at 08:46, Jeff wrote: > On Apr 19, 2004, at 8:01 PM, Tom Lane wrote: > [test case] > > Quad P3-700Mhz, ServerWorks, pg 7.4.2 - 1 process: 10-30 cs / second >

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Jeff
On Apr 19, 2004, at 8:01 PM, Tom Lane wrote: [test case] Quad P3-700Mhz, ServerWorks, pg 7.4.2 - 1 process: 10-30 cs / second 2 process: 100k cs / sec 3 pro

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread ohp
<[EMAIL PROTECTED]>, scott.marlowe <[EMAIL PROTECTED]>, > Bruce Momjian <[EMAIL PROTECTED]>, [EMAIL PROTECTED], > [EMAIL PROTECTED], Neil Conway <[EMAIL PROTECTED]> > Subject: Re: [PERFORM] Wierd context-switching issue on Xeon > > I wrote: > > Here

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread jelle
Same problem with dual 1Ghz P3's running Postgres 7.4.2, linux 2.4.x, and 2GB ram, under load, with long transactions (i.e. 1 "cannot serialize" rollback per minute). 200K was the worst observed with vmstat. Finally moved DB to a single xeon box. ---(end of broadcast)-

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Robert Creager
When grilled further on (Mon, 19 Apr 2004 20:53:09 -0400), Tom Lane <[EMAIL PROTECTED]> confessed: > I wrote: > > Here is a test case. > > Hmmm ... I've been able to reproduce the CS storm on a dual Athlon, > which seems to pretty much let the Xeon per se off the hook. Anybody > got a multiple O

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Joe Conway
Tom Lane wrote: Here is a test case. To set up, run the "test_setup.sql" script once; then launch two copies of the "test_run.sql" script. (For those of you with more than two CPUs, see whether you need one per CPU to make trouble, or whether two test_runs are enough.) Check that you get a nestl

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Tom Lane
I wrote: > Here is a test case. Hmmm ... I've been able to reproduce the CS storm on a dual Athlon, which seems to pretty much let the Xeon per se off the hook. Anybody got a multiple Opteron to try? Totally non-Intel CPUs? It would be interesting to see results with non-Linux kernels, too.

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Tom Lane
Here is a test case. To set up, run the "test_setup.sql" script once; then launch two copies of the "test_run.sql" script. (For those of you with more than two CPUs, see whether you need one per CPU to make trouble, or whether two test_runs are enough.) Check that you get a nestloops-with-index-

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Tom Lane
Josh Berkus <[EMAIL PROTECTED]> writes: >> I've got a quad 2.8Ghz MP Xeon (IBM x445) that I could test on. Does >> anyone have a test set that can reliably reproduce the problem? > Unfortunately we can't seem to come up with one. > It does seem to require a database which is in the many GB (> 10

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Josh Berkus
Joe, > I've got a quad 2.8Ghz MP Xeon (IBM x445) that I could test on. Does > anyone have a test set that can reliably reproduce the problem? Unfortunately we can't seem to come up with one.So far we have 2 machines that exhibit the issue, and their databases are highly confidential (State

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Joe Conway
scott.marlowe wrote: On Mon, 19 Apr 2004, Bruce Momjian wrote: I have BSD on a SuperMicro dual Xeon, so if folks want another hardware/OS combination to test, I can give out logins to my machine. I can probably do some nighttime testing on a dual 2800MHz non-MP Xeon machine as well. It's a Dell 2

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread scott.marlowe
On Mon, 19 Apr 2004, Bruce Momjian wrote: > Josh Berkus wrote: > > Tom, > > > > > So in the short term I think we have to tell people that Xeon MP is not > > > the most desirable SMP platform to run Postgres on. (Josh thinks that > > > the specific motherboard chipset being used in these machine

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Josh Berkus
Tom, > So in the short term I think we have to tell people that Xeon MP is not > the most desirable SMP platform to run Postgres on. (Josh thinks that > the specific motherboard chipset being used in these machines might > share some of the blame too. I don't have any evidence for or against > t

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Bruce Momjian
Josh Berkus wrote: > Tom, > > > So in the short term I think we have to tell people that Xeon MP is not > > the most desirable SMP platform to run Postgres on. (Josh thinks that > > the specific motherboard chipset being used in these machines might > > share some of the blame too. I don't have

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Tom Lane
Josh Berkus <[EMAIL PROTECTED]> writes: > The other thing I'd like your comment on, Tom, is that Dirk appears to have > reported that when he installed a non-bigmem kernel, the issue went away. > Dirk, is this correct? I'd be really surprised if that had anything to do with it. AFAIR Dirk's t

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread J. Andrew Rogers
I decided to check the context-switching behavior here for baseline since we have a rather diverse set of postgres server hardware, though nothing using Xeon MP that is also running a postgres instance, and everything looks normal under load. Some platforms are better than others, but nothing is

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Anjan Dave
-switching issue on Xeon Tom Lane <[EMAIL PROTECTED]> writes: > So in the short term I think we have to tell people that Xeon MP is not > the most desirable SMP platform to run Postgres on. (Josh thinks that > the specific moth

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Dave Cramer
Here's an interesting link that suggests that hyperthreading would be much worse. http://groups.google.com/groups?q=hyperthreading+dual+xeon+idle&start=10&hl=en&lr=&ie=UTF-8&c2coff=1&selm=aukkonen-FE5275.21093624062003%40shawnews.gv.shawcable.net&rnum=16 another which has some hints as to how it

Re: RESOLVED: Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Dirk Lutzebäck
Josh, I cannot reproduce the excessive semop() on a Dual XEON DP on a non-bigmem kernel, HT on. Interesting to know if the problem is related to XEON MP (as Tom wrote) or bigmem. Josh Berkus wrote: Dirk, I'm not sure if this semop() problem is still an issue but the database behaves a bit

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-18 Thread Tom Lane
>> What about hypterthreading does it still happen if HTT is turned off ? > The problem comes from keeping the caches synchronized between multiple > physical CPUs. AFAICS enabling HTT wouldn't make it worse, because a > hyperthreaded processor still only has one cache. Also, I forgot to say tha

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-18 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > There's nothing about the way Postgres spinlocks are coded that affects this? No. AFAICS our spinlock sequences are pretty much equivalent to the way the Linux kernel codes its spinlocks, so there's no deep dark knowledge to be mined there. We could possi

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-18 Thread Tom Lane
Dave Cramer <[EMAIL PROTECTED]> writes: > So the the kernel/OS is irrelevant here ? this happens on any dual xeon? I believe so. The context-switch behavior might possibly be a little more pleasant on other kernels, but the underlying spinlock problem is not dependent on the kernel. > What about

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-18 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > So in the short term I think we have to tell people that Xeon MP is not > the most desirable SMP platform to run Postgres on. (Josh thinks that > the specific motherboard chipset being used in these machines might > share some of the blame too. I don't hav

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-18 Thread Dave Cramer
So the the kernel/OS is irrelevant here ? this happens on any dual xeon? What about hypterthreading does it still happen if HTT is turned off ? Dave On Sun, 2004-04-18 at 17:47, Tom Lane wrote: > After some further digging I think I'm starting to understand what's up > here, and the really fundam

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-18 Thread Tom Lane
After some further digging I think I'm starting to understand what's up here, and the really fundamental answer is that a multi-CPU Xeon MP box sucks for running Postgres. I did a bunch of oprofile measurements on a machine belonging to one of Josh's clients, using a test case that involved heavy

Re: RESOLVED: Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-16 Thread Josh Berkus
Dirk, > I'm not sure if this semop() problem is still an issue but the database > behaves a bit out of bounds in this situation, i.e. consuming system > resources with semop() calls 95% while tables are locked very often and > longer. It would be helpful to us if you could test this with the i

Re: RESOLVED: Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-16 Thread Tom Lane
=?ISO-8859-1?Q?Dirk_Lutzeb=E4ck?= <[EMAIL PROTECTED]> writes: > This was the key to look at: we were missing all indices on table which > is used heavily and does lots of locking. After recreating the missing > indices the production system performed normal. No, more excessive > semop() calls, l

RESOLVED: Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-16 Thread Dirk Lutzebäck
Tom, Josh, I think we have the problem resolved after I found the following note from Tom: > A large number of semops may mean that you have excessive contention on some lockable > resource, but I don't have enough info to guess what resource. This was the key to look at: we were missing all i

Re: [PERFORM] Wierd context-switching issue on Xeon

2003-11-25 Thread Josh Berkus
Tom, > Strictly a WAG ... but what this sounds like to me is disastrously bad > behavior of the spinlock code under heavy contention. We thought we'd > fixed the spinlock code for SMP machines awhile ago, but maybe > hyperthreading opens some new vistas for misbehavior ... Yeah, I thought of tha

  1   2   >