Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-20 Thread Josh Berkus
Guys,
 
 Oh, you wanted a fix?  That seems harder :-(.  AFAICS we need a redesign
 that causes less load on the BufMgrLock.

FWIW, we've been pursuing two routes of quick patch fixes.

1) Dave Cramer and I have been testing setting varying rates of spin_delay in 
an effort to find a sweet spot that the individual system seems to like.   
This has been somewhat delayed by my illness.

2) The OSDL folks have been trying various patches to use Linux 2.6 Futexes in 
place of semops (if I have that right) which, if successful, would produce a 
linux-specific fix.   However, they haven't yet come up wiith a version of 
the patch which is stable.

I'm really curious, BTW, about how all of Jan's changes to buffer usage in 7.5 
affect this issue.   Has anyone tested it on a recent snapshot?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-20 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 I'm really curious, BTW, about how all of Jan's changes to buffer
 usage in 7.5 affect this issue.  Has anyone tested it on a recent
 snapshot?

Won't help.

(1) Theoretical argument: the problem case is select-only and touches
few enough buffers that it need never visit the kernel.  The buffer
management algorithm is thus irrelevant since there are never any
decisions for it to make.  If anything CVS tip will have a worse problem
because its more complicated management algorithm needs to spend longer
holding the BufMgrLock.

(2) Experimental argument: I believe that I did check the self-contained
test case we eventually developed against CVS tip on one of Red Hat's
SMP machines, and indeed it was unhappy.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Bruce Momjian

Did we ever come to a conclusion about excessive SMP context switching
under load?

---

Dave Cramer wrote:
 Robert,
 
 The real question is does it help under real life circumstances ? 
 
 Did you do the tests with Tom's sql code that is designed to create high
 context switchs ?
 
 Dave
 On Sun, 2004-05-02 at 11:20, Robert Creager wrote:
  Found some co-workers at work yesterday to load up my library...
  
  The sample period is 5 minutes long (vs 2 minutes previously):
  
  Context switches -  avgmax
  
  Default 7.4.1 code :   48784 107354
  Default patch - 10 :   20400  28160
  patch at 100   :   38574  85372
  patch at 1000  :   41188 106569
  
  The reading at 1000 was not produced under the same circumstances as the prior
  readings as I had to replace my device under test with a simulated one.  The
  real one died.
  
  The previous run with smaller database and 120 second averages:
  
  Context switches -  avgmax
  
  Default 7.4.1 code :   10665  69470
  Default patch - 10 :   17297  21929
  patch at 100   :   26825  87073
  patch at 1000  :   37580 110849
 -- 
 Dave Cramer
 519 939 0336
 ICQ # 14675561
 
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Robert Creager
When grilled further on (Wed, 19 May 2004 21:20:20 -0400 (EDT)),
Bruce Momjian [EMAIL PROTECTED] confessed:

 
 Did we ever come to a conclusion about excessive SMP context switching
 under load?
 

I just figured out what was causing the problem on my system Monday.  I'm using
the pg_autovacuum daemon, and it was not vacuuming my db.  I've no idea why and
didn't get a chance to investigate.

This lack of vacuuming was causing a huge number of context switches and query
delays. the queries that normally take .1 seconds were taking 11 seconds, and
the context switches were averaging 160k/s, peaking at 190k/s

Unfortunately, I was under pressure to fix the db at the time so I didn't get a
chance to play with the patch.

I restarted the vacuum daemon, and will keep an eye on it to see if it behaves.

If the problem re-occurs, is it worth while to attempt the different patch
delay settings?

Cheers,
Rob

-- 
 19:45:40 up 21 days,  2:30,  4 users,  load average: 2.03, 2.09, 2.06
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004


pgpmHt8DM7Wii.pgp
Description: PGP signature


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Did we ever come to a conclusion about excessive SMP context switching
 under load?

Yeah: it's bad.

Oh, you wanted a fix?  That seems harder :-(.  AFAICS we need a redesign
that causes less load on the BufMgrLock.  However, the traditional
solution to too-much-contention-for-a-lock is to break up the locked
data structure into finer-grained units, which means *more* lock
operations in total.  Normally you expect that the finer-grained lock
units will mean less contention.  But given that the issue here seems to
be trading physical ownership of the lock's cache line back and forth,
I'm afraid that the traditional approach would actually make things
worse.  The SMP issue seems to be not with whether there is
instantaneous contention for the locked datastructure, but with the cost
of making it possible for processor B to acquire a lock recently held by
processor A.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Tom Lane
Robert Creager [EMAIL PROTECTED] writes:
 I just figured out what was causing the problem on my system Monday.
 I'm using the pg_autovacuum daemon, and it was not vacuuming my db.

Do you have the post-7.4.2 datatype fixes for pg_autovacuum?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Robert Creager
When grilled further on (Wed, 19 May 2004 22:42:26 -0400),
Tom Lane [EMAIL PROTECTED] confessed:

 Robert Creager [EMAIL PROTECTED] writes:
  I just figured out what was causing the problem on my system Monday.
  I'm using the pg_autovacuum daemon, and it was not vacuuming my db.
 
 Do you have the post-7.4.2 datatype fixes for pg_autovacuum?

No.  I'm still running 7.4.1 w/associated contrib.  I guess an upgrade is in
order then.  I'm currently downloading 7.4.2 to see what the change is that I
need.  Is it just the 7.4.2 pg_autovacuum that is needed here?

I've caught a whiff that 7.4.3 is nearing release?  Any idea when?

Thanks,
Rob

-- 
 20:45:52 up 21 days,  3:30,  4 users,  load average: 2.02, 2.05, 2.05
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004


pgp19AJ8EZHJX.pgp
Description: PGP signature


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Tom Lane
Robert Creager [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] confessed:
 Do you have the post-7.4.2 datatype fixes for pg_autovacuum?

 No.  I'm still running 7.4.1 w/associated contrib.  I guess an upgrade is in
 order then.  I'm currently downloading 7.4.2 to see what the change is that I
 need.  Is it just the 7.4.2 pg_autovacuum that is needed here?

Nope, the fixes I was thinking about just missed the 7.4.2 release.
I think you can only get them from CVS.  (Maybe we should offer a
nightly build of the latest stable release branch, not only development
tip...)

 I've caught a whiff that 7.4.3 is nearing release?  Any idea when?

Not scheduled yet, but there was talk of pushing one out before 7.5 goes
into feature freeze.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 ...  The SMP issue seems to be not with whether there is
 instantaneous contention for the locked datastructure, but with the cost
 of making it possible for processor B to acquire a lock recently held by
 processor A.

 I see.  I don't even see a TODO in there.  :-(

Nothing more specific than investigate SMP context switching issues,
anyway.  We are definitely in a research mode here, rather than an
engineering mode.

ObQuote: Research is what I am doing when I don't know what I am
doing. - attributed to Werner von Braun, but has anyone got a
definitive reference?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Bruce Momjian

OK, added to TODO:

* Investigate SMP context switching issues


---

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  ...  The SMP issue seems to be not with whether there is
  instantaneous contention for the locked datastructure, but with the cost
  of making it possible for processor B to acquire a lock recently held by
  processor A.
 
  I see.  I don't even see a TODO in there.  :-(
 
 Nothing more specific than investigate SMP context switching issues,
 anyway.  We are definitely in a research mode here, rather than an
 engineering mode.
 
 ObQuote: Research is what I am doing when I don't know what I am
 doing. - attributed to Werner von Braun, but has anyone got a
 definitive reference?
 
   regards, tom lane
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Matthew T. O'Connor
On Wed, 2004-05-19 at 21:59, Robert Creager wrote:
 When grilled further on (Wed, 19 May 2004 21:20:20 -0400 (EDT)),
 Bruce Momjian [EMAIL PROTECTED] confessed:
 
  
  Did we ever come to a conclusion about excessive SMP context switching
  under load?
  
 
 I just figured out what was causing the problem on my system Monday.  I'm using
 the pg_autovacuum daemon, and it was not vacuuming my db.  I've no idea why and
 didn't get a chance to investigate.

Strange.  There is a known bug in the 7.4.2 version of pg_autovacuum
related to data type mismatches which is fixed in CVS.  But that bug
doesn't cause pg_autovacuum to stop vacuuming but rather to vacuum to
often.  So perhaps this is a different issue?  Please let me know what
you find.

Thanks,

Matthew O'Connor



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-19 Thread Christopher Browne
In an attempt to throw the authorities off his trail, [EMAIL PROTECTED] (Tom Lane) 
transmitted:
 ObQuote: Research is what I am doing when I don't know what I am
 doing. - attributed to Werner von Braun, but has anyone got a
 definitive reference?

http://www.quotationspage.com/search.php3?Author=Wernher+von+Braunfile=other

That points to a bunch of seemingly authoritative sources...
-- 
(reverse (concatenate 'string moc.enworbbc @ enworbbc))
http://www.ntlug.org/~cbbrowne/lsf.html
Terific. -- Ford Prefect

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-02 Thread Robert Creager

Found some co-workers at work yesterday to load up my library...

The sample period is 5 minutes long (vs 2 minutes previously):

Context switches -  avgmax

Default 7.4.1 code :   48784 107354
Default patch - 10 :   20400  28160
patch at 100   :   38574  85372
patch at 1000  :   41188 106569

The reading at 1000 was not produced under the same circumstances as the prior
readings as I had to replace my device under test with a simulated one.  The
real one died.

The previous run with smaller database and 120 second averages:

Context switches -  avgmax

Default 7.4.1 code :   10665  69470
Default patch - 10 :   17297  21929
patch at 100   :   26825  87073
patch at 1000  :   37580 110849

-- 
 20:13:50 up 3 days,  2:58,  4 users,  load average: 2.12, 2.14, 2.10
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004


pgpGe9OZpz0nG.pgp
Description: PGP signature


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-05-01 Thread Dave Cramer
No, don't go away and be quiet. Keep testing, it may be that under
normal operation the context switching goes up but under the conditions
that you were seeing the high CS it may not be as bad.

As others have mentioned the real solution to this is to rewrite the
buffer management so that the lock isn't quite as coarse grained.

Dave
On Sat, 2004-05-01 at 00:03, Robert Creager wrote:
 When grilled further on (Thu, 29 Apr 2004 11:21:51 -0700),
 Josh Berkus [EMAIL PROTECTED] confessed:
 
  spins_per_delay was not beneficial.   Instead, try increasing them, one step 
  at a time:
  
  (take baseline measurement at 100)
  250
  500
  1000
  1500
  2000
  3000
  5000
  
  ... until you find an optimal level.   Then report the results to us!
  
 
 Some results.  The patch mentioned is what Dave Cramer posted to the Performance
 list on 4/21.
 
 A Perl script monitored vmstat 1 for 120 seconds and generated max and average
 values.  Unfortunately, I am not present on site, so I cannot physically change
 the device under test to increase the db load to where it hit about 10 days ago.
  That will have to wait till the 'real' work week on Monday.
 
 Context switches -  avgmax
 
 Default 7.4.1 code :   10665  69470
 Default patch - 10 :   17297  21929
 patch at 100   :   26825  87073
 patch at 1000  :   37580 110849
 
 Now granted, the db isn't showing the CS swap problem in a bad way (at all), but
 should the numbers be trending the way they are with the patched code?  Or will
 these numbers potentially change dramatically when I can load up the db?
 
 And, presuming I can re-produce what I was seeing previously (200K CS/s), you
 folks want me to carry on with more testing of the patch and report the results?
  Or just go away and be quiet...
 
 The information is provided from a HP Proliant DL380 G3 with 2x 2.4 GHZ Xenon's
 (with HT enabled) 2 GB ram, running 2.4.22-26mdkenterprise kernel, RAID
 controller w/128 Mb battery backed cache RAID 1 on 2x 15K RPM drives for WAL
 drive, RAID 0+1 on 4x 10K RPM drives for data.  The only job this box has is
 running this db.
 
 Cheers,
 Rob
-- 
Dave Cramer
519 939 0336
ICQ # 14675561


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-30 Thread Robert Creager
When grilled further on (Thu, 29 Apr 2004 11:21:51 -0700),
Josh Berkus [EMAIL PROTECTED] confessed:

 spins_per_delay was not beneficial.   Instead, try increasing them, one step 
 at a time:
 
 (take baseline measurement at 100)
 250
 500
 1000
 1500
 2000
 3000
 5000
 
 ... until you find an optimal level.   Then report the results to us!
 

Some results.  The patch mentioned is what Dave Cramer posted to the Performance
list on 4/21.

A Perl script monitored vmstat 1 for 120 seconds and generated max and average
values.  Unfortunately, I am not present on site, so I cannot physically change
the device under test to increase the db load to where it hit about 10 days ago.
 That will have to wait till the 'real' work week on Monday.

Context switches -  avgmax

Default 7.4.1 code :   10665  69470
Default patch - 10 :   17297  21929
patch at 100   :   26825  87073
patch at 1000  :   37580 110849

Now granted, the db isn't showing the CS swap problem in a bad way (at all), but
should the numbers be trending the way they are with the patched code?  Or will
these numbers potentially change dramatically when I can load up the db?

And, presuming I can re-produce what I was seeing previously (200K CS/s), you
folks want me to carry on with more testing of the patch and report the results?
 Or just go away and be quiet...

The information is provided from a HP Proliant DL380 G3 with 2x 2.4 GHZ Xenon's
(with HT enabled) 2 GB ram, running 2.4.22-26mdkenterprise kernel, RAID
controller w/128 Mb battery backed cache RAID 1 on 2x 15K RPM drives for WAL
drive, RAID 0+1 on 4x 10K RPM drives for data.  The only job this box has is
running this db.

Cheers,
Rob

-- 
 21:54:48 up 2 days,  4:39,  4 users,  load average: 2.00, 2.03, 2.00
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004


pgp88T6PR5F9b.pgp
Description: PGP signature


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-29 Thread ohp
Hi

I'd LOVE to contribute on this but I don't have vmstat and I'm not running
linux.

How can I help?
Regards
On Wed, 28 Apr 2004, Robert Creager wrote:

 Date: Wed, 28 Apr 2004 18:57:53 -0600
 From: Robert Creager [EMAIL PROTECTED]
 To: Josh Berkus [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED], Dirk_Lutzebäck [EMAIL PROTECTED], [EMAIL PROTECTED],
  Tom Lane [EMAIL PROTECTED], Joe Conway [EMAIL PROTECTED],
  scott.marlowe [EMAIL PROTECTED],
  Bruce Momjian [EMAIL PROTECTED], [EMAIL PROTECTED],
  Neil Conway [EMAIL PROTECTED]
 Subject: Re: [PERFORM] Wierd context-switching issue on Xeon

 When grilled further on (Wed, 21 Apr 2004 10:29:43 -0700),
 Josh Berkus [EMAIL PROTECTED] confessed:

  Dave,
 
   After some testing if you use the current head code for s_lock.c which
   has some mods in it to alleviate this situation, and change
   SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test.
   I am seeing a slight degradation in throughput using pgbench -c 10 -t
   1000 but it might be liveable, considering the alternative is unbearable
   in some situations.
  
   Can anyone else replicate my results?
 
  Can you produce a patch against 7.4.1?   I'd like to test your fix against a
  real-world database.

 I would like to see the same, as I have a system that exhibits the same behavior
 on a production db that's running 7.4.1.

 Cheers,
 Rob




-- 
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou   +33-5-61-50-97-01 (Fax)
31190 AUTERIVE   +33-6-07-63-80-64 (GSM)
FRANCE  Email: [EMAIL PROTECTED]
--
Make your life a dream, make your dream a reality. (St Exupery)

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-29 Thread Josh Berkus
Rob,

 I would like to see the same, as I have a system that exhibits the same 
behavior
 on a production db that's running 7.4.1.

If you checked the thread follow-ups,  you'd see that *decreasing* 
spins_per_delay was not beneficial.   Instead, try increasing them, one step 
at a time:

(take baseline measurement at 100)
250
500
1000
1500
2000
3000
5000

... until you find an optimal level.   Then report the results to us!

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-28 Thread Robert Creager
When grilled further on (Wed, 21 Apr 2004 10:29:43 -0700),
Josh Berkus [EMAIL PROTECTED] confessed:

 Dave,
 
  After some testing if you use the current head code for s_lock.c which
  has some mods in it to alleviate this situation, and change
  SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test.
  I am seeing a slight degradation in throughput using pgbench -c 10 -t
  1000 but it might be liveable, considering the alternative is unbearable
  in some situations.
 
  Can anyone else replicate my results?
 
 Can you produce a patch against 7.4.1?   I'd like to test your fix against a 
 real-world database.

I would like to see the same, as I have a system that exhibits the same behavior
on a production db that's running 7.4.1.

Cheers,
Rob


-- 
 18:55:22 up  1:40,  4 users,  load average: 2.00, 2.04, 2.00
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004


pgpEBmUUlf2Tx.pgp
Description: PGP signature


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-27 Thread Josh Berkus
Dave,

 Are you testing this with Tom's code, you need to do a baseline
 measurement with 10 and then increase it, you will still get lots of cs,
 but it will be less.

No, that was just a test of 1000 straight up.Tom outlined a method, but I 
didn't see any code that would help me find a better level, other than just 
trying each +100 increase one at a time.   This would take days of testing 
...
-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-27 Thread Dave Cramer
Josh,

I think you can safely increase by orders of magnitude here, instead of
by +100, my wild ass guess is that the sweet spot is the spin time
should be approximately the time it takes to consume the resource. So if
you have a really fast machine then the spin count should be higher. 

Also you have to take into consideration your memory bus speed, with the
pause instruction inserted in the loop the timing is now dependent on
memory speed.

But... you need a baseline first.

Dave
On Tue, 2004-04-27 at 14:05, Josh Berkus wrote:
 Dave,
 
  Are you testing this with Tom's code, you need to do a baseline
  measurement with 10 and then increase it, you will still get lots of cs,
  but it will be less.
 
 No, that was just a test of 1000 straight up.Tom outlined a method, but I 
 didn't see any code that would help me find a better level, other than just 
 trying each +100 increase one at a time.   This would take days of testing 
 ...
-- 
Dave Cramer
519 939 0336
ICQ # 14675561


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-27 Thread Josh Berkus
Dave,

 But... you need a baseline first.

A baseline on CS?   I have that 

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-26 Thread Kenneth Marshall
On Wed, Apr 21, 2004 at 02:51:31PM -0400, Tom Lane wrote:
 The context swap storm is happening because of contention at the next
 level up (LWLocks rather than spinlocks).  It could be an independent
 issue that just happens to be triggered by the same sort of access
 pattern.  I put forward a hypothesis that the cache miss storm caused by
 the test-and-set ops induces the context swap storm by making the code
 more likely to be executing in certain places at certain times ... but
 it's only a hypothesis.
 
If the context swap storm derives from LWLock contention, maybe using
a random order to assign buffer locks in buf_init.c would prevent
simple adjacency of buffer allocation to cause the storm. Just offsetting
the assignment by the cacheline size should work. I notice that when
initializing the buffers in shared memory, both the buf-meta_data_lock
and the buf-cntx_lock are immediately adjacent in memory. I am not
familiar enough with the flow through postgres to see if there could
be fighting for those two locks. If so, offsetting those by the cache
line size would also stop the context swap storm.

--Ken

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-26 Thread Josh Berkus
Magus,

 It would be interesting to see what a locking implementation ala FUTEX 
 style would give on an 2.6 kernel, as i understood it that would work 
 cross process with some work.

I'mm working on testing a FUTEX patch, but am having some trouble with it.  
Will let you know the results 

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-26 Thread Josh Berkus
Dave,

 Yeah, I did some more testing myself, and actually get better numbers
 with increasing spins per delay to 1000, but my suspicion is that it is
 highly dependent on finding the right delay for the processor you are
 on.

Well, it certainly didn't help here:

procs  memory  swap  io system cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
 2  0  0 14870744 123872 112991200 0 0 1027 187341 48 27 
26  0
 2  0  0 14869912 123872 112991200 048 1030 126490 65 18 
16  0
 2  0  0 14867032 123872 112991200 0 0 1021 106046 72 16 
12  0
 2  0  0 14869912 123872 112991200 0 0 1025 90256 76 14 10  
0
 2  0  0 14870424 123872 112991200 0 0 1022 135249 63 22 
16  0
 2  0  0 14872664 123872 112991200 0 0 1023 13 63 20 
17  0
 1  0  0 14871128 123872 112991200 048 1024 155728 57 22 
20  0
 2  0  0 14871128 123872 112991200 0 0 1028 189655 49 29 
22  0
 2  0  0 14871064 123872 112991200 0 0 1018 190744 48 29 
23  0
 2  0  0 14871064 123872 112991200 0 0 1027 186812 51 26 
23  0


-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-26 Thread Dave Cramer
Are you testing this with Tom's code, you need to do a baseline
measurement with 10 and then increase it, you will still get lots of cs,
but it will be less.

Dave
On Mon, 2004-04-26 at 20:03, Josh Berkus wrote:
 Dave,
 
  Yeah, I did some more testing myself, and actually get better numbers
  with increasing spins per delay to 1000, but my suspicion is that it is
  highly dependent on finding the right delay for the processor you are
  on.
 
 Well, it certainly didn't help here:
 
 procs  memory  swap  io system cpu
  r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
  2  0  0 14870744 123872 112991200 0 0 1027 187341 48 27 
 26  0
  2  0  0 14869912 123872 112991200 048 1030 126490 65 18 
 16  0
  2  0  0 14867032 123872 112991200 0 0 1021 106046 72 16 
 12  0
  2  0  0 14869912 123872 112991200 0 0 1025 90256 76 14 10  
 0
  2  0  0 14870424 123872 112991200 0 0 1022 135249 63 22 
 16  0
  2  0  0 14872664 123872 112991200 0 0 1023 13 63 20 
 17  0
  1  0  0 14871128 123872 112991200 048 1024 155728 57 22 
 20  0
  2  0  0 14871128 123872 112991200 0 0 1028 189655 49 29 
 22  0
  2  0  0 14871064 123872 112991200 0 0 1018 190744 48 29 
 23  0
  2  0  0 14871064 123872 112991200 0 0 1027 186812 51 26 
 23  0
-- 
Dave Cramer
519 939 0336
ICQ # 14675561


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-25 Thread Andrew McMillan
On Thu, 2004-04-22 at 10:37 -0700, Josh Berkus wrote:
 Tom,
 
  The tricky
  part is that a slow adaptation rate means we can't have every backend
  figuring this out for itself --- the right value would have to be
  maintained globally, and I'm not sure how to do that without adding a
  lot of overhead.
 
 This may be a moot point, since you've stated that changing the loop timing 
 won't solve the problem, but what about making the test part of make?   I 
 don't think too many systems are going to change processor architectures once 
 in production, and those that do can be told to re-compile.

Sure they do - PostgreSQL is regularly provided as a pre-compiled
distribution.  I haven't compiled PostgreSQL for years, and we have it
running on dozens of machines, some SMP, some not, but most running
Debian Linux.

Even having a compiler _installed_ on one of our client's database
servers would usually be considered against security procedures, and
would get a black mark when the auditors came through.

Regards,
Andrew McMillan
-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053,  Manners St,  Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201   MOB: +64(21)635-694  OFFICE: +64(4)499-2267
 Planning an election?  Call us!
-


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Dave Cramer
Yeah, I did some more testing myself, and actually get better numbers
with increasing spins per delay to 1000, but my suspicion is that it is
highly dependent on finding the right delay for the processor you are
on.

My hypothesis is that if you spin approximately the same or more time
than the average time it takes to get finished with the shared resource
then this should reduce cs.

Certainly more ideas are required here.

Dave 
On Wed, 2004-04-21 at 22:35, Tom Lane wrote:
 Dave Cramer [EMAIL PROTECTED] writes:
  diff -c -r1.16 s_lock.c
  *** backend/storage/lmgr/s_lock.c   8 Aug 2003 21:42:00 -   1.16
  --- backend/storage/lmgr/s_lock.c   21 Apr 2004 20:27:34 -
  ***
  *** 76,82 
   * The select() delays are measured in centiseconds (0.01 sec) because 10
   * msec is a common resolution limit at the OS level.
   */
  ! #define SPINS_PER_DELAY   100
#define NUM_DELAYS1000
#define MIN_DELAY_CSEC1
#define MAX_DELAY_CSEC100
  --- 76,82 
   * The select() delays are measured in centiseconds (0.01 sec) because 10
   * msec is a common resolution limit at the OS level.
   */
  ! #define SPINS_PER_DELAY   10
#define NUM_DELAYS1000
#define MIN_DELAY_CSEC1
#define MAX_DELAY_CSEC100
 
 
 As far as I can tell, this does reduce the rate of semop's
 significantly, but it does so by bringing the overall processing rate
 to a crawl :-(.  I see 97% CPU idle time when using this patch.
 I believe what is happening is that the select() delay in s_lock.c is
 being hit frequently because the spin loop isn't allowed to run long
 enough to let the other processor get out of the spinlock.
 
   regards, tom lane
 
 
 
 !DSPAM:40872f7e21492906114513!
 
 
-- 
Dave Cramer
519 939 0336
ICQ # 14675561


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Dave Cramer
More data

On a dual xeon with HTT enabled:

I tried increasing the NUM_SPINS to 1000 and it works better.

NUM_SPINLOCKS   CS  ID  pgbench

100 250K59% 230 TPS
1000125K55% 228 TPS

This is certainly heading in the right direction ? Although it looks
like it is highly dependent on the system you are running on.

--dc--   



On Wed, 2004-04-21 at 22:53, Josh Berkus wrote:
 Tom,
 
  As far as I can tell, this does reduce the rate of semop's
  significantly, but it does so by bringing the overall processing rate
  to a crawl :-(.  I see 97% CPU idle time when using this patch.
  I believe what is happening is that the select() delay in s_lock.c is
  being hit frequently because the spin loop isn't allowed to run long
  enough to let the other processor get out of the spinlock.
 
 Also, I tested it on production data, and it reduces the CSes by about 40%.  
 An improvement, but not a magic bullet.
-- 
Dave Cramer
519 939 0336
ICQ # 14675561


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-22 Thread Tom Lane
Paul Tuckfield [EMAIL PROTECTED] writes:
 I used the taskset command:
 taskset 01 -p pid for backend of test_run.sql 1
 taskset 01 -p pid for backend of test_run.sql 1
 
 I guess that 0 and 1 are the two cores (pipelines? hyper-threads?) on 
 the first Xeon processor in the box.

AFAICT, what you've actually done here is to bind both backends to the
first logical processor of the first Xeon.  If you'd used 01 and 02
as the affinity masks then you'd have bound them to the two cores of
that Xeon, but what you actually did simply reduces the system to a
uniprocessor.  In that situation the context swap rate will be normally
one swap per scheduler timeslice, and at worst two swaps per timeslice
(if a process is swapped away from while it holds a lock the other one
wants).  It doesn't prove a lot about our SMP problem though.

I don't have access to a Xeon with both taskset and hyperthreading
enabled, so I can't check what happens when you do the taskset correctly
... could you retry?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Tom Lane
Dave Cramer [EMAIL PROTECTED] writes:
 My hypothesis is that if you spin approximately the same or more time
 than the average time it takes to get finished with the shared resource
 then this should reduce cs.

The only thing we use spinlocks for nowadays is to protect LWLocks, so
the average time involved is fairly small and stable --- or at least
that was the design intention.  What we seem to be seeing is that on SMP
machines, cache coherency issues cause the TAS step itself to be
expensive and variable.  However, in the experiments I did, strace'ing
showed that actual spin timeouts (manifested by the execution of a
delaying select()) weren't actually that common; the big source of
context switches is semop(), which indicates contention at the LWLock
level rather than the spinlock level.  So while tuning the spinlock
limit count might be a useful thing to do in general, I think it will
have only negligible impact on the particular problems we're discussing
in this thread.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Bruce Momjian
Josh Berkus wrote:
 Tom,
 
  Having to recompile to run on single- vs dual-processor machines doesn't
  seem like it would fly.
 
 Oh, I don't know.  Many applications require compiling for a target 
 architecture; SQL Server, for example, won't use a 2nd processor without 
 re-installation.   I'm not sure about Oracle.
 
 It certainly wasn't too long ago that Linux gurus were esposing re-compiling 
 the kernel for the machine.
 
 And it's not like they would *have* to re-compile to use PostgreSQL after 
 adding an additional processor.  Just if they wanted to maximize peformance 
 benefit.
 
 Also, this is a fairly rare circumstance, I think; to judge by my clients, 
 once a database server is in production nobody touches the hardware.

A much simpler solution would be for the postmaster to run a test during
startup.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Josh Berkus
Tom,

 Having to recompile to run on single- vs dual-processor machines doesn't
 seem like it would fly.

Oh, I don't know.  Many applications require compiling for a target 
architecture; SQL Server, for example, won't use a 2nd processor without 
re-installation.   I'm not sure about Oracle.

It certainly wasn't too long ago that Linux gurus were esposing re-compiling 
the kernel for the machine.

And it's not like they would *have* to re-compile to use PostgreSQL after 
adding an additional processor.  Just if they wanted to maximize peformance 
benefit.

Also, this is a fairly rare circumstance, I think; to judge by my clients, 
once a database server is in production nobody touches the hardware.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Josh Berkus
Tom,

 The tricky
 part is that a slow adaptation rate means we can't have every backend
 figuring this out for itself --- the right value would have to be
 maintained globally, and I'm not sure how to do that without adding a
 lot of overhead.

This may be a moot point, since you've stated that changing the loop timing 
won't solve the problem, but what about making the test part of make?   I 
don't think too many systems are going to change processor architectures once 
in production, and those that do can be told to re-compile.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-22 Thread Rod Taylor
On Thu, 2004-04-22 at 13:55, Tom Lane wrote:
 Josh Berkus [EMAIL PROTECTED] writes:
  This may be a moot point, since you've stated that changing the loop timing 
  won't solve the problem, but what about making the test part of make?   I 
  don't think too many systems are going to change processor architectures once
  in production, and those that do can be told to re-compile.
 
 Having to recompile to run on single- vs dual-processor machines doesn't
 seem like it would fly.

Is it something the postmaster could quickly determine and set a global
during the startup cycle?



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-22 Thread Anjan Dave
Tested the sql on Quad 2.0GHz XEON/8GB RAM:
 
During the first run, the CS shooted up more than 100k, and was randomly high/low
Second process made it consistently high 100k+
Third brought it down to anaverage 80-90k
Fourth brought it down to an average of 50-60k/s
 
By cancelling the queries one-by-one, the CS started going up again.
 
8 logical CPUs in 'top', all of them not at all too busy, load average stood around 2 
all the time.
 
Thanks.
Anjan
 
-Original Message- 
From: Josh Berkus [mailto:[EMAIL PROTECTED] 
Sent: Tue 4/20/2004 12:59 PM 
To: Anjan Dave; Dirk Lutzebck; Tom Lane 
Cc: [EMAIL PROTECTED]; Neil Conway 
Subject: Re: [PERFORM] Wierd context-switching issue on Xeon



Anjan,

 Quad 2.0GHz XEON with highest load we have seen on the applications, DB
 performing great -

Can you run Tom's test?   It takes a particular pattern of data access to
reproduce the issue.

--
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Arial;}}
\viewkind4\uc1\pard\f0\fs20 [EMAIL PROTECTED] root]# vmstat 2\par
   procs  memoryswap  io system cpu\par
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id\par
 2  0  4  25068  30752 339164 6899660   0   0 1 20 2   0   1   2\par
 1  1  2  25068  21608 339164 6909292   0   0 0 20240  623 31025  12   9  79\par
 2  1  1  25068  24580 339168 6909292   0   0 0 22446  523   824  12   1  87\par
 1  0  0  25068 241244 339168 6691372   0   0 0   498  141 79995  13   6  81\par
 1  0  0  25068 241172 339168 6691372   0   0 0 0  11723  13   2  86\par
 1  0  0  25068 241208 339168 6691372   0   0 068  12432  13   0  88\par
 1  0  1  25068 241208 339168 6691372   0   0 0 0  11923  13   0  88\par
 1  0  0  25068 241208 339168 6691372   0   0 0 0  11423  13   2  86\par
 1  0  0  25068 241208 339168 6691372   0   0 074  132   284  13   0  88\par
 1  0  0  25068 241208 339168 6691372   0   0 0 0  11718  13   2  86\par
 2  0  0  25068 240256 339168 6691376   0   0 082  145 13905  14   2  84\par
 1  0  0  25068 240168 339168 6691380   0   0 0   338  177  4746  13   1  86\par
 1  0  0  25068 240168 339168 6691380   0   0 056  128   221  12   2  86\par
 1  0  0  25068 240180 339168 6691380   0   0 090  131 12633  14   1  85\par
 2  0  1  25068 240140 339168 6691380   0   0 0   104  144 100919  18   6  76\par
 2  0  0  25068 240136 339168 6691380   0   0 0   138  138 106567  18   5  77\par
 2  0  0  25068 240132 339168 6691380   0   0 050  138 108254  16   5  79\par
 2  0  0  25068 240128 339168 6691380   0   0 086  127 102183  16   7  77\par
 1  0  0  25068 240132 339168 6691380   0   0 0 0  119 110382  17   5  78\par
 2  0  0  25068 239980 339168 6691380   0   0 0 0  125 106970  18   4  78\par
 2  0  0  25068 239972 339168 6691380   0   0 0   136  140 103389  17   7  76\par
   procs  memoryswap  io system cpu\par
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id\par
 2  0  0  25068 240008 339168 6691380   0   0 082  134 107627  19   4  77\par
 2  0  0  25068 240012 339168 6691380   0   0 090  128 94183  16   9  75\par
 2  0  0  25068 213520 339168 6715988   0   0 0   114  156 82781  16   7  78\par
 2  0  1  25068 120356 339168 6803692   0   0 0 30790  522 31866  15  10  76\par
 1  1  3  25068  55384 339168 6870940   0   0 0 21904  466 25549  15  11  73\par
 1  1  2  25068  22804 339168 6903996   0   0 0 21786  538 29445  13   7  80\par
 1  1  1  25068  22284 339168 6905036   0   0 0 20678  634  3428  12   1  87\par
 2  0  0  25068  26232 339168 6906028   0   0 0 12054  332  3577  12   3  84\par
\par
\par
2 Processes running - \par
   procs  memoryswap  io system cpu\par
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id\par
 2  0  0  25068 244412 339192 6691392   0   0 066  150 144059  14   7  79\par
 2  0  1  25068 244368 339196 6691388   0   0 0   134  123 147517  16   7  77\par
 2  0  0  25068 244356 339196 6691388   0   0 0 0  119 134576  16   8  76\par
 2  0  0  25068 244340 339196 6691388   0   0 092  143 103336  17   4  79\par
 2  0  0  25068 244172 339196 6691388   0   0 0   156  158 105336  18   6  75\par
 2  0  0  25068 244104 339196 6691388   0   0 0 0  118 105222  18   5  77\par
 2  0  0  25068 244104 339196

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread pginfo
Hi,

Dual Xeon P4 2.8
linux RedHat AS 3
kernel 2.4.21-4-EL-smp
2 GB ram

I can see the same problem:

procs  memory  swap  io
system cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy
id wa
1  0  0  96212  61056 172024000 0 0  10111 25  0
75  0
 1  0  0  96212  61056 172024000 0 0  108   139 25
0 75  0
 1  0  0  96212  61056 172024000 0 0  104   173 25
0 75  0
 1  0  0  96212  61056 172024000 0 0  10211 25
0 75  0
 1  0  0  96212  61056 172024000 0 0  10111 25
0 75  0
 2  0  0  96204  61056 172024000 0 0  110 53866 31
4 65  0
 2  0  0  96204  61056 172024000 0 0  101 83176 41
5 54  0
 2  0  0  96204  61056 172024000 0 0  102 86050 39
6 55  0
 2  0  0  96204  61056 172024000 049  113 73642 41
5 54  0
 2  0  0  96204  61056 172024000 0 0  102 84211 40
5 55  0
 2  0  0  96204  61056 172024000 0 0  101 105165 39
7 54  0
 2  0  0  96204  61056 172024000 0 0  103 97754 38
6 56  0
 2  0  0  96204  61056 172024000 0 0  103 113668 36
7 57  0
 2  0  0  96204  61056 172024000 0 0  103 112003 37
7 56  0

regards,
ivan.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread ohp
How long is this test supposed to run?

I've launched just 1 for testing, the plan seems horrible; the test is cpu
bound and hasn't finished yet after 17:02 min of CPU time, dual XEON 2.6G
Unixware 713

The machine is a Fujitsu-Siemens TX 200 server
 On Mon, 19 Apr 2004, Tom Lane wrote:

 Date: Mon, 19 Apr 2004 20:01:56 -0400
 From: Tom Lane [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: Joe Conway [EMAIL PROTECTED], scott.marlowe [EMAIL PROTECTED],
  Bruce Momjian [EMAIL PROTECTED], [EMAIL PROTECTED],
  [EMAIL PROTECTED], Neil Conway [EMAIL PROTECTED]
 Subject: Re: [PERFORM] Wierd context-switching issue on Xeon

 Here is a test case.  To set up, run the test_setup.sql script once;
 then launch two copies of the test_run.sql script.  (For those of
 you with more than two CPUs, see whether you need one per CPU to make
 trouble, or whether two test_runs are enough.)  Check that you get a
 nestloops-with-index-scans plan shown by the EXPLAIN in test_run.

 In isolation, test_run.sql should do essentially no syscalls at all once
 it's past the initial ramp-up.  On a machine that's functioning per
 expectations, multiple copies of test_run show a relatively low rate of
 semop() calls --- a few per second, at most --- and maybe a delaying
 select() here and there.

 What I actually see on Josh's client's machine is a context swap storm:
 vmstat 1 shows CS rates around 170K/sec.  strace'ing the backends
 shows a corresponding rate of semop() syscalls, with a few delaying
 select()s sprinkled in.  top(1) shows system CPU percent of 25-30
 and idle CPU percent of 16-20.

 I haven't bothered to check how long the test_run query takes, but if it
 ends while you're still examining the behavior, just start it again.

 Note the test case assumes you've got shared_buffers set to at least
 1000; with smaller values, you may get some I/O syscalls, which will
 probably skew the results.

   regards, tom lane



-- 
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou   +33-5-61-50-97-01 (Fax)
31190 AUTERIVE   +33-6-07-63-80-64 (GSM)
FRANCE  Email: [EMAIL PROTECTED]
--
Make your life a dream, make your dream a reality. (St Exupery)

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread Dave Cramer
After some testing if you use the current head code for s_lock.c which
has some mods in it to alleviate this situation, and change
SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test.
I am seeing a slight degradation in throughput using pgbench -c 10 -t
1000 but it might be liveable, considering the alternative is unbearable
in some situations.

Can anyone else replicate my results?

Dave
On Wed, 2004-04-21 at 08:10, Dirk_Lutzebäck wrote:
 It is intended to run indefinately.
 
 Dirk
 
 [EMAIL PROTECTED] wrote:
 
 How long is this test supposed to run?
 
 I've launched just 1 for testing, the plan seems horrible; the test is cpu
 bound and hasn't finished yet after 17:02 min of CPU time, dual XEON 2.6G
 Unixware 713
 
 The machine is a Fujitsu-Siemens TX 200 server
   
 
 
 
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 
 
 
 !DSPAM:40866735106778584283649!
 
 
-- 
Dave Cramer
519 939 0336
ICQ # 14675561


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread Josh Berkus
Dave,

 After some testing if you use the current head code for s_lock.c which
 has some mods in it to alleviate this situation, and change
 SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test.
 I am seeing a slight degradation in throughput using pgbench -c 10 -t
 1000 but it might be liveable, considering the alternative is unbearable
 in some situations.

 Can anyone else replicate my results?

Can you produce a patch against 7.4.1?   I'd like to test your fix against a 
real-world database.


-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread Paul Tuckfield
Dave:

Why would test and set increase context swtches:
Note that it *does not increase* context swtiches when the two threads 
are on the two cores of a single Xeon processor. (use taskset to force 
affinity on linux)

Scenario:
If the two test and set processes are testing and setting the same bit 
as each other, then they'll see worst case cache coherency misses.  
They'll ping a cache line back and forth between CPUs.  Another case, 
might be that they're tesing and setting different bits or words, but 
those bits or words are always in the same cache line, again causing 
worst case cache coherency and misses.  The fact that tis doesn't 
happen when the threads are bound to the 2 cores of a single Xeon 
suggests it's because they're now sharing L1 cache. No pings/bounces.

I wonder do the threads stall so badly when pinging cache lines back 
and forth,  that the kernel sees it as an opportunity to put the 
process to sleep? or do these worst case misses cause an interrupt?

My question is:  What is it that the two threads waiting for when they 
spin? Is it exactly the same resource, or two resources that happen to 
have test-and-set flags in the same cache line?

On Apr 20, 2004, at 7:41 PM, Dave Cramer wrote:

I modified the code in s_lock.c to remove the spins

#define SPINS_PER_DELAY 1

and it doesn't exhibit the behaviour

This effectively changes the code to

while(TAS(lock))
select(1); // 10ms
Can anyone explain why executing TAS 100 times would increase context
switches ?
Dave

On Tue, 2004-04-20 at 12:59, Josh Berkus wrote:
Anjan,

Quad 2.0GHz XEON with highest load we have seen on the applications, 
DB
performing great -
Can you run Tom's test?   It takes a particular pattern of data 
access to
reproduce the issue.
--
Dave Cramer
519 939 0336
ICQ # 14675561
---(end of 
broadcast)---
TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread Tom Lane
Paul Tuckfield [EMAIL PROTECTED] writes:
 I wonder do the threads stall so badly when pinging cache lines back 
 and forth,  that the kernel sees it as an opportunity to put the 
 process to sleep? or do these worst case misses cause an interrupt?

No; AFAICS the kernel could not even be aware of that behavior.

The context swap storm is happening because of contention at the next
level up (LWLocks rather than spinlocks).  It could be an independent
issue that just happens to be triggered by the same sort of access
pattern.  I put forward a hypothesis that the cache miss storm caused by
the test-and-set ops induces the context swap storm by making the code
more likely to be executing in certain places at certain times ... but
it's only a hypothesis.

Yesterday evening I had pretty well convinced myself that they were
indeed independent issues: profiling on a single-CPU machine was telling
me that the test case I proposed spends over 10% of its time inside
ReadBuffer, which certainly seems like enough to explain a high rate of
contention on the BufMgrLock, without any assumptions about funny
behavior at the hardware level.  However, your report and Dave's suggest
that there really is some linkage.  So I'm still confused.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread Dave Cramer
FYI,

I am doing my testing on non hyperthreading dual athlons. 

Also, the test and set is attempting to set the same resource, and not
simply a bit. It's really an lock;xchg in assemblelr.

Also we are using the PAUSE mnemonic, so we should not be seeing any
cache coherency issues, as the cache is being taken out of the picture
AFAICS ?

Dave

On Wed, 2004-04-21 at 14:19, Paul Tuckfield wrote:
 Dave:
 
 Why would test and set increase context swtches:
 Note that it *does not increase* context swtiches when the two threads 
 are on the two cores of a single Xeon processor. (use taskset to force 
 affinity on linux)
 
 Scenario:
 If the two test and set processes are testing and setting the same bit 
 as each other, then they'll see worst case cache coherency misses.  
 They'll ping a cache line back and forth between CPUs.  Another case, 
 might be that they're tesing and setting different bits or words, but 
 those bits or words are always in the same cache line, again causing 
 worst case cache coherency and misses.  The fact that tis doesn't 
 happen when the threads are bound to the 2 cores of a single Xeon 
 suggests it's because they're now sharing L1 cache. No pings/bounces.
 
 
 I wonder do the threads stall so badly when pinging cache lines back 
 and forth,  that the kernel sees it as an opportunity to put the 
 process to sleep? or do these worst case misses cause an interrupt?
 
 My question is:  What is it that the two threads waiting for when they 
 spin? Is it exactly the same resource, or two resources that happen to 
 have test-and-set flags in the same cache line?
 
 On Apr 20, 2004, at 7:41 PM, Dave Cramer wrote:
 
  I modified the code in s_lock.c to remove the spins
 
  #define SPINS_PER_DELAY 1
 
  and it doesn't exhibit the behaviour
 
  This effectively changes the code to
 
 
  while(TAS(lock))
  select(1); // 10ms
 
  Can anyone explain why executing TAS 100 times would increase context
  switches ?
 
  Dave
 
 
  On Tue, 2004-04-20 at 12:59, Josh Berkus wrote:
  Anjan,
 
  Quad 2.0GHz XEON with highest load we have seen on the applications, 
  DB
  performing great -
 
  Can you run Tom's test?   It takes a particular pattern of data 
  access to
  reproduce the issue.
  -- 
  Dave Cramer
  519 939 0336
  ICQ # 14675561
 
 
  ---(end of 
  broadcast)---
  TIP 8: explain analyze is your friend
 
 
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 
 
 
 !DSPAM:4086c4d0263544680737483!
 
 
-- 
Dave Cramer
519 939 0336
ICQ # 14675561


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-21 Thread Dave Cramer
attached.
-- 
Dave Cramer
519 939 0336
ICQ # 14675561
Index: backend/storage/lmgr/s_lock.c
===
RCS file: /usr/local/cvs/pgsql-server/src/backend/storage/lmgr/s_lock.c,v
retrieving revision 1.16
diff -c -r1.16 s_lock.c
*** backend/storage/lmgr/s_lock.c	8 Aug 2003 21:42:00 -	1.16
--- backend/storage/lmgr/s_lock.c	21 Apr 2004 20:27:34 -
***
*** 76,82 
  	 * The select() delays are measured in centiseconds (0.01 sec) because 10
  	 * msec is a common resolution limit at the OS level.
  	 */
! #define SPINS_PER_DELAY		100
  #define NUM_DELAYS			1000
  #define MIN_DELAY_CSEC		1
  #define MAX_DELAY_CSEC		100
--- 76,82 
  	 * The select() delays are measured in centiseconds (0.01 sec) because 10
  	 * msec is a common resolution limit at the OS level.
  	 */
! #define SPINS_PER_DELAY		10
  #define NUM_DELAYS			1000
  #define MIN_DELAY_CSEC		1
  #define MAX_DELAY_CSEC		100
***
*** 88,93 
--- 88,94 
  
  	while (TAS(lock))
  	{
+ 		__asm__ __volatile__ ( rep;nop: : :memory);
  		if (++spins  SPINS_PER_DELAY)
  		{
  			if (++delays  NUM_DELAYS)
Index: include/storage/s_lock.h
===
RCS file: /usr/local/cvs/pgsql-server/src/include/storage/s_lock.h,v
retrieving revision 1.115.2.1
diff -c -r1.115.2.1 s_lock.h
*** include/storage/s_lock.h	4 Nov 2003 09:43:56 -	1.115.2.1
--- include/storage/s_lock.h	21 Apr 2004 20:26:25 -
***
*** 103,110 
  	register slock_t _res = 1;
  
  	__asm__ __volatile__(
! 			lock			\n
  			xchgb	%0,%1	\n
  :		=q(_res), =m(*lock)
  :		0(_res));
  	return (int) _res;
--- 103,113 
  	register slock_t _res = 1;
  
  	__asm__ __volatile__(
! 		   cmpb $0,%1  \n
! 		   jne 1f  \n
! 			lock		\n
  			xchgb	%0,%1	\n
+ 		   1:\n
  :		=q(_res), =m(*lock)
  :		0(_res));
  	return (int) _res;

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-21 Thread Tom Lane
Kenneth Marshall [EMAIL PROTECTED] writes:
 If the context swap storm derives from LWLock contention, maybe using
 a random order to assign buffer locks in buf_init.c would prevent
 simple adjacency of buffer allocation to cause the storm.

Good try, but no cigar ;-).  The test cases I've been looking at take
only shared locks on the per-buffer locks, so that's not where the
context swaps are coming from.  The swaps have to be caused by the
BufMgrLock, because that's the only exclusive lock being taken.

I did try increasing the allocated size of the spinlocks to 128 bytes
to see if it would do anything.  It didn't ...

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-21 Thread Tom Lane
Dave Cramer [EMAIL PROTECTED] writes:
 diff -c -r1.16 s_lock.c
 *** backend/storage/lmgr/s_lock.c 8 Aug 2003 21:42:00 -   1.16
 --- backend/storage/lmgr/s_lock.c 21 Apr 2004 20:27:34 -
 ***
 *** 76,82 
* The select() delays are measured in centiseconds (0.01 sec) because 10
* msec is a common resolution limit at the OS level.
*/
 ! #define SPINS_PER_DELAY 100
   #define NUM_DELAYS  1000
   #define MIN_DELAY_CSEC  1
   #define MAX_DELAY_CSEC  100
 --- 76,82 
* The select() delays are measured in centiseconds (0.01 sec) because 10
* msec is a common resolution limit at the OS level.
*/
 ! #define SPINS_PER_DELAY 10
   #define NUM_DELAYS  1000
   #define MIN_DELAY_CSEC  1
   #define MAX_DELAY_CSEC  100


As far as I can tell, this does reduce the rate of semop's
significantly, but it does so by bringing the overall processing rate
to a crawl :-(.  I see 97% CPU idle time when using this patch.
I believe what is happening is that the select() delay in s_lock.c is
being hit frequently because the spin loop isn't allowed to run long
enough to let the other processor get out of the spinlock.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1

2004-04-21 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 For BSDOS it has:

 #if (CLIENT_OS == OS_FREEBSD) || (CLIENT_OS == OS_BSDOS) || \
 (CLIENT_OS == OS_OPENBSD) || (CLIENT_OS == OS_NETBSD)
 { /* comment out if inappropriate for your *bsd - cyp (25/may/1999) */
   int ncpus; size_t len = sizeof(ncpus);
   int mib[2]; mib[0] = CTL_HW; mib[1] = HW_NCPU;
   if (sysctl( mib[0], 2, ncpus, len, NULL, 0 ) == 0)
   //if (sysctlbyname(hw.ncpu, ncpus, len, NULL, 0 ) == 0)
 cpucount = ncpus;
 }

Multiplied by how many platforms?  Ewww...

I was wondering about some sort of dynamic adaptation, roughly along the
lines of whenever a spin loop successfully gets the lock after
spinning, decrease the allowed loop count by one; whenever we fail to
get the lock after spinning, increase by 100; if the loop count reaches,
say, 1, decide we are on a uniprocessor and irreversibly set it to
1.  As written this would tend to incur a select() delay once per
hundred spinlock acquisitions, which is way too much, but I think we
could make it work with a sufficiently slow adaptation rate.  The tricky
part is that a slow adaptation rate means we can't have every backend
figuring this out for itself --- the right value would have to be
maintained globally, and I'm not sure how to do that without adding a
lot of overhead.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread ohp
Hi Tom,

You still have an account on my Unixware Bi-Xeon hyperthreded machine.
Feel free to use it for your tests.
On Mon, 19 Apr 2004, Tom Lane wrote:

 Date: Mon, 19 Apr 2004 20:53:09 -0400
 From: Tom Lane [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: Joe Conway [EMAIL PROTECTED], scott.marlowe [EMAIL PROTECTED],
  Bruce Momjian [EMAIL PROTECTED], [EMAIL PROTECTED],
  [EMAIL PROTECTED], Neil Conway [EMAIL PROTECTED]
 Subject: Re: [PERFORM] Wierd context-switching issue on Xeon

 I wrote:
  Here is a test case.

 Hmmm ... I've been able to reproduce the CS storm on a dual Athlon,
 which seems to pretty much let the Xeon per se off the hook.  Anybody
 got a multiple Opteron to try?  Totally non-Intel CPUs?

 It would be interesting to see results with non-Linux kernels, too.

   regards, tom lane

 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster


-- 
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou   +33-5-61-50-97-01 (Fax)
31190 AUTERIVE   +33-6-07-63-80-64 (GSM)
FRANCE  Email: [EMAIL PROTECTED]
--
Make your life a dream, make your dream a reality. (St Exupery)

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Dave Cramer
Dual Athlon

With one process running 30 cs/second
with two process running 15000 cs/second

Dave
On Tue, 2004-04-20 at 08:46, Jeff wrote:
 On Apr 19, 2004, at 8:01 PM, Tom Lane wrote:
 [test case]
 
 Quad P3-700Mhz, ServerWorks, pg 7.4.2 - 1 process: 10-30 cs / second
  2 process: 
 100k cs / sec
  3 process: 140k cs 
 / sec
  8 process: 115k cs 
 / sec
 
 Dual P2-450Mhz, non-serverworks (piix)  - 1 process 15-20 / sec
   2 process 30k / sec
 3 (up to 7) process: 
 15k /sec
 
 (Yes, I verified with more processes the cs's drop)
 
 And finally,
 
 6 cpu sun e4500, solaris 2.6, pg 7.4.2: 1 - 10 processes: hovered 
 between 2-3k cs/second (there was other stuff running on the machine as 
 well)
 
 
 Verrry interesting.
 I've got a dual G4 at home, but for convenience Apple doesn't ship a 
 vmstat that tells context switches
 
 --
 Jeff Trout [EMAIL PROTECTED]
 http://www.jefftrout.com/
 http://www.stuarthamm.net/
 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faqs/FAQ.html
 
 
 
 !DSPAM:40851da1199651145780980!
 
 
-- 
Dave Cramer
519 939 0336
ICQ # 14675561


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Matt Clark
As a cross-ref to all the 7.4.x tests people have sent in, here's 7.2.3 (Redhat 7.3), 
Quad Xeon 700MHz/1MB L2 cache, 3GB RAM.

Idle-ish (it's a production server) cs/sec ~5000

3 test queries running:
   procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache   si  sobibo   incs   us  sy  id
 3  0  0  23380 577680 105912 2145140   0   0 0 0  107 116890  50  14  35
 2  0  0  23380 577680 105912 2145140   0   0 0 0  114 118583  50  15  34
 2  0  0  23380 577680 105912 2145140   0   0 0 0  107 115842  54  14  32
 2  1  0  23380 577680 105920 2145140   0   0 032  156 117549  50  16  35

HTH

Matt

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Tom Lane
 Sent: 20 April 2004 01:02
 To: [EMAIL PROTECTED]
 Cc: Joe Conway; scott.marlowe; Bruce Momjian; [EMAIL PROTECTED];
 [EMAIL PROTECTED]; Neil Conway
 Subject: Re: [PERFORM] Wierd context-switching issue on Xeon 
 
 
 Here is a test case.  To set up, run the test_setup.sql script once;
 then launch two copies of the test_run.sql script.  (For those of
 you with more than two CPUs, see whether you need one per CPU to make
 trouble, or whether two test_runs are enough.)  Check that you get a
 nestloops-with-index-scans plan shown by the EXPLAIN in test_run.
 
 In isolation, test_run.sql should do essentially no syscalls at all once
 it's past the initial ramp-up.  On a machine that's functioning per
 expectations, multiple copies of test_run show a relatively low rate of
 semop() calls --- a few per second, at most --- and maybe a delaying
 select() here and there.
 
 What I actually see on Josh's client's machine is a context swap storm:
 vmstat 1 shows CS rates around 170K/sec.  strace'ing the backends
 shows a corresponding rate of semop() syscalls, with a few delaying
 select()s sprinkled in.  top(1) shows system CPU percent of 25-30
 and idle CPU percent of 16-20.
 
 I haven't bothered to check how long the test_run query takes, but if it
 ends while you're still examining the behavior, just start it again.
 
 Note the test case assumes you've got shared_buffers set to at least
 1000; with smaller values, you may get some I/O syscalls, which will
 probably skew the results.
 
   regards, tom lane
 
 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Sven Geisler
Hi Tom,

Just to explain our hardware situation releated to the FSB of the XEON's.
We have older XEON DP in operation with FSB 400 and 2.4 GHz.
The XEON MP box runs with 2.5 GHz.
The XEON MP box is a Fujitsu Siemens Primergy RX600 with ServerWorks GC LE
as chipset.

The box, which Dirk were use to compare the behavior, is our newest XEON DP
system.
This XEON DP box runs with 2.8 GHz and FSB 533 using the Intel 7501 chipset
(Supermicro).

I would agree to Jush. When PostgreSQL has an issue with the INTEL XEON MP
hardware, this is more releated to the chipset.

Back to the SQL-Level. We use SELECT FOR UPDATE as semaphore.
Should we try another implementation for this semahore on the client side to
prevent this issue?

Regards
Sven.

- Original Message - 
From: Tom Lane [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: Josh Berkus [EMAIL PROTECTED]; [EMAIL PROTECTED];
Neil Conway [EMAIL PROTECTED]
Sent: Sunday, April 18, 2004 11:47 PM
Subject: Re: [PERFORM] Wierd context-switching issue on Xeon


 After some further digging I think I'm starting to understand what's up
 here, and the really fundamental answer is that a multi-CPU Xeon MP box
 sucks for running Postgres.

 I did a bunch of oprofile measurements on a machine belonging to one of
 Josh's clients, using a test case that involved heavy concurrent access
 to a relatively small amount of data (little enough to fit into Postgres
 shared buffers, so that no I/O or kernel calls were really needed once
 the test got going).  I found that by nearly any measure --- elapsed
 time, bus transactions, or machine-clear events --- the spinlock
 acquisitions associated with grabbing and releasing the BufMgrLock took
 an unreasonable fraction of the time.  I saw about 15% of elapsed time,
 40% of bus transactions, and nearly 100% of pipeline-clear cycles going
 into what is essentially two instructions out of the entire backend.
 (Pipeline clears occur when the cache coherency logic detects a memory
 write ordering problem.)

 I am not completely clear on why this machine-level bottleneck manifests
 as a lot of context swaps at the OS level.  I think what is happening is
 that because SpinLockAcquire is so slow, a process is much more likely
 than you'd normally expect to arrive at SpinLockAcquire while another
 process is also acquiring the spinlock.  This puts the two processes
 into a lockstep condition where the second process is nearly certain
 to observe the BufMgrLock as locked, and be forced to suspend itself,
 even though the time the first process holds the BufMgrLock is not
 really very long at all.

 If you google for Xeon and cache coherency you'll find quite a bit of
 suggestive information about why this might be more true on the Xeon
 setup than others.  A couple of interesting hits:

 http://www.theinquirer.net/?article=10797
 says that Xeon MP uses a *slower* FSB than Xeon DP.  This would
 translate directly to more time needed to transfer a dirty cache line
 from one processor to the other, which is the basic operation that we're
 talking about here.

 http://www.aceshardware.com/Spades/read.php?article_id=3187
 says that Opterons use a different cache coherency protocol that is
 fundamentally superior to the Xeon's, because dirty cache data can be
 transferred directly between two processor caches without waiting for
 main memory.

 So in the short term I think we have to tell people that Xeon MP is not
 the most desirable SMP platform to run Postgres on.  (Josh thinks that
 the specific motherboard chipset being used in these machines might
 share some of the blame too.  I don't have any evidence for or against
 that idea, but it's certainly possible.)

 In the long run, however, CPUs continue to get faster than main memory
 and the price of cache contention will continue to rise.  So it seems
 that we need to give up the assumption that SpinLockAcquire is a cheap
 operation.  In the presence of heavy contention it won't be.

 One thing we probably have got to do soon is break up the BufMgrLock
 into multiple finer-grain locks so that there will be less contention.
 However I am wary of doing this incautiously, because if we do it in a
 way that makes for a significant rise in the number of locks that have
 to be acquired to access a buffer, we might end up with a net loss.

 I think Neil Conway was looking into how the bufmgr might be
 restructured to reduce lock contention, but if he had come up with
 anything he didn't mention exactly what.  Neil?

 regards, tom lane




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Dirk Lutzebäck
Dirk Lutzebaeck wrote:

c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro)

performs well and I could not observe context switch peaks here (one 
user active), almost no extra semop calls
Did Tom's test here: with 2 processes I'll reach 200k+ CS with peaks to 
300k CS. Bummer.. Josh, I don't think you can bash the ServerWorks 
chipset here nor bigmem.

Dirk



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Paul Tuckfield
I tried to test how this is related to cache coherency, by forcing 
affinity of the two test_run.sql processes to the two cores (pipelines? 
threads) of a single hyperthreaded xeon processor in an smp xeon box.

When the processes are allowed to run on distinct chips in the smp box, 
the CS storm happens.  When they are bound to the two cores of a 
single hyperthreaded Xeon in the smp box, the CS storm *does* happen.



I used the taskset command:
taskset 01 -p pid for backend of test_run.sql 1
taskset 01 -p pid for backend of test_run.sql 1
I guess that 0 and 1 are the two cores (pipelines? hyper-threads?) on 
the first Xeon processor in the box.

I did this on RedHat Fedora core1 on an intel motherboard (I'll get the 
part no if it matters)

during storms :  300k CS/sec, 75% idle (on a dual xeon (four core)) 
machine (suggesting serializing/sleeping processes)
no storm:   50k CS/sec,  50% idle (suggesting 2 cpu bound processes)

Maybe there's a hot block that is bouncing back and forth between 
caches? or maybe the page holding semaphores?

On Apr 19, 2004, at 5:53 PM, Tom Lane wrote:

I wrote:
Here is a test case.
Hmmm ... I've been able to reproduce the CS storm on a dual Athlon,
which seems to pretty much let the Xeon per se off the hook.  Anybody
got a multiple Opteron to try?  Totally non-Intel CPUs?
It would be interesting to see results with non-Linux kernels, too.

			regards, tom lane

---(end of 
broadcast)---
TIP 4: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Paul Tuckfield
Ooops, what I meant to say was that 2 threads bound to one 
(hyperthreaded) cpu does *NOT* cause the storm, even on an smp xeon.

Therefore, the context switches may be a result of cache coherency 
related delays.  (2 threads on one hyperthreaded cpu presumably have 
tightly coupled 1,l2 cache.)

On Apr 20, 2004, at 1:02 PM, Paul Tuckfield wrote:

I tried to test how this is related to cache coherency, by forcing 
affinity of the two test_run.sql processes to the two cores 
(pipelines? threads) of a single hyperthreaded xeon processor in an 
smp xeon box.

When the processes are allowed to run on distinct chips in the smp 
box, the CS storm happens.  When they are bound to the two cores of 
a single hyperthreaded Xeon in the smp box, the CS storm *does* 
happen.
 er, meant *NOT HAPPEN*


I used the taskset command:
taskset 01 -p pid for backend of test_run.sql 1
taskset 01 -p pid for backend of test_run.sql 1
I guess that 0 and 1 are the two cores (pipelines? hyper-threads?) on 
the first Xeon processor in the box.

I did this on RedHat Fedora core1 on an intel motherboard (I'll get 
the part no if it matters)

during storms :  300k CS/sec, 75% idle (on a dual xeon (four core)) 
machine (suggesting serializing/sleeping processes)
no storm:   50k CS/sec,  50% idle (suggesting 2 cpu bound processes)

Maybe there's a hot block that is bouncing back and forth between 
caches? or maybe the page holding semaphores?

On Apr 19, 2004, at 5:53 PM, Tom Lane wrote:

I wrote:
Here is a test case.
Hmmm ... I've been able to reproduce the CS storm on a dual Athlon,
which seems to pretty much let the Xeon per se off the hook.  Anybody
got a multiple Opteron to try?  Totally non-Intel CPUs?
It would be interesting to see results with non-Linux kernels, too.

			regards, tom lane

---(end of 
broadcast)---
TIP 4: Don't 'kill -9' the postmaster



---(end of 
broadcast)---
TIP 5: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faqs/FAQ.html



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Josh Berkus
Dirk, Tom,

OK, off IRC, I have the following reports:

Linux 2.4.21 or 2.4.20 on dual Pentium III : problem verified
Linux 2.4.21 or 2.4.20 on dual Penitum II : problem cannot be reproduced
Solaris 2.6 on 6 cpu e4500 (using 8 processes) : problem not reproduced

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread J. Andrew Rogers
I verified problem on a Dual Opteron server.  I temporarily killed the
normal load, so the server was largely idle when the test was run.

Hardware:
2x Opteron 242
Rioworks HDAMA server board
4Gb RAM

OS Kernel:
RedHat9 + XFS


1 proc: 10-15 cs/sec
2 proc: 400,000-420,000 cs/sec



j. andrew rogers




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Anjan Dave
If this helps - 

Quad 2.0GHz XEON with highest load we have seen on the applications, DB performing 
great - 

   procs  memory  swap  io system  cpu
 r  b  w   swpd   free   buff  cache   si   sobibo   incs us sy id
 1  0  0   1616 351820  66144 1081370400 2 01 1  0  2  7
 3  0  0   1616 349712  66144 1081373600 8  1634 1362  4650  4  2 95
 0  0  0   1616 347768  66144 1081412000   188  1218 1158  4203  5  1 93
 0  0  1   1616 346596  66164 1081418400 8  1972 1394  4773  4  1 94
 2  0  1   1616 345424  66164 108142720020  1392 1184  4197  4  2 94

Around 4k CS/sec
Chipset is Intel ServerWorks GC-HE.
Linux Kernel 2.4.20-28.9bigmem #1 SMP

Thanks,
Anjan


-Original Message-
From: Dirk Lutzebäck [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 20, 2004 10:29 AM
To: Tom Lane; Josh Berkus
Cc: [EMAIL PROTECTED]; Neil Conway
Subject: Re: [PERFORM] Wierd context-switching issue on Xeon

Dirk Lutzebaeck wrote:

 c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro)

 performs well and I could not observe context switch peaks here (one 
 user active), almost no extra semop calls

Did Tom's test here: with 2 processes I'll reach 200k+ CS with peaks to 
300k CS. Bummer.. Josh, I don't think you can bash the ServerWorks 
chipset here nor bigmem.

Dirk



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Bruce Momjian
Dirk Lutzebäck wrote:
 Dirk Lutzebaeck wrote:
 
  c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro)
 
  performs well and I could not observe context switch peaks here (one 
  user active), almost no extra semop calls
 
 Did Tom's test here: with 2 processes I'll reach 200k+ CS with peaks to 
 300k CS. Bummer.. Josh, I don't think you can bash the ServerWorks 
 chipset here nor bigmem.

Dave Cramer reproduced the problem on my SuperMicro dual Xeon on BSD/OS.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Josh Berkus
Anjan,

 Quad 2.0GHz XEON with highest load we have seen on the applications, DB
 performing great -

Can you run Tom's test?   It takes a particular pattern of data access to 
reproduce the issue.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Dave Cramer
I modified the code in s_lock.c to remove the spins

#define SPINS_PER_DELAY 1

and it doesn't exhibit the behaviour

This effectively changes the code to 


while(TAS(lock))
select(1); // 10ms

Can anyone explain why executing TAS 100 times would increase context
switches ?

Dave


On Tue, 2004-04-20 at 12:59, Josh Berkus wrote:
 Anjan,
 
  Quad 2.0GHz XEON with highest load we have seen on the applications, DB
  performing great -
 
 Can you run Tom's test?   It takes a particular pattern of data access to 
 reproduce the issue.
-- 
Dave Cramer
519 939 0336
ICQ # 14675561


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Joe Conway
Joe Conway wrote:
In isolation, test_run.sql should do essentially no syscalls at all once
it's past the initial ramp-up.  On a machine that's functioning per
expectations, multiple copies of test_run show a relatively low rate of
semop() calls --- a few per second, at most --- and maybe a delaying
select() here and there.
Here's results for 7.4 on a dual Athlon server running fedora core:

CPU states:  cpuusernice  systemirq  softirq  iowaitidle
   total   86.0%0.0%   52.4%   0.0% 0.0%0.0%   61.2%
   cpu00   37.6%0.0%   29.7%   0.0% 0.0%0.0%   32.6%
   cpu01   48.5%0.0%   22.7%   0.0% 0.0%0.0%   28.7%
procs  memory  swap  io system 
   cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs
 1  0 120448  25764  48300 109457600 0   124  170   187
 1  0 120448  25780  48300 109457600 0 0  15289
 2  0 120448  25744  48300 109458000 060  141 78290
 2  0 120448  25752  48300 109458000 0 0  131 140326
 2  0 120448  25756  48300 109457600 040  122 140100
 2  0 120448  25764  48300 109458400 060  133 136595
 2  0 120448  24284  48300 109458400 0   200  138 135151

The jump in cs corresponds to starting the query in the second session.

Joe

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: RESOLVED: Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Dirk Lutzebäck
Josh, I cannot reproduce the excessive semop() on a Dual XEON DP on a 
non-bigmem kernel, HT on. Interesting to know if the problem is related 
to XEON MP (as Tom wrote) or bigmem.

Josh Berkus wrote:

Dirk,

 

I'm not sure if this semop() problem is still an issue but the database 
behaves a bit out of bounds in this situation, i.e. consuming system 
resources with semop() calls 95% while tables are locked very often and 
longer.
   

It would be helpful to us if you could test this with the indexes disabled on 
the non-Bigmem system.   I'd like to eliminate Bigmem as a factor, if 
possible.

 



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Anjan Dave
What about quad-XEON setups? Could that be worse? (have dual, and quad setups both) 
Shall we re-consider XEON-MP CPU machines with high cache (4MB+)?
 
Very generally, what number would be considered high, especially, if it coincides with 
expected heavy load?
 
Not sure a specific chipset was mentioned...
 
Thanks,
Anjan

-Original Message- 
From: Greg Stark [mailto:[EMAIL PROTECTED] 
Sent: Sun 4/18/2004 8:40 PM 
To: Tom Lane 
Cc: [EMAIL PROTECTED]; Josh Berkus; [EMAIL PROTECTED]; Neil Conway 
Subject: Re: [PERFORM] Wierd context-switching issue on Xeon




Tom Lane [EMAIL PROTECTED] writes:

 So in the short term I think we have to tell people that Xeon MP is not
 the most desirable SMP platform to run Postgres on.  (Josh thinks that
 the specific motherboard chipset being used in these machines might
 share some of the blame too.  I don't have any evidence for or against
 that idea, but it's certainly possible.)

 In the long run, however, CPUs continue to get faster than main memory
 and the price of cache contention will continue to rise.  So it seems
 that we need to give up the assumption that SpinLockAcquire is a cheap
 operation.  In the presence of heavy contention it won't be.

There's nothing about the way Postgres spinlocks are coded that affects this?

Is it something the kernel could help with? I've been wondering whether
there's any benefits postgres is missing out on by using its own hand-rolled
locking instead of using the pthreads infrastructure that the kernel is often
involved in.

--
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread J. Andrew Rogers

I decided to check the context-switching behavior here for baseline
since we have a rather diverse set of postgres server hardware, though
nothing using Xeon MP that is also running a postgres instance, and
everything looks normal under load.  Some platforms are better than
others, but nothing is outside of what I would consider normal bounds.

Our biggest database servers are Opteron SMP systems, and these servers
are particularly well-behaved under load with Postgres 7.4.2.  If there
is a problem with the locking code and context-switching, it sure isn't
manifesting on our Opteron SMP systems.  Under rare confluences of
process interaction, we occasionally see short spikes in the 2-3,000
cs/sec range.  It typically peaks at a couple hundred cs/sec under load.
Obviously this is going to be a function of our load profile a certain
extent.

The Opterons have proven to be very good database hardware in general
for us.


j. andrew rogers








---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 The other thing I'd like your comment on, Tom, is that Dirk appears to have 
 reported that when he installed a non-bigmem kernel, the issue went away.   
 Dirk, is this correct?

I'd be really surprised if that had anything to do with it.  AFAIR
Dirk's test changed more than one variable and so didn't prove a
connection.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Joe Conway
scott.marlowe wrote:
On Mon, 19 Apr 2004, Bruce Momjian wrote:
I have BSD on a SuperMicro dual Xeon, so if folks want another
hardware/OS combination to test, I can give out logins to my machine.
I can probably do some nighttime testing on a dual 2800MHz non-MP Xeon 
machine as well.  It's a Dell 2600 series machine and very fast.  It has 
the moderately fast 533MHz FSB so may not have as many problems as the MP 
type CPUs seem to be having.
I've got a quad 2.8Ghz MP Xeon (IBM x445) that I could test on. Does 
anyone have a test set that can reliably reproduce the problem?

Joe

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Josh Berkus
Joe,

 I've got a quad 2.8Ghz MP Xeon (IBM x445) that I could test on. Does 
 anyone have a test set that can reliably reproduce the problem?

Unfortunately we can't seem to come up with one.So far we have 2 machines 
that exhibit the issue, and their databases are highly confidential (State of 
WA education data).  

It does seem to require a database which is in the many GB ( 10GB), and a 
situation where a small subset of the data is getting hit repeatedly by 
multiple processes.   So you could try your own data warehouse, making sure 
that you have at least 4 connections hitting one query after another.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 I've got a quad 2.8Ghz MP Xeon (IBM x445) that I could test on. Does 
 anyone have a test set that can reliably reproduce the problem?

 Unfortunately we can't seem to come up with one.

 It does seem to require a database which is in the many GB ( 10GB), and a 
 situation where a small subset of the data is getting hit repeatedly by 
 multiple processes.

I do not think a large database is actually necessary; the test case
Josh's client has is only hitting a relatively small amount of data.
The trick seems to be to cause lots and lots of ReadBuffer/ReleaseBuffer
activity without much else happening, and to do this from multiple
backends concurrently.

I believe the best way to make this happen is a lot of relatively simple
(but not short) indexscan queries that in aggregate touch just a bit
less than shared_buffers worth of data.  I have not tried to make a
self-contained test case, but based on what I know now I think it should
be possible.

I'll give this a shot later tonight --- it does seem that trying to
reproduce the problem on different kinds of hardware is the next useful
step we can take.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Tom Lane
Here is a test case.  To set up, run the test_setup.sql script once;
then launch two copies of the test_run.sql script.  (For those of
you with more than two CPUs, see whether you need one per CPU to make
trouble, or whether two test_runs are enough.)  Check that you get a
nestloops-with-index-scans plan shown by the EXPLAIN in test_run.

In isolation, test_run.sql should do essentially no syscalls at all once
it's past the initial ramp-up.  On a machine that's functioning per
expectations, multiple copies of test_run show a relatively low rate of
semop() calls --- a few per second, at most --- and maybe a delaying
select() here and there.

What I actually see on Josh's client's machine is a context swap storm:
vmstat 1 shows CS rates around 170K/sec.  strace'ing the backends
shows a corresponding rate of semop() syscalls, with a few delaying
select()s sprinkled in.  top(1) shows system CPU percent of 25-30
and idle CPU percent of 16-20.

I haven't bothered to check how long the test_run query takes, but if it
ends while you're still examining the behavior, just start it again.

Note the test case assumes you've got shared_buffers set to at least
1000; with smaller values, you may get some I/O syscalls, which will
probably skew the results.

regards, tom lane

drop table test_data;

create table test_data(f1 int);

insert into test_data values (random() * 100);
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;
insert into test_data select random() * 100 from test_data;

create index test_index on test_data(f1);

vacuum verbose analyze test_data;
checkpoint;
-- force nestloop indexscan plan
set enable_seqscan to 0;
set enable_mergejoin to 0;
set enable_hashjoin to 0;

explain
select count(*) from test_data a, test_data b, test_data c
where a.f1 = b.f1 and b.f1 = c.f1;

select count(*) from test_data a, test_data b, test_data c
where a.f1 = b.f1 and b.f1 = c.f1;

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Tom Lane
I wrote:
 Here is a test case.

Hmmm ... I've been able to reproduce the CS storm on a dual Athlon,
which seems to pretty much let the Xeon per se off the hook.  Anybody
got a multiple Opteron to try?  Totally non-Intel CPUs?

It would be interesting to see results with non-Linux kernels, too.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Joe Conway
Tom Lane wrote:
Here is a test case.  To set up, run the test_setup.sql script once;
then launch two copies of the test_run.sql script.  (For those of
you with more than two CPUs, see whether you need one per CPU to make
trouble, or whether two test_runs are enough.)  Check that you get a
nestloops-with-index-scans plan shown by the EXPLAIN in test_run.
Check.

In isolation, test_run.sql should do essentially no syscalls at all once
it's past the initial ramp-up.  On a machine that's functioning per
expectations, multiple copies of test_run show a relatively low rate of
semop() calls --- a few per second, at most --- and maybe a delaying
select() here and there.
What I actually see on Josh's client's machine is a context swap storm:
vmstat 1 shows CS rates around 170K/sec.  strace'ing the backends
shows a corresponding rate of semop() syscalls, with a few delaying
select()s sprinkled in.  top(1) shows system CPU percent of 25-30
and idle CPU percent of 16-20.
Your test case works perfectly. I ran 4 concurrent psql sessions, on a 
quad Xeon (IBM x445, 2.8GHz, 4GB RAM), hyperthreaded. Heres what 'top' 
looks like:

177 processes: 173 sleeping, 3 running, 1 zombie, 0 stopped
CPU states:  cpuusernice  systemirq  softirq  iowaitidle
   total   35.9%0.0%7.2%   0.0% 0.0%0.0%   56.8%
   cpu00   19.6%0.0%4.9%   0.0% 0.0%0.0%   75.4%
   cpu01   44.1%0.0%7.8%   0.0% 0.0%0.0%   48.0%
   cpu020.0%0.0%0.0%   0.0% 0.0%0.0%  100.0%
   cpu03   32.3%0.0%   13.7%   0.0% 0.0%0.0%   53.9%
   cpu04   21.5%0.0%   10.7%   0.0% 0.0%0.0%   67.6%
   cpu05   42.1%0.0%9.8%   0.0% 0.0%0.0%   48.0%
   cpu06  100.0%0.0%0.0%   0.0% 0.0%0.0%0.0%
   cpu07   27.4%0.0%   10.7%   0.0% 0.0%0.0%   61.7%
Mem: 4123700k av, 3933896k used, 189804k free, 0k shrd, 221948k buff
  2492124k actv,  760612k in_d,   41416k in_c
Swap: 2040244k av, 5632k used, 2034612k free 3113272k cached
Note that cpu06 is not a postgres process. The output of vmstat looks 
like this:

# vmstat 1
procs  memory  swap  io system 
   cpu
r  b swpd   free   buff  cache  si  so   bi   bo  in   cs us sy id wa
4  0 5632 184264 221948 3113308  0   000   00  0  0  0  0
3  0 5632 184264 221948 3113308  0   000  112 211894 36  9 55  0
5  0 5632 184264 221948 3113308  0   000  125 222071 39  8 53  0
4  0 5632 184264 221948 3113308  0   000  110 215097 39 10 52  0
1  0 5632 184588 221948 3113308  0   00   96  139 187561 35 10 55  0
3  0 5632 184588 221948 3113308  0   000  114 241731 38 10 52  0
3  0 5632 184920 221948 3113308  0   000  132 257168 40  9 51  0
1  0 5632 184912 221948 3113308  0   000  114 251802 38  9 54  0

Note the test case assumes you've got shared_buffers set to at least
1000; with smaller values, you may get some I/O syscalls, which will
probably skew the results.
 shared_buffers

 16384
(1 row)
I found that killing three of the four concurrent queries dropped 
context switches to about 70,000 to 100,000. Two or more sessions brings 
it up to 200K+.

Joe

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Robert Creager
When grilled further on (Mon, 19 Apr 2004 20:53:09 -0400),
Tom Lane [EMAIL PROTECTED] confessed:

 I wrote:
  Here is a test case.
 
 Hmmm ... I've been able to reproduce the CS storm on a dual Athlon,
 which seems to pretty much let the Xeon per se off the hook.  Anybody
 got a multiple Opteron to try?  Totally non-Intel CPUs?
 
 It would be interesting to see results with non-Linux kernels, too.
 

Same problem on my dual AMD MP with 2.6.5 kernel using two sessions of your
test, but maybe not quite as severe. The highest CS values I saw was 102k, with
some non-db number crunching going on in parallel with the test.  'Average'
about 80k with two instances.  Using the anticipatory scheduler.

A single instance pulls in around 200-300 CS, and no tests running around
200-300 CS (i.e. no CS difference).

A snipet:

procs ---memory-- ---swap-- -io --system-- cpu
 3  0284  90624  93452 145374000 0 0 1075 76548 83 17  0  0
 6  0284 125312  93452 147019600 0 0 1073 87702 78 22  0  0
 3  0284 178392  93460 14202080076   298 1083 67721 77 24  0  0
 4  0284 177120  93460 142150000  1104 0 1054 89593 80 21  0  0
 5  0284 173504  93460 142517200  3584 0 1110 65536 81 19  0  0
 4  0284 169984  93460 142870800  3456 0 1098 66937 81 20  0  0
 6  0284 170944  93460 142870800 8 0 1045 66065 81 19  0  0
 6  0284 167288  93460 142877600 0 8 1097 75560 81 19  0  0
 6  0284 136296  93460 145835600 0 0 1036 80808 75 26  0  0
 5  0284 132864  93460 146168800 0 0 1007 76071 84 17  0  0
 4  0284 132880  93460 146168800 0 0 1079 86903 82 18  0  0
 5  0284 132880  93460 146168800 0 0 1078 79885 83 17  0  0
 6  0284 132648  93460 146168800 0   760 1228 66564 86 14  0  0
 6  0284 132648  93460 146168800 0 0 1047 69741 86 15  0  0
 6  0284 132672  93460 146168800 0 0 1057 79052 84 16  0  0
 5  0284 132672  93460 146168800 0 0 1054 81109 82 18  0  0
 5  0284 132736  93460 146168800 0 0 1043 91725 80 20  0  0


Cheers,
Rob

-- 
 21:33:03 up 3 days,  1:10,  3 users,  load average: 5.05, 4.67, 4.22
Linux 2.6.5-01 #5 SMP Tue Apr 6 21:32:39 MDT 2004


pgp0.pgp
Description: PGP signature


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread jelle

Same problem with dual 1Ghz P3's running Postgres 7.4.2, linux 2.4.x, and 
2GB ram, under load, with long transactions (i.e. 1 cannot serialize 
rollback per minute). 200K was the worst observed with vmstat.

Finally moved DB to a single xeon box.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-18 Thread Tom Lane
After some further digging I think I'm starting to understand what's up
here, and the really fundamental answer is that a multi-CPU Xeon MP box
sucks for running Postgres.

I did a bunch of oprofile measurements on a machine belonging to one of
Josh's clients, using a test case that involved heavy concurrent access
to a relatively small amount of data (little enough to fit into Postgres
shared buffers, so that no I/O or kernel calls were really needed once
the test got going).  I found that by nearly any measure --- elapsed
time, bus transactions, or machine-clear events --- the spinlock
acquisitions associated with grabbing and releasing the BufMgrLock took
an unreasonable fraction of the time.  I saw about 15% of elapsed time,
40% of bus transactions, and nearly 100% of pipeline-clear cycles going
into what is essentially two instructions out of the entire backend.
(Pipeline clears occur when the cache coherency logic detects a memory
write ordering problem.)

I am not completely clear on why this machine-level bottleneck manifests
as a lot of context swaps at the OS level.  I think what is happening is
that because SpinLockAcquire is so slow, a process is much more likely
than you'd normally expect to arrive at SpinLockAcquire while another
process is also acquiring the spinlock.  This puts the two processes
into a lockstep condition where the second process is nearly certain
to observe the BufMgrLock as locked, and be forced to suspend itself,
even though the time the first process holds the BufMgrLock is not
really very long at all.

If you google for Xeon and cache coherency you'll find quite a bit of
suggestive information about why this might be more true on the Xeon
setup than others.  A couple of interesting hits:

http://www.theinquirer.net/?article=10797
says that Xeon MP uses a *slower* FSB than Xeon DP.  This would
translate directly to more time needed to transfer a dirty cache line
from one processor to the other, which is the basic operation that we're
talking about here.

http://www.aceshardware.com/Spades/read.php?article_id=3187
says that Opterons use a different cache coherency protocol that is
fundamentally superior to the Xeon's, because dirty cache data can be
transferred directly between two processor caches without waiting for
main memory.

So in the short term I think we have to tell people that Xeon MP is not
the most desirable SMP platform to run Postgres on.  (Josh thinks that
the specific motherboard chipset being used in these machines might
share some of the blame too.  I don't have any evidence for or against
that idea, but it's certainly possible.)

In the long run, however, CPUs continue to get faster than main memory
and the price of cache contention will continue to rise.  So it seems
that we need to give up the assumption that SpinLockAcquire is a cheap
operation.  In the presence of heavy contention it won't be.

One thing we probably have got to do soon is break up the BufMgrLock
into multiple finer-grain locks so that there will be less contention.
However I am wary of doing this incautiously, because if we do it in a
way that makes for a significant rise in the number of locks that have
to be acquired to access a buffer, we might end up with a net loss.

I think Neil Conway was looking into how the bufmgr might be
restructured to reduce lock contention, but if he had come up with
anything he didn't mention exactly what.  Neil?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-18 Thread Dave Cramer
So the the kernel/OS is irrelevant here ? this happens on any dual xeon?

What about hypterthreading does it still happen if HTT is turned off ?

Dave
On Sun, 2004-04-18 at 17:47, Tom Lane wrote:
 After some further digging I think I'm starting to understand what's up
 here, and the really fundamental answer is that a multi-CPU Xeon MP box
 sucks for running Postgres.
 
 I did a bunch of oprofile measurements on a machine belonging to one of
 Josh's clients, using a test case that involved heavy concurrent access
 to a relatively small amount of data (little enough to fit into Postgres
 shared buffers, so that no I/O or kernel calls were really needed once
 the test got going).  I found that by nearly any measure --- elapsed
 time, bus transactions, or machine-clear events --- the spinlock
 acquisitions associated with grabbing and releasing the BufMgrLock took
 an unreasonable fraction of the time.  I saw about 15% of elapsed time,
 40% of bus transactions, and nearly 100% of pipeline-clear cycles going
 into what is essentially two instructions out of the entire backend.
 (Pipeline clears occur when the cache coherency logic detects a memory
 write ordering problem.)
 
 I am not completely clear on why this machine-level bottleneck manifests
 as a lot of context swaps at the OS level.  I think what is happening is
 that because SpinLockAcquire is so slow, a process is much more likely
 than you'd normally expect to arrive at SpinLockAcquire while another
 process is also acquiring the spinlock.  This puts the two processes
 into a lockstep condition where the second process is nearly certain
 to observe the BufMgrLock as locked, and be forced to suspend itself,
 even though the time the first process holds the BufMgrLock is not
 really very long at all.
 
 If you google for Xeon and cache coherency you'll find quite a bit of
 suggestive information about why this might be more true on the Xeon
 setup than others.  A couple of interesting hits:
 
 http://www.theinquirer.net/?article=10797
 says that Xeon MP uses a *slower* FSB than Xeon DP.  This would
 translate directly to more time needed to transfer a dirty cache line
 from one processor to the other, which is the basic operation that we're
 talking about here.
 
 http://www.aceshardware.com/Spades/read.php?article_id=3187
 says that Opterons use a different cache coherency protocol that is
 fundamentally superior to the Xeon's, because dirty cache data can be
 transferred directly between two processor caches without waiting for
 main memory.
 
 So in the short term I think we have to tell people that Xeon MP is not
 the most desirable SMP platform to run Postgres on.  (Josh thinks that
 the specific motherboard chipset being used in these machines might
 share some of the blame too.  I don't have any evidence for or against
 that idea, but it's certainly possible.)
 
 In the long run, however, CPUs continue to get faster than main memory
 and the price of cache contention will continue to rise.  So it seems
 that we need to give up the assumption that SpinLockAcquire is a cheap
 operation.  In the presence of heavy contention it won't be.
 
 One thing we probably have got to do soon is break up the BufMgrLock
 into multiple finer-grain locks so that there will be less contention.
 However I am wary of doing this incautiously, because if we do it in a
 way that makes for a significant rise in the number of locks that have
 to be acquired to access a buffer, we might end up with a net loss.
 
 I think Neil Conway was looking into how the bufmgr might be
 restructured to reduce lock contention, but if he had come up with
 anything he didn't mention exactly what.  Neil?
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 
 
 
 !DSPAM:4082feb7326901956819835!
 
 
-- 
Dave Cramer
519 939 0336
ICQ # 14675561


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-18 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 So in the short term I think we have to tell people that Xeon MP is not
 the most desirable SMP platform to run Postgres on.  (Josh thinks that
 the specific motherboard chipset being used in these machines might
 share some of the blame too.  I don't have any evidence for or against
 that idea, but it's certainly possible.)
 
 In the long run, however, CPUs continue to get faster than main memory
 and the price of cache contention will continue to rise.  So it seems
 that we need to give up the assumption that SpinLockAcquire is a cheap
 operation.  In the presence of heavy contention it won't be.

There's nothing about the way Postgres spinlocks are coded that affects this?

Is it something the kernel could help with? I've been wondering whether
there's any benefits postgres is missing out on by using its own hand-rolled
locking instead of using the pthreads infrastructure that the kernel is often
involved in.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-18 Thread Tom Lane
Dave Cramer [EMAIL PROTECTED] writes:
 So the the kernel/OS is irrelevant here ? this happens on any dual xeon?

I believe so.  The context-switch behavior might possibly be a little
more pleasant on other kernels, but the underlying spinlock problem is
not dependent on the kernel.

 What about hypterthreading does it still happen if HTT is turned off ?

The problem comes from keeping the caches synchronized between multiple
physical CPUs.  AFAICS enabling HTT wouldn't make it worse, because a
hyperthreaded processor still only has one cache.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-18 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 There's nothing about the way Postgres spinlocks are coded that affects this?

No.  AFAICS our spinlock sequences are pretty much equivalent to the way
the Linux kernel codes its spinlocks, so there's no deep dark knowledge
to be mined there.

We could possibly use some more-efficient blocking mechanism than semop()
once we've decided we have to block (it's a shame Linux still doesn't
have cross-process POSIX semaphores).  But the striking thing I learned
from looking at the oprofile results is that most of the inefficiency
comes at the very first TAS() operation, before we've even spun let
alone decided we have to block.  The s_lock() subroutine does not
account for more than a few percent of the runtime in these tests,
compared to 15% at the inline TAS() operations in LWLockAcquire and
LWLockRelease.  I interpret this to mean that once it's acquired
ownership of the cache line, a Xeon can get through the spinning
loop in s_lock() mighty quickly.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-18 Thread Tom Lane
 What about hypterthreading does it still happen if HTT is turned off ?

 The problem comes from keeping the caches synchronized between multiple
 physical CPUs.  AFAICS enabling HTT wouldn't make it worse, because a
 hyperthreaded processor still only has one cache.

Also, I forgot to say that the numbers I'm quoting *are* with HTT off.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


RESOLVED: Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-16 Thread Dirk Lutzebäck
Tom, Josh,

I think we have the problem resolved after I found the following note 
from Tom:

 A large number of semops may mean that you have excessive contention 
on some lockable
 resource, but I don't have enough info to guess what resource.

This was the key to look at: we were missing all indices on table which 
is used heavily and does lots of locking. After recreating the missing 
indices the production system performed normal. No, more excessive 
semop() calls, load way below 1.0, CS over 20.000 very rare, more in 
thousands realm and less.

This is quite a relief but I am sorry that the problem was so stupid and 
you wasted some time although Tom said he had also seem excessive 
semop() calls on another Dual XEON system.

Hyperthreading was turned off so far but will be turned on again the 
next days. I don't expect any problems then.

I'm not sure if this semop() problem is still an issue but the database 
behaves a bit out of bounds in this situation, i.e. consuming system 
resources with semop() calls 95% while tables are locked very often and 
longer.

Thanks for your help,

Dirk

At last here is the current vmstat 1 excerpt where the problem has been 
resolved:



procs ---memory-- ---swap-- -io --system-- 
cpu
r  b   swpd   free   buff  cache   si   sobibo   incs us sy 
id wa
1  0   2308 232508 201924 697653200   136   464  628   812  5  
1 94  0
0  0   2308 232500 201928 69766280096   296  495   484  4  
0 95  0
0  1   2308 232492 201928 697662800 0   176  347   278  1  
0 99  0
0  0   2308 233484 201928 69765960040   580  443   351  8  
2 90  0
1  0   2308 233484 201928 69766960076   692  792   651  9  
2 88  0
0  0   2308 233484 201928 697669600 020  13234  0  
0 100  0
0  0   2308 233484 201928 697669600 076  17790  0  
0 100  0
0  1   2308 233484 201928 697669600 0   216  321   250  4  
0 96  0
0  0   2308 233484 201928 697669600 0   116  417   240  8  
0 92  0
0  0   2308 233484 201928 69767840048   600  403   270  8  
0 92  0
0  0   2308 233464 201928 69768600076   452 1064  2611 14  
1 84  0
0  0   2308 233460 201932 69769000032   256  587   587 12  
1 87  0
0  0   2308 233460 201932 69769320032   188  379   287  5  
0 94  0
0  0   2308 233460 201932 697693200 0 0  103 8  0  
0 100  0
0  0   2308 233460 201932 697693200 0 0  10214  0  
0 100  0
0  1   2308 233444 201948 697693200 0   348  300   180  1  
0 99  0
1  0   2308 233424 201948 69769480016   380  739   906  4  
2 93  0
0  0   2308 233424 201948 69770320068   260  724   987  7  
0 92  0
0  0   2308 231924 201948 69771280096   344 1130   753 11  
1 88  0
1  0   2308 231924 201948 697724800   112   324  687   628  3  
0 97  0
0  0   2308 231924 201948 697724800 0   192  575   430  5  
0 95  0
1  0   2308 231924 201948 697724800 0   264  208   124  0  
0 100  0
0  0   2308 231924 201948 69772640016   272  380   230  3  
2 95  0
0  0   2308 231924 201948 697726400 0 0  104 8  0  
0 100  0
0  0   2308 231924 201948 697726400 048  25892  1  
0 99  0
0  0   2308 231816 201948 697748400   212   268  456   384  2  
0 98  0
0  0   2308 231816 201948 697748400 088  453   770  0  
0 99  0
0  0   2308 231452 201948 697768000   196   476  615   676  5  
0 94  0
0  0   2308 231452 201948 697768000 0   228  431   400  2  
0 98  0
0  0   2308 231452 201948 697768000 0 0  23758  3  
0 97  0
0  0   2308 231448 201952 697768000 0 0  36584  2  
0 97  0
0  0   2308 231448 201952 697768000 040  246   108  1  
0 99  0
0  0   2308 231448 201952 6960096   352  606  1026  4  
2 94  0
0  0   2308 231448 201952 69600 0   240  295   266  5  
0 95  0



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: RESOLVED: Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-16 Thread Tom Lane
=?ISO-8859-1?Q?Dirk_Lutzeb=E4ck?= [EMAIL PROTECTED] writes:
 This was the key to look at: we were missing all indices on table which 
 is used heavily and does lots of locking. After recreating the missing 
 indices the production system performed normal. No, more excessive 
 semop() calls, load way below 1.0, CS over 20.000 very rare, more in 
 thousands realm and less.

Hmm ... that's darn interesting.  AFAICT the test case I am looking at
for Josh's client has no such SQL-level problem ... but I will go back
and double check ...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Wierd context-switching issue on Xeon

2003-11-25 Thread Josh Berkus
Tom,

 Strictly a WAG ... but what this sounds like to me is disastrously bad
 behavior of the spinlock code under heavy contention.  We thought we'd
 fixed the spinlock code for SMP machines awhile ago, but maybe
 hyperthreading opens some new vistas for misbehavior ...

Yeah, I thought of that based on the discussion on -Hackers.  But we tried 
turning off hyperthreading, with no change in behavior.

 If you can't try 7.4, or want to gather more data first, it would be
 good to try to confirm or disprove the theory that the context switches
 are coming from spinlock delays.  If they are, they'd be coming from the
 select() calls in s_lock() in s_lock.c.  Can you strace or something to
 see what kernel calls the context switches occur on?

Might be worth it ... will suggest that.  Will also try 7.4.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match