So the the kernel/OS is irrelevant here ? this happens on any dual xeon? What about hypterthreading does it still happen if HTT is turned off ?
Dave On Sun, 2004-04-18 at 17:47, Tom Lane wrote: > After some further digging I think I'm starting to understand what's up > here, and the really fundamental answer is that a multi-CPU Xeon MP box > sucks for running Postgres. > > I did a bunch of oprofile measurements on a machine belonging to one of > Josh's clients, using a test case that involved heavy concurrent access > to a relatively small amount of data (little enough to fit into Postgres > shared buffers, so that no I/O or kernel calls were really needed once > the test got going). I found that by nearly any measure --- elapsed > time, bus transactions, or machine-clear events --- the spinlock > acquisitions associated with grabbing and releasing the BufMgrLock took > an unreasonable fraction of the time. I saw about 15% of elapsed time, > 40% of bus transactions, and nearly 100% of pipeline-clear cycles going > into what is essentially two instructions out of the entire backend. > (Pipeline clears occur when the cache coherency logic detects a memory > write ordering problem.) > > I am not completely clear on why this machine-level bottleneck manifests > as a lot of context swaps at the OS level. I think what is happening is > that because SpinLockAcquire is so slow, a process is much more likely > than you'd normally expect to arrive at SpinLockAcquire while another > process is also acquiring the spinlock. This puts the two processes > into a "lockstep" condition where the second process is nearly certain > to observe the BufMgrLock as locked, and be forced to suspend itself, > even though the time the first process holds the BufMgrLock is not > really very long at all. > > If you google for Xeon and "cache coherency" you'll find quite a bit of > suggestive information about why this might be more true on the Xeon > setup than others. A couple of interesting hits: > > http://www.theinquirer.net/?article=10797 > says that Xeon MP uses a *slower* FSB than Xeon DP. This would > translate directly to more time needed to transfer a dirty cache line > from one processor to the other, which is the basic operation that we're > talking about here. > > http://www.aceshardware.com/Spades/read.php?article_id=30000187 > says that Opterons use a different cache coherency protocol that is > fundamentally superior to the Xeon's, because dirty cache data can be > transferred directly between two processor caches without waiting for > main memory. > > So in the short term I think we have to tell people that Xeon MP is not > the most desirable SMP platform to run Postgres on. (Josh thinks that > the specific motherboard chipset being used in these machines might > share some of the blame too. I don't have any evidence for or against > that idea, but it's certainly possible.) > > In the long run, however, CPUs continue to get faster than main memory > and the price of cache contention will continue to rise. So it seems > that we need to give up the assumption that SpinLockAcquire is a cheap > operation. In the presence of heavy contention it won't be. > > One thing we probably have got to do soon is break up the BufMgrLock > into multiple finer-grain locks so that there will be less contention. > However I am wary of doing this incautiously, because if we do it in a > way that makes for a significant rise in the number of locks that have > to be acquired to access a buffer, we might end up with a net loss. > > I think Neil Conway was looking into how the bufmgr might be > restructured to reduce lock contention, but if he had come up with > anything he didn't mention exactly what. Neil? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) > > > > !DSPAM:4082feb7326901956819835! > > -- Dave Cramer 519 939 0336 ICQ # 14675561 ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings