Re: [HACKERS] spinlocks on powerpc
On 2012-01-03 04:44, Robert Haas wrote: On read-only workloads, you get spinlock contention, because everyone who wants a snapshot has to take the LWLock mutex to increment the shared lock count and again (just a moment later) to decrement it. Does the LWLock protect anything but the shared lock count? If not then the usually quickest manipulation is along the lines of: loop: lwarx r5,0,r3 #load and reserve add r0,r4,r5 #increment word stwcx. r0,0,r3 #store new value if still reserved bne-loop #loop if lost reservation (per IBM's software ref manual, https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600719DF2 ) The same sort of thing generally holds on other instruction-sets also. Also, heavy-contention locks should be placed in cache lines away from other data (to avoid thrashing the data cache lines when processors are fighting over the lock cache lines). -- Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on powerpc
On Tue, Jan 3, 2012 at 3:05 PM, Jeremy Harris j...@wizmail.org wrote: On 2012-01-03 04:44, Robert Haas wrote: On read-only workloads, you get spinlock contention, because everyone who wants a snapshot has to take the LWLock mutex to increment the shared lock count and again (just a moment later) to decrement it. Does the LWLock protect anything but the shared lock count? If not then the usually quickest manipulation is along the lines of: loop: lwarx r5,0,r3 #load and reserve add r0,r4,r5 #increment word stwcx. r0,0,r3 #store new value if still reserved bne- loop #loop if lost reservation (per IBM's software ref manual, https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600719DF2 ) The same sort of thing generally holds on other instruction-sets also. Sure, but the actual critical section is not that simple. You might look at the code for LWLockAcquire() if you're curious. Also, heavy-contention locks should be placed in cache lines away from other data (to avoid thrashing the data cache lines when processors are fighting over the lock cache lines). Yep. This is possibly a problem, and has been discussed before, but I don't think we have any firm evidence that it's a problem, or how much padding helps. The heavily contended LWLocks are mostly non-consecutive, except perhaps for the buffer mapping locks. It's been suggested to me that we should replace our existing LWLock implementation with a CAS-based implementation that crams all the relevant details into a single 8-byte word. The pointer to the head of the wait queue, for example, could be stored as an offset into the allProcs array rather than a pointer value, which would allow it to be stored in 24 bits rather than 8 bytes. But there's not quite enough bit space to make it work without making compromises -- most likely, reducing the theoretical upper limit on MaxBackends from 2^24 to, say, 2^22. Even if we were willing to do that, the performance benefits of using atomics here are so far unproven... which doesn't mean they don't exist, but someone's going to have to do some work to show that they do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on powerpc
Robert Haas robertmh...@gmail.com writes: On Tue, Jan 3, 2012 at 3:05 PM, Jeremy Harris j...@wizmail.org wrote: Also, heavy-contention locks should be placed in cache lines away from other data (to avoid thrashing the data cache lines when processors are fighting over the lock cache lines). Yep. This is possibly a problem, and has been discussed before, but I don't think we have any firm evidence that it's a problem, or how much padding helps. The heavily contended LWLocks are mostly non-consecutive, except perhaps for the buffer mapping locks. We are in fact already padding and aligning LWLocks on (if memory serves) 16 or 32 byte boundaries depending on platform. So there might be 2 to 4 LWLocks in the same cache line, depending on platform. It's been suggested before that we pad more to reduce this number, but nobody's demonstrated any performance win from doing so. It's been suggested to me that we should replace our existing LWLock implementation with a CAS-based implementation that crams all the relevant details into a single 8-byte word. I'm under the impression that the main costs are associated with trading cache line ownership between CPUs, which makes me think that this'd be essentially a waste of effort, quite aside from the portability problems involved. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on powerpc
Robert Haas robertmh...@gmail.com writes: I'm unconvinced by these numbers. There is a measurable change but it is pretty small. The Itanium changes resulted in an enormous gain at higher concurrency levels. Yeah, that was my problem with it also: I couldn't measure enough gain to convince me it was a real effect, and not an artifact of the specific machine being tested. It occurred to me though that we already know that pgbench itself is a bottleneck in tests like this, and there's an easy way to take it out of the picture: move the selects into a plpgsql function that iterates multiple times per client query. The attached testing script reduces the client interaction costs by a thousandfold compared to plain pgbench -S, and also takes parse/plan time out of the loop, so that it becomes easier to see the effects of contention. With this script, I still see a loss of 1% or so from adding the unlocked test in TAS_SPIN at moderate contention levels, but there's a very clear jump when the machine is saturated. So this convinces me that Manabu-san's results are reproducible, and I've committed the TAS_SPIN addition. git head as of this morning, on 8-core IBM 8406-71Y: pgbench -c 1 -j 1 -f bench.script -T 300 bench tps = 50.142878 (including connections establishing) pgbench -c 2 -j 1 -f bench.script -T 300 bench tps = 97.179234 (including connections establishing) pgbench -c 8 -j 4 -f bench.script -T 300 bench tps = 341.731982 (including connections establishing) pgbench -c 16 -j 8 -f bench.script -T 300 bench tps = 402.114111 (including connections establishing) pgbench -c 32 -j 16 -f bench.script -T 300 benchtps = 371.338156 (including connections establishing) pgbench -c 64 -j 32 -f bench.script -T 300 benchtps = 359.785087 (including connections establishing) pgbench -c 96 -j 48 -f bench.script -T 300 benchtps = 363.879950 (including connections establishing) pgbench -c 128 -j 64 -f bench.script -T 300 bench tps = 376.794934 (including connections establishing) after re-adding TAS_SPIN macro: pgbench -c 1 -j 1 -f bench.script -T 300 bench tps = 50.182676 (including connections establishing) pgbench -c 2 -j 1 -f bench.script -T 300 bench tps = 96.751910 (including connections establishing) pgbench -c 8 -j 4 -f bench.script -T 300 bench tps = 327.108510 (including connections establishing) pgbench -c 16 -j 8 -f bench.script -T 300 bench tps = 395.425611 (including connections establishing) pgbench -c 32 -j 16 -f bench.script -T 300 benchtps = 444.291852 (including connections establishing) pgbench -c 64 -j 32 -f bench.script -T 300 benchtps = 486.151168 (including connections establishing) pgbench -c 96 -j 48 -f bench.script -T 300 benchtps = 496.379981 (including connections establishing) pgbench -c 128 -j 64 -f bench.script -T 300 bench tps = 494.058124 (including connections establishing) For Itanium, I was able to find some fairly official-looking documentation that said this is how you should do it. It would be nice to find something similar for PPC64, instead of testing every machine and reinventing the wheel ourselves. You are aware that our spinlock code is pretty much verbatim from the PPC ISA spec, no? The issue here is that the official documentation has been a moving target over the decades the ISA has been in existence. regards, tom lane #! /bin/sh psql bench EOF create or replace function bench(scale int, reps int) returns void language plpgsql as $$ declare naccounts int := 10 * scale; v_aid int; v_abalance int; begin for i in 1 .. reps loop v_aid := round(random() * naccounts + 1); SELECT abalance INTO v_abalance FROM pgbench_accounts WHERE aid = v_aid; end loop; end; $$; EOF cat bench.script EOF select bench(:scale, 1000); EOF # warm the caches a bit pgbench -c 1 -j 1 -f bench.script -T 30 bench /dev/null echo pgbench -c 1 -j 1 -f bench.script -T 300 bench pgbench -c 1 -j 1 -f bench.script -T 300 bench | grep including echo pgbench -c 2 -j 1 -f bench.script -T 300 bench pgbench -c 2 -j 1 -f bench.script -T 300 bench | grep including echo pgbench -c 8 -j 4 -f bench.script -T 300 bench pgbench -c 8 -j 4 -f bench.script -T 300 bench | grep including echo pgbench -c 16 -j 8 -f bench.script -T 300 bench pgbench -c 16 -j 8 -f bench.script -T 300 bench | grep including echo pgbench -c 32 -j 16 -f bench.script -T 300 bench pgbench -c 32 -j 16 -f bench.script -T 300 bench | grep including echo pgbench -c 64 -j 32 -f bench.script -T 300 bench pgbench -c 64 -j 32 -f bench.script -T 300 bench | grep including echo pgbench -c 96 -j 48 -f bench.script -T 300 bench pgbench -c 96 -j 48 -f bench.script -T 300 bench | grep including echo pgbench -c 128 -j 64 -f bench.script -T 300 bench pgbench -c 128 -j 64 -f bench.script -T 300 bench | grep including -- Sent via pgsql-hackers mailing list
Re: [HACKERS] spinlocks on powerpc
On Tue, Jan 3, 2012 at 4:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: For Itanium, I was able to find some fairly official-looking documentation that said this is how you should do it. It would be nice to find something similar for PPC64, instead of testing every machine and reinventing the wheel ourselves. You are aware that our spinlock code is pretty much verbatim from the PPC ISA spec, no? The issue here is that the official documentation has been a moving target over the decades the ISA has been in existence. I wasn't aware of that, but I think my basic point still stands: it's gonna be painful if we have to test large numbers of different PPC boxes to figure all this out... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on powerpc
On Mon, Jan 2, 2012 at 12:03 AM, Tom Lane t...@sss.pgh.pa.us wrote: (It's depressing that these numbers have hardly moved since August --- at least on this test, the work that Robert's done has not made any difference.) Most of the scalability work that's been committed since August has really been about ProcArrayLock, which does have an impact on read scalability, but is a much more serious problem on write workloads. On read-only workloads, you get spinlock contention, because everyone who wants a snapshot has to take the LWLock mutex to increment the shared lock count and again (just a moment later) to decrement it. But on write workloads, transactions must take need ProcArrayLock in exclusive mode to commit, so you have the additional problem of snapshot-taking forcing committers to wait and (probably to a lesser degree) visca versa. Most of the benefit we've gotten so far has come from shortening the time for which ProcArrayLock is held in shared mode while taking snapshots, which is going to primarily benefit write workloads. I'm a bit surprised that you haven't seen any benefit at all on read workloads - I would have expected a small but measurable gain - but I'm not totally shocked if there isn't one. The architecture may play into it, too. Most of the testing that I have done has been on AMD64 or Itanium, and those have significantly different performance characteristics. The Itanium machine I've used for testing is faster in absolute terms than the AMD64 box, but it seems to also suffer more severely in the presence of spinlock contention. This means that, on current sources, ProcArrayLock is a bigger problem on Itanium than it is on AMD64. I don't have a PPC64 box to play with ATM, so I can't speculate on what the situation is there. It's occurred to me to wonder whether the Itanium vs. AMD64 effects are specific to those architectures or general characteristics of strong memory ordering architectures vs. weak memory architectures, but I don't really have enough data to know. I'm concerned by this whole issue of spinlocks, since the previous round of testing on Itanium pretty much proves that getting the spinlock implementation wrong is a death sentence. If PPC64 is going to require specific tweaks for every subarchitecture, that's going to be a colossal nuisance, but probably a necessary one if we don't want to suck there. For Itanium, I was able to find some fairly official-looking documentation that said this is how you should do it. It would be nice to find something similar for PPC64, instead of testing every machine and reinventing the wheel ourselves. I wonder whether the gcc folks have done any meaningful thinking about this in their builtin atomics; if so, that might be an argument for using that as more than just a fallback. If not, it's a pretty good argument against it, at least IMHO. All that having been said... That last is clearly a winner for reasonable numbers of processes, so I committed it that way, but I'm a little worried by the fact that it looks like it might be a marginal loss when the system is overloaded. I would like to see results from your machine. I'm unconvinced by these numbers. There is a measurable change but it is pretty small. The Itanium changes resulted in an enormous gain at higher concurrency levels. I've seen several cases where improving one part of the code actually makes performance worse, because of things like: once lock A is less contented, lock B becomes more contended, and for some reason the effect on lock B is greater than the effect on lock A. It was precisely this sort of effect that lead to the sinval optimizations commited as b4fbe392f8ff6ff1a66b488eb7197eef9e1770a4; the lock manager optimizations improved things with moderate numbers of processes but were much worse at high numbers of processes precisely because the lock manager (which is partitioned) wasn't there to throttle the beating on SInvalReadLock *which isn't). I'd be inclined to say we should optimize for architectures where either of both of these techniques make the sort of big splash Manabu Ori is seeing on his machine, and assume that the much smaller changes you're seeing on your machines are as likely to be artifacts as real effects. When and if enough evidence emerges to say otherwise, we can decide whether to rethink. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on powerpc
I wrote: it might be that the only machines that actually spit up on the hint bit (rather than ignore it) were 32-bit, in which case this would be a usable heuristic. Not sure how we can research that ... do we want to just assume the kernel guys know what they're doing? I did a bit of research on this. The EH hint bit for LWARX was introduced in Power ISA 2.05 of 2007, https://www.power.org/resources/reading/PowerISA_V2.05.pdf which states: Warning: On some processors that comply with versions of the architecture that precede Version 2.00, executing a Load And Reserve instruction in which EH = 1 will cause the illegal instruction error handler to be invoked. According to wikipedia and some other sites, 2.00 corresponds to POWER4 (introduced in 2001), which was *not* the first 64-bit POWER architecture; before that there was POWER3 (used in IBM RS/6000 for a couple of years) and PPC 620 (not commercially successful, in particular never used by Apple). However I could not find anything definitive as to whether those particular chips would give SIGILL for the hint bit being set. So there may or may not be a small population of old 64-bit PPC chips on which the hint bit would cause trouble. Certainly it should be safe on the vast majority of currently operational PPC64 machines. In the other direction, it's at least possible that someone would want to build PG as 32-bit on a PPC machine that is new enough to support the hint bit. What I suggest we should do about this is provide an overridable option in pg_config_manual.h, along the lines of #if defined(__ppc64__) || defined(__powerpc64__) #define USE_PPC_LWARX_MUTEX_HINT #endif and then test that symbol in s_lock.h. This will provide an escape hatch for anyone who doesn't want the decision tied to 64-bit-ness, while still enabling the option automatically for the majority of people who could use it. BTW, while reading the ISA document I couldn't help noticing that LWARX is clearly specified to operate on 4-byte quantities (there's LDARX if you want to use 8-byte). Which seems to mean that this bit in s_lock.h just represents bogus waste of space: #if defined(__ppc64__) || defined(__powerpc64__) typedef unsigned long slock_t; #else typedef unsigned int slock_t; #endif Shouldn't we just make slock_t be int for both PPC and PPC64? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on powerpc
Tom, thank you for your advise. On 2012/01/01, at 3:39, Tom Lane wrote: What I suggest we should do about this is provide an overridable option in pg_config_manual.h, along the lines of #if defined(__ppc64__) || defined(__powerpc64__) #define USE_PPC_LWARX_MUTEX_HINT #endif and then test that symbol in s_lock.h. This will provide an escape hatch for anyone who doesn't want the decision tied to 64-bit-ness, while still enabling the option automatically for the majority of people who could use it. Fair enough. I recreated the patch as you advised. ppc-TAS_SPIN-20120101.diff Description: Binary data BTW, while reading the ISA document I couldn't help noticing that LWARX is clearly specified to operate on 4-byte quantities (there's LDARX if you want to use 8-byte). Which seems to mean that this bit in s_lock.h just represents bogus waste of space: #if defined(__ppc64__) || defined(__powerpc64__) typedef unsigned long slock_t; #else typedef unsigned int slock_t; #endif Shouldn't we just make slock_t be int for both PPC and PPC64? I'd like it to be untouched for this TAS_SPIN for powerpc discussion, since it seems it remainds like this for several years and maybe it needs some more careful consideration especially for sign extension… Regards, Manabu Ori -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on powerpc
Manabu Ori manabu@gmail.com writes: I recreated the patch as you advised. Hmm, guess I wasn't clear --- we still need a configure test, since even if we are on PPC64 there's no guarantee that the assembler will accept the hint bit. I revised the patch to include a configure test and committed it. However, I omitted the part that added an unlocked test in TAS_SPIN, because (1) that's logically a separate change, and (2) in my testing the unlocked test produces a small but undeniable performance loss (see numbers below). We need to investigate a bit more to understand why I'm getting results different from yours. If the bottom line is that the unlocked test loses for smaller numbers of processors and only helps with lots of them, I have to question whether it's a good idea to apply it. Shouldn't we just make slock_t be int for both PPC and PPC64? I'd like it to be untouched for this TAS_SPIN for powerpc discussion, since it seems it remainds like this for several years and maybe it needs some more careful consideration I ran a test and could not see any consistent performance difference between 4-byte and 8-byte slock_t, so I've committed that change too. Obviously that can be revisited if anyone comes up with evidence in the other direction. While I was looking at this, I noticed that PPC ISA 2.03 and later recommend use of lwsync rather than isync and sync in lock acquisition and release, and sure enough I can measure improvement from making that change too. So again the problem is to know whether it's safe to use that instruction. Googling shows that there's at least one current 32-bit PPC chip that gives SIGILL (Freescale's E500 ... thanks for nothing, Freescale ...); but at least some projects are using 64-bitness as a proxy test for whether it's safe to use lwsync. So for the moment I've also committed a patch that switches to using lwsync on PPC64. We can perhaps improve on that too, but it's got basically the same issues as the hint bit with respect to how to know at compile time whether the instruction is safe at run time. I would be interested to see results from your 750 Express machine as to the performance impact of each of these successive patches, and then perhaps the TAS_SPIN change on top of that. While working on this, I repeated the tests I did in http://archives.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us With current git head, I get: pgbench -c 1 -j 1 -S -T 300 bench tps = 8703.264346 (including connections establishing) pgbench -c 2 -j 1 -S -T 300 bench tps = 12207.827348 (including connections establishing) pgbench -c 8 -j 4 -S -T 300 bench tps = 48593.65 (including connections establishing) pgbench -c 16 -j 8 -S -T 300 bench tps = 91155.555180 (including connections establishing) pgbench -c 32 -j 16 -S -T 300 bench tps = 124648.093971 (including connections establishing) pgbench -c 64 -j 32 -S -T 300 bench tps = 129488.449355 (including connections establishing) pgbench -c 96 -j 48 -S -T 300 bench tps = 124958.553086 (including connections establishing) pgbench -c 128 -j 64 -S -T 300 benchtps = 134195.370726 (including connections establishing) (It's depressing that these numbers have hardly moved since August --- at least on this test, the work that Robert's done has not made any difference.) These numbers are repeatable in the first couple of digits, but there's some noise in the third digit. With your patch (hint bit and TAS_SPIN change) I get: pgbench -c 1 -j 1 -S -T 300 bench tps = 8751.930270 (including connections establishing) pgbench -c 2 -j 1 -S -T 300 bench tps = 12211.160964 (including connections establishing) pgbench -c 8 -j 4 -S -T 300 bench tps = 48608.877131 (including connections establishing) pgbench -c 16 -j 8 -S -T 300 bench tps = 90827.234014 (including connections establishing) pgbench -c 32 -j 16 -S -T 300 bench tps = 123267.062954 (including connections establishing) pgbench -c 64 -j 32 -S -T 300 bench tps = 128951.585059 (including connections establishing) pgbench -c 96 -j 48 -S -T 300 bench tps = 126551.870909 (including connections establishing) pgbench -c 128 -j 64 -S -T 300 benchtps = 133311.793735 (including connections establishing) With the TAS_SPIN change only, no hint bit: pgbench -c 1 -j 1 -S -T 300 bench tps = 8764.703599 (including connections establishing) pgbench -c 2 -j 1 -S -T 300 bench tps = 12163.321040 (including connections establishing) pgbench -c 8 -j 4 -S -T 300 bench tps = 48580.673497 (including connections establishing) pgbench -c 16 -j 8 -S -T 300 bench tps = 90672.227488 (including connections establishing) pgbench -c 32 -j 16 -S -T 300 bench tps = 121604.634146 (including connections establishing) pgbench -c 64 -j 32 -S -T 300 bench tps = 129088.387379 (including connections establishing) pgbench -c 96 -j 48 -S -T 300 bench tps = 126291.854733
Re: [HACKERS] spinlocks on powerpc
Manabu Ori manabu@gmail.com writes: 2011/12/30 Tom Lane t...@sss.pgh.pa.us The info that I've found says that the hint exists beginning in POWER6, and there were certainly 64-bit Power machines before that. However, it might be that the only machines that actually spit up on the hint bit (rather than ignore it) were 32-bit, in which case this would be a usable heuristic. Not sure how we can research that ... do we want to just assume the kernel guys know what they're doing? I'm a bit confused and might miss the point, but... If we can decide whether to use the hint operand when we build postgres, I think it's better to check if we can compile and run a sample code with lwarx hint operand than to refer to some arbitrary defines, such as FOO_PPC64 or something. Well, there are two different conditions we have to deal with: (1) does gcc+assembler understand the hint operand for lwarx? This we can reasonably check with configure, since it's a property of the build environment. (2) does the machine where the executable will run understand the hint bit, or failing that at least treat it as a no-op? We cannot determine that at configure time, unless we can fall back on some approximate proxy condition like testing 64-bit vs 32-bit. (I see that the kernel boys dodged point 1 by writing the lwarx instruction as a numeric constant, but that seems far too ugly and fragile for my taste. In any case point 2 is the big issue.) If you don't like the 64-bit hack or something much like it, I think we have got three other alternatives: * Do nothing, ie reject the patch. * Push the problem onto the user by offering a configure option. I don't care for this in the least, notably because packagers such as Linux distros couldn't safely enable the option, so in practice it would be unavailable to a large fraction of users. * Perform a runtime test. I'm not sure if there's a better way, but if nothing else we could fork a subprocess during postmaster start, have it try an lwarx with hint bit, observe whether it dumps core, and set a flag to tell future TAS calls whether to use the hint bit. Ick. In any case, adding a conditional branch to the TAS code would lose some of the performance benefit of the patch. Given that you don't get any benefit at all until you have a large number of cores, this would be a net loss for a lot of people. None of those look better than an approximate proxy condition to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on powerpc
On fre, 2011-12-30 at 14:47 +0900, Manabu Ori wrote: If we can decide whether to use the hint operand when we build postgres, I think it's better to check if we can compile and run a sample code with lwarx hint operand than to refer to some arbitrary defines, such as FOO_PPC64 or something. But you can't be sure that the host you are running this on has the same capability as the build system. Packaging systems only classify architectures on broad categories such as i386 or powerpc or maybe powerpc64. So a package built for powerpc64 has to run on all powerpc64 hosts. Imagine you are using some Pentium instruction and run the program on a 80486. It's the same architecture as far as kernel, package management, etc. are concerned, but your program will break. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on powerpc
On 12/30/2011 11:23 AM, Tom Lane wrote: Manabu Orimanabu@gmail.com writes: 2011/12/30 Tom Lanet...@sss.pgh.pa.us The info that I've found says that the hint exists beginning in POWER6, and there were certainly 64-bit Power machines before that. However, it might be that the only machines that actually spit up on the hint bit (rather than ignore it) were 32-bit, in which case this would be a usable heuristic. Not sure how we can research that ... do we want to just assume the kernel guys know what they're doing? I'm a bit confused and might miss the point, but... If we can decide whether to use the hint operand when we build postgres, I think it's better to check if we can compile and run a sample code with lwarx hint operand than to refer to some arbitrary defines, such as FOO_PPC64 or something. Well, there are two different conditions we have to deal with: (1) does gcc+assembler understand the hint operand for lwarx? This we can reasonably check with configure, since it's a property of the build environment. (2) does the machine where the executable will run understand the hint bit, or failing that at least treat it as a no-op? We cannot determine that at configure time, unless we can fall back on some approximate proxy condition like testing 64-bit vs 32-bit. (I see that the kernel boys dodged point 1 by writing the lwarx instruction as a numeric constant, but that seems far too ugly and fragile for my taste. In any case point 2 is the big issue.) If you don't like the 64-bit hack or something much like it, I think we have got three other alternatives: * Do nothing, ie reject the patch. * Push the problem onto the user by offering a configure option. I don't care for this in the least, notably because packagers such as Linux distros couldn't safely enable the option, so in practice it would be unavailable to a large fraction of users. * Perform a runtime test. I'm not sure if there's a better way, but if nothing else we could fork a subprocess during postmaster start, have it try an lwarx with hint bit, observe whether it dumps core, and set a flag to tell future TAS calls whether to use the hint bit. Ick. In any case, adding a conditional branch to the TAS code would lose some of the performance benefit of the patch. Given that you don't get any benefit at all until you have a large number of cores, this would be a net loss for a lot of people. None of those look better than an approximate proxy condition to me. #3 in particular is unspeakably ugly. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] spinlocks on powerpc
2011/12/30 Tom Lane t...@sss.pgh.pa.us Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: The Linux kernel does this (arch/powerpc/include/asm/ppc-opcode.h): Yeah, I was looking at that too. We can't copy-paste code from Linux directly, and I'm not sure I like that particular phrasing of the macro, but perhaps we should steal the idea and only use the hint on 64-bit PowerPC processors? The info that I've found says that the hint exists beginning in POWER6, and there were certainly 64-bit Power machines before that. However, it might be that the only machines that actually spit up on the hint bit (rather than ignore it) were 32-bit, in which case this would be a usable heuristic. Not sure how we can research that ... do we want to just assume the kernel guys know what they're doing? I'm a bit confused and might miss the point, but... If we can decide whether to use the hint operand when we build postgres, I think it's better to check if we can compile and run a sample code with lwarx hint operand than to refer to some arbitrary defines, such as FOO_PPC64 or something. I still wonder when to judge the hint availability, compile time or runtime. I don't have any idea how to decide that on runtime, though. P.S. I changed the subject since it's no longer related to HPUX. Regards, Manabu Ori