Re: [HACKERS] [PATCH] Improve spinlock inline assembly for x86.
On 2016-01-18 16:56:22 -0600, Kevin Grittner wrote: > On Mon, Jan 18, 2016 at 4:50 PM, Andres Freund wrote: > > > Now I'm equally unconvinced that it's worthwhile to do anything > > here. I just don't think benchmarking plays a role either way. > > Well, that would be the crucial point on which we differ -- the > rest is all agreement. I don't think we should accept the patch > *in the absence* of benchmarking to show a result that is neutral > or better. Spinlocks are just too performance-critical and too > fussy to accept a change on the basis that "the source code looks > fine". IMO, anyway. By that justification we need to start benchmarking adding new variables on the stack, that'll most of the time have a bigger performance impact than this. Benchmarking minor source code cleanups is just not worth the time. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Improve spinlock inline assembly for x86.
On Mon, Jan 18, 2016 at 4:50 PM, Andres Freund wrote: > Now I'm equally unconvinced that it's worthwhile to do anything > here. I just don't think benchmarking plays a role either way. Well, that would be the crucial point on which we differ -- the rest is all agreement. I don't think we should accept the patch *in the absence* of benchmarking to show a result that is neutral or better. Spinlocks are just too performance-critical and too fussy to accept a change on the basis that "the source code looks fine". IMO, anyway. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Improve spinlock inline assembly for x86.
On Mon, Jan 18, 2016 at 2:39 PM, Kevin Grittner wrote: >> but I don't think that Andreas' patch is necessarily a >> performance patch. There can be value in removing superfluous >> code; doing so sometimes clarifies intent and understanding. > > Well, that's why I said I would be satisfied with a neutral > benchmark result -- when there is a tie, the shorter, simpler code > should generally win. I'm not really sure what there was in what I > said to argue about; since that I've just been trying figure that > out. If we all agree, let's let it drop. If we don't want to apply this, then I think that a sensible compromise would be to add a code comment that says that we don't believe the LOCK prefix matters. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Improve spinlock inline assembly for x86.
On 2016-01-18 16:14:05 -0600, Kevin Grittner wrote: > Unconvinced that we should do performance testing on a proposed > performance patch before accepting it I'm unconvinced that it makes sense to view this as a performance patch. And unconvinced that you can sanely measure it. The lock prefix is a one byte instruction prefix, and lock xchg, and xchg are exactly the same, leaving the instruction width aside. It's just a littlebit less work for the instruction decoder. The point about alignment and such is, that changing some code somewhere is likely to have a bigger performance impact than the actual effect of the removal of those few bytes. So when you benchmark, you'd just benchmark a slightly changed code layout. objdump -d build/postgres/dev-assert/vpath/src/backend/postgres |grep 'lock xchg'|head -n1 4b732f:f0 86 01 lock xchg %al,(%rcx) the f0 is the lock prefix. In total there's 22 of them in the postgres codebase, when compiled with my flags/compiler. I think it's unrealistic to benchmark slight codemovements on a regular basis, particularly using a large machine. There's just not enough time and hardware around for that. Now I'm equally unconvinced that it's worthwhile to do anything here. I just don't think benchmarking plays a role either way. >, that the changes in NUMA > scheduling in the Linux 3.8 kernel have a major effect on how well > our code performs at high concurrency on NUMA machines with a lot > of memory nodes That I believe immediately. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Improve spinlock inline assembly for x86.
On Mon, Jan 18, 2016 at 4:24 PM, Peter Geoghegan wrote: > On Mon, Jan 18, 2016 at 2:14 PM, Kevin Grittner wrote: >> It's hard to understand quite what you're saying there. If you're >> saying that code changes that should be performance neutral can >> sometimes affect performance because of alignment of code with >> cache line boundaries -- I absolutely agree; is that an argument >> against performance testing performance patches? > > No, it isn't an argument against performance testing patches like > this, but I don't think anyone suggested otherwise. I'm still not sure what argument he *was* making, but I'm glad to hear it wasn't that. > Of course every performance related patch should be tested to > make sure it meets its goals and at acceptable cost, I argued that it should be tested to ensure that it caused no regression in a case which has been a problem for some of our customers (spinlock contention at high concurrency on a machine with 4 or more NUMA nodes). We have been able to solve those problems, but it has been a fussy business -- sometimes we have tweaked spinlock code and sometimes that has not worked but an OS upgrade has. I am quite unconvinced about whether the change will help, hurt, or have no impact on these problems; I'm only arguing for testing to find out. > but I don't think that Andreas' patch is necessarily a > performance patch. There can be value in removing superfluous > code; doing so sometimes clarifies intent and understanding. Well, that's why I said I would be satisfied with a neutral benchmark result -- when there is a tie, the shorter, simpler code should generally win. I'm not really sure what there was in what I said to argue about; since that I've just been trying figure that out. If we all agree, let's let it drop. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Improve spinlock inline assembly for x86.
On Mon, Jan 18, 2016 at 2:14 PM, Kevin Grittner wrote: >> You get just as much churn by changing code elsewhere, which >> often causes code movement and alignment changes. > > It's hard to understand quite what you're saying there. If you're > saying that code changes that should be performance neutral can > sometimes affect performance because of alignment of code with > cache line boundaries -- I absolutely agree; is that an argument > against performance testing performance patches? No, it isn't an argument against performance testing patches like this, but I don't think anyone suggested otherwise. Of course every performance related patch should be tested to make sure it meets its goals and at acceptable cost, but I don't think that Andreas' patch is necessarily a performance patch. There can be value in removing superfluous code; doing so sometimes clarifies intent and understanding. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Improve spinlock inline assembly for x86.
On Mon, Jan 18, 2016 at 3:47 PM, Andres Freund wrote: > On January 18, 2016 10:42:42 PM GMT+01:00, Kevin Grittner > wrote: >> I took a look at this and agree that the shorter, simpler code >> proposed in this patch should make no *logical* difference, and >> looks like it *should* have a neutral or beneficial affect; but >> performance tuning in general, and spinlock performance in >> particular, is full of surprises. We have seen customers suffer >> poor scaling on their brand new monster machines because of the >> interaction between NUMA scheduling and our spinlock >> implementation, and seen those problems go away with an upgrade >> from pre-3.8 to post-3.8 kernels. I would be hesitant to accept >> this change without seeing a benchmark on a large NUMA machine with >> 4 or more memory nodes, on Linux kernels both before and after 3.8, >> to make sure that the effects are at least neutral. > > Unconvinced. Unconvinced that we should do performance testing on a proposed performance patch before accepting it, that the changes in NUMA scheduling in the Linux 3.8 kernel have a major effect on how well our code performs at high concurrency on NUMA machines with a lot of memory nodes, that patches to improve performance sometimes cause surprising regressions, that the results will come out any particular way, or that the difference will be out of the noise? Personally I'm only convinced on the first three of those. > You get just as much churn by changing code elsewhere, which > often causes code movement and alignment changes. It's hard to understand quite what you're saying there. If you're saying that code changes that should be performance neutral can sometimes affect performance because of alignment of code with cache line boundaries -- I absolutely agree; is that an argument against performance testing performance patches? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Improve spinlock inline assembly for x86.
On January 18, 2016 10:42:42 PM GMT+01:00, Kevin Grittner wrote: >On Mon, Jan 18, 2016 at 3:04 PM, Andres Freund >wrote: >> On January 18, 2016 7:27:59 PM GMT+01:00, Robert Haas > wrote: >>> On Sun, Jan 17, 2016 at 6:38 AM, Andreas Seltenreich > wrote: > While discussing issues with its developers, it was pointed out to >me that our spinlock inline assembly is less than optimal. Attached >is a patch that addresses this. > >>> I did a Google search and everybody seems to agree that the LOCK >>> prefix is redundant. I found a source agreeing with the idea that >it >>> doesn't clobber registers > >>> So I guess it would be safe to change this. Scares me slightly, >>> though. >> >> Clobbers IIRC are implicit on x86 anyway. Rather doubt that the >> space for the prefix makes any sorry of practical difference, but >> there indeed seems no reason to have it. > >I took a look at this and agree that the shorter, simpler code >proposed in this patch should make no *logical* difference, and >looks like it *should* have a neutral or beneficial affect; but >performance tuning in general, and spinlock performance in >particular, is full of surprises. We have seen customers suffer >poor scaling on their brand new monster machines because of the >interaction between NUMA scheduling and our spinlock >implementation, and seen those problems go away with an upgrade >from pre-3.8 to post-3.8 kernels. I would be hesitant to accept >this change without seeing a benchmark on a large NUMA machine with >4 or more memory nodes, on Linux kernels both before and after 3.8, >to make sure that the effects are at least neutral. Unconvinced. You get just as much churn by changing code elsewhere, which often causes code movement and alignment changes. --- Please excuse brevity and formatting - I am writing this on my mobile phone. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Improve spinlock inline assembly for x86.
On Mon, Jan 18, 2016 at 3:04 PM, Andres Freund wrote: > On January 18, 2016 7:27:59 PM GMT+01:00, Robert Haas > wrote: >> On Sun, Jan 17, 2016 at 6:38 AM, Andreas Seltenreich >> wrote: >>> While discussing issues with its developers, it was pointed out to me >>> that our spinlock inline assembly is less than optimal. Attached is >>> a patch that addresses this. >> I did a Google search and everybody seems to agree that the LOCK >> prefix is redundant. I found a source agreeing with the idea that it >> doesn't clobber registers >> So I guess it would be safe to change this. Scares me slightly, >> though. > > Clobbers IIRC are implicit on x86 anyway. Rather doubt that the > space for the prefix makes any sorry of practical difference, but > there indeed seems no reason to have it. I took a look at this and agree that the shorter, simpler code proposed in this patch should make no *logical* difference, and looks like it *should* have a neutral or beneficial affect; but performance tuning in general, and spinlock performance in particular, is full of surprises. We have seen customers suffer poor scaling on their brand new monster machines because of the interaction between NUMA scheduling and our spinlock implementation, and seen those problems go away with an upgrade from pre-3.8 to post-3.8 kernels. I would be hesitant to accept this change without seeing a benchmark on a large NUMA machine with 4 or more memory nodes, on Linux kernels both before and after 3.8, to make sure that the effects are at least neutral. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Improve spinlock inline assembly for x86.
On January 18, 2016 7:27:59 PM GMT+01:00, Robert Haas wrote: >On Sun, Jan 17, 2016 at 6:38 AM, Andreas Seltenreich > wrote: >> I'm currently experimenting with just-in-time compilation using >libfirm. >> While discussing issues with its developers, it was pointed out to me >> that our spinlock inline assembly is less than optimal. Attached is >a >> patch that addresses this. >> >> , >> | Remove the LOCK prefix from the XCHG instruction. Locking is >implicit >> | with XCHG and the prefix wastes a byte. Also remove the "cc" >register >> | from the clobber list as the XCHG instruction does not modify any >flags. >> | >> | Reported by Christoph Mallon. >> ` > >I did a Google search and everybody seems to agree that the LOCK >prefix is redundant. I found a source agreeing with the idea that it >doesn't clobber registers, too: > >http://www.oopweb.com/Assembly/Documents/ArtOfAssembly/Volume/Chapter_6/CH06-1.html#HEADING1-85 > >So I guess it would be safe to change this. Scares me slightly, >though. Clobbers IIRC are implicit on x86 anyway. Rather doubt that the space for the prefix makes any sorry of practical difference, but there indeed seems no reason to have it. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Improve spinlock inline assembly for x86.
On Sun, Jan 17, 2016 at 6:38 AM, Andreas Seltenreich wrote: > I'm currently experimenting with just-in-time compilation using libfirm. > While discussing issues with its developers, it was pointed out to me > that our spinlock inline assembly is less than optimal. Attached is a > patch that addresses this. > > , > | Remove the LOCK prefix from the XCHG instruction. Locking is implicit > | with XCHG and the prefix wastes a byte. Also remove the "cc" register > | from the clobber list as the XCHG instruction does not modify any flags. > | > | Reported by Christoph Mallon. > ` I did a Google search and everybody seems to agree that the LOCK prefix is redundant. I found a source agreeing with the idea that it doesn't clobber registers, too: http://www.oopweb.com/Assembly/Documents/ArtOfAssembly/Volume/Chapter_6/CH06-1.html#HEADING1-85 So I guess it would be safe to change this. Scares me slightly, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers