Re: Memory corruption due to word sharing

2012-02-10 Thread Richard Henderson
On 02/03/2012 12:00 PM, Linus Torvalds wrote: >do { > load-link %r,%m > if (r == value) > return 0; > add >} while (store-conditional %r,%m) >return 1; > > and it is used to implement two *very* common (and critical) > reference-counting use cases: > > - decre

Re: Memory corruption due to word sharing

2012-02-06 Thread Torvald Riegel
On Fri, 2012-02-03 at 12:00 -0800, Linus Torvalds wrote: > Of course, it you expose some intrinsic for the whole "ll/sc" model > (and you then turn it into cmpxchg on demand), we could literally > open-code it. > > That is likely the most *flexible* approach for a compiler. I think > pretty much e

Re: Memory corruption due to word sharing

2012-02-03 Thread Paul E. McKenney
On Fri, Feb 03, 2012 at 12:00:03PM -0800, Linus Torvalds wrote: > On Fri, Feb 3, 2012 at 11:16 AM, Andrew MacLeod wrote: [ . . . ] > Having access to __ATOMIC_ACQUIRE would actually be an improvement - > it's just that the architectures that really care about things like > that simply don't matt

Re: Memory corruption due to word sharing

2012-02-03 Thread Linus Torvalds
On Fri, Feb 3, 2012 at 11:16 AM, Andrew MacLeod wrote: >>    The special cases are because older x86 cannot do the generic >> "add_return" efficiently - it needs xadd - but can do atomic versions >> that test the end result and give zero or sign information. > > Since these are older x86 only, cou

Re: Memory corruption due to word sharing

2012-02-03 Thread Andrew MacLeod
On 02/03/2012 12:16 PM, Linus Torvalds wrote: So we have several atomics we use in the kernel, with the more common being - add (and subtract) and cmpchg of both 'int' and 'long' This would be __atomic_fetch_add, __atomic_fetch_sub, and __atomic_compare_exchange. For 4.8 __atomic_compare_e

Re: Memory corruption due to word sharing

2012-02-03 Thread Linus Torvalds
On Fri, Feb 3, 2012 at 8:38 AM, Andrew MacLeod wrote: > > The atomic intrinsics were created for c++11  memory model compliance, but I > am certainly open to enhancements that would make them more useful.   I am > planning some enhancements for 4.8 now, and it sounds like you may have some > sugge

Re: Memory corruption due to word sharing

2012-02-03 Thread Andrew MacLeod
And I assume that since the compiler does them, that would now make it impossible for us to gather a list of all the 'lock' prefixes so that we can undo them if it turns out that we are running on a UP machine. When we do SMP operations, we don't just add a "lock" prefix to it. We do this:

Re: Memory corruption due to word sharing

2012-02-03 Thread Matthew Gretton-Dann
On Fri, Feb 03, 2012 at 09:37:22AM +, Richard Guenther wrote: > On Fri, 3 Feb 2012, DJ Delorie wrote: > > > > > Jan Kara writes: > > > we've spotted the following mismatch between what kernel folks expect > > > from a compiler and what GCC really does, resulting in memory corruption > > >

Re: Memory corruption due to word sharing

2012-02-03 Thread Richard Guenther
On Fri, 3 Feb 2012, DJ Delorie wrote: > > Jan Kara writes: > > we've spotted the following mismatch between what kernel folks expect > > from a compiler and what GCC really does, resulting in memory corruption on > > some architectures. Consider the following structure: > > struct x { > >

Re: Memory corruption due to word sharing

2012-02-02 Thread DJ Delorie
Jan Kara writes: > we've spotted the following mismatch between what kernel folks expect > from a compiler and what GCC really does, resulting in memory corruption on > some architectures. Consider the following structure: > struct x { > long a; > unsigned int b1; > unsigned int b2:

Re: Memory corruption due to word sharing

2012-02-02 Thread Paul E. McKenney
On Thu, Feb 02, 2012 at 11:08:25AM -0800, Linus Torvalds wrote: > On Thu, Feb 2, 2012 at 10:42 AM, Paul E. McKenney > wrote: > >> > >> SMP-atomic or percpu atomic? Or both? > > > > Only SMP-atomic. > > And I assume that since the compiler does them, that would now make it > impossible for us to g

Re: Memory corruption due to word sharing

2012-02-02 Thread Linus Torvalds
On Thu, Feb 2, 2012 at 10:42 AM, Paul E. McKenney wrote: >> >> SMP-atomic or percpu atomic? Or both? > > Only SMP-atomic. And I assume that since the compiler does them, that would now make it impossible for us to gather a list of all the 'lock' prefixes so that we can undo them if it turns out t

Re: Memory corruption due to word sharing

2012-02-02 Thread Paul E. McKenney
On Wed, Feb 01, 2012 at 03:11:00PM -0800, Linus Torvalds wrote: > On Wed, Feb 1, 2012 at 2:45 PM, Paul E. McKenney > wrote: > > > > My (perhaps forlorn and naive) hope is that C++11 memory_order_relaxed > > will eventually allow ACCESS_ONCE() to be upgraded so that (for example) > > access-once in

Re: Memory corruption due to word sharing

2012-02-02 Thread Linus Torvalds
On Thu, Feb 2, 2012 at 8:28 AM, Michael Matz wrote: > > Sure.  Simplest example: struct s {int i:24;} __attribute__((packed)). > > You must access only three bytes, no matter what.  The basetype (int) is > four bytes. Ok, so here's a really *stupid* (but also really really simple) patch attached.

Re: Memory corruption due to word sharing

2012-02-02 Thread Michael Matz
Hi, On Thu, 2 Feb 2012, Aldy Hernandez wrote: > > Seriously - is there any real argument *against* just using the base > > type as a hint for access size? > > If I'm on the hook for attempting to fix this again, I'd also like to > know if there are any arguments against using the base type. S

Re: Memory corruption due to word sharing

2012-02-02 Thread Aldy Hernandez
Linus Torvalds writes: > Seriously - is there any real argument *against* just using the base > type as a hint for access size? If I'm on the hook for attempting to fix this again, I'd also like to know if there are any arguments against using the base type.

Re: Memory corruption due to word sharing

2012-02-02 Thread Michael Matz
Hi, On Wed, 1 Feb 2012, Linus Torvalds wrote: > But I also think that gcc is simply *buggy*, and has made them much > nastier than they should be. What gcc *should* have done is to turn > bitfield accesses into shift-and-masking of the underlying field as > early as possible, and then do all o

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther
On Thu, 2 Feb 2012, James Courtier-Dutton wrote: > On 1 February 2012 15:19, Jan Kara wrote: > >  Hello, > > > >  we've spotted the following mismatch between what kernel folks expect > > from a compiler and what GCC really does, resulting in memory corruption on > > some architectures. Consider

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther
On Thu, 2 Feb 2012, David Sterba wrote: > On Wed, Feb 01, 2012 at 04:19:18PM +0100, Jan Kara wrote: > > We actually spotted this race in practice in btrfs on structure > > fs/btrfs/ctree.h:struct btrfs_block_rsv where spinlock content got > > corrupted due to update of following bitfield and there

Re: Memory corruption due to word sharing

2012-02-02 Thread Ingo Molnar
* Linus Torvalds wrote: > [...] > > And I realize that compiler people tend to think that loop > hoisting etc is absolutely critical for performance, and some > big hammer like "barrier()" makes a compiler person wince. You > think it results in horrible code generation problems. > > It rea

Re: Memory corruption due to word sharing

2012-02-02 Thread David Sterba
On Wed, Feb 01, 2012 at 04:19:18PM +0100, Jan Kara wrote: > We actually spotted this race in practice in btrfs on structure > fs/btrfs/ctree.h:struct btrfs_block_rsv where spinlock content got > corrupted due to update of following bitfield and there seem to be other > places in kernel where this c

Re: Memory corruption due to word sharing

2012-02-02 Thread James Courtier-Dutton
On 1 February 2012 15:19, Jan Kara wrote: >  Hello, > >  we've spotted the following mismatch between what kernel folks expect > from a compiler and what GCC really does, resulting in memory corruption on > some architectures. Consider the following structure: > struct x { >    long a; >    unsign

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther
On Wed, 1 Feb 2012, Linus Torvalds wrote: > IT IS A GCC BUG. I think you are wrong when you assume that we think it is not a bug. Repeating it repeatedly in all-caps doesn't make the fix appear faster though. Richard.

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther
On Wed, 1 Feb 2012, Linus Torvalds wrote: > Just out of morbid curiosity, what happens if you have totally > *separate* variables that just happen to link together? IOW, something > like > >static struct { unsigned bit:1; } onebit; >static volatile int var; > > and they just *happen* to

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther
On Wed, 1 Feb 2012, Linus Torvalds wrote: > On Wed, Feb 1, 2012 at 9:41 AM, Michael Matz wrote: > > > > One problem is that it's not a new problem, GCC emitted similar code since > > about forever, and still they turned up only now (well, probably because > > ia64 is dead, but sparc64 should have

RE: Memory corruption due to word sharing

2012-02-02 Thread Bernd Petrovitsch
On Mit, 2012-02-01 at 21:04 +, Boehm, Hans wrote: [...] > The C11 memory model potentially adds overhead in only two cases: > > 1. When current code involves touching a field that wouldn't otherwise > be touched. There are odd cases in which this measurably slows down > code, but I think all

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 2:45 PM, Paul E. McKenney wrote: > > My (perhaps forlorn and naive) hope is that C++11 memory_order_relaxed > will eventually allow ACCESS_ONCE() to be upgraded so that (for example) > access-once increments can generate a single increment-memory instruction > on x86. I don

Re: Memory corruption due to word sharing

2012-02-01 Thread Paul E. McKenney
On Wed, Feb 01, 2012 at 12:59:24PM -0800, Linus Torvalds wrote: > On Wed, Feb 1, 2012 at 12:41 PM, Torvald Riegel wrote: > > > > You do rely on the compiler to do common transformations I suppose: > > hoist loads out of loops, CSE, etc.  How do you expect the compiler to > > know whether they are

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 1:25 PM, Boehm, Hans wrote: > > Here are some more interesting ones that illustrate the issues (all > declarations are non-local, unless stated otherwise): > > struct { char a; int b:9; int c:7; char d} x; > > Is x.b = 1 allowed to overwrite x.a?  C11 says no, essentially r

RE: Memory corruption due to word sharing

2012-02-01 Thread Boehm, Hans
> From: Torvald Riegel > > Oh, one of my favorite (NOT!) pieces of code in the kernel is the > > implementation of the > > > >smp_read_barrier_depends() > > > > macro, which on every single architecture except for one (alpha) is a > no-op. > > > > We have basically 30 or so empty definitions fo

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 1:24 PM, Torvald Riegel wrote: >> It's not the only thing we do. We have cases where it's not that you >> can't hoist things outside of loops, it's that you have to read things >> exactly *once*, and then use that particular value (ie the compiler >> can't be allowed to relo

Re: Memory corruption due to word sharing

2012-02-01 Thread Torvald Riegel
On Wed, 2012-02-01 at 13:20 -0800, Linus Torvalds wrote: > On Wed, Feb 1, 2012 at 12:53 PM, Torvald Riegel wrote: > > > > For volatile, I agree. > > > > However, the original btrfs example was *without* a volatile, and that's > > why I raised the memory model point. This triggered an error in a >

RE: Memory corruption due to word sharing

2012-02-01 Thread Boehm, Hans
> From: Linus Torvalds > Don't try to make it anything more complicated. This has *nothing* to > do with threads or functions or anything else. > > If you do massive inlining, and you don't see any barriers or > conditionals or other reasons not to write to it, just write to it. > > Don't try to

Re: Memory corruption due to word sharing

2012-02-01 Thread Torvald Riegel
On Wed, 2012-02-01 at 12:59 -0800, Linus Torvalds wrote: > On Wed, Feb 1, 2012 at 12:41 PM, Torvald Riegel wrote: > > > > You do rely on the compiler to do common transformations I suppose: > > hoist loads out of loops, CSE, etc. How do you expect the compiler to > > know whether they are allowed

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 12:53 PM, Torvald Riegel wrote: > > For volatile, I agree. > > However, the original btrfs example was *without* a volatile, and that's > why I raised the memory model point.  This triggered an error in a > concurrent execution, so that's memory model land, at least in C > l

RE: Memory corruption due to word sharing

2012-02-01 Thread Boehm, Hans
> From: Linus Torvalds > > > > We need a proper memory model. > > Not really. > > The fact is, the kernel will happily take the memory model of the > underlying hardware. Trying to impose some compiler description of the > memory model is actually horribly bad, because it automatically also > inv

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 12:41 PM, Torvald Riegel wrote: > > You do rely on the compiler to do common transformations I suppose: > hoist loads out of loops, CSE, etc.  How do you expect the compiler to > know whether they are allowed for a particular piece of code or not? We have barriers. Compile

Re: Memory corruption due to word sharing

2012-02-01 Thread Torvald Riegel
On Wed, 2012-02-01 at 09:29 -0800, Linus Torvalds wrote: > On Wed, Feb 1, 2012 at 9:08 AM, Torvald Riegel wrote: > > > > What do the kernel folks think about the C11 memory model? If you can > > spot any issues in there, the GCC community would certainly like to > > know. > > I don't think this

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 12:16 PM, Jakub Jelinek wrote: >> >> So the kernel really doesn't care what you do to things *within* the >> bitfield. > > But what is *within* the bitfield?  Do you consider s4 or t2 fields > (non-bitfield fields that just the ABI wants to pack together with > the bitfield

Re: Memory corruption due to word sharing

2012-02-01 Thread Torvald Riegel
On Wed, 2012-02-01 at 11:47 -0800, Linus Torvalds wrote: > On Wed, Feb 1, 2012 at 9:42 AM, Torvald Riegel wrote: > > > > We need a proper memory model. > > Not really. > > The fact is, the kernel will happily take the memory model of the > underlying hardware. You do rely on the compiler to do

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 12:01 PM, Linus Torvalds wrote: > >  - However, while using the *smallest* possible access may generate > correct code, it often generates really *crappy* code. Which is > exactly the bug that I reported in > >   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696 Btw, as I a

Re: Memory corruption due to word sharing

2012-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2012 at 12:01:46PM -0800, Linus Torvalds wrote: > On Wed, Feb 1, 2012 at 11:40 AM, Jakub Jelinek wrote: > > struct S { long s1; unsigned int s2 : 5; unsigned int s3 : 19; unsigned > > char s4; unsigned int s5; }; > > struct T { long t1 : 16; unsigned int t2; }; > > > > on e.g. x86

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 11:40 AM, Jakub Jelinek wrote: > > Well, the C++11/C11 model doesn't allow to use the underlying type > for accesses, consider e.g. > > struct S { long s1; unsigned int s2 : 5; unsigned int s3 : 19; unsigned char > s4; unsigned int s5; }; > struct T { long s1 : 16; unsigned

Re: Memory corruption due to word sharing

2012-02-01 Thread Alan Cox
> So here's basically what the kernel needs: > > - if we don't touch a field, the compiler doesn't touch it. > >This is the rule that gcc now violates with bitfields. > >This is a gcc bug. End of story. The "volatile" example proves it - > anybody who argues otherwise is simply wrong, a

Re: Memory corruption due to word sharing

2012-02-01 Thread Jeff Law
On 02/01/2012 12:44 PM, Boehm, Hans wrote: C11 is a published standard. Last I checked, gcc did not follow many of the above rules. It looks like major changes were recently merged into the gcc trunk, and I haven't had a chance to test those, so it may well be fixed. But more importantly,

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 9:42 AM, Torvald Riegel wrote: > > We need a proper memory model. Not really. The fact is, the kernel will happily take the memory model of the underlying hardware. Trying to impose some compiler description of the memory model is actually horribly bad, because it automati

RE: Memory corruption due to word sharing

2012-02-01 Thread Boehm, Hans
use.cz; rguent...@suse.de; gcc@gcc.gnu.org > Subject: Re: Memory corruption due to word sharing > > On Wed, 2012-02-01 at 08:41 -0800, Linus Torvalds wrote: > > If the gcc people aren't willing to agree that this is actually a > flaw > > in the standard (one that is being a

Re: Memory corruption due to word sharing

2012-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2012 at 06:42:54PM +0100, Torvald Riegel wrote: > We need a proper memory model. No vague assumptions with lots of > hand-waving. If you think that this is simple stuff and can > sufficiently described by "don't do anything stupid", then please have a > look at the issues that the

Re: Memory corruption due to word sharing

2012-02-01 Thread Torvald Riegel
On Wed, 2012-02-01 at 08:41 -0800, Linus Torvalds wrote: > If the gcc people aren't willing to agree that this is actually a flaw > in the standard (one that is being addressed, no less) It has been addressed in the standards. > and try to fix > it, Again, this is being worked on, see http://gcc

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 10:45 AM, Jeff Law wrote: > > Torvald Riegel & I were told that was kernel policy when we brought up the > upcoming bitfield semantic changes with some of the linux kernel folks last > year. Btw, one reason this is true is that the bitfield ordering/packing is so unspecifie

Re: Memory corruption due to word sharing

2012-02-01 Thread Peter Bergner
On Wed, 2012-02-01 at 13:09 -0500, David Miller wrote: > From: Michael Matz > Date: Wed, 1 Feb 2012 18:41:05 +0100 (CET) > > > One problem is that it's not a new problem, GCC emitted similar code since > > about forever, and still they turned up only now (well, probably because > > ia64 is dead

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 10:09 AM, David Miller wrote: > > Personally I've avoided C bitfields like the plague in any code I've > written. I do agree with that. The kernel largely tries to avoid bitfields, usually because we have some really strict rules about different bitfields, but also because

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 9:41 AM, Michael Matz wrote: > > One problem is that it's not a new problem, GCC emitted similar code since > about forever, and still they turned up only now (well, probably because > ia64 is dead, but sparc64 should have similar problems).  The bitfield > handling code is

Re: Memory corruption due to word sharing

2012-02-01 Thread Jeff Law
On 02/01/2012 11:09 AM, David Miller wrote: From: Michael Matz Date: Wed, 1 Feb 2012 18:41:05 +0100 (CET) One problem is that it's not a new problem, GCC emitted similar code since about forever, and still they turned up only now (well, probably because ia64 is dead, but sparc64 should have sim

Re: Memory corruption due to word sharing

2012-02-01 Thread David Miller
From: Michael Matz Date: Wed, 1 Feb 2012 18:41:05 +0100 (CET) > One problem is that it's not a new problem, GCC emitted similar code since > about forever, and still they turned up only now (well, probably because > ia64 is dead, but sparc64 should have similar problems). Indeed, on sparc64 i

Re: Memory corruption due to word sharing

2012-02-01 Thread Dennis Clarke
> I have actually tried exactly this earlier today (because while looking at > this, I had an idea that putting volatile in place could be a workaround, > causing gcc to generate a saner code), but it doesn't work either: > > # cat x.c > struct x { > long a; > volatile unsigned int lock; >

Re: Memory corruption due to word sharing

2012-02-01 Thread Michael Matz
Hi, On Wed, 1 Feb 2012, Jiri Kosina wrote: > # cat x.c > struct x { > long a; > volatile unsigned int lock; > unsigned int full:1; > }; > > void > wrong(struct x *ptr) > { > ptr->full = 1; > } > > In my opinion, this is a clear bug Even that depends (sadly) on who you ask.

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 9:11 AM, Jiri Kosina wrote: > On Wed, 1 Feb 2012, Linus Torvalds wrote: >> >> And I suspect it really is a generic bug that can be shown even with >> the above trivial example. > > I have actually tried exactly this earlier today (because while looking at > this, I had an id

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 9:08 AM, Torvald Riegel wrote: > > What do the kernel folks think about the C11 memory model?  If you can > spot any issues in there, the GCC community would certainly like to > know. I don't think this is about memory models except very tangentially. Gcc currently accesse

Re: Memory corruption due to word sharing

2012-02-01 Thread Jiri Kosina
On Wed, 1 Feb 2012, Linus Torvalds wrote: > But the compiler turns the access to the bitfield (in a 32-bit aligned > word) into a 64-bit access that accesses the word *next* to it. > > That word next to it might *be* the lock, for example. > > So we could literally have this kind of situation: >

Re: Memory corruption due to word sharing

2012-02-01 Thread Torvald Riegel
On Wed, 2012-02-01 at 16:19 +0100, Jan Kara wrote: > I've raised the issue with our GCC guys and they said to me that: "C does > not provide such guarantee, nor can you reliably lock different > structure fields with different locks if they share naturally aligned > word-size memory regions. The C

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 8:37 AM, Colin Walters wrote: > > 1) Use the same lock for a given bitfield That's not the problem. All the *bitfield* fields are all accessed under the same word already. > 2) Split up the bitfield into different words Again, it's not the bitfield that is the problem. T

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 7:19 AM, Jan Kara wrote: > >  we've spotted the following mismatch between what kernel folks expect > from a compiler and what GCC really does, resulting in memory corruption on > some architectures. This is sad. We've had something like this before due to architectural re

Re: Memory corruption due to word sharing

2012-02-01 Thread Colin Walters
On Wed, 2012-02-01 at 16:19 +0100, Jan Kara wrote: > I've raised the issue with our GCC guys and they said to me that: "C does > not provide such guarantee, nor can you reliably lock different > structure fields with different locks if they share naturally aligned > word-size memory regions. The

Re: Memory corruption due to word sharing

2012-02-01 Thread Markus Trippelsdorf
On 2012.02.01 at 16:19 +0100, Jan Kara wrote: > we've spotted the following mismatch between what kernel folks expect > from a compiler and what GCC really does, resulting in memory corruption on > some architectures. Consider the following structure: > struct x { > long a; > unsigned int

Memory corruption due to word sharing

2012-02-01 Thread Jan Kara
Hello, we've spotted the following mismatch between what kernel folks expect from a compiler and what GCC really does, resulting in memory corruption on some architectures. Consider the following structure: struct x { long a; unsigned int b1; unsigned int b2:1; }; We have two proc