Re: Cmm Memory Model (Understanding #15449)

2018-12-07 Thread Simon Marlow
FWIW, my main reference at the time when this stuff was implemented was
this page by Doug Lea: http://gee.cs.oswego.edu/dl/jmm/cookbook.html

As Ben says, things have evolved a lot since then. I'm not an expert at
all, but I know from experience that getting this stuff right is really
hard. Even on x86 we had a tough time figuring out where to put this
barrier:
https://phabricator.haskell.org/diffusion/GHC/browse/master/rts%2FWSDeque.c$135-137

My understanding of the current memory model is this:
- we give no guarantees about ordering between non-atomic IORef operations,
except that: doing things in parallel shouldn't segfault". So if a
processor can see a pointer in an IORef, it can safely follow the pointer
and find the memory it points to correctly initialized. This may require
barriers on some architectures, but not x86(_64) as I understand it.
- MVar operations and atomicModifyIORef are full barriers. Or something.

Cheers
Simon


On Thu, 29 Nov 2018 at 04:44, Travis Whitaker 
wrote:

> Hello GHC Devs,
>
> I'm trying to get my head around ticket #15449 (
> https://ghc.haskell.org/trac/ghc/ticket/15449). This gist of things is
> that GHC generates incorrect aarch64 code that causes memory corruption in
> multithreaded programs run on out-of-order machines. User trommler
> discovered that similar issues are present on PowerPC, and indeed ARMv7 and
> PowerPC support the same types of load/store reorderings. The LLVM code
> emitted by GHC may be incorrect with respect to LLVM's memory model, but
> this isn't a problem on architectures with minimal reordering like x86.
>
> I had initially thought that GHC simply wasn't emitting the appropriate
> LLVM fences; there's an elephant-gun-approach here (
> https://github.com/TravisWhitaker/ghc/commits/ghc843-wip/T15449) that
> guards each atomic operation with a full barrier. I still believe that GHC
> is omitting necessary LLVM fences, but this change is insufficient to fix
> the behavior of the test case (which is simply GHC itself compiling a test
> package with '-jN', N > 1).
>
> It seems there's a long and foggy history of the Cmm memory model. Edward
> Yang discusses this a bit in his post here (
> http://blog.ezyang.com/2014/01/so-you-want-to-add-a-new-concurrency-primitive-to-ghc/)
> and issues similar to #15449 have plagued GHC in the past, like #12469 (
> https://ghc.haskell.org/trac/ghc/ticket/12469). Worryingly, GHC only has
> MO_WriteBarrier, whereas PowerPC and ARMv7 really need read, write, and
> full memory barriers. On ARM an instruction memory barrier might be
> required as well, but I don't know enough about STG/Cmm to say for sure,
> and it'd likely be LLVM's responsibility to emit that anyway.
>
> I'm hoping that someone with more tribal knowledge than I might be able to
> give me some pointers with regards to the following areas:
>
>
>- Does STG itself have anything like a memory model? My intuition says
>'definitely not', but given that STG expressions may contain Cmm operations
>(via StgCmmPrim), there may be STG-to-STG transformations that need to care
>about the target machine's memory model.
>- With respect to Cmm, what reorderings does GHC perform? What are the
>relevant parts of the compiler to begin studying?
>- Are the LLVM atomics that GHC emits correct with respect to the LLVM
>memory model? As it stands now LLVM fences are only emitted for
>MO_WriteBarrier. Without fences accompanying the atomics, it seems the LLVM
>compiler could float dependent loads/stores past atomic operations.
>- Why is MO_WriteBarrier the only provided memory barrier? My hunch is
>that it's because this is the only sort of barrier required on x86, which
>only allows loads to be reordered with older stores, but perhaps I'm
>missing something? Is it plausible that Cmm simply needs additional barrier
>primitives to target these weaker memory models? Conversely, is there some
>property of Cmm that let's us get away without read barriers at all?
>
>
> Naturally, if I've got any of this wrong or are otherwise barking up the
> wrong tree, please let me know.
>
> Thanks for all your efforts!
>
> Travis Whitaker
> ___
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Cmm Memory Model (Understanding #15449)

2018-12-06 Thread Carter Schonwald
Awesome summary by Ben!

theres also some crazy hijinks needed currently to do compare and swap
style stuff

see
https://hackage.haskell.org/package/atomic-primops-0.8.2/docs/Data-Atomics.html
for some discussion. (i'm not sure what the current state of the world is
on CAS tech)

On Thu, Nov 29, 2018 at 9:45 AM Ben Gamari  wrote:

> Travis Whitaker  writes:
>
> > Hello GHC Devs,
> >
> > I'm trying to get my head around ticket #15449 (
> > https://ghc.haskell.org/trac/ghc/ticket/15449). This gist of things is
> that
> > GHC generates incorrect aarch64 code that causes memory corruption in
> > multithreaded programs run on out-of-order machines. User trommler
> > discovered that similar issues are present on PowerPC, and indeed ARMv7
> and
> > PowerPC support the same types of load/store reorderings. The LLVM code
> > emitted by GHC may be incorrect with respect to LLVM's memory model, but
> > this isn't a problem on architectures with minimal reordering like x86.
> >
> Thank you for picking this up!
>
> > I had initially thought that GHC simply wasn't emitting the appropriate
> > LLVM fences; there's an elephant-gun-approach here (
> > https://github.com/TravisWhitaker/ghc/commits/ghc843-wip/T15449) that
> > guards each atomic operation with a full barrier. I still believe that
> GHC
> > is omitting necessary LLVM fences, but this change is insufficient to fix
> > the behavior of the test case (which is simply GHC itself compiling a
> test
> > package with '-jN', N > 1).
> >
> > It seems there's a long and foggy history of the Cmm memory model. Edward
> > Yang discusses this a bit in his post here (
> >
> http://blog.ezyang.com/2014/01/so-you-want-to-add-a-new-concurrency-primitive-to-ghc/
> )
> > and issues similar to #15449 have plagued GHC in the past, like #12469 (
> > https://ghc.haskell.org/trac/ghc/ticket/12469). Worryingly, GHC only has
> > MO_WriteBarrier, whereas PowerPC and ARMv7 really need read, write, and
> > full memory barriers. On ARM an instruction memory barrier might be
> > required as well, but I don't know enough about STG/Cmm to say for sure,
> > and it'd likely be LLVM's responsibility to emit that anyway.
> >
> In my opinion GHC's current memory model story is quite unsustainable. As
> you point out, we currently have a limited selection of plain barriers
> and a few atomic operations, none of which are adequately documented
> given the subtlety they involve.
>
> I think we would at this point be foolish not to take advantage of the
> research that has been done in this area by moving to C11-style
> acquire/release semantics wherever possible. While following through on
> this is not a small task, these semantics are both well defined and
> easier to reason about intuitively. We might also be able to benefit
> from the wealth of model checking tools that are now available for this
> formalism.
>
> Ulan Degenbaev (CC'd) and I discussed him possibly picking up the task
> of moving to C11 atomics a few weeks ago.
>
> > I'm hoping that someone with more tribal knowledge than I might be able
> to
> > give me some pointers with regards to the following areas:
> >
> >
> >- Does STG itself have anything like a memory model? My intuition says
> >'definitely not', but given that STG expressions may contain Cmm
> operations
> >(via StgCmmPrim), there may be STG-to-STG transformations that need
> to care
> >about the target machine's memory model.
>
> As far as I know, it has nothing formally, or even informally, defined.
> That being said, relatively few of our primops make any guarantees about
> their operation in a concurrent setting. The cases that I can think of
> include `casArray#`, `atomicModifyIORef#`, `atomic{Read,Write}*Array#`
> and the STM operations.
>
> >- With respect to Cmm, what reorderings does GHC perform? What are the
> >relevant parts of the compiler to begin studying?
>
> As far as I know, very few. We only perform a handful of optimizations
> on C--. These include,
>
>  * Sinking of assignments (see compiler/cmm/CmmSink.hs); see
>CmmSink.conflicts for what commutation we allow. We notably don't
>allow sinking of any memory assignments past calls (including
>primops)
>
>  * Simple constant folding (see CmmOpt.cmmMachOpFoldM)
>
>  * Common block elimination (see CmmCommonBlockElim)
>
>  * Some simple control-flow optimisation (see CmmContFlowOpt)
>
> >- Are the LLVM atomics that GHC emits correct with respect to the LLVM
> >memory model? As it stands now LLVM fences are only emitted for
> >MO_WriteBarrier. Without fences accompanying the atomics, it seems
> the LLVM
> >compiler could float dependent loads/stores past atomic operations.
>
> Frankly, I would be surprised if they are correct. Few people have
> really looked at GHC's memory ordering properties and even fewer have
> looked at those of the LLVM backend.
>
> >- Why is MO_WriteBarrier the only provided memory barrier? My hunch is
> >that it's because 

Re: Cmm Memory Model (Understanding #15449)

2018-11-29 Thread Ben Gamari
Travis Whitaker  writes:

> Hello GHC Devs,
>
> I'm trying to get my head around ticket #15449 (
> https://ghc.haskell.org/trac/ghc/ticket/15449). This gist of things is that
> GHC generates incorrect aarch64 code that causes memory corruption in
> multithreaded programs run on out-of-order machines. User trommler
> discovered that similar issues are present on PowerPC, and indeed ARMv7 and
> PowerPC support the same types of load/store reorderings. The LLVM code
> emitted by GHC may be incorrect with respect to LLVM's memory model, but
> this isn't a problem on architectures with minimal reordering like x86.
>
Thank you for picking this up!

> I had initially thought that GHC simply wasn't emitting the appropriate
> LLVM fences; there's an elephant-gun-approach here (
> https://github.com/TravisWhitaker/ghc/commits/ghc843-wip/T15449) that
> guards each atomic operation with a full barrier. I still believe that GHC
> is omitting necessary LLVM fences, but this change is insufficient to fix
> the behavior of the test case (which is simply GHC itself compiling a test
> package with '-jN', N > 1).
>
> It seems there's a long and foggy history of the Cmm memory model. Edward
> Yang discusses this a bit in his post here (
> http://blog.ezyang.com/2014/01/so-you-want-to-add-a-new-concurrency-primitive-to-ghc/)
> and issues similar to #15449 have plagued GHC in the past, like #12469 (
> https://ghc.haskell.org/trac/ghc/ticket/12469). Worryingly, GHC only has
> MO_WriteBarrier, whereas PowerPC and ARMv7 really need read, write, and
> full memory barriers. On ARM an instruction memory barrier might be
> required as well, but I don't know enough about STG/Cmm to say for sure,
> and it'd likely be LLVM's responsibility to emit that anyway.
>
In my opinion GHC's current memory model story is quite unsustainable. As
you point out, we currently have a limited selection of plain barriers
and a few atomic operations, none of which are adequately documented
given the subtlety they involve.

I think we would at this point be foolish not to take advantage of the
research that has been done in this area by moving to C11-style
acquire/release semantics wherever possible. While following through on
this is not a small task, these semantics are both well defined and
easier to reason about intuitively. We might also be able to benefit
from the wealth of model checking tools that are now available for this
formalism.

Ulan Degenbaev (CC'd) and I discussed him possibly picking up the task
of moving to C11 atomics a few weeks ago.

> I'm hoping that someone with more tribal knowledge than I might be able to
> give me some pointers with regards to the following areas:
>
>
>- Does STG itself have anything like a memory model? My intuition says
>'definitely not', but given that STG expressions may contain Cmm operations
>(via StgCmmPrim), there may be STG-to-STG transformations that need to care
>about the target machine's memory model.

As far as I know, it has nothing formally, or even informally, defined.
That being said, relatively few of our primops make any guarantees about
their operation in a concurrent setting. The cases that I can think of
include `casArray#`, `atomicModifyIORef#`, `atomic{Read,Write}*Array#`
and the STM operations.

>- With respect to Cmm, what reorderings does GHC perform? What are the
>relevant parts of the compiler to begin studying?

As far as I know, very few. We only perform a handful of optimizations
on C--. These include,

 * Sinking of assignments (see compiler/cmm/CmmSink.hs); see
   CmmSink.conflicts for what commutation we allow. We notably don't
   allow sinking of any memory assignments past calls (including
   primops)

 * Simple constant folding (see CmmOpt.cmmMachOpFoldM)

 * Common block elimination (see CmmCommonBlockElim)

 * Some simple control-flow optimisation (see CmmContFlowOpt)

>- Are the LLVM atomics that GHC emits correct with respect to the LLVM
>memory model? As it stands now LLVM fences are only emitted for
>MO_WriteBarrier. Without fences accompanying the atomics, it seems the LLVM
>compiler could float dependent loads/stores past atomic operations.

Frankly, I would be surprised if they are correct. Few people have
really looked at GHC's memory ordering properties and even fewer have
looked at those of the LLVM backend.

>- Why is MO_WriteBarrier the only provided memory barrier? My hunch is
>that it's because this is the only sort of barrier required on x86, which
>only allows loads to be reordered with older stores, but perhaps I'm
>missing something? Is it plausible that Cmm simply needs additional barrier
>primitives to target these weaker memory models? Conversely, is there some
>property of Cmm that let's us get away without read barriers at all?
>
Note that there are a few others defined in SMP.h (namely
store_load_barrier and load_load_barrier). However, see the discussion
above. My understanding is 

Cmm Memory Model (Understanding #15449)

2018-11-28 Thread Travis Whitaker
Hello GHC Devs,

I'm trying to get my head around ticket #15449 (
https://ghc.haskell.org/trac/ghc/ticket/15449). This gist of things is that
GHC generates incorrect aarch64 code that causes memory corruption in
multithreaded programs run on out-of-order machines. User trommler
discovered that similar issues are present on PowerPC, and indeed ARMv7 and
PowerPC support the same types of load/store reorderings. The LLVM code
emitted by GHC may be incorrect with respect to LLVM's memory model, but
this isn't a problem on architectures with minimal reordering like x86.

I had initially thought that GHC simply wasn't emitting the appropriate
LLVM fences; there's an elephant-gun-approach here (
https://github.com/TravisWhitaker/ghc/commits/ghc843-wip/T15449) that
guards each atomic operation with a full barrier. I still believe that GHC
is omitting necessary LLVM fences, but this change is insufficient to fix
the behavior of the test case (which is simply GHC itself compiling a test
package with '-jN', N > 1).

It seems there's a long and foggy history of the Cmm memory model. Edward
Yang discusses this a bit in his post here (
http://blog.ezyang.com/2014/01/so-you-want-to-add-a-new-concurrency-primitive-to-ghc/)
and issues similar to #15449 have plagued GHC in the past, like #12469 (
https://ghc.haskell.org/trac/ghc/ticket/12469). Worryingly, GHC only has
MO_WriteBarrier, whereas PowerPC and ARMv7 really need read, write, and
full memory barriers. On ARM an instruction memory barrier might be
required as well, but I don't know enough about STG/Cmm to say for sure,
and it'd likely be LLVM's responsibility to emit that anyway.

I'm hoping that someone with more tribal knowledge than I might be able to
give me some pointers with regards to the following areas:


   - Does STG itself have anything like a memory model? My intuition says
   'definitely not', but given that STG expressions may contain Cmm operations
   (via StgCmmPrim), there may be STG-to-STG transformations that need to care
   about the target machine's memory model.
   - With respect to Cmm, what reorderings does GHC perform? What are the
   relevant parts of the compiler to begin studying?
   - Are the LLVM atomics that GHC emits correct with respect to the LLVM
   memory model? As it stands now LLVM fences are only emitted for
   MO_WriteBarrier. Without fences accompanying the atomics, it seems the LLVM
   compiler could float dependent loads/stores past atomic operations.
   - Why is MO_WriteBarrier the only provided memory barrier? My hunch is
   that it's because this is the only sort of barrier required on x86, which
   only allows loads to be reordered with older stores, but perhaps I'm
   missing something? Is it plausible that Cmm simply needs additional barrier
   primitives to target these weaker memory models? Conversely, is there some
   property of Cmm that let's us get away without read barriers at all?


Naturally, if I've got any of this wrong or are otherwise barking up the
wrong tree, please let me know.

Thanks for all your efforts!

Travis Whitaker
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs