>> - membar_enter and membar_exit speak of loads "reach[ing] global >> visibility", which makes no sense to me. What does it mean for a >> load to reach global visibility? > I actually have no idea what these do and when one would use them.
Based on what I think is intended, I think accurate descriptions would be something like membar_enter All stores preceding membar_enter() will reach global visibility before any store after it does, and before any load after it starts. membar_exit All loads and stores preceding membar_exit() will complete and reach global visibility (respectively) before any store that follows it reaches global visibility. I don't totally understand their utility, despite the examples; I need to think about them more. >> - membar_consumer is described with "All loads preceding the memory >> barrier will complete before any loads after the memory barrier >> complete". That last "complete" needs to be "start" for this to >> be a useful guarantee. > I don't know about this, Well, consider: /* must get datum_1 value before reading datum_2 */ i = datum_1; membar_consumer(); j = datum_2; With the wording as it stands, this could turn into On CPU 1: ask memory subsystem to load datum_1 into i ask memory subsystem to load datum_2 into j memory subsystem reads datum_2 On CPU 2: write to datum_1 and datum_2 On CPU 1: memory subsystem reads datum_1 barrier causes: wait for memory subsystem to write i wait for memory subsystem to write j Now, effectively, the j=datum_2 load has occurred before the i=datum_1 load. That is, the memory barrier has not done its job; it is not useful. (Why would this happen? Perhaps the read of datum_1 requires evicting a dirty line from cache, but datum_2 doesn't. Perhaps datum_2 is in the same cache line as something else which was requested even earlier. Perhaps something else....) > it seems like you could replace both "complete"s with "start"s, or > either one, and still come out about right. No; if you replace the both "complete"s with "start"s, then you get "All loads preceding the memory barrier will start before any loads after the memory barrier start". This too is not useful; consider On CPU 1: ask memory subsystem to load datum_1 into i barrier causes: do the previous line before the next line ask memory subsystem to load datum_2 into j memory subsystem reads datum_2 On CPU 2: write to datum_1 and datum_2 On CPU 1: memory subsystem reads datum_1 wait for memory subsystem to write j Again, the effective order has come out wrong. > I think the guarantee is that if one guy does > i = j = 0; > ... > i = 1; > membar_producer(); > j = 1; > and the other guy does > my_j = j; > membar_consumer(); > my_i = i; > then the latter guy will never see > (my_i == 0 && my_j == 1) Yes. That's what I believe it's intended to do, and that's what I think the current wordings don't promise. > It doesn't really matter when the loads "start" and "complete" as > long as the ordering of the writer is observed that way by the > reader. Right. But the current wordings don't promise that. >> - membar_sync has each of the above issues (mutatis mutandis). > I think this one guarantees that [...] Yes, I think that's what they're intended to promise. I don't think it's what they actually do promise. > How would this be better described? Apply all applicable fixes from the foregoing: membar_sync All loads and stores preceding the memory barrier will complete and reach global visibility, respectively, before any loads and stores after the memory barrier start and reach global visibility, respectively. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B