On Sun, Nov 02, 2025 at 08:05:24PM +0000, Philipp Stanner wrote:
> On Sun, 2025-11-02 at 10:37 -0800, Paul E. McKenney wrote:
> > On Sun, Nov 02, 2025 at 04:18:48PM +0000, Philipp Stanner wrote:
> > > 
> 
> […]
> 
> > > 
> > > "However, the CPU need not actually invalidate the cache line before
> > > sending the acknowledgement." [1]
> > > 
> > > Well yes, I think it absolutely needs to. The previous examples relied
> > > precisely on this. What a CPU sending an Invalidate Message actually is
> > > saying is: "I will modify this cache line that you currently have read-
> > > only in your local cache. Once you sent me the Invalidate-ACK I know
> > > that you have invalidated it and I can safely modify it."
> > > 
> > > A CPU sending an Invalidate-ACK without actually having invalidated its
> > > cache line is, bluntly, lying and endangering the entire cache
> > > coherence.
> > > 
> > > Now don't get me wrong, I accept that this is obviously what is really
> > > happening. But the chapter got me to the point of interpreting a
> > > truthfull Invalidate-ACK as an essential part of cache coherence.
> > 
> > The following sentence was intended to help: "It could instead queue
> > the invalidate message with the understanding that the message will
> > be processed before the CPU sends any further messages regarding that
> > cache line."
> 
> So it can refuse to invalidate the cache line as long as it only keeps
> it in its current state, the "worst damage" being that the CPU reads
> potentially outdated data?
> 
> > 
> 
> […]
> 
> > Fourth, you are right that strict unoptimized MESI
> > would absolutely require that the cache line be invalidated prior to
> > acknowledging the invalidation.
> 
> My understanding of that chapter is that classic MESI needs little
> memory barriers, but store buffers and invalidate queues inform the
> necessity for wmb() and rmb(), respectively.

You still need full barriers for things like the LB pattern, but
yes, classic MESI has less room for mischief than more elaborate
cache-coherence protocols.

> > > The previous section detailing the store-buffer, on the contrary, makes
> > > more sense: "Altough not owning this cache line yet, I can store my new
> > > value in the store buffer already because whatever the current value
> > > is, I will overwrite it anyways." whereas with the invalidate queue the
> > > reader just ignores that the variable might have changed.
> > 
> > Well, if the cacheline is in Modified or Exclusive state, then the
> > CPU must transition it to at least Shared (with extra state saying
> > "doomed" or some such).  Or not, given yet more protocol complexity.
> > If the CPU receiving the invalidation request knows that the CPU sending
> > that request doesn't care what the current value of the cacheline is,
> > then the receiving CPU can pretend that any stores happened before it
> > received the invalidation request.  Again, assuming that there are no
> > ordering instructions that prohibit this.
> 
> I'm losing track why MESI even exists in the first place, to be honest.
> I had thought it's about guaranteeing that a given cache line can only
> have one value at a given time; but it seems protocols like that are
> more about getting a cache line at all, sooner or later, and every
> ordering must be ensured by the instructions.
> 
> This might seem strange or trivial to you.. I guess the crucial point
> is my (false?) understanding of the Invalidate message serving to
> guarantee that the receiver will see the sender's update of the cache
> line. And it will see that update, just too late…
> 
> So the invalidate ack message is interpreted as a hard, reliable
> synchronization point.

If you were to go back to 1983 when MESI was first described, then
microprocessor CPUs simply did not have enough electronics to get into
too much trouble, so at that time, you would be right that you didn't
need much more on top of MESI.  Here in 2025, CPUs are very strange and
complex beasts.  But to be fair, they don't cause anywhere near as much
confusion as optimizing compilers.

> > > I guess this is legal because the only real guarantee of CPUs is that
> > > one particular CPU sees all its accesses in order? But even then, as
> > > above, for store buffers it makes sense, because the storing CPU
> > > doesn't care about other values. The *reading* CPU sending the fake
> > > Invalidate-ACK, on the contrary, should very well care about reading
> > > the truthfull value from the cache line.
> > 
> > Also, different types of CPUs have different underlying ordering
> > guarantees.  And speculative execution can often ignore those guarantees
> > as long as it can avoid the user-visible state seeing any violations.  And
> > given multiple CPUs reading and modifying a given variable concurrently,
> > what exactly is the truthful value at any given point in time?
> > (Referring to Figure 15.10 ("A Variable With More Simultaneous Values").
> 
> Argh.

Quite!

> There is this famous quote from a famous book, where the roman governor
> asks:
> 
> "What is truth?"

It also figured in a Johnny Cash (not cache!) song from 1970.  ;-)

> > > And if it all works like that, then what even is the point of
> > > Invalidate messages at all, if you can not rely on them being followed
> > > before you yourself start modifying the cache line?
> > 
> > Because they are needed for things like memory-ordering instructions
> > to work correctly.  But on a weakly ordered system, if there are no
> > memory-ordering instructions in the code, then there are precious few
> > memory-ordering guarantees anyway.  ;-)
> > 
> > > Or is the point that a CPU temporarily ignoring an Invalidate message
> > > can still validly (without memory barriers) use data in that cache line
> > > which does *not* get modified by the other CPU? So memory barriers in
> > > this scenario would allow for more efficiency by "segmenting" cache
> > > lines?
> > 
> > The point is mostly that on weakly ordered systems in the absence of
> > memory-ordering instructions, there are very few guarantees.  See again
> > Figure 15.10.
> 
> Is the store buffer the exact equivalent of the invalidate queue, or is
> the latter "more evil" as to my explanations above?
> 
> I'd say:
>  * The store buffer allows the CPU to ignore that it should (in an
>    ideal, performant world) store this  _right_now_, only caring about
>    _itself_ seeing the stores in order, ignoring that the other CPU
>    should see it _now_ (or before that other, subsequent store in the
>    example).
>  * The invalidate queue allows a CPU to keep its read-only cacheline,
>    ignoring that it might get updated _right now_, ignoring the other
>    CPU's store, only caring about its own previous stores / reads (or,
>    more simple: relying on the fact that the cache line had been r-o so
>    far)
> 
> Still, the latter seems less wise to me.

As to which is "more evil", I suspect that this depends on the fertile
imaginations of hardware architects and designers.  Much also depends
on one's expectations of orderliness.  And in CPUs, as in real life,
additional orderliness comes at a price.  At any given point, is the
price worth it?  Choose wisely!  ;-)

> > > Quite confusing. Parallel programming is hard and discussing it is one
> > > thing we can do about it :]
> > 
> > Agreed!!!
> > 
> > Do the explanations above help?  If so, I will rework that paragraph
> > with attribution.
> 
> Well, yes, the reminder of there being no "true value" anyways helps
> and reminds about why barriers exist.
> 
> But in fact, the question about the Invalidate message just being
> temporarily *ignored and falsly answered* seemed so obvious to me that
> I was searching for the Quick Quiz that would answer it, but there was
> none :(
> 
> Something like that would sound very paul-ish to me:
> 
> " Quick Quiz:
> But wait! Wasn't the entire point of the invalidate-ack message to
> assure the receiver that it's now safe to modify the cache line without
> there being readers of the old data left? How can the CPU possibly get
> away with falsly claiming it has invalidated the cache line?
> "
> 
> I think that would help. Or rather: the answer would help.

Thank you very much!  I was going to just expand that paragraph, but
you are right, a Quick Quiz would work much better.

                                                        Thanx, Paul

> Regards
> P.
> 
> 
> > 
> >  Thanx, Paul
> > 
> > > Thanks,
> > > Philipp
> > > 
> > > 
> > > [1] 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git/tree/appendix/whymb/whymemorybarriers.tex#n1127
> > > 
> 

Reply via email to