On 09/04/2011 11:16 AM, Michael S. Tsirkin wrote:
> I mean argue for a richer set of barriers, with per-arch minimal
> implementations instead of the large but portable hammer of
> sync_synchronize, if you will.
That's what I'm saying really. On x86 the richer set of barriers
need not insert code at all for both wmb and rmb macros. All we
might need is an 'optimization barrier'- e.g. linux does
__asm__ __volatile__("": : :"memory")
ppc needs something like sync_synchronize there.
No, rmb and wmb need to generate code. You are right that in some
places there will be some extra barriers.
If you want a richer set of barriers, that must be something like
{rr,rw,wr,ww}_mb{_acq,_rel,} (again not counting the Alpha). On x86,
then, all the rr/rw/ww barriers will be compiler barriers because the
hardware already enforces ordering. The other three map to
lfence/sfence/mfence:
barrier assembly why?
---------------------------------------------------------------------
wr_mb_acq lfence prevents the read from moving up -> acquire
wr_mb_rel sfence prevents the write from moving down -> release
wr_mb mfence (full barrier)
But if you stick to rmb/wmb/mb, then the correct definition of rmb is
"the least strict barrier that provides all three of rr_mb(),
rw_mb_rel() and wr_mb_acq()". This is, as expected, an lfence.
Similarly, wmb must provide all three of ww_mb(), wr_mb_rel() and
rw_mb_acq(), and this is an sfence.
So the right place to put an #ifdef is not "wmb()", but the _uses_ of
wmb() where you know you need a barrier that is less strict. That's why
I say David patch is correct; on top of that you may change the
particular uses of wmb() in virtio.c to compiler barriers, for example
when you only care about ordering writes after writes.
Likewise, there may even be places in which you could #ifdef out a full
memory barrier. For example, if you only care about ordering writes
with respect to reads, x86 hardware is already providing that and you
could omit the mb().
I think in general it is premature optimization, though.
Regarding specific examples in virtio where lfence and sfence could be
used, there may be one when using event signaling. In the backend you
write first the index of your response, then you check whether to
generate an event. (I think) the following requirements hold:
* if you read the event-index too early, you might skip an event and
deadlock. So you need at least a read barrier.
* you can write the response-index after reading the event-index, as
long as you write it before waking up the guest.
So, in that case an x86 lfence should be enough, though again without
more consideration I would use a full barrier just to be sure.
Paolo