Em Quarta-feira 07 Outubro 2009, às 11:51:37, você escreveu: [snip] > So, this produce "normal" mov instructions, that fetches data from the > cache if available. But on a modern computer, with several cores (or > several processors), you have a cache per processor or at least per > group of cores (on a core2quad, you have 2 caches, one per pair of > core). > > On the contrary, fetch-and-set and friends issue a LOCK# signal that > ensure cache consistency between your processors (see Intel > Architectures Software Developer’s Manual, Volume 3A, Section 8.1.4 > for details). So even if get/set are atomic operation, they do not > provide the same thread-safety as provided by lock+xadd, lock+cmpxchg > or xchg since they don't ensure we're not fetching the data from an > outdated cache line.
I don't think the LOCK# is necessary. Or that it's even possible.
The LOCK# signal is necessary to guarantee exclusive access to the memory
shared between processors. But we don't need exclusive access to the memory if
we're reading or writing to it.
According to the manual Volume 2A, the LOCK instruction says:
> The LOCK prefix can be prepended only to the following instructions and
> only to those forms of the instructions where the destination operand is a
> memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC,
> NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. If the LOCK prefix is used
> with one of these instructions and the source operand is a memory operand,
> an undefined opcode exception (#UD) may be generated. An undefined opcode
> exception will also be generated if the LOCK prefix is used with any
> instruction not in the above list.
Note that MOV isn't on that list. So not only would it have no effect, it would
also generate #UD (SIGILL).
Anyway, what you're asking for isn't necessary. If we forget for a moment all
the other variables in memory and concentrate only on the atomic variable
itself, what you're asking for is that:
volatile int i;
thread1:
i = 1;
thread2:
if (i == 1)
do something
else
do something else
Note that there's no actual synchronisation between the two threads. You could
say you want to know if thread1 has finished doing its work (i.e., imagine that
it's actually "volatile bool finished") but it's not really a synchronisation.
It's not a synchronisation because thread2 could miss the store to i by one
instruction and that would be enough. Any code working like this needs to go
back and test again. It's the principle of the spinlock.
This is the Read-After-Write case. The same also applies to the WAW case:
thread1:
i = 1;
thread2:
i = 2;
There's no synchronisation. Either result is possible.
Therefore, we don't need anything special to load or store to memory, not on
x86 or x86-64 at least.
It only gets interesting when we deal with more than one variable, like the
case in the blog:
volatile int x, y, z;
thread1:
x = 1;
y = 2;
z = 3;
thread2:
if (z == 3)
function(x, y)
else
function(-1, -1)
The x86 and x86-64 processors don't have explicit memory ordering. All
instructions have full memory barrier semantics, meaning that all memory
operations that have externally-visible effects must be executed in order. The
compiler must generate the necessary Store instructions in the right order and
the processor must either execute them in the right order, or make it so that
the externally-visible effects happen in the right order (this is the Release
memory barrier).
In the other thread, the reverse also applies: the compiler generates the
Loads in the proper order and the processor must guarantee that loads happen
in the right order, or that they behave as if they did. That is, since z is
loaded first, the processor must ensure that any writes that happened to x and
y before z was written are also observed when x and y are read (that's the
Acquire memory barrier).
The example I gave in the blog was if y were in a separate cacheline that had
recently been touched. So if y is cached, but the read to z causes a cache
miss and read from memory, then y needs to be flushed too.
How? I don't care. The processor has to ensure that because x86 has full
memory barrier for all instructions.
On the IA-64 it gets interesting because the instructions don't have memory
barrier semantics unless you tell them to do it. So, if you translated that to
IA-64 assembly (very dumb, no optimisation or reordering, which a compiler
would definitely do):
common:
mov loc0 = x
mov loc1 = y
mov loc2 = z
;;
thread1:
mov r8 = 1
mov r9 = 2
mov r10 = 3
;;
st4 [loc0] = r8
st4 [loc1] = r9
st4 [loc2] = r10
thread2:
ld4 r10 = [loc2]
;;
cmp.eq p6, p7 = r10, 3
;;
(p6) ld4 out0 = [loc0]
(p6) ld4 out1 = [loc1]
(p7) mov out0 = -1
(p7) mov out1 = -1
;;
br.call rp = function
If every instruction had full memory barrier, then the only possible values
for out0 and out1 are 1 and 2, respectively.
But ld4 has no memory barrier, like I said in the blog, the possible values
are (assuming they were initialised to 0 before):
x y
0 0
1 0
0 2
1 2
To fix this, we need to do proper memory barriers. We do that by making the
final store a store-release and the initial load a load-acquire:
thread1:
mov r8 = 1
mov r9 = 2
mov r10 = 3
;;
st4 [loc0] = r8
st4 [loc1] = r9
st4.rel [loc2] = r10
thread2:
ld4.acq r10 = [loc2]
;;
cmp.eq p6, p7 = r10, 3
;;
(p6) ld4 out0 = [loc0]
(p6) ld4 out1 = [loc1]
(p7) mov out0 = -1
(p7) mov out1 = -1
;;
br.call rp = function
In reality, since we declared x and y to be volatile too, the compiler would
generate st4.rel and ld4.acq for all of them, though only the last/first one is
strictly necessary. (The compiler doesn't know)
If we don't declare them volatile, the compiler could reorder the loads or
reuse existing values -- i.e., it could move them across the memory barriers,
thus defeating their use.
The inline assembly functions have a special marker to the compiler telling it
not to do any reordering across the function. There's no such marker in the
volatiles, so this is the only point where I'm not sure if we're doing the
right thing...
--
Thiago Macieira - thiago.macieira (AT) nokia.com
Senior Product Manager - Nokia, Qt Development Frameworks
Sandakerveien 116, NO-0402 Oslo, Norway
Qt Developer Days 2009 | Registration Now Open!
Munich, Germany: Oct 12 - 14 San Francisco, California: Nov 2 - 4
http://qt.nokia.com/qtdevdays2009
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Qt4-preview-feedback mailing list [email protected] http://lists.trolltech.com/mailman/listinfo/qt4-preview-feedback
