Manfred Spraul <[EMAIL PROTECTED]> writes:Rep;nop is just a short delay - that's all. It means that the cpu pipelines have a chance to drain, and that the other thread gets enough cpu resources. Below is the full instruction documentation, from the latest ia32 doc set from Intel:
Intel recommends to add a special pause instruction into spinlock busy loops. It's necessary for hyperthreading - without it, the cpu can't figure out that a logical thread does no useful work and incorrectly awards lots of execution resources to that thread. Additionally, it's supposed to reduce the time the cpu needs to recover from the (mispredicted) branch after the spinlock was obtained.
Don't you have to put it in a specific place in the loop to make that
work? If not, why not? I doubt that rep;nop is magic enough to
recognize the loop that will be generated from s_lock()'s code.
Improves the performance of spin-wait loops. When executing a spin-wait loop, a Pentium 4 or Intel Xeon processor suffers a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops. An additional function of the PAUSE instruction is to reduce the power consumed by a Pentium 4 processor while executing a spin loop. The Pentium 4 processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor s power consumption. This instruction was introduced in the Pentium 4 processors, but is backward compatible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as a pre-defined delay. The delay is finite and can be zero for some processors. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying noop operation).
I think a separate function is better than adding it into TAS: if it's part of tas, then it would automatically be included by every SpinLockAcquire call - unnecessary .text bloat. Additionally, there might be other busy loops, in addition to TAS, that could use a delay function.
I'll post a new patch that doesn't rely on __inline__ in the i386 independant part.
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster