Manfred Spraul <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Don't you have to put it in a specific place in the loop to make that >> work? If not, why not? >> > Rep;nop is just a short delay - that's all.
That view seems to me to be directly contradicted by this statement: > The PAUSE instruction provides a hint to the processor that > the code sequence is a spin-wait loop. The processor uses this hint to > avoid the memory order violation in most situations, which greatly > improves processor performance. It's not apparent to me how a short delay translates into avoiding a memory order violation (possibly some docs on what that means exactly might help...). I suspect strongly that there needs to be some near proximity between the PAUSE instruction and the lock-test instruction for this to work as advertised. It would help if Intel were less coy about what the instruction really does. > This instruction does not change the architectural state of the > processor (that is, it performs essentially a delaying noop > operation). This can be rephrased as "we're not telling you what this instruction really does, because its interesting effects are below the level of the instruction set architecture". Great. How are we supposed to know how to use it? > I think a separate function is better than adding it into TAS: if it's > part of tas, then it would automatically be included by every > SpinLockAcquire call - unnecessary .text bloat. Why do you think it's unnecessary? One thing that I find particularly vague in the quoted documentation is the statement that the PAUSE instruction is needed to avoid a delay when *exiting* the spin-wait loop. Doesn't this mean that a PAUSE is needed in the success path when the first TAS succeeds (i.e, the normal no-contention path)? If not, why not? If so, does it go before or after the lock instruction? Also, if the principal effect is a "short delay", do we really need it at all considering that our inner loop in s_lock is rather more than an "xchgb" followed by a conditional branch? There will be time for the write queue to drain while we're incrementing and testing our spin counter (which I trust is in a register...). The reason I'm so full of questions is that I spent some time several days ago looking at exactly this issue, and came away with only the conclusion that I had to find some more-detailed documentation before I could figure out what we should do about the spinlocks for Xeons. You have not convinced me that you know more about the issue than I do. A "10% speedup" is nice, but how do we know that that's what we should expect to get? Maybe there's a lot more to be won by doing it correctly (for some value of "correctly"). regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly