Re: [HACKERS] spinlocks: generalizing non-locking test

2004-10-19 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 Granted, but I think you've mostly conceded my point: every _subsequent_
 time TAS() is invoked, the non-locking test is a clear win (with the
 possible exception of PPC).

I'm not real sure.  One point here is that the standard advice about
this stuff is generally thinking in terms of an *extremely* tight spin
loop, ie

while (TAS(lock))
;

The loop in s_lock.c has a bit more overhead than that.  Also, because
we only use spinlocks to protect LWLocks, the expected hold time for a
spinlock is just a couple dozen instructions, which is probably less
than the expected time in most other uses of spinlocks.  So I think it's
less than clear that we should expect TAS to fail, even within the loop.

Basically I'd like to see some tests proving that there's actually any
value in it before we go complicating the assembly-code API ...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] spinlocks: generalizing non-locking test

2004-10-18 Thread Mark Wong
On Sun, Oct 17, 2004 at 11:16:50PM +1000, Neil Conway wrote:
 Currently, the assembly for TAS() on x86 does a non-locking test before 
 using an atomic operation to attempt to acquire the spinlock:
 
   __asm__ __volatile__(
  cmpb$0,%1   \n
  jne 1f  \n
  lock\n
  xchgb   %0,%1   \n
   1: \n
 : +q(_res), +m(*lock)
 :
 : memory, cc);
 
 The reason this is a good idea is that if we fail to immediately acquire 
 the spinlock, s_lock() will spin SPINS_PER_DELAY times in userspace 
 calling TAS() each time before going to sleep. If we do an atomic 
 operation for each spin, this generates a lot more bus traffic than is 
 necessary. Doing a non-locking test (followed by an atomic operation to 
 acquire the spinlock if appropriate) is therefore better on SMP systems.
 
 Currently x86 is the only platform on which we do this -- ISTM that all 
 the other platforms that implement spinlocks via atomic operations could 
 benefit from this technique.
 
 We could fix this by tweaking each platform's assembler to add a 
 non-blocking test, but there might be an easier way. Rather than 
 modifying platform-specific assembler, I believe this C sequence is 
 equivalent to the non-locking test:
 
  volatile slock_t *lock = ...;
 
  if (*lock == 0)
  TAS(lock);
 
 Because the lock variable is volatile, the compiler should reload it 
 from memory for each loop iteration. (If this is actually not a 
 sufficient non-locking test, please let me know...)
 
 We could add a new s_lock.h macro, TAS_INNER_LOOP(), whose default 
 implementation would be:
 
  #define TAS_INNER_LOOP(lock) \
  if ((*lock) == 0) \
  TAS(lock);
 
 And then remove the x86-specific non-locking test from TAS.
 
 Comments?
 
 -Neil
 

Steve Hemminger had this to say:


The linux kernel code is:
#define UNLOCKED 1

...
while (atomic_dec(lock) != 0) {
  do {
  rep_nop();
  } while(*lock != UNLOCKED);
}

To do the equivalent thing in postgres would mean

while(TAS(lock)) {
   do {
  rep_nop();
   }  while (*lock);

The point is do the locking test first (assume success is possible)
if that doesn't work spin without doing locked operations and make
sure and do the rep; nop; for the hyperthreaded CPU's

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] spinlocks: generalizing non-locking test

2004-10-18 Thread Neil Conway
On Mon, 2004-10-18 at 11:53, Tom Lane wrote:
 Only once we've begun to spin.  The first time through, it's not at all
 clear whether the extra test is worthwhile --- it's certainly a win if
 the lock is always already held, and certainly a loss if the lock is
 always free

Granted, but I think you've mostly conceded my point: every _subsequent_
time TAS() is invoked, the non-locking test is a clear win (with the
possible exception of PPC). Therefore we have two cases: initial TAS
and TAS inside loop, so so the logical implementation is two distinct
macros. Of course, there may well be platforms on which TAS() is defined
to TAS_INNER_LOOP() or vice versa, but this decision will vary by
platform.

 you have to do some benchmarking to decide if you want it or not. 

I agree that benchmarking is worth doing before making changes.

 We have the ASM-level test on those platforms
 where people seem to think that it is worthwhile, but not everywhere.

That is certainly an optimistic interpretation :-) I would say an
equally likely theory is that there is only one platform on which people
have bothered to try a non-blocking test and see if it improves
performance, and accordingly we only have one platform on which a
non-locking test is used.

(If anyone out there _has_ modified the spinlock implementation for
PostgreSQL on a particular platform to use a non-locking initial test
and found it hasn't improved performance, please speak up.)

-Neil



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] spinlocks: generalizing non-locking test

2004-10-17 Thread Neil Conway
Currently, the assembly for TAS() on x86 does a non-locking test before 
using an atomic operation to attempt to acquire the spinlock:

__asm__ __volatile__(
  cmpb$0,%1   \n
  jne 1f  \n
  lock\n
  xchgb   %0,%1   \n
1: \n
:   +q(_res), +m(*lock)
:
:   memory, cc);
The reason this is a good idea is that if we fail to immediately acquire 
the spinlock, s_lock() will spin SPINS_PER_DELAY times in userspace 
calling TAS() each time before going to sleep. If we do an atomic 
operation for each spin, this generates a lot more bus traffic than is 
necessary. Doing a non-locking test (followed by an atomic operation to 
acquire the spinlock if appropriate) is therefore better on SMP systems.

Currently x86 is the only platform on which we do this -- ISTM that all 
the other platforms that implement spinlocks via atomic operations could 
benefit from this technique.

We could fix this by tweaking each platform's assembler to add a 
non-blocking test, but there might be an easier way. Rather than 
modifying platform-specific assembler, I believe this C sequence is 
equivalent to the non-locking test:

volatile slock_t *lock = ...;
if (*lock == 0)
TAS(lock);
Because the lock variable is volatile, the compiler should reload it 
from memory for each loop iteration. (If this is actually not a 
sufficient non-locking test, please let me know...)

We could add a new s_lock.h macro, TAS_INNER_LOOP(), whose default 
implementation would be:

#define TAS_INNER_LOOP(lock) \
if ((*lock) == 0) \
TAS(lock);
And then remove the x86-specific non-locking test from TAS.
Comments?
-Neil
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] spinlocks: generalizing non-locking test

2004-10-17 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 Currently x86 is the only platform on which we do this -- ISTM that all 
 the other platforms that implement spinlocks via atomic operations could 
 benefit from this technique.

Do you have any actual evidence for that opinion?  ISTM this is
dependent on a large set of assumptions about the CPU's bus behavior,
boiling down to the conclusion that an extra conditional branch is
cheaper than a locked bus cycle.  It would be a serious error to
transfer that conclusion to non-Intel chips without investigation.
(For that matter, I'm not that thrilled about it even for the Intel
chips, since the extra test is certainly 100% wasted cycles on any
single-CPU machine, or indeed anywhere that the lock is not contended
for.)

I believe that an extra test would be a dead loss on PPC, for instance,
because the initial load is already nonblocking there.

  if (*lock == 0)
  TAS(lock);

This will certainly not work on HPPA, and it really shouldn't assume
that zero is the non-locked state, and what happened to testing the
TAS result?

On the whole it seems far easier to stick an extra load into the asm
code on those platforms where we think it's a win.  Especially since
we have already done that ;-)

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] spinlocks: generalizing non-locking test

2004-10-17 Thread Neil Conway
On Mon, 2004-10-18 at 04:13, Tom Lane wrote:
 Do you have any actual evidence for that opinion?  ISTM this is
 dependent on a large set of assumptions about the CPU's bus behavior,
 boiling down to the conclusion that an extra conditional branch is
 cheaper than a locked bus cycle.

I think the conditional branch is effectively free: we're spinning in a
busy-wait loop, so we're effectively throwing away cycles until the lock
is free anyway. The important things are: (a) we interfere as little as
possible with concurrent system activity while spinning (b) we notice
promptly that the lock is free. ISTM that doing a non-locking test
before the locking test achieves those two conditions.

 (For that matter, I'm not that thrilled about it even for the Intel
 chips, since the extra test is certainly 100% wasted cycles on any
 single-CPU machine, or indeed anywhere that the lock is not contended
 for.)

Yes, but the proper fix for this is not to spin at all on UP machines; I
have a grungy patch to do this that I will clean up and send in for 8.1.

   if (*lock == 0)
   TAS(lock);
 
 This will certainly not work on HPPA, and it really shouldn't assume
 that zero is the non-locked state, and what happened to testing the
 TAS result?

Erm, perhaps I should have emphasized that that was pseudo-code :) A
more realistic implementation would be:

#define TAS_INNER_LOOP(lock) (S_LOCK_FREE(lock) ? TAS(lock) : 1)

-Neil



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] spinlocks: generalizing non-locking test

2004-10-17 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 On Mon, 2004-10-18 at 04:13, Tom Lane wrote:
 Do you have any actual evidence for that opinion?  ISTM this is
 dependent on a large set of assumptions about the CPU's bus behavior,
 boiling down to the conclusion that an extra conditional branch is
 cheaper than a locked bus cycle.

 I think the conditional branch is effectively free: we're spinning in a
 busy-wait loop, so we're effectively throwing away cycles until the lock
 is free anyway.

Only once we've begun to spin.  The first time through, it's not at all
clear whether the extra test is worthwhile --- it's certainly a win if
the lock is always already held, and certainly a loss if the lock is
always free, and otherwise you have to do some benchmarking to decide
if you want it or not.  We have the ASM-level test on those platforms
where people seem to think that it is worthwhile, but not everywhere.
As near as I can tell, your proposal is to never have the extra test on
the initial TAS (because you'll remove it from the ASM) and to have it
always on repeated tests (inside the spin loop).  This is not what is
done elsewhere AFAICS, and I'm not prepared to take on faith that it's
a win.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings