subject:"\[HACKERS\] Spinlock backoff algorithm"

Re: [HACKERS] Spinlock backoff algorithm

2007-11-19 Thread Zdenek Kotala


Gregory Stark wrote:

"Tom Lane" <[EMAIL PROTECTED]> writes:


Mark Mielke <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

My goodness that's a hardware-dependent proposal.  Shall we discuss
how many CPUs there are where an integer division is *slower* than
a floating-point op?

Do you have one in mind, or is this a straw man? :-)

I've got one upstairs (HPPA), and I believe that it's actually a pretty
common situation in scientifically-oriented workstations from a few
years back.


I think floating point is fast on many common platforms, even many i386
variants. But usually that's assuming you're comparing doing a whole bunch of
work in floating point or integer math. Converting a bunch of integers to
floating point for a single operation doesn't seem like a case that's going to
shine on any floating point unit.


Just for fullness, task context switch is more complex (slower) when 
application uses FP operation. It is not important for PostgreSQL 
because it has FP operation on many places, but it is good to know.



Zdenek


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Spinlock backoff algorithm

2007-11-17 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> Mark Mielke <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> My goodness that's a hardware-dependent proposal.  Shall we discuss
>>> how many CPUs there are where an integer division is *slower* than
>>> a floating-point op?
>
>> Do you have one in mind, or is this a straw man? :-)
>
> I've got one upstairs (HPPA), and I believe that it's actually a pretty
> common situation in scientifically-oriented workstations from a few
> years back.

I think floating point is fast on many common platforms, even many i386
variants. But usually that's assuming you're comparing doing a whole bunch of
work in floating point or integer math. Converting a bunch of integers to
floating point for a single operation doesn't seem like a case that's going to
shine on any floating point unit.

And there are plenty of platforms with *no* floating point. On embedded 486,
ARM, etc the floating point will be emulated by either the compiler or the
kernel.

But as Tom said there are plenty of places using floating point arithmetic in
the kernel. The planner does all of its selectivity and costing in floating
point for example.

I wonder if you wouldn't be better off going whole hog with soft floating
point emulation. I et Sun's cc has an option to implement floating point
arithmetic entirely in user-side software.

Another option might be to compile just some modules with soft float and do
utils/adt/*.c using hardfloat -- so user-defined types still use the floating
point unit but the database's internal arithmetic is done using software
emulated math.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

---(end of broadcast)---
TIP 6: explain analyze is your friend

Off-Topic: Float point division academia? (was: Re: [HACKERS] Spinlock backoff algorithm)

2007-11-16 Thread Mark Mielke


Martijn van Oosterhout wrote:

On Wed, Nov 14, 2007 at 06:57:18PM -0800, Josh Berkus wrote:
  
The question is, for our most common platforms (like AMD and Intel) is the FPU 
notably slower/more contended than integer division?  I'd the impression that 
it was, but my knowledge of chip architectures is liable to be out of date.


Can we have a hardware geek speak up?


I did some searching and the best figures I can find for integer
division is 15-40 cycles whereas for floating point the best I found was 5
cycles. The FP units on modern processer as very highly optimsed, but
the figures quoted are for pipelined division, which is not what we're
doing here.
  
I agree with Tom that the although the user might be experiencing odd 
issues as PostgreSQL was not designed for 32 light-weight cores like the 
Niagara, that this particular issue is unlikely to be the primary issue.


In response to the academic discussion above, did you also look up the 
time to convert from INT -> FLOAT, and then FLOAT -> INT?


Also, if divided by a constant, there are automatic optimization tricks 
performed, such as:


$ cat a.c
int f(int i)
{
   return i / 7;
}
$ gcc -O3 -S a.c
$ cat a.s
...
.globl f
   .type   f, @function
f:
.LFB2:
   movl%edi, %eax
   movl$-1840700269, %edx
   imull   %edx
   leal(%rdx,%rdi), %eax
   sarl$31, %edi
   sarl$2, %eax
   subl%edi, %eax
   ret

No divide. No need to convert from INT -> FLOAT -> INT.

Here is non-conclusive evidence both that integer division by a constant 
may still be faster on a fairly modern processor (AMD X2 3800+), and 
that, as Tom believes, this issue is a waste of time:


$ cat a.c
#define MAX_RANDOM_VALUE  (0x7FFF)

int old_f(int i)
{
   return ((double) i / (double) MAX_RANDOM_VALUE) + 0.5;
}

int new_f(int i)
{
   return (i + MAX_RANDOM_VALUE - 1) / MAX_RANDOM_VALUE;
}
$ cat old.c
int old_f(int);
int new_f(int);

int main()
{
   for (int i = 0; i < 1; i++)
   old_f(50);
   return 0;
}
$ cat new.c
int old_f(int);
int new_f(int);

int main()
{
   for (int i = 0; i < 1; i++)
   new_f(50);
   return 0;
}
$ gcc -o old -O3 -std=c99 old.c a.c
$ gcc -o new -O3 -std=c99 new.c a.c
$ time ./old
./old  1.58s user 0.00s system 99% cpu 1.590 total
$ time ./old
./old  1.31s user 0.00s system 99% cpu 1.314 total
$ time ./old
./old  1.14s user 0.00s system 99% cpu 1.142 total
$ time ./old
./old  1.46s user 0.00s system 99% cpu 1.461 total
$ time ./new  
./new  0.68s user 0.00s system 99% cpu 0.682 total

$ time ./new
./new  0.70s user 0.01s system 99% cpu 0.703 total
$ time ./new
./new  0.70s user 0.00s system 99% cpu 0.705 total
$ time ./new
./new  0.70s user 0.00s system 99% cpu 0.703 total


I say non-conclusive, because:
1) my CPU is probably doing branch prediction and may possibly be doing 
out-of-order execution. Since it has more integer units that floating 
point units, this would work in the favour of integers. The contention 
case described by the original poster does not allow for multiple 
integer units to be in use at the same time.
2) as Tom has already said, 100 million divides in either 0.7 seconds or 
1.5 seconds, is still orders of magnitude more than the contended case 
would ever be called.


For the future, please consider doing integer division when working with 
integers and the rvalue is a constant. For this case, it doesn't seem 
like it is worth it to risk breaking the code.


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>

Re: [HACKERS] Spinlock backoff algorithm

2007-11-16 Thread Martijn van Oosterhout

On Wed, Nov 14, 2007 at 06:57:18PM -0800, Josh Berkus wrote:
> The question is, for our most common platforms (like AMD and Intel) is the 
> FPU 
> notably slower/more contended than integer division?  I'd the impression that 
> it was, but my knowledge of chip architectures is liable to be out of date.
> 
> Can we have a hardware geek speak up?

I did some searching and the best figures I can find for integer
division is 15-40 cycles whereas for floating point the best I found was 5
cycles. The FP units on modern processer as very highly optimsed, but
the figures quoted are for pipelined division, which is not what we're
doing here.

http://www.behardware.com/art/imprimer/623/

I did find a thread about how integer division on the Itanium was
implemented by copying the integers to the FP registers, doing the
division and rounding there and copying back. So at least there it's
not true.

http://gcc.gnu.org/ml/gcc-patches/2005-12/msg01518.html

Hope this helps,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution 
> inevitable.
>  -- John F Kennedy

signature.asc
Description: Digital signature

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Tom Lane

Josh Berkus <[EMAIL PROTECTED]> writes:
> Nah, if it's only Niagara, it's not worth bothering.

It's not only that aspect of it --- it's that I am 100% convinced that
Magne has misidentified the source of whatever FPU contention he's
seeing.  The floating-point code in s_lock() is executed only just
after having returned from a sleep that is at least one millisecond
and often many times that.  If Niagara cannot handle a few kiloflops
then you need to find some other company to work for ;-)

I am interested to find out what the true cause of the reported FPU
contention is, but I'll bet our next lunch that s_lock.c ain't it.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Josh Berkus

Greg,

> That says precisely nothing about the matter at hand. Someone should
> simply change it and benchmark it in pgsql. I doubt you'll see a
> difference there on regular AMD/Intel ... and if it makes the sun
> hyperthreaded cpu happier...

Nah, if it's only Niagara, it's not worth bothering.  I was under the 
impression that most x86 chips had similar issues around having less FP than 
IP, but I could be wrong.  It's also worth thinking about the new Intel 
multi-core archtectures; do they starve FPs as well?

On a busy oltp system, spinlock waits get called 100's of times per second, so 
like procarraylock this it's frequent enought to call for microoptimization.

I think maybe we should try making the change and testing it on Niagara and on 
some standard x86 platforms, and then on the new x86 architectures, to see if 
it makes any difference in CPU utilization.

FYI, if nobody had guessed this is coming out of study Magne is doing on 
improving PostgreSQL SMP scalability.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Joshua D. Drake


Gregory Maxwell wrote:

On Nov 14, 2007 10:12 PM, Joshua D. Drake <[EMAIL PROTECTED]> wrote:

http://www.intel.com/performance/server/xeon/intspd.htm
http://www.intel.com/performance/server/xeon/fpspeed.htm


That says precisely nothing about the matter at hand. Someone should
simply change it and benchmark it in pgsql. I doubt you'll see a
difference there on regular AMD/Intel ... and if it makes the sun
hyperthreaded cpu happier...


Are you offering?

Joshua D. Drake



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Gregory Maxwell

On Nov 14, 2007 10:12 PM, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
> http://www.intel.com/performance/server/xeon/intspd.htm
> http://www.intel.com/performance/server/xeon/fpspeed.htm

That says precisely nothing about the matter at hand. Someone should
simply change it and benchmark it in pgsql. I doubt you'll see a
difference there on regular AMD/Intel ... and if it makes the sun
hyperthreaded cpu happier...

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Steve Atkins



On Nov 14, 2007, at 6:57 PM, Josh Berkus wrote:


Tom,

I've got one upstairs (HPPA), and I believe that it's actually a  
pretty

common situation in scientifically-oriented workstations from a few
years back.


Last I checked, scientific workstations aren't exactly a common  
platform for

PostgreSQL servers.

The question is, for our most common platforms (like AMD and Intel)  
is the FPU
notably slower/more contended than integer division?  I'd the  
impression that
it was, but my knowledge of chip architectures is liable to be out  
of date.


Can we have a hardware geek speak up?


Somewhat. The last version of K7 I looked at had three integer  
execution units versus one floating point unit.


They're also scheduled fairly independently, meaning that casts from  
double to integer or back again will have some minor negative effects  
on the pipeline or the scheduler more than the use of floating point  
itself.


In the grand scheme of things, though, I don't believe it's a big  
deal for typical code on most modern desktop CPUs, certainly not  
compared to memory starvation, use of less than optimal compilers and  
all the other reasons the pipeline might stall. I might care in the  
innermost of inner loops, but possibly not even then unless a  
profiler told me differently.


Cheers,
  Steve


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Joshua D. Drake


Josh Berkus wrote:

Tom,


I've got one upstairs (HPPA), and I believe that it's actually a pretty
common situation in scientifically-oriented workstations from a few
years back.


Last I checked, scientific workstations aren't exactly a common platform for 
PostgreSQL servers.


The question is, for our most common platforms (like AMD and Intel) is the FPU 
notably slower/more contended than integer division?  I'd the impression that 
it was, but my knowledge of chip architectures is liable to be out of date.


Can we have a hardware geek speak up?



http://www.intel.com/performance/server/xeon/intspd.htm
http://www.intel.com/performance/server/xeon/fpspeed.htm


Sincerely,

Joshua D. Drake


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Josh Berkus

Tom,

> I've got one upstairs (HPPA), and I believe that it's actually a pretty
> common situation in scientifically-oriented workstations from a few
> years back.

Last I checked, scientific workstations aren't exactly a common platform for 
PostgreSQL servers.

The question is, for our most common platforms (like AMD and Intel) is the FPU 
notably slower/more contended than integer division?  I'd the impression that 
it was, but my knowledge of chip architectures is liable to be out of date.

Can we have a hardware geek speak up?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Trevor Talbot

On 11/14/07, Tom Lane <[EMAIL PROTECTED]> wrote:

> The other problem with using modulo is that it makes the result depend
> mostly on the low-order bits of the random() result, rather than mostly
> on the high-order bits; with lower-grade implementations of random(),
> the lower bits are materially less random than the higher.  Now
> admittedly high-grade randomness is probably not too important for this
> specific context, but I dislike putting in poor coding practices that
> someone might see and copy without thinking...

If there's a dependency on a particular quality of random()
implementation, why not just include one?  Mersenne Twister is easy,
while not being cryptographic strength.
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Greg Smith


On Wed, 14 Nov 2007, Mark Mielke wrote:


The other problem with using modulo is that it makes the result depend
mostly on the low-order bits of the random() result, rather than mostly
on the high-order bits; with lower-grade implementations of random(),
the lower bits are materially less random than the higher.


If this was a serious problem, there is the >> operator. I see it as a poor 
coding practice to make assumptions about which bits are most "random" in a 
call to random().


There are many types of pseudo-random number generators where the 
low-order bits are not so random, and the assumption Tom has described is 
pretty likely to be true.  See http://www.fourmilab.ch/random/ as one 
comment about the badness of the standard UNIX random generator for 
example.


There is an interesting discussion of this issue along with code showing a 
way to improve things while only using integer math (which in some cases 
uses >> as you suggest) as part of the Java standard library:


http://java.sun.com/j2se/1.4.2/docs/api/java/util/Random.html#nextInt(int)

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Tom Lane

Mark Mielke <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> My goodness that's a hardware-dependent proposal.  Shall we discuss
>> how many CPUs there are where an integer division is *slower* than
>> a floating-point op?

> Do you have one in mind, or is this a straw man? :-)

I've got one upstairs (HPPA), and I believe that it's actually a pretty
common situation in scientifically-oriented workstations from a few
years back.

>> Why do you think that a couple of FP ops here are a problem, anyway?
>> This is a code path where we've already yielded the processor, so
>> by definition the repetition rate has to be pretty low.

> Yielded the processor?

Yielded the processor, as in pg_usleep.  It is absolutely impossible for
any thread to execute that line of code more than 1000 times per second,
and the expected rate would be very much less.  Furthermore, the entire
point of the function is to try to make processes come out of the sleep
at different times, so they shouldn't be ganging up on the FPU anyway.

There may be some place where we have an unnecessarily high amount
of FP usage, but I very much doubt that this is it.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Mark Mielke


Tom Lane wrote:

=?ISO-8859-1?Q?Magne_M=E6hre?= <[EMAIL PROTECTED]> writes:
  

I understand the reasoning for the backoff (as of the discussion on
2003-08-05), but is there any particular reason for using floating
point operations here ?   Maybe a modulo would be just as good (or
better since it doesn't involve the FPU) ?


My goodness that's a hardware-dependent proposal.  Shall we discuss
how many CPUs there are where an integer division is *slower* than
a floating-point op?
  

Hi Tom:

Do you have one in mind, or is this a straw man? :-)


Why do you think that a couple of FP ops here are a problem, anyway?
This is a code path where we've already yielded the processor, so
by definition the repetition rate has to be pretty low.
Yielded the processor? Or yielded the lock? With 32 active threads 
contending for the lock, but first contending for the FPU, one could see 
how it might be relevant.


I think I agree with you that this won't be the only problem, however, 
the FPU may not need to contribute to the problem?



The other problem with using modulo is that it makes the result depend
mostly on the low-order bits of the random() result, rather than mostly
on the high-order bits; with lower-grade implementations of random(),
the lower bits are materially less random than the higher.  Now
admittedly high-grade randomness is probably not too important for this
specific context, but I dislike putting in poor coding practices that
someone might see and copy without thinking...
  
If this was a serious problem, there is the >> operator. I see it as a 
poor coding practice to make assumptions about which bits are most 
"random" in a call to random(). If anybody fixes the problem you 
describe, then the opposite may become true. Perhaps the lowest bits are 
the most random. If random() is broken, random() should be fixed. Coding 
in specific implementation details about specific implementations of 
random() is just as bad. :-)


IMHO, use of FPU should be avoided wherever possible on any platform. On 
some platforms, the FPU may be fully implemented in software. My memory 
is faint, but I think SPARC v7 either implemented / in software, or had 
a trap that implemented it in software.


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Tom Lane

=?ISO-8859-1?Q?Magne_M=E6hre?= <[EMAIL PROTECTED]> writes:
> I understand the reasoning for the backoff (as of the discussion on
> 2003-08-05), but is there any particular reason for using floating
> point operations here ?   Maybe a modulo would be just as good (or
> better since it doesn't involve the FPU) ?

My goodness that's a hardware-dependent proposal.  Shall we discuss
how many CPUs there are where an integer division is *slower* than
a floating-point op?

Why do you think that a couple of FP ops here are a problem, anyway?
This is a code path where we've already yielded the processor, so
by definition the repetition rate has to be pretty low.

The other problem with using modulo is that it makes the result depend
mostly on the low-order bits of the random() result, rather than mostly
on the high-order bits; with lower-grade implementations of random(),
the lower bits are materially less random than the higher.  Now
admittedly high-grade randomness is probably not too important for this
specific context, but I dislike putting in poor coding practices that
someone might see and copy without thinking...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Zdenek Kotala


Magne Mæhre wrote:


I was playing with a Nevada server and noticed a rush on the FPU
(the Nevada has a single shared FPU for its 32 threads).


Probably you mean Niagara ;-).


Zdenek

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[HACKERS] Spinlock backoff algorithm

2007-11-14 Thread Magne Mæhre



I was playing with a Nevada server and noticed a rush on the FPU
(the Nevada has a single shared FPU for its 32 threads).
Looking at the spinlock code, I found :

 cur_delay += (int) (cur_delay *
 ((double) random() / (double) MAX_RANDOM_VALUE) + 0.5);

I understand the reasoning for the backoff (as of the discussion on
2003-08-05), but is there any particular reason for using floating
point operations here ?   Maybe a modulo would be just as good (or
better since it doesn't involve the FPU) ?

Something like:
   cur_delay  +=  random() % (cur_delay + 1) ;


--Magne

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

Off-Topic: Float point division academia? (was: Re: [HACKERS] Spinlock backoff algorithm)

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

Re: [HACKERS] Spinlock backoff algorithm

[HACKERS] Spinlock backoff algorithm

18 matches

Site Navigation

Mail list logo

Footer information