Off-Topic: Float point division academia? (was: Re: [HACKERS] Spinlock backoff algorithm)

Mark Mielke Fri, 16 Nov 2007 12:07:24 -0800

Martijn van Oosterhout wrote:

On Wed, Nov 14, 2007 at 06:57:18PM -0800, Josh Berkus wrote:
The question is, for our most common platforms (like AMD and Intel) is the FPUnotably slower/more contended than integer division? I'd the impression thatit was, but my knowledge of chip architectures is liable to be out of date.
Can we have a hardware geek speak up?
I did some searching and the best figures I can find for integer
division is 15-40 cycles whereas for floating point the best I found was 5
cycles. The FP units on modern processer as very highly optimsed, but
the figures quoted are for pipelined division, which is not what we're
doing here.

I agree with Tom that the although the user might be experiencing oddissues as PostgreSQL was not designed for 32 light-weight cores like theNiagara, that this particular issue is unlikely to be the primary issue.

In response to the academic discussion above, did you also look up thetime to convert from INT -> FLOAT, and then FLOAT -> INT?

Also, if divided by a constant, there are automatic optimization tricksperformed, such as:


$ cat a.c
int f(int i)
{
   return i / 7;
}
$ gcc -O3 -S a.c
$ cat a.s
...
.globl f
       .type   f, @function
f:
.LFB2:
       movl    %edi, %eax
       movl    $-1840700269, %edx
       imull   %edx
       leal    (%rdx,%rdi), %eax
       sarl    $31, %edi
       sarl    $2, %eax
       subl    %edi, %eax
       ret

No divide. No need to convert from INT -> FLOAT -> INT.

Here is non-conclusive evidence both that integer division by a constantmay still be faster on a fairly modern processor (AMD X2 3800+), andthat, as Tom believes, this issue is a waste of time:


$ cat a.c
#define MAX_RANDOM_VALUE  (0x7FFFFFFF)

int old_f(int i)
{
   return ((double) i / (double) MAX_RANDOM_VALUE) + 0.5;
}

int new_f(int i)
{
   return (i + MAX_RANDOM_VALUE - 1) / MAX_RANDOM_VALUE;
}
$ cat old.c
int old_f(int);
int new_f(int);

int main()
{
   for (int i = 0; i < 100000000; i++)
       old_f(50);
   return 0;
}
$ cat new.c
int old_f(int);
int new_f(int);

int main()
{
   for (int i = 0; i < 100000000; i++)
       new_f(50);
   return 0;
}
$ gcc -o old -O3 -std=c99 old.c a.c
$ gcc -o new -O3 -std=c99 new.c a.c
$ time ./old
./old  1.58s user 0.00s system 99% cpu 1.590 total
$ time ./old
./old  1.31s user 0.00s system 99% cpu 1.314 total
$ time ./old
./old  1.14s user 0.00s system 99% cpu 1.142 total
$ time ./old
./old  1.46s user 0.00s system 99% cpu 1.461 total

$ time ./new./new 0.68s user 0.00s system 99% cpu 0.682 total

$ time ./new
./new  0.70s user 0.01s system 99% cpu 0.703 total
$ time ./new
./new  0.70s user 0.00s system 99% cpu 0.705 total
$ time ./new
./new  0.70s user 0.00s system 99% cpu 0.703 total


I say non-conclusive, because:

1) my CPU is probably doing branch prediction and may possibly be doingout-of-order execution. Since it has more integer units that floatingpoint units, this would work in the favour of integers. The contentioncase described by the original poster does not allow for multipleinteger units to be in use at the same time.2) as Tom has already said, 100 million divides in either 0.7 seconds or1.5 seconds, is still orders of magnitude more than the contended casewould ever be called.

For the future, please consider doing integer division when working withintegers and the rvalue is a constant. For this case, it doesn't seemlike it is worth it to risk breaking the code.


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>

Off-Topic: Float point division academia? (was: Re: [HACKERS] Spinlock backoff algorithm)

Reply via email to