Martijn van Oosterhout wrote:
On Wed, Nov 14, 2007 at 06:57:18PM -0800, Josh Berkus wrote:
The question is, for our most common platforms (like AMD and Intel) is the FPU notably slower/more contended than integer division? I'd the impression that it was, but my knowledge of chip architectures is liable to be out of date.

Can we have a hardware geek speak up?
I did some searching and the best figures I can find for integer
division is 15-40 cycles whereas for floating point the best I found was 5
cycles. The FP units on modern processer as very highly optimsed, but
the figures quoted are for pipelined division, which is not what we're
doing here.
I agree with Tom that the although the user might be experiencing odd issues as PostgreSQL was not designed for 32 light-weight cores like the Niagara, that this particular issue is unlikely to be the primary issue.

In response to the academic discussion above, did you also look up the time to convert from INT -> FLOAT, and then FLOAT -> INT?

Also, if divided by a constant, there are automatic optimization tricks performed, such as:

$ cat a.c
int f(int i)
{
   return i / 7;
}
$ gcc -O3 -S a.c
$ cat a.s
...
.globl f
       .type   f, @function
f:
.LFB2:
       movl    %edi, %eax
       movl    $-1840700269, %edx
       imull   %edx
       leal    (%rdx,%rdi), %eax
       sarl    $31, %edi
       sarl    $2, %eax
       subl    %edi, %eax
       ret

No divide. No need to convert from INT -> FLOAT -> INT.

Here is non-conclusive evidence both that integer division by a constant may still be faster on a fairly modern processor (AMD X2 3800+), and that, as Tom believes, this issue is a waste of time:

$ cat a.c
#define MAX_RANDOM_VALUE  (0x7FFFFFFF)

int old_f(int i)
{
   return ((double) i / (double) MAX_RANDOM_VALUE) + 0.5;
}

int new_f(int i)
{
   return (i + MAX_RANDOM_VALUE - 1) / MAX_RANDOM_VALUE;
}
$ cat old.c
int old_f(int);
int new_f(int);

int main()
{
   for (int i = 0; i < 100000000; i++)
       old_f(50);
   return 0;
}
$ cat new.c
int old_f(int);
int new_f(int);

int main()
{
   for (int i = 0; i < 100000000; i++)
       new_f(50);
   return 0;
}
$ gcc -o old -O3 -std=c99 old.c a.c
$ gcc -o new -O3 -std=c99 new.c a.c
$ time ./old
./old  1.58s user 0.00s system 99% cpu 1.590 total
$ time ./old
./old  1.31s user 0.00s system 99% cpu 1.314 total
$ time ./old
./old  1.14s user 0.00s system 99% cpu 1.142 total
$ time ./old
./old  1.46s user 0.00s system 99% cpu 1.461 total
$ time ./new ./new 0.68s user 0.00s system 99% cpu 0.682 total
$ time ./new
./new  0.70s user 0.01s system 99% cpu 0.703 total
$ time ./new
./new  0.70s user 0.00s system 99% cpu 0.705 total
$ time ./new
./new  0.70s user 0.00s system 99% cpu 0.703 total


I say non-conclusive, because:
1) my CPU is probably doing branch prediction and may possibly be doing out-of-order execution. Since it has more integer units that floating point units, this would work in the favour of integers. The contention case described by the original poster does not allow for multiple integer units to be in use at the same time. 2) as Tom has already said, 100 million divides in either 0.7 seconds or 1.5 seconds, is still orders of magnitude more than the contended case would ever be called.

For the future, please consider doing integer division when working with integers and the rvalue is a constant. For this case, it doesn't seem like it is worth it to risk breaking the code.

Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>

Reply via email to