On Sun, Mar 29, 2009 at 03:32:20PM -0700, Mark Butler wrote:
>
> a0 += ip[0] + (ip[0] >> 32);
That has certain weaknesses too. What we really want is a 128-bit add
across two 64-bit registers. If you write the code like this:
value = ip[0];
a0 += value;
if (a0 < value) /* 64-bit overflow implies need to carry */
a1++;
b0 += a0;
if (b0 < a0)
b1++;
then you get the desired effect. The pair a1:a0 is the 128-bit sum
of the 64-bit ip[] values, and the pair b1:b0 is the 128-bit sum of
the a1:a0 values. Best of all, the compiler (at least, our compiler)
is smart enough to detect the carry-detection construct and turn it
into branchless add-with-carry instructions. Very efficient.
We've been meaning to introduce this "fletcher2c" for some time --
it just got lost in the sea of things to do. Thanks for the reminder.
Jeff