Thanks. We'll take a look into it and see if we can fix it.
Bill.
On 14 August 2017 at 16:51, wrote:
> Hi,
>
> I've dug a bit deeper, and it seems that there is an alignment issue
> within addmul_1. I've created two marginally different programs, of which
> one is much faster:
>
> $ gcc -O3 add
Hi,
I've dug a bit deeper, and it seems that there is an alignment issue within
addmul_1. I've created two marginally different programs, of which one is
much faster:
$ gcc -O3 addmul_1.s -o addmul_1.o -c; for i in a b; do gcc -O3 $i.cpp -o
$i.out addmul_1.o; ./$i.out ; done
Time: 0.490647
Tim
Of course if you are using an AMD processor, surely it is a "performance
marginality problem". :-)
Bill.
On 11 August 2017 at 20:00, Bill Hart wrote:
> We've noticed similar sorts of things. One possibility is that the loop in
> your test code is not aligned as well in one version. Or perhaps y
We've noticed similar sorts of things. One possibility is that the loop in
your test code is not aligned as well in one version. Or perhaps your stack
is hitting the same location modulo 4096, which is a known issue on some
modern processors. There might be SSE code in the linker and AVX code in
th