Matthew Knepley <[email protected]> writes: > On Tue, Apr 4, 2017 at 10:02 PM, Jed Brown <[email protected]> wrote: > >> Matthew Knepley <[email protected]> writes: >> >> > On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi <[email protected] >> > >> > wrote: >> > >> >> I had weird issues where gcc (that I am using for my tests right now) >> >> wasn't vectorising properly (even enabling all flags, from >> tree-vectorize, >> >> to mavx). According to my tests, I know the Intel compiler was a bit >> better >> >> at that. >> >> >> > >> > We are definitely at the mercy of the compiler for this. Maybe Jed has an >> > idea why its not vectorizing. >> >> Is this so bad? >> >> 000000000024080e <VecMAXPY_Seq+0x2fe> mov rax,QWORD PTR [rbp-0xb0] >> 0000000000240815 <VecMAXPY_Seq+0x305> add ebx,0x1 >> 0000000000240818 <VecMAXPY_Seq+0x308> vmulpd ymm0,ymm7,YMMWORD PTR >> [rax+r9*1] >> 000000000024081e <VecMAXPY_Seq+0x30e> mov rax,QWORD PTR [rbp-0xa8] >> 0000000000240825 <VecMAXPY_Seq+0x315> vfmadd231pd ymm0,ymm8,YMMWORD PTR >> [rax+r9*1] >> 000000000024082b <VecMAXPY_Seq+0x31b> mov rax,QWORD PTR [rbp-0xb8] >> 0000000000240832 <VecMAXPY_Seq+0x322> vfmadd231pd ymm0,ymm6,YMMWORD PTR >> [rax+r9*1] >> 0000000000240838 <VecMAXPY_Seq+0x328> vfmadd231pd ymm0,ymm5,YMMWORD PTR >> [r10+r9*1] >> 000000000024083e <VecMAXPY_Seq+0x32e> vaddpd ymm0,ymm0,YMMWORD PTR >> [r11+r9*1] >> 0000000000240844 <VecMAXPY_Seq+0x334> vmovapd YMMWORD PTR [r11+r9*1],ymm0 >> 000000000024084a <VecMAXPY_Seq+0x33a> add r9,0x20 >> 000000000024084e <VecMAXPY_Seq+0x33e> cmp DWORD PTR [rbp-0xa0],ebx >> 0000000000240854 <VecMAXPY_Seq+0x344> ja 000000000024080e >> <VecMAXPY_Seq+0x2fe> >> > > I agree that is what we should see. It cannot be what Fillippo has if he is > getting ~4x with the template stuff.
I'm using gcc. Fillippo, can you make an easy to run test that we can evaluate on Xeon and KNL?
signature.asc
Description: PGP signature
