[Bug target/113827] MrBayes benchmark redundant load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827 --- Comment #3 from Andrew Pinski --- (In reply to Robin Dapp from comment #0) > A hot block in the MrBayes benchmark (as used in the Phoronix testsuite) has > a redundant scalar load when vectorized. > > Minimal example, compiled with -march=rv64gcv -O3 > > int foo (float **a, float f, int n) > { > for (int i = 0; i < n; i++) > { > a[i][0] /= f; > a[i][1] /= f; > a[i][2] /= f; > a[i][3] /= f; > a[i] += 4; > } > } LLVM for aarch64 with the above testcase: `` .L3: ldr x2, [x0] mov x1, x2 ldr q31, [x2] fdivv31.4s, v31.4s, v0.4s str q31, [x1], 16 str x1, [x0], 8 HERE cmp x3, x0 bne .L3 ``` There is a store of x1 there. I really think you messed up reducing the testcase.
[Bug target/113827] MrBayes benchmark redundant load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |WAITING Last reconfirmed||2024-02-09 --- Comment #2 from Andrew Pinski --- >a redundant scalar load I don't see any redundant load in that loop. ``` L3: movq(%rdi), %rax ;; load a[i] from rdi vmovups (%rax), %xmm1 ;; load rax[0-3] into vector vdivps %xmm0, %xmm1, %xmm1 ;; divide vmovups %xmm1, (%rax) ;; store result back into rax[0-3] addq$16, %rax ;; add 4*4 to rax movq%rax, (%rdi) ;; store rax back into rdi addq$8, %rdi ;; add 8 to rdi cmpq%rdi, %rdx jne .L3 ;; compare and loop back ``` That is a[i] is different between each iterations. Maybe you reduced this code too much?
[Bug target/113827] MrBayes benchmark redundant load on riscv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827 --- Comment #1 from Robin Dapp --- x86 (-march=native -O3 on an i7 12th gen) looks pretty similar: .L3: movq(%rdi), %rax vmovups (%rax), %xmm1 vdivps %xmm0, %xmm1, %xmm1 vmovups %xmm1, (%rax) addq$16, %rax movq%rax, (%rdi) addq$8, %rdi cmpq%rdi, %rdx jne .L3 So probably not target specific. Costing?