https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579

            Bug ID: 90579
           Summary: Huge store forward stall due to vectorizer
           Product: gcc
           Version: 9.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hjl.tools at gmail dot com
  Target Milestone: ---
            Target: x86-64

loop/avx256 branch at

https://gitlab.com/x86-benchmarks/microbenchmark

shows huge store forward stall due to vectorizer in

---
extern double r[6];
extern double a[];

double
loop (int k, double x)
{
  int i;
  double t=0;
  for (i=0;i<6;i++)
    r[i] = x * a[i + k];
  for (i=0;i<6;i++)
    t+=r[5-i];
  return t;
}
---

when compiled with -O3 -march=skylake:

[hjl@gnu-cfl-1 microbenchmark]$ perf stat -e ld_blocks.store_forward ./event
loop: 229408

 Performance counter stats for './event':

                 1      ld_blocks.store_forward:u                               

       0.000478529 seconds time elapsed

       0.000502000 seconds user
       0.000000000 seconds sys


[hjl@gnu-cfl-1 microbenchmark]$ perf stat -e ld_blocks.store_forward
./event-avx128
loop: 191390

 Performance counter stats for './event-avx128':

                 1      ld_blocks.store_forward:u                               

       0.000526154 seconds time elapsed

       0.000507000 seconds user
       0.000000000 seconds sys


[hjl@gnu-cfl-1 microbenchmark]$ perf stat -e ld_blocks.store_forward
./event-avx256
loop: 1312864

 Performance counter stats for './event-avx256':

            30,001      ld_blocks.store_forward:u                               

       0.000756643 seconds time elapsed

       0.000723000 seconds user
       0.000000000 seconds sys


[hjl@gnu-cfl-1 microbenchmark]$

Reply via email to