https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63148

            Bug ID: 63148
           Summary: r187042 causes auto-vectorization failure for X86 for
                    -m32.
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: doug.gilmore at imgtec dot com

Created attachment 33440
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33440&action=edit
test example

I noticed that MultiSource/Benchmarks/TSVC/LoopRestructuring-{flt,dbl}
from LLVM test-suite fail on X86 -m32 and I was able to bisect the
failure to commit r187042.

I attached a stripped down example:

Before the revision if we compile with -fdump-tree-vect-details
we see that a loop carried dependency is recorded:

(compute_affine_dependence
  stmt_a: D.1748_9 = global_data.b[D.1747_8];
  stmt_b: global_data.b[i.0_2] = D.1750_11;
(subscript_dependence_tester 
(analyze_overlapping_iterations 
  (chrec_a = {0, +, 1}_5)
  (chrec_b = {1, +, 1}_5)
(analyze_siv_subscript 
(analyze_subscript_affine_affine 
  (overlaps_a = [1 + 1 * x_1]
)
  (overlaps_b = [0 + 1 * x_1]
)
)
)
  (overlap_iterations_a = [1 + 1 * x_1]
)
  (overlap_iterations_b = [0 + 1 * x_1]
)
)
(analyze_overlapping_iterations 
  (chrec_a = 2816)
  (chrec_b = 2816)
  (overlap_iterations_a = [0]
)
  (overlap_iterations_b = [0]
)
)
(build_classic_dist_vector
  dist_vector = (  1 
  )
)
)
)

which results in the loop not being vectorized because of the memory
recurrence.

After the change the dependency is not recorded:

(compute_affine_dependence
  stmt_a: D.1748_9 = global_data.b[D.1747_8];
  stmt_b: global_data.b[i.0_2] = D.1750_11;
(subscript_dependence_tester 
(analyze_overlapping_iterations 
  (chrec_a = {536870912, +, 1}_5)
  (chrec_b = {1, +, 1}_5)
(analyze_siv_subscript 
(analyze_subscript_affine_affine 
  (overlaps_a = no dependence
)
  (overlaps_b = no dependence
)
)
)
  (overlap_iterations_a = no dependence
)
  (overlap_iterations_b = no dependence
)
)
(dependence classified: scev_known)
)

Causing the loop to be incorrectly vectorized.

Note that when compiled with -m64 is actually vectorized,
but it is determined that versioning is needed:

45: dependence distance == 0 between global_data.a[D.1767_2] and
global_data.a[D.1767_2]
45: versioning for alias required: can't determine dependence between
global_data.a[D.1767_2] and *D.1776_10
...
58: LOOP VECTORIZED.
s221_extract.c:40: note: vectorized 5 loops in function.
Merging blocks 2 and 41
Removing basic block 5
...

and the incorrectly vectorized code is removed.

Reply via email to