[Bug tree-optimization/96351] New: missed opportunity to optimize out redundant loop

2020-07-28 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96351

Bug ID: 96351
   Summary: missed opportunity to optimize out redundant loop
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: felix.yang at huawei dot com
  Target Milestone: ---

inline unsigned int
stringLen(const short* const src)
{
if (src == 0 || *src == 0) {
return 0;
} else {
const short* pszTmp = src + 1;

while (*pszTmp)
++pszTmp;

return (unsigned int)(pszTmp - src);
}
}

extern void bar();

void foo(const short* const str) {
unsigned int len = stringLen(str);
if (!len) {
bar();
}
}

When stringLen is inlined into foo, the else block in stringLen can be
simplified into non-zero, thus eliminating the while loop. This looks like a
tree VRP issue, but this pass does not work as expected for this test case.

$ g++ -S -O2 foo.cpp -fdump-tree-vrp

Consider function foo, value ranges after VRP does not help here:
 48
 49 .MEM_1: <<< error >>> VARYING
 50 str_3(D): const short int * const VARYING
 51 _6: short int VARYING
 52 str_7: const short int * const [1B, +INF]  EQUIVALENCES: { str_3(D) } (1
elements)
 53 pszTmp_8: const short int * [1B, +INF]  EQUIVALENCES: { pszTmp_10 } (1
elements)
 54 pszTmp_9: const short int * const [1B, +INF]
 55 pszTmp_10: const short int * const [1B, +INF]
 56 _11: short int VARYING
 57 pszTmp_12: const short int * [1B, +INF]
 58 _13: unsigned int [0, 0]
 59 _14: long int VARYING
 60 _15: long int [-4611686018427387904, 4611686018427387903]
 61 _16: unsigned int VARYING
 62 _18: unsigned int [0, 0]
 63 pszTmp_19: const short int * [1B, +INF]  EQUIVALENCES: { pszTmp_10 } (1
elements)

 ..

 93[local count: 439750964]:
 94   pszTmp_9 = str_3(D) + 2;
 95
 96[local count: 3997736055]:
 97   # pszTmp_10 = PHI 
 98   _11 = *pszTmp_10;
 99   if (_11 == 0)
100 goto ; [11.00%]
101   else
102 goto ; [89.00%]
103
104[local count: 3557985095]:
105   pszTmp_12 = pszTmp_10 + 2;
106   goto ; [100.00%]
107
108[local count: 439750964]:
109   # pszTmp_8 = PHI 
110   _14 = pszTmp_8 - str_3(D);
111   _15 = _14 /[ex] 2;
112   _16 = (unsigned int) _15;
113   if (_16 == 0)
114 goto ; [3.91%]
115   else
116 goto ; [96.09%]
117
118[local count: 354334798]:
119   bar ();
120
121[local count: 1073741824]:
122   return;

Any suggestions to proceed?

[Bug other/96281] New: TBAA does not work as expected for a simple test case

2020-07-22 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96281

Bug ID: 96281
   Summary: TBAA does not work as expected for a simple test case
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: alias
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: felix.yang at huawei dot com
  Target Milestone: ---
Target: aarch64

Test case: foo.c

typedef struct state_t {
int threadid;
} state_t;

int history_h[8][12][64];

void history_good (state_t *s) {
int i, j;

if (s->threadid >= 0 && s->threadid < 8) {
for (i = 0; i < 12; i++) {
for (j = 0; j < 64; j++) {
history_h[s->threadid][i][j] = (history_h[s->threadid][i][j] +
1) >> 1;
}
}
}
}

$ gcc -S -O2 -ftree-loop-vectorize -funroll-loops foo.c -fopt-info
foo.c:14:13: optimized: loop unrolled 6 times

When the input parameter s is specified to be unaliased with __restrict__ type
qualifier like:
void history_good (state_t * __restrict__ s)

The inner loop could be auto-vectorized:
$ gcc -S -O2 -ftree-loop-vectorize -funroll-loops foo.c -fopt-info
foo.c:14:13: optimized: loop vectorized using 16 byte vectors
foo.c:8:6: optimized: loop with 15 iterations completely unrolled (header
execution count 16535624)

Looks like TBAA is not working here for this case.  Then I noticed the
following logic in tree-ssa-alias.c:

1939   /* When we are trying to disambiguate an access with a pointer
dereference
1940  as base versus one with a decl as base we can use both the size
1941  of the decl and its dynamic type for extra disambiguation.
1942  ???  We do not know anything about the dynamic type of the decl
1943  other than that its alias-set contains base2_alias_set as a subset
1944  which does not help us here.  */
1945   /* As we know nothing useful about the dynamic type of the decl just
1946  use the usual conflict check rather than a subset test.
1947  ???  We could introduce -fvery-strict-aliasing when the language
1948  does not allow decls to have a dynamic type that differs from their
1949  static type.  Then we can check
1950  !alias_set_subset_of (base1_alias_set, base2_alias_set) instead.  */
1951   if (base1_alias_set != base2_alias_set
1952   && !alias_sets_conflict_p (base1_alias_set, base2_alias_set))
1953 return false;

This was introduced by: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42834 
>From the comments, this depends on the language of the source code. 
So at least for C & C++, could we check !alias_set_subset_of (base1_alias_set,
base2_alias_set) instead here?
Any other languages supported by GCC that makes a difference?

[Bug tree-optimization/95961] New: ICE: in exact_div, at poly-int.h:2182

2020-06-29 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95961

Bug ID: 95961
   Summary: ICE: in exact_div, at poly-int.h:2182
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: felix.yang at huawei dot com
  Target Milestone: ---
Target: aarch64

test case:
$ cat foo.c
typedef struct {
unsigned short mprr_2[5][16][16];
} ImageParameters;

int s[16][2];

void intrapred_luma_16x16(ImageParameters *img, int s0)
{
  for (int j = 0; j < 16; j++)
for (int i = 0; i < 16; i++)
  {
img->mprr_2[1][j][i] = s[j][1];
img->mprr_2[2][j][i] = s0;
  }
}

Command line to reproduce:
$ gcc -O3 -march=armv8.2-a+sve -fno-vect-cost-model foo.c

Call trace:
during GIMPLE pass: vect
dump file: a-foo.c.163t.vect
foo.c: In function ‘intrapred_luma_16x16’:
foo.c:7:6: internal compiler error: in exact_div, at poly-int.h:2182
7 | void intrapred_luma_16x16(ImageParameters *img, int s0)
  |  ^~~~
v0xdb6937 poly_int<2u, poly_result::result_kind>::type>
exact_div<2u, unsigned long, unsigned long>(poly_int_pod<2u, unsigned long>
const&, poly_int_pod<2u, unsigned long> const&)
../../gcc-git/gcc/poly-int.h:2182
0x22934ef vect_get_num_vectors
../../gcc-git/gcc/tree-vectorizer.h:1647
0x2297d5f vect_enhance_data_refs_alignment(_loop_vec_info*)
../../gcc-git/gcc/tree-vect-data-refs.c:1827
0x1686adf vect_analyze_loop_2
../../gcc-git/gcc/tree-vect-loop.c:2138
0x1688267 vect_analyze_loop(loop*, vec_info_shared*)
../../gcc-git/gcc/tree-vect-loop.c:2612
0x16c77e7 try_vectorize_loop_1
../../gcc-git/gcc/tree-vectorizer.c:955
0x16c7f6f try_vectorize_loop
../../gcc-git/gcc/tree-vectorizer.c:1110
0x16c811f vectorize_loops()
../../gcc-git/gcc/tree-vectorizer.c:1189
0x151e6df execute
../../gcc-git/gcc/tree-ssa-loop.c:414

In vect_enhance_data_refs_alignment, when we call vect_get_num_vectors, we
have:
(gdb) p nscalars
$11 = {> = {coeffs = {2, 2}}, }

(gdb) p debug_tree(vectype)
 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set 1 canonical-type
0xb22305e8 precision:32 min  max

pointer_to_this >
VNx4SI
..

(gdb) p TYPE_VECTOR_SUBPARTS (vectype)
$13 = {> = {coeffs = {4, 4}}, }

nscalars is not a multiple of number of elements of vectype, which triggers the
ICE.

In the vect pass, vectorization factor computed by
vect_determine_vectorization_factor is [8,8]. But this is updated to [1, 1]
later by vect_update_vf_for_slp, as indicated in the phase dump:
7860 foo.c:9:3: note:   === vect_make_slp_decision ===
7861 foo.c:9:3: note:   Decided to SLP 2 instances. Unrolling factor [1,1]
7862 foo.c:9:3: note:   === vect_detect_hybrid_slp ===
7863 foo.c:9:3: note:   === vect_update_vf_for_slp ===
7864 foo.c:9:3: note:   Loop contains only SLP stmts
7865 foo.c:9:3: note:   Updating vectorization factor to [1,1].
7866 foo.c:9:3: note:  vectorization_factor = [1,1], niters = 16

This logic here was once changed by commit
d9f21f6acb3aa615834e855e16b6311cd18c5668:

 323if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
 324 {
 325 - if (STMT_SLP_TYPE (stmt_info))
 326 -   possible_npeel_number
 327 - = (vf * GROUP_SIZE (stmt_info)) / nelements;
 328 - else
 329 -   possible_npeel_number = vf / nelements;
 330 + poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
 331 + ? vf * GROUP_SIZE (stmt_info) :
vf);
 332 + possible_npeel_number
 333 +   = vect_get_num_vectors (nscalars, vectype);

Proposed fix:
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index eb8288e7a85..b30a7d8a3bb 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1823,8 +1823,11 @@ vect_enhance_data_refs_alignment (loop_vec_info
loop_vinfo)
{
  poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
  ? vf * DR_GROUP_SIZE (stmt_info) :
vf);
- possible_npeel_number
-   = vect_get_num_vectors (nscalars, vectype);
+ if (maybe_lt (nscalars, TYPE_VECTOR_SUBPARTS (vectype)))
+   possible_npeel_number = 0;
+ else
+   possible_npeel_number
+ = vect_get_num_vectors (nscalars, vectype);

  /* NPEEL_TMP is 0 when there is no misalignment, but also
 allow peeling NELEMENTS.  */

[Bug tree-optimization/95570] New: ICE: Segmentation fault in vect_loop_versioning

2020-06-08 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95570

Bug ID: 95570
   Summary: ICE: Segmentation fault in vect_loop_versioning
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: felix.yang at huawei dot com
  Target Milestone: ---
Target: aarch64

foo.c:
int x[8][32];

void
foo (int start)
{
  for (int i = start; i < start + 16; i++)
x[start][i] = i;
}

$ gcc -S -O2 -ftree-loop-vectorize -march=armv8.2-a+sve -msve-vector-bits=256
-fno-vect-cost-model -fwrapv -mstrict-align foo.c
during GIMPLE pass: vect
foo.c: In function ‘foo’:
foo.c:4:1: internal compiler error: Segmentation fault
4 | foo (int start)
  | ^~~
0x12bb79b crash_signal
../../gcc-git/gcc/toplev.c:328
0x94cc4c contains_struct_check(tree_node const*, tree_node_structure_enum, char
const*, int, char const*)
../../gcc-git/gcc/tree.h:3665
0x9893db wi::extended_tree<576>::extended_tree(tree_node const*)
../../gcc-git/gcc/tree.h:5922
0x987e9b generic_wide_int >::generic_wide_int(tree_node const* const&)
../../gcc-git/gcc/wide-int.h:782
0x98796f wi::to_widest(tree_node const*)
../../gcc-git/gcc/tree.h:5849
0xa84e13 tree_int_cst_compare(tree_node const*, tree_node const*)
../../gcc-git/gcc/tree.h:6121
0x16911bb vect_create_cond_for_align_checks
../../gcc-git/gcc/tree-vect-loop-manip.c:3055
0x1691a1f vect_loop_versioning(_loop_vec_info*, gimple*)
../../gcc-git/gcc/tree-vect-loop-manip.c:3263
0x167ffbb vect_transform_loop(_loop_vec_info*, gimple*)
../../gcc-git/gcc/tree-vect-loop.c:8691
0x16ac56b try_vectorize_loop_1
../../gcc-git/gcc/tree-vectorizer.c:989
0x16ac7e7 try_vectorize_loop
../../gcc-git/gcc/tree-vectorizer.c:1046
0x16ac997 vectorize_loops()
../../gcc-git/gcc/tree-vectorizer.c:1125
0x1507e2b execute
../../gcc-git/gcc/tree-ssa-loop.c:414

Here, we are doing loop versionging for alignment.
The only dr here is a gather-statter operation: x[start][i].
Scalar evolution analysis for this dr failed, so DR_STEP is NULL_TREE, which
leads to the segfault.
But scatter-gather operation should be filtered out in
vect_enhance_data_refs_alignment.
Like:
@@ -2206,6 +2228,12 @@ vect_enhance_data_refs_alignment (loop_vec_info
loop_vinfo)
  && DR_GROUP_FIRST_ELEMENT (stmt_info) != stmt_info))
continue;

+ /* For scatter-gather or invariant accesses there is nothing
+to enhance.  */
+ if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)
+   || integer_zerop (DR_STEP (dr)))
+   continue;
+
  if (STMT_VINFO_STRIDED_P (stmt_info))
{
  /* Strided loads perform only component accesses, alignment is

I also witnessed similar issues in vect_verify_datarefs_alignment,
vect_get_peeling_costs_all_drs and vect_peeling_supportable. Since the code is
similar, maybe we should propose a new funtion for that.  Suggestions?

[Bug target/95459] New: aarch64: ICE in in aarch64_short_vector_p, at config/aarch64/aarch64.c:16803

2020-06-01 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95459

Bug ID: 95459
   Summary: aarch64: ICE in in aarch64_short_vector_p, at
config/aarch64/aarch64.c:16803
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: felix.yang at huawei dot com
  Target Milestone: ---
Target: aarch64

Another sve-related ICE issue triggered under option -mgeneral-regs-only. 

Reduced test case:

#include 

svint8x2_t
callee_s8 (svint8_t x0, svint8_t x1)
{
  return svcreate2 (x0, x1);
}

$aarch64-linux-gnu-gcc -O2 -S -mgeneral-regs-only -march=armv8.2-a+sve bar.c

bar.c: In function ‘callee_s8’:
bar.c:6:10: error: ACLE function ‘svcreate2_s8’ is incompatible with the use of
‘-mgeneral-regs-only’
6 |   return svcreate2 (x0, x1);
  |  ^
bar.c:4:1: internal compiler error: in aarch64_short_vector_p, at
config/aarch64/aarch64.c:16803
4 | callee_s8 (svint8_t x0, svint8_t x1)
  | ^
0x17d5887 aarch64_short_vector_p
../../gcc-git/gcc/config/aarch64/aarch64.c:16803
0x17d5993 aarch64_composite_type_p
../../gcc-git/gcc/config/aarch64/aarch64.c:16838
0x17d5aab aarch64_vfp_is_call_or_return_candidate
../../gcc-git/gcc/config/aarch64/aarch64.c:16877
0x17b4a07 aarch64_init_cumulative_args(CUMULATIVE_ARGS*, tree_node const*,
rtx_def*, tree_node const*, unsigned int, bool)
../../gcc-git/gcc/config/aarch64/aarch64.c:5988
0xdbf60f assign_parms_initialize_all
../../gcc-git/gcc/function.c:2298 0xdc5b8b
gimplify_parameters(gimple**)
../../gcc-git/gcc/function.c:3863 0xe86a8b gimplify_body(tree_node*,
bool)
../../gcc-git/gcc/gimplify.c:14776
0xe872cf gimplify_function_tree(tree_node*)
../../gcc-git/gcc/gimplify.c:14934
0xbe4a4b cgraph_node::analyze()
../../gcc-git/gcc/cgraphunit.c:671
0xbe70ef analyze_functions
../../gcc-git/gcc/cgraphunit.c:1231
0xbecba3 symbol_table::finalize_compilation_unit()
../../gcc-git/gcc/cgraphunit.c:2975

Here, input param 'type' for aarch64_short_vector_p() looks like:

 
unit-size 
align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0xb22300a8 precision:128 min  max
>
constant
elt0:   elt1:  >
...

aarch64_short_vector_p() calls aarch64_sve_mode_p() and aarch64_sve_mode_p()
depends on TARGET_SVE which is false under option -mgeneral-regs-only. As a
result, aarch64_sve_mode_p() returns false and this triggers the ICE.  I think
we are simply checking whether a type (and a mode) is a 64/128-bit short vector
or not, TARGET_SVE should not make a difference here.

Proposed patch is trivial:
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index
7feff77adf6..4f00a8c2063 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16800,7 +16800,7 @@ aarch64_short_vector_p (const_tree type,
 {
   /* Rely only on the type, not the mode, when processing SVE types.  */
   if (type && aarch64_some_values_include_pst_objects_p (type))
-   gcc_assert (aarch64_sve_mode_p (mode));
+   gcc_assert (TARGET_SVE ? aarch64_sve_mode_p (mode) : true);
   else
size = GET_MODE_SIZE (mode);
 }

With this fix, we have:
$aarch64-linux-gnu-gcc -O2 -S -mgeneral-regs-only -march=armv8.2-a+sve bar.c

bar.c: In function ‘callee_s8’:
bar.c:6:10: error: ACLE function ‘svcreate2_s8’ is incompatible with the use of
‘-mgeneral-regs-only’
6 |   return svcreate2 (x0, x1);
  |  ^
bar.c:4:1: fatal error: ‘callee_s8’ requires the SVE ISA extension
4 | callee_s8 (svint8_t x0, svint8_t x1)
  | ^
compilation terminated.

[Bug target/95254] New: aarch64: gcc generate inefficient code with fixed sve vector length

2020-05-21 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95254

Bug ID: 95254
   Summary: aarch64: gcc generate inefficient code with fixed sve
vector length
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: felix.yang at huawei dot com
  Target Milestone: ---
Target: aarch64

Test case:

typedef short __attribute__((vector_size (8))) v4hi;

typedef union U4HI { v4hi v; short a[4]; } u4hi;

short b[4];

void pass_v4hi (v4hi v)
{
int i;
u4hi u;
u.v = v;
for (i = 0; i < 4; i++)
  b[i] = u.a[i];
};

$ gcc -O2 -ftree-slp-vectorize -S -march=armv8.2-a+sve foo.c
assembly code:
pass_v4hi:
.LFB0:
.cfi_startproc
adrpx0, .LANCHOR0
str d0, [x0, #:lo12:.LANCHOR0]
ret
.cfi_endproc

$ gcc -O2 -ftree-slp-vectorize -S -march=armv8.2-a+sve -msve-vector-bits=256
foo.c
assembly code:
pass_v4hi:
.LFB0:
.cfi_startproc
sub sp, sp, #16
.cfi_def_cfa_offset 16
ptrue   p0.b, vl32
adrpx0, .LANCHOR0
add x0, x0, :lo12:.LANCHOR0
str d0, [sp, 8]
ld1hz0.d, p0/z, [sp, #1, mul vl]
st1hz0.d, p0, [x0]
add sp, sp, 16
.cfi_def_cfa_offset 0
ret
.cfi_endproc


The root cause here is that we choose a different mode in
aarch64_vectorize_related_mode[1]: VNx2HImode instead of V4HImode.
Then in the final tree ssa forwprop pass, we need to do a VIEW_CONVERT from
V4HImode to VNx2HImode.
One way to fix this is to catch and simplify the pattern in
aarch64_expand_sve_mem_move, emitting a mov pattern of V4HImode instead.
I am assuming endianness does not make a difference here. Will propose a patch
for comments.


[1] call trace:
(gdb) bt
#0  aarch64_vectorize_related_mode (vector_mode=E_VNx8HImode, element_mode=...,
nunits=...) at ../../gcc-git/gcc/config/aarch64/aarch64.c:2377
#1  0x012983b4 in related_vector_mode (vector_mode=E_VNx8HImode,
element_mode=..., nunits=...) at ../../gcc-git/gcc/stor-layout.c:535
#2  0x01652918 in get_related_vectype_for_scalar_type
(prevailing_mode=E_VNx8HImode, scalar_type=0xb22da498, nunits=...)
at ../../gcc-git/gcc/tree-vect-stmts.c:11463
#3  0x01653304 in get_vectype_for_scalar_type (vinfo=0x2f0dc80,
scalar_type=0xb22da498, group_size=4)
at ../../gcc-git/gcc/tree-vect-stmts.c:11545
#4  0x016533a0 in get_vectype_for_scalar_type (vinfo=0x2f0dc80,
scalar_type=0xb22da498, node=0x2e5d460)
at ../../gcc-git/gcc/tree-vect-stmts.c:11569
#5  0x016987e8 in vect_get_constant_vectors (vinfo=0x2f0dc80,
slp_node=0x2e53080, op_num=0, vec_oprnds=0xc738)
at ../../gcc-git/gcc/tree-vect-slp.c:3562
#6  0x016993f8 in vect_get_slp_defs (vinfo=0x2f0dc80,
slp_node=0x2e53080, vec_oprnds=0xc7a8, n=1) at
../../gcc-git/gcc/tree-vect-slp.c:3786
#7  0x01631c70 in vect_get_vec_defs (vinfo=0x2f0dc80,
op0=0xb20e3120, op1=0x0, stmt_info=0x2feef60, vec_oprnds0=0xcdd0,
vec_oprnds1=0x0,
slp_node=0x2e53080) at ../../gcc-git/gcc/tree-vect-stmts.c:1726
#8  0x01648bc8 in vectorizable_store (vinfo=0x2f0dc80,
stmt_info=0x2feef60, gsi=0xdad0, vec_stmt=0xd5b0,
slp_node=0x2e53080,
cost_vec=0x0) at ../../gcc-git/gcc/tree-vect-stmts.c:8186
#9  0x01651808 in vect_transform_stmt (vinfo=0x2f0dc80,
stmt_info=0x2feef60, gsi=0xdad0, slp_node=0x2e53080,
slp_node_instance=0x2fefe70)
at ../../gcc-git/gcc/tree-vect-stmts.c:11184
#10 0x0169a4a0 in vect_schedule_slp_instance (vinfo=0x2f0dc80,
node=0x2e53080, instance=0x2fefe70) at ../../gcc-git/gcc/tree-vect-slp.c:4134
#11 0x0169aaac in vect_schedule_slp (vinfo=0x2f0dc80) at
../../gcc-git/gcc/tree-vect-slp.c:4258
#12 0x016972f0 in vect_slp_bb_region (region_begin=..., region_end=...,
datarefs=..., n_stmts=10) at ../../gcc-git/gcc/tree-vect-slp.c:3227
#13 0x01697c60 in vect_slp_bb (bb=0xb22ce340) at
../../gcc-git/gcc/tree-vect-slp.c:3350
#14 0x016a56f0 in (anonymous namespace)::pass_slp_vectorize::execute
(this=0x2e6aae0, fun=0xb2116000) at
../../gcc-git/gcc/tree-vectorizer.c:1320

[Bug target/94991] ICE: Segmentation fault with option -mgeneral-regs-only

2020-05-07 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94991

--- Comment #1 from Fei Yang  ---
For the given testcase, we are doing FAIL for scalar floating move expand
pattern since TARGET_FLOAT is false with option -mgeneral-regs-only. But move
expand pattern cannot fail. It would be better to to replace the FAIL with code
that bitcasts to the equivalent integer mode, using gen_lowpart.  Will propose
a patch for this.

[Bug target/94991] New: ICE: Segmentation fault with option -mgeneral-regs-only

2020-05-07 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94991

Bug ID: 94991
   Summary: ICE: Segmentation fault with option
-mgeneral-regs-only
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: felix.yang at huawei dot com
  Target Milestone: ---
Target: aarch64

Yet another ICE with -mgeneral-regs-only:

foo.c
struct S { float d; };

void bar (struct S);

void
f0 (int x)
{
  struct S s = {.d = 0.0f };
  ((char *) )[0] = x;
  s.d *= 7.0;
  bar (s);
}

$aarch64-linux-gnu-gcc -S -O2 -mgeneral-regs-only foo.c

pr9.c: In function ‘f0’:
pr9.c:8:12: error: ‘-mgeneral-regs-only’ is incompatible with the use of
floating-point types
8 |   struct S s = {.d = 0.0f };
  |^
pr9.c:10:7: error: ‘-mgeneral-regs-only’ is incompatible with the use of
floating-point types
   10 |   s.d *= 7.0;
  |   ^~
pr9.c:10:7: error: ‘-mgeneral-regs-only’ is incompatible with the use of
floating-point types
pr9.c:10:7: error: ‘-mgeneral-regs-only’ is incompatible with the use of
floating-point types during RTL pass: expand
pr9.c:10:7: internal compiler error: Segmentation fault 0x12a7a0b
crash_signal
../../gcc-git/gcc/toplev.c:328
0xb05a70 single_set(rtx_insn const*)
../../gcc-git/gcc/rtl.h:3437
0xd1dd2f emit_move_insn(rtx_def*, rtx_def*)
../../gcc-git/gcc/expr.c:3858
0xb5c837 emit_library_call_value_1(int, rtx_def*, rtx_def*, libcall_type,
machine_mode, int, std::pair*)
../../gcc-git/gcc/calls.c:5597
0xb42dab emit_library_call_value(rtx_def*, rtx_def*, libcall_type,
machine_mode, rtx_def*, machine_mode, rtx_def*, machine_mode)
../../gcc-git/gcc/rtl.h:4257
0x10f0ff3 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*, rtx_def*,
int, optab_methods)
../../gcc-git/gcc/optabs.c:1831
0xd05133 expand_mult(machine_mode, rtx_def*, rtx_def*, rtx_def*, int, bool)
../../gcc-git/gcc/expmed.c:3568
0xd3291f expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
../../gcc-git/gcc/expr.c:9046
0xd36c0f expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc-git/gcc/expr.c:10054
0xd2f2e3 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
../../gcc-git/gcc/expr.c:8358
0xd10943 expand_normal
../../gcc-git/gcc/expr.h:288
0xd29ecb store_field
../../gcc-git/gcc/expr.c:7102
0xd2256b expand_assignment(tree_node*, tree_node*, bool)
../../gcc-git/gcc/expr.c:5374
0xb82f1b expand_gimple_stmt_1
../../gcc-git/gcc/cfgexpand.c:3749
0xb83387 expand_gimple_stmt
../../gcc-git/gcc/cfgexpand.c:3847
0xb8ba73 expand_gimple_basic_block
../../gcc-git/gcc/cfgexpand.c:5887
0xb8d8d3 execute
../../gcc-git/gcc/cfgexpand.c:6542

[Bug tree-optimization/94784] ICE: in simplify_vector_constructor, at tree-ssa-forwprop.c:2482

2020-04-27 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94784

--- Comment #2 from Fei Yang  ---
Will propose a patch for review.

[Bug tree-optimization/94784] ICE: in simplify_vector_constructor, at tree-ssa-forwprop.c:2482

2020-04-27 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94784

--- Comment #1 from Fei Yang  ---
I did some check and it looks like everything works fine before the ICE.

The reason for the assert is that applying VIEW_CONVERT_EXPR to two general
vectors is dangerous in this context.  If through some bug we ended up with one
vector being V4HI and the other being V2SI (say), the assert stops us from
silently miscompiling the code.

In the testcase we have two vectors with the same ABI identity but with
different TYPE_MODEs. As suggested by Richard Sandiford, it would be better to
flip the assert around so that it checks that the two vectors have equal
TYPE_VECTOR_SUBPARTS and that converting the corresponding element types is a
useless_type_conversion_p.

[Bug tree-optimization/94784] New: ICE: in simplify_vector_constructor, at tree-ssa-forwprop.c:2482

2020-04-27 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94784

Bug ID: 94784
   Summary: ICE: in simplify_vector_constructor, at
tree-ssa-forwprop.c:2482
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: felix.yang at huawei dot com
  Target Milestone: ---
Target: aarch64

I see one gcc_assert was introduce in: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544271.html

It looks like this is causing an ICE when compiling the foo.c test.
Gimple input to forwprop4 pass looks like:
pass_v4hi (v4hi v)
{
  vector(4) short int * vectp.4;
  vector(4) short int * vectp_a.3;
  union u4hi u;
  int j;
  short int _2;
  short int _3;
  vector(4) short int _6;
  short int _25;
  short int _26;

   [local count: 214748368]:
  _3 = BIT_FIELD_REF ;
  _2 = BIT_FIELD_REF ;
  _26 = BIT_FIELD_REF ;
  _25 = BIT_FIELD_REF ;
  _6 = {_3, _2, _26, _25}; 
 <
  MEM  [(short int *)] = _6;  <
  u ={v} {CLOBBER};
  return;
}

Here at the crash site, we have two vector types with different modes:

(gdb) p debug_tree (src_type)
 
unit-size 
align:16 warn_if_not_align:0 symtab:0 alias-set 2 canonical-type
0xb22ea498 precision:16 min  max

pointer_to_this >
sizes-gimplified V4HI
size  constant 64>
unit-size  constant 8>
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0xb23b1690 nunits:4 context >
$5 = void

(gdb) p debug_tree (type)
 
unit-size 
align:16 warn_if_not_align:0 symtab:0 alias-set 2 canonical-type
0xb22ea498 precision:16 min  max

pointer_to_this >
VNx2HI
size  constant 64>
unit-size  constant 8>
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0xb23b1690 nunits:4
pointer_to_this >
$6 = void

foo.c
typedef short __attribute__((vector_size (8))) v4hi; typedef union U4HI { v4hi
v; short a[4]; } u4hi;

short a[4];

void pass_v4hi (v4hi v) {
int j;
u4hi u;
u.v = v;
for (j = 0; j < 4; j++)
  a[j] = u.a[j];
};

$ aarch64-linux-gnu-gcc -S -O2 -ftree-slp-vectorize -march=armv8.2-a+sve
-msve-vector-bits=256 foo.c during GIMPLE pass: forwprop dump file:
foo.c.190t.forwprop4
foo.c: In function ‘pass_v4hi’:
foo.c:7:6: internal compiler error: in simplify_vector_constructor, at
tree-ssa-forwprop.c:2482
7 | void pass_v4hi (v4hi v) {
  |  ^
0x147dbf7 simplify_vector_constructor
../../gcc-git/gcc/tree-ssa-forwprop.c:2482
0x1480a2b execute
../../gcc-git/gcc/tree-ssa-forwprop.c:3151

[Bug target/94678] aarch64: unexpected result with -mgeneral-regs-only and sve

2020-04-21 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94678

--- Comment #1 from Fei Yang  ---
crash log for test2.c:

during RTL pass: expand
foo.c: In function 'f2':
foo.c:14:10: internal compiler error: in emit_move_insn, at expr.c:3815
   14 |   return svadd_m (*x, *y, 1);
  |  ^~~
0xd21797 emit_move_insn(rtx_def*, rtx_def*)
../../gcc-git/gcc/expr.c:3814
0xcf52cb copy_to_mode_reg(machine_mode, rtx_def*)
../../gcc-git/gcc/explow.c:634
0x11021f7 maybe_legitimize_operand
../../gcc-git/gcc/optabs.c:7283
0x1102caf maybe_legitimize_operands(insn_code, unsigned int, unsigned int,
expand_operand*)
../../gcc-git/gcc/optabs.c:7415
0x1102d9b maybe_gen_insn(insn_code, unsigned int, expand_operand*)
../../gcc-git/gcc/optabs.c:7434
0x1103223 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
../../gcc-git/gcc/optabs.c:7477
0x11032c3 expand_insn(insn_code, unsigned int, expand_operand*)
../../gcc-git/gcc/optabs.c:7508
0x10edec3 expand_vector_broadcast(machine_mode, rtx_def*)
../../gcc-git/gcc/optabs.c:419
0x1859ec3 aarch64_sve::function_expander::add_input_operand(insn_code,
rtx_def*)
../../gcc-git/gcc/config/aarch64/aarch64-sve-builtins.cc:2790
0x185b1b3 aarch64_sve::function_expander::use_cond_insn(insn_code, unsigned
int)
../../gcc-git/gcc/config/aarch64/aarch64-sve-builtins.cc:3078
0x185b753 aarch64_sve::function_expander::map_to_rtx_codes(rtx_code, rtx_code,
int, unsigned int)
../../gcc-git/gcc/config/aarch64/aarch64-sve-builtins.cc:3224
0x18768d7
aarch64_sve::rtx_code_function::expand(aarch64_sve::function_expander&) const
../../gcc-git/gcc/config/aarch64/aarch64-sve-builtins-functions.h:211
0x185b9fb aarch64_sve::function_expander::expand()
../../gcc-git/gcc/config/aarch64/aarch64-sve-builtins.cc:3281
0x185dcab aarch64_sve::expand_builtin(unsigned int, tree_node*, rtx_def*)
../../gcc-git/gcc/config/aarch64/aarch64-sve-builtins.cc:3569
0x17c0ceb aarch64_expand_builtin
../../gcc-git/gcc/config/aarch64/aarch64.c:13147
0xb3782f expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
../../gcc-git/gcc/builtins.c:7736
0xd4039b expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc-git/gcc/expr.c:11132
0xd3338f expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
../../gcc-git/gcc/expr.c:8359
0xd28337 store_expr(tree_node*, rtx_def*, int, bool, bool)
../../gcc-git/gcc/expr.c:5755
0xd26d27 expand_assignment(tree_node*, tree_node*, bool)
../../gcc-git/gcc/expr.c:5514
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/94678] New: aarch64: unexpected result with -mgeneral-regs-only and sve

2020-04-21 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94678

Bug ID: 94678
   Summary: aarch64: unexpected result with -mgeneral-regs-only
and sve
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: felix.yang at huawei dot com
CC: richard.sandiford at arm dot com
  Target Milestone: ---
Target: aarch64

It looks like there are several issues out there for aarch64 sve codegen with
-mgeneral-regs-only.

>> test1.c:
#pragma GCC aarch64 "arm_sve.h" 

svbool_t
f1()
{
  return svptrue_b8 ();
}

$aarch64-linux-gnu-gcc -S -march=armv8.2-a+sve -mgeneral-regs-only test1.c 

Assembly output:
f1:
.LFB0:
.cfi_startproc
ptrue   p0.b, all   < predicate register is used here even with
-mgeneral-regs-only 
ret
.cfi_endproc
.LFE0:

>> test2.c:
#pragma GCC aarch64 "arm_sve.h"

svint8_t
f2 (svbool_t *x, svint8_t *y)
{
  return svadd_m (*x, *y, 1);
}

$aarch64-linux-gnu-gcc -S -march=armv8.2-a+sve -mgeneral-regs-only test2.c 

This will trigger an ICE.

We do ISA extension checks for SVE in
check_required_extensions(aarch64-sve-builtins.cc), I think we may also need to
check -mgeneral-regs-only there and issue an error message when this option is
specified.  This would be cheap as compared with adding &&
TARGET_GENERAL_REGS_ONLY to TARGET_SVE and similar macros.  I have created a
patch for that.

[Bug tree-optimization/94269] New: widening_mul should consider block frequency

2020-03-23 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94269

Bug ID: 94269
   Summary: widening_mul should consider block frequency
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: felix.yang at huawei dot com
  Target Milestone: ---

Test case:

float
calc(long n, float *x, int inc_x,
 float *y, int inc_y)
{
  float dot = 0.0;
  int ix = 0, iy = 0;

  if (n < 0) {
return dot;
  }

  int i = 0;
  while (i < n) {
dot += y[iy] * x[ix];
ix  += inc_x;
iy  += inc_y;
i++;
  }

  return dot;
}

Command line: aarch64-linux-gnu-gcc -S -O2 -fopt-info -ftree-loop-vectorize
-funsafe-math-optimizations -march=armv8.2-a+sve -msve-vector-bits=256 calc.c

calc:
.LFB0:
.cfi_startproc
cmp x0, 0
ble .L4
mov w7, w0
mov x5, x3
mov w6, 32
mov x3, x1
mov x1, 0
index   z4.s, #0, w4
index   z3.s, #0, w2
whilelo p0.s, wzr, w0
mov z0.s, #0
.p2align 3,,7
.L3:
ld1wz1.s, p0/z, [x5, z4.s, sxtw 2]
ld1wz2.s, p0/z, [x3, z3.s, sxtw 2]
add x1, x1, 8
fmlaz0.s, p0/m, z1.s, z2.s
smaddl  x5, w4, w6, x5   <==
whilelo p0.s, w1, w7
smaddl  x3, w2, w6, x3   <==
b.any   .L3
ptrue   p0.b, vl32
faddv   s0, p0, z0.s
ret

Command line: aarch64-linux-gnu-gcc -S -O2 -fopt-info -ftree-loop-vectorize
-funsafe-math-optimizations -march=armv8.2-a+sve -msve-vector-bits=256 calc.c
-fdisable-tree-widening_mul

calc:
.LFB0:
.cfi_startproc
cmp x0, 0
ble .L4
sbfiz   x8, x4, 5, 32
sbfiz   x7, x2, 5, 32
mov w6, w0
mov x5, x3
mov x3, x1
mov x1, 0
index   z4.s, #0, w4
index   z3.s, #0, w2
whilelo p0.s, wzr, w0
mov z0.s, #0
ptrue   p1.b, vl32
.p2align 3,,7
.L3:
ld1wz1.s, p0/z, [x5, z4.s, sxtw 2]
ld1wz2.s, p0/z, [x3, z3.s, sxtw 2]
add x1, x1, 8
fmulz1.s, z1.s, z2.s
add x5, x5, x8 <=
faddz0.s, p0/m, z0.s, z1.s
add x3, x3, x7 <=
whilelo p0.s, w1, w6
b.any   .L3
faddv   s0, p1, z0.s
ret

widening_mul phase moves the two multiply instructions from outside the loop to
inside the loop, merging with the two add instructions separately.  This
increases the cost of the loop.  

I think widening_mul should consider block frequency when doing such a
combination.
I mean something like:
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 54ba035..4439452 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2721,7 +2721,10 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi,
gimple *stmt,
 {
   if (!has_single_use (rhs1)
  || !is_widening_mult_p (rhs1_stmt, , _rhs1,
- , _rhs2))
+ , _rhs2)
+ || (gimple_bb (rhs1_stmt) != gimple_bb (stmt)
+ && gimple_bb (rhs1_stmt)->count.to_frequency(cfun)
+< gimple_bb (stmt)->count.to_frequency(cfun)))
return false;
   add_rhs = rhs2;
   conv_stmt = conv1_stmt;
@@ -2730,7 +2733,10 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi,
gimple *stmt,
 {
   if (!has_single_use (rhs2)
  || !is_widening_mult_p (rhs2_stmt, , _rhs1,
- , _rhs2))
+ , _rhs2)
+ || (gimple_bb (rhs2_stmt) != gimple_bb (stmt)
+ && gimple_bb (rhs2_stmt)->count.to_frequency(cfun)
+< gimple_bb (stmt)->count.to_frequency(cfun)))
return false;
   add_rhs = rhs1;
   conv_stmt = conv2_stmt;

[Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero

2020-03-15 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #4 from Fei Yang  ---
(In reply to Fei Yang from comment #0)
> Created attachment 47966 [details]
> proposed patch to fix this issue
> 
> Simple test case:
> int
> foo (int c, int d)
> {
>   int a = (c >> d) & 7;
> 
>   if (a >= 2) {
> return 1;
>   }
> 
>   return 0;
> }
> 
> Compile option: gcc -S -O2 test.c
> 
> 
> On aarch64, GCC trunk emits 4 instrunctions:
> asr w0, w0, 8
> tst w0, 6
> csetw0, ne
> ret
> 
> which can be further simplified into:
> tst x0, 1536
> csetw0, ne
> ret
> 
> We see the same issue on other targets such as i386 and x86-64.
> 
> Attached please find proposed patch for this issue.

The previously posted test case is not correct.
Test case should be:
int fifth (int c)
{
int a = (c >> 8) & 7;

if (a >= 2) {
return 1;
} else {
return 0;
}
}

[Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero

2020-03-12 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

--- Comment #2 from Fei Yang  ---
The test case is reduced from spec2017 benchmark.

int FastBoard::count_pliberties(const int i) {
return count_neighbours(EMPTY, i);
}

// count neighbours of color c at vertex v
int FastBoard::count_neighbours(const int c, const int v) {
assert(c == WHITE || c == BLACK || c == EMPTY);
return (m_neighbours[v] >> (NBR_SHIFT * c)) & 7;
}

bool FastBoard::self_atari(int color, int vertex) {
assert(get_square(vertex) == FastBoard::EMPTY);

// 1) count new liberties, if we add 2 or more we're safe
if (count_pliberties(vertex) >= 2) {
return false;
}

..

[Bug rtl-optimization/94026] New: combine missed opportunity to simplify comparisons with zero

2020-03-04 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026

Bug ID: 94026
   Summary: combine missed opportunity to simplify comparisons
with zero
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: felix.yang at huawei dot com
  Target Milestone: ---

Created attachment 47966
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47966=edit
proposed patch to fix this issue

Simple test case:
int
foo (int c, int d)
{
  int a = (c >> d) & 7;

  if (a >= 2) {
return 1;
  }

  return 0;
}

Compile option: gcc -S -O2 test.c


On aarch64, GCC trunk emits 4 instrunctions:
asr w0, w0, 8
tst w0, 6
csetw0, ne
ret

which can be further simplified into:
tst x0, 1536
csetw0, ne
ret

We see the same issue on other targets such as i386 and x86-64.

Attached please find proposed patch for this issue.

[Bug tree-optimization/66804] Alignment issue caused by auto vectorization

2015-07-08 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66804

--- Comment #4 from Fei Yang felix.yang at huawei dot com ---
(In reply to Markus Trippelsdorf from comment #2)
 You're invoking undefined behavior:

test.c:34:12: runtime error: store to
 misaligned address 0x00401c8c for type 'unsigned char *', which requires
 8 byte alignment
0x00401c8c: note: pointer points here
  00 00 00 00 00
 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00
 00 00 00
  ^ 

So your testcase is invalid.

Any reference? The testcase works fine without vectorization. thanks


[Bug tree-optimization/66804] Alignment issue caused by auto vectorization

2015-07-08 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66804

--- Comment #1 from Fei Yang felix.yang at huawei dot com ---
Also reproducible with GCC-5 and GCC-6.


[Bug tree-optimization/66804] New: Alignment issue caused by auto vectorization

2015-07-08 Thread felix.yang at huawei dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66804

Bug ID: 66804
   Summary: Alignment issue caused by auto vectorization
   Product: gcc
   Version: 4.9.4
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: felix.yang at huawei dot com
CC: rguenther at suse dot de
  Target Milestone: ---
  Host: x86_64-SUSE-LINUX
Target: x86_64
 Build: x86_64-SUSE-LINUX

testcase(test.c):

#pragma pack(1)
typedef struct
{
unsigned char x[344];
} TEST1;

typedef struct
{
unsigned int x[12];
unsigned char y[2396];
TEST1 u;
} TEST2;

typedef struct
{
TEST2 v;
unsigned char reserved[4];
} TEST3;
#pragma pack()

TEST3 xxx;

void foo ()
{
TEST3 *q = xxx;
unsigned char **p;
unsigned int i, len = 0;

p = (unsigned char **)(void *)q-v.u;
len = sizeof (TEST1);

for (i = 0; i  len; i += sizeof (unsigned char *))
  {
*p = q-v.y;
p++;
  }
}

int main ()
{
  foo ();
  return 0;
}

compile options: gcc -O2 test.c -ftree-loop-vectorize -fvect-cost-model
-fopt-info
test.c:32:5: note: loop vectorized
test.c:32:5: note: loop peeled for vectorization to enhance alignment
test.c:32:5: note: loop turned into non-loop; it never loops
test.c:23:6: note: loop turned into non-loop; it never loops
test.c:32:5: note: loop vectorized
test.c:32:5: note: loop peeled for vectorization to enhance alignment
test.c:32:5: note: loop turned into non-loop; it never loops
test.c:39:5: note: loop turned into non-loop; it never loops

The generated code will trigger a Segmentation Fault on x86_64-SUSE-Linux.