[Bug target/96814] [11 Regression] wrong code with -march=cascadelake since r11-1445-g502d63b6d6141597

2020-09-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96814

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Richard Biener  ---
Latent issue fixed as well.

[Bug target/96814] [11 Regression] wrong code with -march=cascadelake since r11-1445-g502d63b6d6141597

2020-09-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96814

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:d16b5975ca985cbe97698479fc38b6a636886978

commit r11-3460-gd16b5975ca985cbe97698479fc38b6a636886978
Author: Richard Biener 
Date:   Fri Sep 25 11:13:13 2020 +0200

middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

The RTL expansion code for CTORs doesn't handle VECTOR_BOOLEAN_TYPE_P
with bit-precision elements correctly as the testcase shows before
the PR97085 fix.  The following makes it do the correct thing
(not 100% sure for CTOR of sub-vectors due to the lack of a testcase).

The alternative would be to assert such CTORs do not happen (and also
add IL verification for this).

The GIMPLE FE needs a way to declare the VECTOR_BOOLEAN_TYPE_P vectors
(thus the C FE needs that).

2020-09-25  Richard Biener  

PR middle-end/96814
* expr.c (store_constructor): Handle VECTOR_BOOLEAN_TYPE_P
CTORs correctly.

* gcc.target/i386/pr96814.c: New testcase.

[Bug target/96814] [11 Regression] wrong code with -march=cascadelake since r11-1445-g502d63b6d6141597

2020-09-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96814

--- Comment #6 from Richard Biener  ---
OK, so the extra folding for PR97085 also fixed this testcase, now expanding
from

  _1 = { 0, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 };
  _2 = .VCOND_MASK (_1, { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0 });
  _3 = VIEW_CONVERT_EXPR(_2);

I still have a patch for the CTOR expansion which I'll test and propose.

[Bug target/96814] [11 Regression] wrong code with -march=cascadelake since r11-1445-g502d63b6d6141597

2020-09-23 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96814

--- Comment #5 from Richard Biener  ---
So the issue of the miscompile is that we do not expand the vector
constructor created by vector lowering correctly.  Of course we don't actually
_want_ CTORs of those ...

The following fixes the miscompile (but will still mishandle vector
CTORs concatenating multiple other vector)

diff --git a/gcc/expr.c b/gcc/expr.c
index 1a15f24b397..8bb3b4caf1c 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -6922,7 +6922,9 @@ store_constructor (tree exp, rtx target, int cleared,
poly
_int64 size,
insn_code icode = CODE_FOR_nothing;
tree elt;
tree elttype = TREE_TYPE (type);
-   int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
+   int elt_size
+ = (VECTOR_BOOLEAN_TYPE_P (type) ? TYPE_PRECISION (elttype)
+: tree_to_uhwi (TYPE_SIZE (elttype)));
machine_mode eltmode = TYPE_MODE (elttype);
HOST_WIDE_INT bitsize;
HOST_WIDE_INT bitpos;
@@ -7045,7 +7047,9 @@ store_constructor (tree exp, rtx target, int cleared,
poly_int64 size,
HOST_WIDE_INT eltpos;
tree value = ce->value;

-   bitsize = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (value)));
+   bitsize = (VECTOR_BOOLEAN_TYPE_P (type)
+  ? elt_size
+  : tree_to_uhwi (TYPE_SIZE (TREE_TYPE (value;
if (cleared && initializer_zerop (value))
  continue;


Another interesting case is IMHO

typedef unsigned char __attribute__ ((__vector_size__ (32))) V;
unsigned char c = 8;
int
main (void)
{
  V x = ((V){c} > 0) == 0;
  for (unsigned i = 0; i < sizeof (x); i++)
if (x[i] != (i ? 0xff : 0)) __builtin_abort();
  return 0;
}

which shows different behavior of veclower:

+  vector(32)  _20;
+  vector(32)  _21;

:
   c.0_1 = c;
   _11 = {c.0_1};
   _2 = _11 != { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
-  _3 = VEC_COND_EXPR <_2, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, { -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1 }>;
+  _20 = _11 != { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
+  _21 = ~_20;
+  _3 = _21;
   _4 = VEC_COND_EXPR <_3, { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1
}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0 }>;

it shows that while the initial FE IL has the nested VEC_COND (maybe avoid
that for boolean vectors?) we fold one case but not the other?

[Bug target/96814] [11 Regression] wrong code with -march=cascadelake since r11-1445-g502d63b6d6141597

2020-09-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96814

--- Comment #4 from Martin Liška  ---
> 
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 1f861630225..0cc80adf632 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -12581,7 +12581,9 @@ fold_ternary_loc (location_t loc, enum tree_code
> code, t
> ree type,
>   && tree_fits_uhwi_p (op2))
> {
>   tree eltype = TREE_TYPE (TREE_TYPE (arg0));
> - unsigned HOST_WIDE_INT width = tree_to_uhwi (TYPE_SIZE (eltype));
> + unsigned HOST_WIDE_INT width
> +   = (TREE_CODE (eltype) == BOOLEAN_TYPE
> +  ? TYPE_PRECISION (eltype) : tree_to_uhwi (TYPE_SIZE
> (eltype)));
>   unsigned HOST_WIDE_INT n = tree_to_uhwi (arg1);
>   unsigned HOST_WIDE_INT idx = tree_to_uhwi (op2);
>  
> diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
> index 6d5d65195ae..e9dbe07dccc 100644
> --- a/gcc/tree-vect-generic.c
> +++ b/gcc/tree-vect-generic.c
> @@ -137,7 +137,7 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
>  }
>if (bitpos)
>  {
> -  if (TREE_CODE (type) == BOOLEAN_TYPE)
> +  if (0 && TREE_CODE (type) == BOOLEAN_TYPE)
> {
>   tree itype
> = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0);
> 
> 
> makes the generated code a bit easier to follow but I guess still ends up
> miscompiling things?

Ok, so apparently you installed already the patch. And yes, we still miscompile
it :(

> 
> Note if I add -fdisable-tree-veclower ISEL ICEs.
> 
> So I'd see where this VEC_COND_EXPR comes from.

The vector is already build in C FE:

#0  build_opaque_vector_type (innertype=, nunits=...) at /home/marxin/Programming/gcc/gcc/tree.c:10964
#1  0x0082368e in build_binary_op (location=271264, code=GT_EXPR,
orig_op0=, orig_op1=, convert_p=true) at
/home/marxin/Programming/gcc/gcc/c/c-typeck.c:12164
#2  0x008028e3 in parser_build_binary_op (location=271264,
code=GT_EXPR, arg1=..., arg2=...) at
/home/marxin/Programming/gcc/gcc/c/c-typeck.c:3755
#3  0x0084dcfd in c_parser_binary_expression (parser=0x77fc4ab0,
after=, omp_atomic_lhs=) at
/home/marxin/Programming/gcc/gcc/c/c-parser.c:8087
#4  0x0084e9b6 in c_parser_conditional_expression
(parser=0x77fc4ab0, after=, omp_atomic_lhs=)
at /home/marxin/Programming/gcc/gcc/c/c-parser.c:7691
#5  0x0084efe1 in c_parser_expr_no_commas (parser=0x77fc4ab0,
after=, omp_atomic_lhs=) at
/home/marxin/Programming/gcc/gcc/c/c-parser.c:7606

for 

(gdb) p debug_tree(orig_op0)
 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x775ec348 precision:8 min  max >
unsigned V16QI
size 
unit-size 
align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7740ec78 nunits:16>
side-effects
arg:0 >
side-effects
arg:0 
used unsigned ignored read decl_5 V16QI
/home/marxin/Programming/testcases/pr96814.c:8:12 size  unit-size 
align:128 warn_if_not_align:0 context  initial >
/home/marxin/Programming/testcases/pr96814.c:8:12 start:
/home/marxin/Programming/testcases/pr96814.c:8:12 finish:
/home/marxin/Programming/testcases/pr96814.c:8:12>>
$10 = void
(gdb) p debug_tree(orig_op1)
  constant 0>

So the '(V){8} > 0' if I see correctly. I tried to manually change 'intt' in
parser_build_binary_op to boolean_type_node, but
it leads to:

/home/marxin/Programming/testcases/pr96814.c:8:21: error: invalid operands to
binary == (have ‘__vector(16) _Bool’ and ‘int’)
8 |x = ((V){8} > 0) == 0;
  | ^~
  ||
  |__vector(16) _Bool

:/
Any hint how to resolve it?

[Bug target/96814] [11 Regression] wrong code with -march=cascadelake since r11-1445-g502d63b6d6141597

2020-09-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96814

--- Comment #3 from Richard Biener  ---
It's indeed odd that we have a VEC_COND_EXPR creating a boolean vector.  With
GCC 10 veclower saw

  _1 = { 0, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 };
  _2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0 }>;

and _2 is vector while _1 is vector now we have

  _1 = { -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _2 = VEC_COND_EXPR <_1, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, { -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1 }>;
  _3 = VEC_COND_EXPR <_2, { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0 }>;

where _2 is vector, appearantly inverting _1.  This is either
new or missed folding I guess (I see this in .original already).

Now, veclower better shouldn't touch this - definitely a vector
ctor from SSA names isn't something we want...   Though in principle
we should generate correct code here or give up.

So I'd say we hit a latent issue here?

I see we're doing a bad job in constant folding as well, mostly because
our representation of vector is special:

 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x766472a0 precision:1 min  max >
SI
size 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x76647348 nunits:32>
constant npatterns:1 nelts-per-pattern:1
elt:0:  
constant 0>>

see how we're having a vector type of a QImode boolean with size 8.
The folding code assumes it can work with TYPE_SIZE of the component:

case BIT_FIELD_REF:
  if (TREE_CODE (arg0) == VECTOR_CST
  && (type == TREE_TYPE (TREE_TYPE (arg0))
  || (VECTOR_TYPE_P (type)
  && TREE_TYPE (type) == TREE_TYPE (TREE_TYPE (arg0
  && tree_fits_uhwi_p (op1)
  && tree_fits_uhwi_p (op2))
{
  tree eltype = TREE_TYPE (TREE_TYPE (arg0));
  unsigned HOST_WIDE_INT width = tree_to_uhwi (TYPE_SIZE (eltype));
  unsigned HOST_WIDE_INT n = tree_to_uhwi (arg1);

but that's not true for these types.

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 1f861630225..0cc80adf632 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -12581,7 +12581,9 @@ fold_ternary_loc (location_t loc, enum tree_code code,
t
ree type,
  && tree_fits_uhwi_p (op2))
{
  tree eltype = TREE_TYPE (TREE_TYPE (arg0));
- unsigned HOST_WIDE_INT width = tree_to_uhwi (TYPE_SIZE (eltype));
+ unsigned HOST_WIDE_INT width
+   = (TREE_CODE (eltype) == BOOLEAN_TYPE
+  ? TYPE_PRECISION (eltype) : tree_to_uhwi (TYPE_SIZE (eltype)));
  unsigned HOST_WIDE_INT n = tree_to_uhwi (arg1);
  unsigned HOST_WIDE_INT idx = tree_to_uhwi (op2);

diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index 6d5d65195ae..e9dbe07dccc 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -137,7 +137,7 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
 }
   if (bitpos)
 {
-  if (TREE_CODE (type) == BOOLEAN_TYPE)
+  if (0 && TREE_CODE (type) == BOOLEAN_TYPE)
{
  tree itype
= build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0);


makes the generated code a bit easier to follow but I guess still ends up
miscompiling things?

Note if I add -fdisable-tree-veclower ISEL ICEs.

So I'd see where this VEC_COND_EXPR comes from.

[Bug target/96814] [11 Regression] wrong code with -march=cascadelake since r11-1445-g502d63b6d6141597

2020-09-02 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96814

Martin Liška  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #2 from Martin Liška  ---
So before the revision we generated:

  _1 = { 0, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 };
  _2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }>;
  _3 = VIEW_CONVERT_EXPR(_2);
  x = _3;

but now we have:

   :
  _1 = { -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _24 = VIEW_CONVERT_EXPR(_1);
  _25 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _26 = BIT_FIELD_REF <_25, 1, 0>;
  _27 = _26;
  _28 = { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 };
  _29 = BIT_FIELD_REF <_28, 1, 0>;
  _30 = _29;
  _31 = _24 & 1;
  _32 = _31 != 0 ? _27 : _30;
...
  _151 = _24 & 32768;
  _152 = _151 != 0 ? _147 : _150;
  _2 = {_32, _40, _48, _56, _64, _72, _80, _88, _96, _104, _112, _120, _128,
_136, _144, _152};
  _3 = .VCOND_MASK (_2, { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 });
  _4 = VIEW_CONVERT_EXPR(_3);
  x = _4;
  i_15 = 0;
  goto ; [INV]

It's because veclower sees:

  _1 = { -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _2 = VEC_COND_EXPR <_1, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }>;
  _3 = VEC_COND_EXPR <_2, { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }>;
  _4 = VIEW_CONVERT_EXPR(_3);

and if I see correctly the creation of the mask _2 is broken in RTL (probably
one can't build one from a vector costructor)?
Anyway SSA_NAMEs like _128 don't have any use.

@Richi: Can you please help me with that? I'm looking into it for quite some
time :/

[Bug target/96814] [11 Regression] wrong code with -march=cascadelake since r11-1445-g502d63b6d6141597

2020-08-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96814

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug target/96814] [11 Regression] wrong code with -march=cascadelake since r11-1445-g502d63b6d6141597

2020-08-27 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96814

Martin Liška  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |marxin at gcc dot 
gnu.org
Summary|[11 Regression] wrong code  |[11 Regression] wrong code
   |with -march=cascadelake |with -march=cascadelake
   ||since
   ||r11-1445-g502d63b6d6141597
   Last reconfirmed||2020-08-27
 CC||marxin at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Martin Liška  ---
Started with my r11-1445-g502d63b6d6141597.
Thank you for the report.