[Bug rtl-optimization/106590] [12/13 Regression] x86-64 miscompilation starting with r12-8233-g1ceddd7497e15d w/ mtune=skylake

2022-08-30 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106590

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #12 from Jakub Jelinek  ---
Fixed for 12.3+ too.

[Bug rtl-optimization/106590] [12/13 Regression] x86-64 miscompilation starting with r12-8233-g1ceddd7497e15d w/ mtune=skylake

2022-08-29 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106590

--- Comment #11 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:030063c43f30a2335d3c03182df0beb82d003816

commit r12-8720-g030063c43f30a2335d3c03182df0beb82d003816
Author: Jakub Jelinek 
Date:   Mon Aug 15 13:56:57 2022 +0200

ifcvt: Fix up noce_convert_multiple_sets [PR106590]

The following testcase is miscompiled on x86_64-linux.
The problem is in the noce_convert_multiple_sets optimization.
We essentially have:
if (g == 1)
  {
g = 1;
f = 23;
  }
else
  {
g = 2;
f = 20;
  }
and for each insn try to create a conditional move sequence.
There is code to detect overlap with the regs used in the condition
and the destinations, so we actually try to construct:
tmp_g = g == 1 ? 1 : 2;
f = g == 1 ? 23 : 20;
g = tmp_g;
which is fine.  But, we actually try to create two different
conditional move sequences in each case, seq1 with the whole
(eq (reg/v:HI 82 [ g ]) (const_int 1 [0x1]))
condition and seq2 with cc_cmp
(eq (reg:CCZ 17 flags) (const_int 0 [0]))
to rely on the earlier present comparison.  In each case, we
compare the rtx costs and choose the cheaper sequence (seq1 if both
have the same cost).
The problem is that with the skylake tuning,
tmp_g = g == 1 ? 1 : 2;
is actually expanded as
tmp_g = (g == 1) + 1;
in seq1 (which clobbers (reg 17 flags)) and as a cmov in seq2
(which doesn't).  The tuning says both have the same cost, so we
pick seq1.  Next we check sequences for
f = g == 1 ? 23 : 20; and here the seq2 cmov is cheaper, but it
uses (reg 17 flags) which has been clobbered earlier.

The following patch fixes that by detecting if we in the chosen
sequence clobber some register mentioned in cc_cmp or rev_cc_cmp,
and if yes, arranges for only seq1 (i.e. sequences that emit the
comparison itself) to be used after that.

2022-08-15  Jakub Jelinek  

PR rtl-optimization/106590
* ifcvt.cc (check_for_cc_cmp_clobbers): New function.
(noce_convert_multiple_sets_1): If SEQ sets or clobbers any regs
mentioned in cc_cmp or rev_cc_cmp, don't consider seq2 for any
further conditional moves.

* gcc.dg/torture/pr106590.c: New test.

(cherry picked from commit 3a74a7bf62f47ed0d19866576378724be932ee17)

[Bug rtl-optimization/106590] [12/13 Regression] x86-64 miscompilation starting with r12-8233-g1ceddd7497e15d w/ mtune=skylake

2022-08-19 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106590

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|12.2|12.3

--- Comment #10 from Richard Biener  ---
GCC 12.2 is being released, retargeting bugs to GCC 12.3.

[Bug rtl-optimization/106590] [12/13 Regression] x86-64 miscompilation starting with r12-8233-g1ceddd7497e15d w/ mtune=skylake

2022-08-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106590

--- Comment #9 from Jakub Jelinek  ---
Fixed on the trunk so far, 12.2 is frozen now and I think the fix needs some
time on the trunk before backporting, so probably only will be fixed for 12.3.

[Bug rtl-optimization/106590] [12/13 Regression] x86-64 miscompilation starting with r12-8233-g1ceddd7497e15d w/ mtune=skylake

2022-08-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106590

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:3a74a7bf62f47ed0d19866576378724be932ee17

commit r13-2042-g3a74a7bf62f47ed0d19866576378724be932ee17
Author: Jakub Jelinek 
Date:   Mon Aug 15 13:56:57 2022 +0200

ifcvt: Fix up noce_convert_multiple_sets [PR106590]

The following testcase is miscompiled on x86_64-linux.
The problem is in the noce_convert_multiple_sets optimization.
We essentially have:
if (g == 1)
  {
g = 1;
f = 23;
  }
else
  {
g = 2;
f = 20;
  }
and for each insn try to create a conditional move sequence.
There is code to detect overlap with the regs used in the condition
and the destinations, so we actually try to construct:
tmp_g = g == 1 ? 1 : 2;
f = g == 1 ? 23 : 20;
g = tmp_g;
which is fine.  But, we actually try to create two different
conditional move sequences in each case, seq1 with the whole
(eq (reg/v:HI 82 [ g ]) (const_int 1 [0x1]))
condition and seq2 with cc_cmp
(eq (reg:CCZ 17 flags) (const_int 0 [0]))
to rely on the earlier present comparison.  In each case, we
compare the rtx costs and choose the cheaper sequence (seq1 if both
have the same cost).
The problem is that with the skylake tuning,
tmp_g = g == 1 ? 1 : 2;
is actually expanded as
tmp_g = (g == 1) + 1;
in seq1 (which clobbers (reg 17 flags)) and as a cmov in seq2
(which doesn't).  The tuning says both have the same cost, so we
pick seq1.  Next we check sequences for
f = g == 1 ? 23 : 20; and here the seq2 cmov is cheaper, but it
uses (reg 17 flags) which has been clobbered earlier.

The following patch fixes that by detecting if we in the chosen
sequence clobber some register mentioned in cc_cmp or rev_cc_cmp,
and if yes, arranges for only seq1 (i.e. sequences that emit the
comparison itself) to be used after that.

2022-08-15  Jakub Jelinek  

PR rtl-optimization/106590
* ifcvt.cc (check_for_cc_cmp_clobbers): New function.
(noce_convert_multiple_sets_1): If SEQ sets or clobbers any regs
mentioned in cc_cmp or rev_cc_cmp, don't consider seq2 for any
further conditional moves.

* gcc.dg/torture/pr106590.c: New test.

[Bug rtl-optimization/106590] [12/13 Regression] x86-64 miscompilation starting with r12-8233-g1ceddd7497e15d w/ mtune=skylake

2022-08-12 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106590

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #7 from Jakub Jelinek  ---
Created attachment 53447
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53447&action=edit
gcc13-pr106590.patch

Untested fix.

[Bug rtl-optimization/106590] [12/13 Regression] x86-64 miscompilation starting with r12-8233-g1ceddd7497e15d w/ mtune=skylake

2022-08-12 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106590

--- Comment #6 from Jakub Jelinek  ---
Cleaned up testcase:

/* { dg-additional-options "-mtune=skylake" { target { i?86-*-* x86_64-*-* } }
} */

typedef struct A { short a; } A;
typedef A *B;
typedef struct C { int c, d; } C;
typedef C *D;

B
foo (void)
{
  static A r = { .a = 1 };
  return &r;
}

D
bar (void)
{
  static C r = { .c = 1, .d = 23 };
  return &r;
}

static inline int __attribute__((always_inline))
baz (short a)
{
  int e = 1, f;
  short g;
  D h;

  switch (a)
{
case 1:
  f = 23;
  g = 1;
  break;
case 2:
  f = 20;
  g = 2;
  break;
}

  h = bar ();

  if (h->d != f || h->c != g)
__builtin_abort ();
  return e;
}

int
qux (void)
{
  B i = foo ();
  int e = 1;

  switch (i->a)
{
case 1:
case 2:
  e = baz (i->a);
  break;
case 3:
  e = 0;
  break;
}

  return e;
}

int
main ()
{
  qux ();
  return 0;
}

I think this is a latent bug in noce_convert_multiple_sets_1.
For cond
(eq (reg/v:HI 82 [ g ])
(const_int 1 [0x1]))
and cc_cmp
(eq (reg:CCZ 17 flags)
(const_int 0 [0]))
when we try_emit_cmove_seq for
(insn 3 35 4 7 (set (reg/v:HI 82 [ g ])
(const_int 2 [0x2])) "pr106590.c":37:9 84 {*movhi_internal}
 (nil))
insn, we get seq1
(insn 65 0 66 (set (reg:CCZ 17 flags)
(compare:CCZ (reg/v:HI 82 [ g ])
(const_int 1 [0x1]))) -1
 (nil))

(insn 66 65 67 (set (reg:QI 96)
(ne:QI (reg:CCZ 17 flags)
(const_int 0 [0]))) -1
 (nil))

(insn 67 66 68 (set (reg:HI 95)
(zero_extend:HI (reg:QI 96))) -1
 (nil))

(insn 68 67 0 (parallel [
(set (reg:HI 95)
(plus:HI (reg:HI 95)
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
]) -1
 (nil))
and seq2
(insn 69 0 70 (set (reg:HI 97)
(const_int 2 [0x2])) -1
 (nil))

(insn 70 69 0 (set (reg:HI 95)
(if_then_else:HI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg/v:HI 82 [ g ])
(reg:HI 97))) -1
 (nil))
and later for
(insn 4 3 36 7 (set (reg/v:SI 84 [ f ])
(const_int 20 [0x14])) "pr106590.c":36:9 83 {*movsi_internal}
 (nil))
we get seq1
(insn 72 0 71 (set (reg:SI 98)
(const_int 20 [0x14])) -1
 (nil))

(insn 71 72 73 (set (reg:CCZ 17 flags)
(compare:CCZ (reg/v:HI 82 [ g ])
(const_int 1 [0x1]))) -1
 (nil))

(insn 73 71 0 (set (reg/v:SI 84 [ f ])
(if_then_else:SI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg/v:SI 84 [ f ])
(reg:SI 98))) -1
 (nil))
and seq2
(insn 74 0 75 (set (reg:SI 99)
(const_int 20 [0x14])) -1
 (nil))

(insn 75 74 0 (set (reg/v:SI 84 [ f ])
(if_then_else:SI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg/v:SI 84 [ f ])
(reg:SI 99))) -1
 (nil))
In the former case, cost1 == cost2, in the latter case cost1 > cost2.
So, we for the first insn choose seq1 and for the second seq2.
That is wrong, because the first seq1 clobbers the (reg 17 flags) register
which is used in cc_cmp/rev_cc_cmp.  So, if we pick that seq1, we may not use
any seq2 afterwards for the latter instructions, need to always select seq1
since then.

[Bug rtl-optimization/106590] [12/13 Regression] x86-64 miscompilation starting with r12-8233-g1ceddd7497e15d w/ mtune=skylake

2022-08-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106590

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
  Known to fail||12.1.0
 CC||jakub at gcc dot gnu.org

[Bug rtl-optimization/106590] [12/13 Regression] x86-64 miscompilation starting with r12-8233-g1ceddd7497e15d w/ mtune=skylake

2022-08-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106590

--- Comment #5 from Andrew Pinski  ---
Created attachment 53442
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53442&action=edit
testcase that should be ready for gcc.c-torture/execute

[Bug rtl-optimization/106590] [12/13 Regression] x86-64 miscompilation starting with r12-8233-g1ceddd7497e15d w/ mtune=skylake

2022-08-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106590

--- Comment #4 from Andrew Pinski  ---
So the IR is slightly different entering CE1. The two BB for the sides of the
if are swapped. But that is the only difference.

This is definitely a latent bug that got exposed, it is not the first latent
bug in CE1 recently either.

[Bug rtl-optimization/106590] [12/13 Regression] x86-64 miscompilation starting with r12-8233-g1ceddd7497e15d w/ mtune=skylake

2022-08-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106590

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2022-08-12
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
  Component|target  |rtl-optimization

--- Comment #3 from Andrew Pinski  ---
CE goes bad:
IF-CASE-2 found, start 6, else 8
verify found no changes in insn with uid = 35.
changing bb of uid 5
  from 8 to 6
deleting block 8
Conversion succeeded on pass 1.