[Bug rtl-optimization/98212] [10/11 Regression] X86 unoptimal code for float equallity comparison followed by jump

2020-12-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98212

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:a5c05005499dd323296008fda4f414d8647adf0c

commit r11-5923-ga5c05005499dd323296008fda4f414d8647adf0c
Author: Jakub Jelinek 
Date:   Fri Dec 11 00:36:21 2020 +0100

dojump: Fix up probabilities splitting in dojump.c comparison splitting
[PR98212]

When compiling:
void foo (void);
void bar (float a, float b) { if (__builtin_expect (a != b, 1)) foo (); }
void baz (float a, float b) { if (__builtin_expect (a == b, 1)) foo (); }
void qux (float a, float b) { if (__builtin_expect (a != b, 0)) foo (); }
void corge (float a, float b) { if (__builtin_expect (a == b, 0)) foo (); }
on x86_64, we get (unimportant cruft removed):
bar:ucomiss %xmm1, %xmm0
jp  .L4
je  .L1
.L4:jmp foo
.L1:ret
baz:ucomiss %xmm1, %xmm0
jp  .L6
jne .L6
jmp foo
.L6:ret
qux:ucomiss %xmm1, %xmm0
jp  .L13
jne .L13
ret
.L13:   jmp foo
corge:  ucomiss %xmm1, %xmm0
jnp .L18
.L14:   ret
.L18:   jne .L14
jmp foo
(note for bar and qux that changed with a patch I've posted earlier today).
This is all reasonable, except the last function, the overall jump to
the tail call is predicted unlikely (10%), so it is good jmp foo isn't on
the straight line path, but NaNs are (or should be) considered very
unlikely
in the programs, so IMHO the right code (and one emitted with the following
patch) is:
corge:  ucomiss %xmm1, %xmm0
jp  .L14
je  .L18
.L14:   ret
.L18:   jmp foo

Let's discuss the probabilities in the above testcase:
for !and_them it looks all correct, so for
bar we split
if (a != b) goto t; // prob 90%
goto f;
into:
if (a unord b) goto t; // first_prob = prob * cprob = 90% * 1% = 0.9%
if (a ltgt b) goto t; // adjusted prob = (prob - first_prob) / (1 -
first_prob) = (90% - 0.9%) / (1 - 0.9%) = 89.909%
and for qux we split
if (a != b) goto t; // prob 10%
goto f;
into:
if (a unord b) goto t; // first_prob = prob * cprob = 10% * 1% = 0.1%
if (a ltgt b) goto t; // adjusted prob = (prob - first_prob) / (1 -
first_prob) = (10% - 0.1%) / (1 - 0.1%) = 9.910%
Now, the and_them cases should be probability wise exactly the same
if we swap the f and t labels, because baz
if (a == b) goto t; // prob 90%
goto f;
is equivalent to:
if (a != b) goto f; // prob 10%
goto t;
which is in qux.  This means we could expand baz as:
if (a unord b) goto f; // 0.1%
if (a ltgt b) goto f; // 9.910%
goto t;
But we don't expand it exactly that way, but instead (as the comment says)
as:
if (a ord b) ; else goto f; // first_prob as probability of ;
if (a uneq b) goto t; // adjusted prob
goto f;
So, first_prob.invert () should be 0.1% and adjusted prob should be
1 - 9.910%.
Thus, the right thing is 4 inverts:
prob = prob.invert (); // baz is equivalent to qux with swap(t, f) and thus
inverted original prob
first_prob = prob.split (cprob.invert ()).invert ();
// cprob.invert because by doing if (cond) ; else goto f; we effectively
invert the condition
// the second invert because first_prob is probability of ; rather than
goto f
prob = prob.invert (); // lastly because adjusted prob we want is
// probability of goto t;, while the one from corresponding !and_them case
// would be if (...) goto f; goto t;

2020-12-11  Jakub Jelinek  

PR rtl-optimization/98212
* dojump.c (do_compare_rtx_and_jump): Change computation of
first_prob for and_them.  Add comment explaining and_them case.

* gcc.dg/predict-8.c: Adjust expected probability.

[Bug rtl-optimization/98212] [10/11 Regression] X86 unoptimal code for float equallity comparison followed by jump

2020-12-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98212

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:680e4202f23ce74f3b26c7f090b9d22a56765554

commit r11-5899-g680e4202f23ce74f3b26c7f090b9d22a56765554
Author: Jakub Jelinek 
Date:   Thu Dec 10 12:03:30 2020 +0100

dojump: Improve float != comparisons on x86 [PR98212]

The x86 backend doesn't have EQ or NE floating point comparisons,
so splits x != y into x unord y || x <> y.  The problem with that is
that unord comparison doesn't trap on qNaN operands but LTGT does.
The end effect is that it doesn't trap on qNaN operands, because x unord y
will be true for those and so LTGT will not be performed, but as the
backend
is currently unable to merge signalling and non-signalling comparisons (and
after all, with this exact exception it shouldn't unless the first one is
signalling and the second one is non-signalling) it means we end up with:
ucomiss %xmm1, %xmm0
jp  .L4
comiss  %xmm1, %xmm0
jne .L4
ret
.p2align 4,,10
.p2align 3
.L4:
xorl%eax, %eax
jmp foo
where the comiss is the signalling comparison, but we already know that
the right flags bits are already computed by the ucomiss insn.

The following patch, if target supports UNEQ comparisons, splits NE
as x unord y || !(x uneq y) instead, which in the end means we end up with
just:
ucomiss %xmm1, %xmm0
jp  .L4
jne .L4
ret
.p2align 4,,10
.p2align 3
.L4:
jmp foo
because UNEQ is like UNORDERED non-signalling.

2020-12-10  Jakub Jelinek  

PR rtl-optimization/98212
* dojump.c (do_compare_rtx_and_jump): When splitting NE and backend
can do UNEQ, prefer splitting x != y into x unord y || !(x uneq y)
instead of into x unord y || x ltgt y.

* gcc.target/i386/pr98212.c: New test.

[Bug rtl-optimization/98212] [10/11 Regression] X86 unoptimal code for float equallity comparison followed by jump

2020-12-09 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98212

--- Comment #6 from Jakub Jelinek  ---
Or shall it instead do:
--- gcc/dojump.c.jj 2020-12-09 15:11:17.042888002 +0100
+++ gcc/dojump.c2020-12-09 20:34:13.124398356 +0100
@@ -1148,9 +1148,8 @@ do_compare_rtx_and_jump (rtx op0, rtx op
  if (and_them)
{
  rtx_code_label *dest_label;
- prob = prob.invert ();
- profile_probability first_prob = prob.split (cprob).invert
();
- prob = prob.invert ();
+ profile_probability first_prob
+   = prob.split (cprob.invert ()).invert ();
  /* If we only jump if true, just bypass the second jump.  */
  if (! if_false_label)
{
?  With the rationale that for and_them we basically invert the first condition
because we pass non-NULL false label and NULL true label, so if first_code is
ORDERED and cprob is thus 99% for and_them we will emit UNORDERED and want it
to be very unlikely, while if first_code is UNORDERED and cprob is thus 1% for
and_them we will emit ORDERED and want it to be very likely (though that case
doesn't happen, if first_code is ORDERED, then and_them is always true, if
first_code is UNORDERED, then and_them is always false, plus there is one case
where first_code is neither of them, then cprob is even.

[Bug rtl-optimization/98212] [10/11 Regression] X86 unoptimal code for float equallity comparison followed by jump

2020-12-09 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98212

Jakub Jelinek  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek  ---
So, for the corge case, do_compare_rtx_and_jump is called with EQ,
if_false_label non-NULL, if_true_label NULL (i.e. fall through) and prob of
10%.
The target doesn't support it and so x == y is being split into
x ord y && x u== y.
first_prob is computed as 10.9% and prob 91.74%, it is expanded as
if (x unord y) goto if_false_label; // prob of goto 89.1%
if (x u== y) goto dummy; // prob of goto 91.74%
goto if_false_label;
dummy:;

1133  profile_probability cprob
1134= profile_probability::guessed_always ();
1135  if (first_code == UNORDERED)
1136cprob = cprob.apply_scale (1, 100);
1137  else if (first_code == ORDERED)
1138cprob = cprob.apply_scale (99, 100);
1139  else
1140cprob = profile_probability::even ();
1141  /* We want to split:
1142 if (x) goto t; // prob;
1143 into
1144 if (a) goto t; // first_prob;
1145 if (b) goto t; // prob;
1146 such that the overall probability of jumping to t
1147 remains the same and first_prob is prob * cprob.  */
1148  if (and_them)
...
1151  prob = prob.invert ();
1152  profile_probability first_prob = prob.split
(cprob).invert ();
1153  prob = prob.invert ();

The comment only describes how !and_them looks like, for and_them it is
actually:
if (x) goto t; // prob;
goto f;
into:
if (a) goto f; // 1 - first_prob;
if (b) goto t; // prob;
goto f;
(at least for the case where we have both f and t).
prob.split is documented as:
 Split *THIS (ORIG) probability into 2 probabilities, such that
 the returned one (FIRST) is *THIS * CPROB and *THIS is
 adjusted (SECOND) so that FIRST + FIRST.invert () * SECOND
 == ORIG.  This is useful e.g. when splitting a conditional
 branch like:
 if (cond)
   goto lab; // ORIG probability
 into
 if (cond1)
   goto lab; // FIRST = ORIG * CPROB probability
 if (cond2)
   goto lab; // SECOND probability
So, if we talk about baz above with prob of jumping to t 90%, the computed
first_prob is 1-((1-90%)*99%), so 90.1%, which means if (op0 unord op1) goto f;
with 8.9% probability.
For corge prob of jumping to t is 10%, the computed first_prob is
1-((1-10%)*99%), so 10.9% which means
if (op0 unord op1) goto f; // with 89.1% probability.  NaNs are never that
common!
if (op0 uneq op1) goto t; // with 91.74% probability.
I think the right thing would be to for these < 50% initial prop and ORDERED
first_code would be to use something like:
profile_probability first_prob = prob.split (cprob).invert ();
without the two prob = prob.invert (); around it.

With:
--- gcc/dojump.c.jj 2020-12-09 15:11:17.042888002 +0100
+++ gcc/dojump.c2020-12-09 20:05:59.535234206 +0100
@@ -1148,9 +1148,15 @@ do_compare_rtx_and_jump (rtx op0, rtx op
  if (and_them)
{
  rtx_code_label *dest_label;
- prob = prob.invert ();
- profile_probability first_prob = prob.split (cprob).invert
();
- prob = prob.invert ();
+ profile_probability first_prob;
+ if (prob < profile_probability::even ())
+   first_prob = prob.split (cprob).invert ();
+ else
+   {
+ prob = prob.invert ();
+ first_prob = prob.split (cprob).invert ();
+ prob = prob.invert ();
+   }
  /* If we only jump if true, just bypass the second jump.  */
  if (! if_false_label)
{
I get the output I'm looking for, so bar/baz/qux stay the same and corge
becomes:
corge:  ucomiss %xmm1, %xmm0
jp  .L14
je  .L18
.L14:   ret
.L18:   jmp foo

Honza, what do you think?

[Bug rtl-optimization/98212] [10/11 Regression] X86 unoptimal code for float equallity comparison followed by jump

2020-12-09 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98212

--- Comment #4 from Jakub Jelinek  ---
As for f2, only the case of a == b when the jump is predicted unlikely looks
weird, e.g. take:
void foo (void);

void
bar (float a, float b)
{
  if (__builtin_expect (a != b, 1))
foo ();
}

void
baz (float a, float b)
{
  if (__builtin_expect (a == b, 1))
foo ();
}

void
qux (float a, float b)
{
  if (__builtin_expect (a != b, 0))
foo ();
}

void
corge (float a, float b)
{
  if (__builtin_expect (a == b, 0))
foo ();
}

bar:ucomiss %xmm1, %xmm0
jp  .L4
je  .L1
.L4:jmp foo
.L1:ret
baz:ucomiss %xmm1, %xmm0
jp  .L6
jne .L6
jmp foo
.L6:ret
qux:ucomiss %xmm1, %xmm0
jp  .L13
jne .L13
ret
.L13:   jmp foo
corge:  ucomiss %xmm1, %xmm0
jnp .L18
.L14:   ret
.L18:   jne .L14
jmp foo
I guess I need to debug the branch probabilities in that case, we should
certainly predict that operands are most of the time non-NaN and thus that PF
bit will not be set in 99% of cases or so.

[Bug rtl-optimization/98212] [10/11 Regression] X86 unoptimal code for float equallity comparison followed by jump

2020-12-09 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98212

Jakub Jelinek  changed:

   What|Removed |Added

   Last reconfirmed||2020-12-09
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Created attachment 49718
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49718&action=edit
gcc11-pr98212.patch

Untested attempt to fix f1.

[Bug rtl-optimization/98212] [10/11 Regression] X86 unoptimal code for float equallity comparison followed by jump

2020-12-09 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98212

--- Comment #2 from Uroš Bizjak  ---
f1 is currently unoptimal by design, the compiler is unable to merge trapping
and non-trapping instructions. There is already a PR for that.

f2 is not optimal. The conditional jump to the unconditional jump can be
merged, but the compiler currently does not perform that optimization.

[Bug rtl-optimization/98212] [10/11 Regression] X86 unoptimal code for float equallity comparison followed by jump

2020-12-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98212

Richard Biener  changed:

   What|Removed |Added

Summary|X86 unoptimal code for  |[10/11 Regression] X86
   |float equallity comparison  |unoptimal code for float
   |followed by jump|equallity comparison
   ||followed by jump
 Target||x86_64-*-* i?86-*-*
   Target Milestone|--- |10.3
   Host|x86 |