Re: Fix profile update in tree-ssa-reassoc

2023-08-24 Thread Hans-Peter Nilsson via Gcc-patches
> Date: Wed, 23 Aug 2023 11:10:02 +0200
> From: Jan Hubicka via Gcc-patches 

> Hi,
> this patch adds missing profile update to maybe_optimize_range_tests.

[...]

> Jakub, it seems that the code is originally yours.  Any idea why those are 
> not turned to
> constant true or false conditionals?
> 
> Bootstrapped/regtested x86_64-linux, does it seem to make sense?
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/110628
>   * tree-ssa-reassoc.cc (maybe_optimize_range_tests): Add profile update.

Hi.  Feeling somewhat guilty for not noticing that you had
posted a patch before me xfailing it, I went ahead and
tested this patch for cris-elf against
r14-3431-g7e05cd632fab, but unfortunately it regresses a few
tests, and it appears it's not just testcase (dumps) that
need tweaking.  Four test-cases regress (counting multiple
runs as just one):

Running /x/gcc/gcc/testsuite/gcc.c-torture/execute/execute.exp ...
FAIL: gcc.c-torture/execute/pr95731.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr95731.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr95731.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/pr95731.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr95731.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr95731.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr95731.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
...

Running /x/gcc/gcc/testsuite/gcc.dg/dg.exp ...
FAIL: gcc.dg/pr46309-2.c scan-tree-dump-times reassoc2 "Optimizing range tests 
[^\r\n]*_[0-9]* -.0, 0. and -.128, 128.[\n\r]* into" 1
...

Running /x/gcc/gcc/testsuite/gcc.dg/torture/dg-torture.exp ...
FAIL: gcc.dg/torture/pr63464.c   -Os  execution test
...

Running /x/gcc/gcc/testsuite/gcc.dg/tree-ssa/tree-ssa.exp ...
...
FAIL: gcc.dg/tree-ssa/pr95731.c scan-tree-dump-times optimized " >= 0| < 0" 6

brgds, H-P


Fix profile update in tree-ssa-reassoc

2023-08-23 Thread Jan Hubicka via Gcc-patches
Hi,
this patch adds missing profile update to maybe_optimize_range_tests.
Jakub, I hope I got the code right: I think it basically analyzes the
chain of conditionals, finds some basic blocks involved in the range
testing and then puts all the test into first BB.

The patch fixes gcc.dg/tree-ssa/update-threading.c profile misupdate on
power-pc.  Curiously enough the code is produced differently for x86_64.
I tried to find testcase for x86_64 and found that

testsuite/gcc.dg/tree-ssa/reassoc-33.c
testsuite/gcc.dg/tree-ssa/reassoc-37.c
testsuite/gcc.dg/tree-ssa/reassoc-43.c

are testing this function. However sadly neighter of these testcases seems
to work as expected.  For example in testsuite/gcc.dg/tree-ssa/reassoc-33.c
we turn

;; basic block 3, loop depth 0, count 708669600 (estimated locally, freq 
0.6600), maybe hot
;;  prev block 2, next block 4, flags: (NEW, VISITED)
;;  pred:   2 [66.0% (guessed)]  count:708669600 (estimated locally, freq 
0.6600) (FALSE_VALUE,EXECUTABLE)
_4 = a_14(D) == 44;
_5 = a_14(D) == 78;
_30 = 0;
_6 = _4 | _5;
if (_30 != 0)
  goto ; [34.00%]
else
  goto ; [66.00%]
;;  succ:   7 [34.0% (guessed)]  count:240947667 (estimated locally, freq 
0.2244) (TRUE_VALUE,EXECUTABLE)
;;  4 [66.0% (guessed)]  count:467721933 (estimated locally, freq 
0.4356) (FALSE_VALUE,EXECUTABLE)

to

;; basic block 2, loop depth 0, count 1073741824 (estimated locally, freq 
1.), maybe hot
;;  prev block 0, next block 3, flags: (NEW, VISITED)
;;  pred:   ENTRY [always]  count:1073741824 (estimated locally, freq 
1.) (FALLTHRU,EXECUTABLE)
_18 = (unsigned int) a_14(D);
_19 = _18 + 4294967253;
_24 = (unsigned int) a_14(D);
_25 = _24 + 4294967253;
_26 = _25 & 4294967260;
_27 = _26 == 0;
_20 = _19 <= 3;
_1 = a_14(D) == 43;
_21 = (unsigned int) a_14(D);
_22 = _21 + 4294967221;
_23 = _22 <= 3;
_2 = a_14(D) == 75;
_31 = _27;
_3 = _1 | _2;
if (_31 != 0)
  goto ; [34.00%]
else
  goto ; [66.00%]

which replaces later tests

;; basic block 4, loop depth 0, count 467721934 (estimated locally, freq 
0.4356), maybe hot
;;  prev block 3, next block 5, flags: (NEW, VISITED)
;;  pred:   3 [66.0% (guessed)]  count:467721933 (estimated locally, freq 
0.4356) (FALSE_VALUE,EXECUTABLE)
_7 = a_14(D) == 77;
_8 = a_14(D) == 46;
_29 = 0;
_9 = _7 | _8;
if (_29 != 0)
  goto ; [34.00%]
else
  goto ; [66.00%]

;; basic block 5, loop depth 0, count 308696475 (estimated locally, freq 
0.2875), maybe hot
;;  prev block 4, next block 6, flags: (NEW, VISITED)
;;  pred:   4 [66.0% (guessed)]  count:308696475 (estimated locally, freq 
0.2875) (FALSE_VALUE,EXECUTABLE)
_10 = a_14(D) == 76;
_11 = a_14(D) == 45;
_28 = 0;
_12 = _10 | _11;
if (_28 != 0)
  goto ; [50.00%]
else
  goto ; [50.00%]
;;  succ:   7 [50.0% (guessed)]  count:154348238 (estimated locally, freq 
0.1437) (TRUE_VALUE,EXECUTABLE)
;;  6 [50.0% (guessed)]  count:154348238 (estimated locally, freq 
0.1437) (FALSE_VALUE,EXECUTABLE)

However BB4 and BB5 is not updated to be unconditional by tree-ssa-reassoc pass
and we thus miss the profile update.

This happens later in forwprop but at that time it is too late to update the 
probabilities.
So we get:

;;   basic block 2, loop depth 0, count 1073741824 (estimated locally, freq 
1.), maybe hot
;;prev block 0, next block 3, flags: (NEW, VISITED)
;;pred:   ENTRY [always]  count:1073741824 (estimated locally, freq 
1.) (FALLTHRU,EXECUTABLE)
  _24 = (unsigned int) a_14(D);
  _25 = _24 + 4294967253;
  _26 = _25 & 4294967260;
  _27 = _26 == 0;
  if (_26 == 0)
goto ; [34.00%]
  else
goto ; [66.00%]
;;succ:   4 [34.0% (guessed)]  count:365072224 (estimated locally, freq 
0.3400) (TRUE_VALUE,EXECUTABLE)
;;3 [66.0% (guessed)]  count:708669600 (estimated locally, freq 
0.6600) (FALSE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 0, count 154348237 (estimated locally, freq 
0.1437), maybe hot
;;   Invalid sum of incoming counts 708669600 (estimated locally, freq 0.6600), 
should be 154348237 (estimated locally, freq 0.1437)
;;prev block 2, next block 4, flags: (NEW, VISITED)
;;pred:   2 [66.0% (guessed)]  count:708669600 (estimated locally, freq 
0.6600) (FALSE_VALUE,EXECUTABLE)
;;succ:   4 [always]  count:154348237 (estimated locally, freq 0.1437) 
(FALLTHRU,EXECUTABLE) c.c:12:12

;;   basic block 4, loop depth 0, count 1073741824 (estimated locally, freq 
1.), maybe hot
;;   Invalid sum of incoming counts 519420461 (estimated locally, freq 0.4837), 
should be 1073741824 (estimated locally, freq 1.)
;;prev block 3, next block 1, flags: (NEW, VISITED)
;;pred:   2 [34.0% (guessed)]  count:365072224 (estimated locally, freq 
0.3400) (TRUE_VALUE,EXECUTABLE)
;;3 [always]  count:154348237 (estimated locally, freq 0.1437) 
(FALLTHRU,EXECUTABLE) c.c:12:12
  # _13 = PHI 
  return _13;

Jakub, it seems that the code is originally yours.  Any idea why those are not 
turned to
constant true