[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 Jakub Jelinek changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #49 from Jakub Jelinek --- Should be fixed now.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #48 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:172a72da368146e0fe34194020eb7a6636db4438 commit r14-5556-g172a72da368146e0fe34194020eb7a6636db4438 Author: Jakub Jelinek Date: Fri Nov 17 15:09:44 2023 +0100 vect: Fix check_reduction_path [PR112374] As mentioned in the PR, the intent of the r14-5076 changes was that it doesn't count one of the uses on the use_stmt, but what actually got implemented is that it does this processing on any op_use_stmt, even if it is not the use_stmt statement, which means that it can increase count even on debug stmts (-fcompare-debug failures), or if there would be some other use stmt with 2+ uses it could count that as a single use. Though, because it fails whenever cnt != 1 and I believe use_stmt must be one of the uses, it would probably fail in the latter case anyway. The following patch fixes that by doing this extra processing only when op_use_stmt is use_stmt, and using the normal processing otherwise (so ignore debug stmts, and increase on any uses on the stmt). 2023-11-17 Jakub Jelinek PR tree-optimization/112374 * tree-vect-loop.cc (check_reduction_path): Perform the cond_fn_p special case only if op_use_stmt == use_stmt, use as_a rather than dyn_cast in that case. * gcc.dg/pr112374-1.c: New test. * gcc.dg/pr112374-2.c: New test. * g++.dg/opt/pr112374.C: New test.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #47 from Robin Dapp --- And, just to confirm: Testsuite is unchanged on riscv with your patch.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #46 from Robin Dapp --- (In reply to Jakub Jelinek from comment #43) > Now, the patch changed it to allow one extra use in certain cases (but I > think only on use_stmt, because there should be one use on use_stmt and if > there is some other in the loop, that would be already too much). So I > think my patch ought to work and the above would be more expensive (as it > would need to check each op_use_stmt for being suitable internal call). > the code before Your patch looks good. What I wanted to do, originally, is to just skip the reduction variable's next use in the else value and this is what your patch ensures. I agree, I don't think we actually need to check other uses in that manner (because even the use as a proper op would be too much) so what I posted last will do more work than necessary - it might just make the whole thing a bit clearer.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #45 from Andrew Pinski --- (In reply to Jakub Jelinek from comment #44) > (In reply to Andrew Pinski from comment #42) > > Just one suggestion for the testcases, add: > > /* { dg-additional-options "-march=armv9-a" { target aarch64*-*-* } } */ > > > > So it at least it will be tested on another target. > > Does aarch64*-*-* support -march=armv9-a regardless of whether e.g. > assembler has support for it? I mean, x86 does that, but I think e.g. > powerpc didn't or doesn't. Yes -march=armv9-a is supported even if the assembler does not support it.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #44 from Jakub Jelinek --- (In reply to Andrew Pinski from comment #42) > Just one suggestion for the testcases, add: > /* { dg-additional-options "-march=armv9-a" { target aarch64*-*-* } } */ > > So it at least it will be tested on another target. Does aarch64*-*-* support -march=armv9-a regardless of whether e.g. assembler has support for it? I mean, x86 does that, but I think e.g. powerpc didn't or doesn't.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #43 from Jakub Jelinek --- (In reply to Robin Dapp from comment #40) > (In reply to Jakub Jelinek from comment #37) > [..] > > > The above isn't complete, so one just has to guess what you mean outside of > > that, but the above doesn't seem to be correct. There are many internal > > calls, and most of them shouldn't have the above handling. Say if there is > > .MUL_OVERFLOW (op.ops[opi], op.ops[opi]) call, it should count as 2 uses, > > not one. > > But we would count MUL_OVERFLOW as two uses because its else_pos = -1? > For clarity we could maybe move the else pos check into the if condition, > yes. > > One other thing, though. The else branch checks for is_gimple_debug and > flow_bb_inside_loop_p. We could handle this separately and continue. > Maybe something like this is clearer, replacing the whole loop (and moved > the else_idx check)? > > FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi]) > { > if (is_gimple_debug (op_use_stmt) > || (*code == ERROR_MARK > && !flow_bb_inside_loop_p (loop, >gimple_bb (op_use_stmt > continue; > > /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have >op1 twice (once as definition, once as else) in the same operation. >Allow this. */ > gcall *call = dyn_cast (op_use_stmt); > internal_fn ifn; > if (call && gimple_call_internal_p (call) > && (ifn = gimple_call_internal_fn (call)) > && internal_fn_else_index (ifn) > -1) > { > unsigned else_idx = internal_fn_else_index (ifn); > for (unsigned int j = 0; j < gimple_call_num_args (call); ++j) > { > if (j == else_idx) > continue; > if (gimple_call_arg (call, j) == op.ops[opi]) > cnt++; > } > } > else > FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) > cnt++; > } Then bool cond_fn_p = op.code.is_internal_fn () && (conditional_internal_fn_code (internal_fn (op.code)) != ERROR_MARK); would be unused and would cause bootstrap error. Anyway, it really depends on what you want to achieve with this code. My (admittedly limited) understanding of this is that use_stmt ought to be always one of the statements among the op_use_stmt statements (I believe it must be a use statement of op.ops[opi]) and I think should be also inside of the loop and because after this loop it fails if cnt != 1, I think before your changes it basically made sure that there is exactly one non-debug use in the loop (the use_stmt one) and not other, and that on use_stmt there is just one use. Now, the patch changed it to allow one extra use in certain cases (but I think only on use_stmt, because there should be one use on use_stmt and if there is some other in the loop, that would be already too much). So I think my patch ought to work and the above would be more expensive (as it would need to check each op_use_stmt for being suitable internal call). the code before
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #42 from Andrew Pinski --- Just one suggestion for the testcases, add: /* { dg-additional-options "-march=armv9-a" { target aarch64*-*-* } } */ So it at least it will be tested on another target.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #41 from Richard Biener --- (In reply to Jakub Jelinek from comment #39) > Created attachment 56601 [details] > gcc14-pr112374.patch > > So the following then? LGTM. Refactoring to more explicitly handle debug stmts like with if (is_gimple_debug (op_use_stmt)) continue; else if (cond_fn_p && ...) would also be fine but it isn't technically necessary after this fix.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #40 from Robin Dapp --- (In reply to Jakub Jelinek from comment #37) [..] > The above isn't complete, so one just has to guess what you mean outside of > that, but the above doesn't seem to be correct. There are many internal > calls, and most of them shouldn't have the above handling. Say if there is > .MUL_OVERFLOW (op.ops[opi], op.ops[opi]) call, it should count as 2 uses, > not one. But we would count MUL_OVERFLOW as two uses because its else_pos = -1? For clarity we could maybe move the else pos check into the if condition, yes. One other thing, though. The else branch checks for is_gimple_debug and flow_bb_inside_loop_p. We could handle this separately and continue. Maybe something like this is clearer, replacing the whole loop (and moved the else_idx check)? FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi]) { if (is_gimple_debug (op_use_stmt) || (*code == ERROR_MARK && !flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt continue; /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have op1 twice (once as definition, once as else) in the same operation. Allow this. */ gcall *call = dyn_cast (op_use_stmt); internal_fn ifn; if (call && gimple_call_internal_p (call) && (ifn = gimple_call_internal_fn (call)) && internal_fn_else_index (ifn) > -1) { unsigned else_idx = internal_fn_else_index (ifn); for (unsigned int j = 0; j < gimple_call_num_args (call); ++j) { if (j == else_idx) continue; if (gimple_call_arg (call, j) == op.ops[opi]) cnt++; } } else FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) cnt++; }
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 Jakub Jelinek changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #39 from Jakub Jelinek --- Created attachment 56601 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56601=edit gcc14-pr112374.patch So the following then?
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #38 from Jakub Jelinek --- And the #c29 change actually seems to fix all the testcases here with -fcompare-debug -gno-statement-frontiers: $ ~/src/gcc/obj48/gcc/xg++ -B ~/src/gcc/obj48/gcc/ -fcompare-debug -gno-statement-frontiers pr112374-0.C -O2 -march=skylake-avx512 -S $ ~/src/gcc/obj48/gcc/xgcc -B ~/src/gcc/obj48/gcc/ -fcompare-debug -gno-statement-frontiers pr112374-1.c -O2 -march=skylake-avx512 -S xgcc: error: pr112374-1.c: ‘-fcompare-debug’ failure (length) $ ~/src/gcc/obj48/gcc/xgcc -B ~/src/gcc/obj48/gcc/ -fcompare-debug -gno-statement-frontiers pr112374-2.c -O2 -march=skylake-avx512 -S xgcc: error: pr112374-2.c: ‘-fcompare-debug’ failure (length) $ ~/src/gcc/obj48/gcc/xg++ -B ~/src/gcc/obj48/gcc/ -fcompare-debug -gno-statement-frontiers pr112374-3.C -O2 -march=skylake-avx512 -S pr112374-3.C: In constructor ‘t::t(const long unsigned int&)’: pr112374-3.C:4:36: warning: narrowing conversion of ‘(long unsigned int)a’ from ‘long unsigned int’ to ‘long int’ [-Wnarrowing] 4 | t(const unsigned long ) : coef{a} {}; |^ xg++: error: pr112374-3.C: ‘-fcompare-debug’ failure $ ./xg++ -B ./ -fcompare-debug -gno-statement-frontiers pr112374-0.C -O2 -march=skylake-avx512 -S $ ./xgcc -B ./ -fcompare-debug -gno-statement-frontiers pr112374-1.c -O2 -march=skylake-avx512 -S $ ./xgcc -B ./ -fcompare-debug -gno-statement-frontiers pr112374-2.c -O2 -march=skylake-avx512 -S $ ./xg++ -B ./ -fcompare-debug -gno-statement-frontiers pr112374-3.C -O2 -march=skylake-avx512 -S pr112374-3.C: In constructor ‘t::t(const long unsigned int&)’: pr112374-3.C:4:36: warning: narrowing conversion of ‘(long unsigned int)a’ from ‘long unsigned int’ to ‘long int’ [-Wnarrowing] 4 | t(const unsigned long ) : coef{a} {}; |^ where ~/src/gcc/obj48/gcc/ is build dir of last night's trunk and ./ is build dir of trunk patched with #c29 patch. pr112374-0.C is #c19, pr112374-1.c is #c21, pr112374-2.c is #c25 and pr112374-3.C is #c30. So, pr112374-0.C is not useful with -gno-statement-frontiers, guess I'll add the rest into the testsuite for the patch after trying to fix the narrowing warning in #c30.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #37 from Jakub Jelinek --- (In reply to Robin Dapp from comment #35) > What does get rid of the comparison failures in the three last posted > reduced examples is: > > gcall *call = dyn_cast (op_use_stmt); > internal_fn ifn; > if (call && gimple_call_internal_p (call)) > { > ifn = gimple_call_internal_fn (call); > unsigned else_pos = internal_fn_else_index (ifn); > for (unsigned int j = 0; j < gimple_call_num_args (call); ++j) > { > if (j == else_pos) > continue; > if (gimple_call_arg (call, j) == op.ops[opi]) > cnt++; > } > } > > i.e. ignoring the else altogether. The above isn't complete, so one just has to guess what you mean outside of that, but the above doesn't seem to be correct. There are many internal calls, and most of them shouldn't have the above handling. Say if there is .MUL_OVERFLOW (op.ops[opi], op.ops[opi]) call, it should count as 2 uses, not one.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #36 from Jakub Jelinek --- Actually, while i386-expand.o compilation fails with -fcompare-debug even after the above patch, seems that is only because of the discriminator issue, the *.gkd difference is then: --- i386-expand.cc.gkd 2023-11-16 09:57:22.666238096 +0100 +++ i386-expand.gk.cc.gkd 2023-11-16 09:59:09.207704622 +0100 @@ -245675,14 +245675,14 @@ Declarations used by ix86_expand_vector_ (note # 0 0 NOTE_INSN_DELETED) (insn:TI # 0 0 11 (set (reg:QI 0 ax [360]) (mem:QI (plus:DI (reg/f:DI 0 ax [orig:192 rt ] [192]) -(const_int 3 [0x3])) [ MEM [(struct rtx_def *)tmp_295 + 3B]+0 S1 A8])) "../../gcc/config/i386/i386-expand.cc":17709:451 discrim 6# {*movqi_internal} +(const_int 3 [0x3])) [ MEM [(struct rtx_def *)tmp_295 + 3B]+0 S1 A8])) "../../gcc/config/i386/i386-expand.cc":17709:451 discrim 5# {*movqi_internal} (nil)) (insn:TI # 0 0 11 (parallel [ (set (reg:SI 0 ax [360]) (and:SI (reg:SI 0 ax [360]) (const_int -29 [0xffe3]))) (clobber (reg:CC 17 flags)) -]) "../../gcc/config/i386/i386-expand.cc":17709:451 discrim 6# {*andsi_1} +]) "../../gcc/config/i386/i386-expand.cc":17709:451 discrim 5# {*andsi_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn:TI # 0 0 11 (parallel [ @@ -245690,12 +245690,12 @@ Declarations used by ix86_expand_vector_ (ior:SI (reg:SI 0 ax [360]) (const_int 24 [0x18]))) (clobber (reg:CC 17 flags)) -]) "../../gcc/config/i386/i386-expand.cc":17709:451 discrim 6# {*iorsi_1} +]) "../../gcc/config/i386/i386-expand.cc":17709:451 discrim 5# {*iorsi_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn:TI # 0 0 11 (set (mem:QI (plus:DI (reg/v/f:DI 3 bx [orig:192 rt ] [192]) (const_int 3 [0x3])) [ MEM [(struct rtx_def *)tmp_295 + 3B]+0 S1 A8]) -(reg:QI 0 ax [362])) "../../gcc/config/i386/i386-expand.cc":17709:451 discrim 6# {*movqi_internal} +(reg:QI 0 ax [362])) "../../gcc/config/i386/i386-expand.cc":17709:451 discrim 5# {*movqi_internal} (expr_list:REG_DEAD (reg:QI 0 ax [362]) (nil))) (jump_insn # 0 0 11 (set (pc) So, this should no longer fail normal bootstrap comparison (but of course fail -fcompare-debug bootstrap). And with -fcompare-debug -gno-statement-frontiers it compiles fine.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #35 from Robin Dapp --- What does get rid of the comparison failures in the three last posted reduced examples is: gcall *call = dyn_cast (op_use_stmt); internal_fn ifn; if (call && gimple_call_internal_p (call)) { ifn = gimple_call_internal_fn (call); unsigned else_pos = internal_fn_else_index (ifn); for (unsigned int j = 0; j < gimple_call_num_args (call); ++j) { if (j == else_pos) continue; if (gimple_call_arg (call, j) == op.ops[opi]) cnt++; } } i.e. ignoring the else altogether.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #34 from Robin Dapp --- (In reply to Jakub Jelinek from comment #29) > --- gcc/tree-vect-loop.cc.jj 2023-11-14 10:35:52.0 +0100 > +++ gcc/tree-vect-loop.cc 2023-11-15 22:42:32.782007408 +0100 > @@ -4105,9 +4105,9 @@ pop: > /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have > op1 twice (once as definition, once as else) in the same operation. > Allow this. */ > - if (cond_fn_p) > + if (cond_fn_p && op_use_stmt == use_stmt) > { > - gcall *call = dyn_cast (use_stmt); > + gcall *call = as_a (use_stmt); > unsigned else_pos > = internal_fn_else_index (internal_fn (op.code)); > > doesn't cure the -fcompare-debug failure though. Yeah that's what I meant to mean in the first place ;) Thanks for the debugging efforts. I'm going to check with your reduced example, could reproduce it locally with it.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #33 from rguenther at suse dot de --- On Wed, 15 Nov 2023, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 > > --- Comment #29 from Jakub Jelinek --- > --- gcc/tree-vect-loop.cc.jj2023-11-14 10:35:52.0 +0100 > +++ gcc/tree-vect-loop.cc 2023-11-15 22:42:32.782007408 +0100 > @@ -4105,9 +4105,9 @@ pop: > /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have >op1 twice (once as definition, once as else) in the same operation. >Allow this. */ > - if (cond_fn_p) > + if (cond_fn_p && op_use_stmt == use_stmt) > { > - gcall *call = dyn_cast (use_stmt); > + gcall *call = as_a (use_stmt); > unsigned else_pos > = internal_fn_else_index (internal_fn (op.code)); > > doesn't cure the -fcompare-debug failure though. Looks like a relevant fix tho.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 --- Comment #32 from Andrew Pinski --- I suspect we need a dump option to dump the freed ssa names after each pass. This will allow proving if r14-5076 is at fault or not.
[Bug tree-optimization/112374] [14 Regression] Failed bootstrap with `--with-arch=skylake-avx512 --with-cpu=skylake-avx512`, causes a comparison failure since r14-5076-g01c18f58d37865d5f3bbe93e666183b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374 Andrew Pinski changed: What|Removed |Added Build|x86_64-linux-gnu| Host|x86_64-linux-gnu| Target|x86_64-linux-gnu|x86_64-linux-gnu ||aarch64-linux-gnu Component|target |tree-optimization --- Comment #31 from Andrew Pinski --- (In reply to Andrew Pinski from comment #30) > Here is another reduced testcase: Note the testcase in comment #30, I get comparison failure also on aarch64 with: `-O2 -fno-exceptions -fno-rtti -fno-common -Wno-narrowing -gno-statement-frontiers -march=armv9-a+sve2 -fcompare-debug` .