Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On Wed, Sep 2, 2015 at 9:01 AM, Kyrill Tkachovwrote: > > On 02/09/15 16:18, Zamyatin, Igor wrote: >>> >>> >>> On 19/08/15 17:57, Jeff Law wrote: On 08/12/2015 08:31 AM, Kyrill Tkachov wrote: > > 2015-08-10 Kyrylo Tkachov > >* ifcvt.c (struct noce_if_info): Add then_simple, else_simple, >then_cost, else_cost fields. Change branch_cost field to > unsigned int. >(end_ifcvt_sequence): Call set_used_flags on each insn in the >sequence. >Include rtl-iter.h. >(noce_simple_bbs): New function. >(noce_try_move): Bail if basic blocks are not simple. >(noce_try_store_flag): Likewise. >(noce_try_store_flag_constants): Likewise. >(noce_try_addcc): Likewise. >(noce_try_store_flag_mask): Likewise. >(noce_try_cmove): Likewise. >(noce_try_minmax): Likewise. >(noce_try_abs): Likewise. >(noce_try_sign_mask): Likewise. >(noce_try_bitop): Likewise. >(bbs_ok_for_cmove_arith): New function. >(noce_emit_all_but_last): Likewise. >(noce_emit_insn): Likewise. >(noce_emit_bb): Likewise. >(noce_try_cmove_arith): Handle non-simple basic blocks. >(insn_valid_noce_process_p): New function. >(contains_mem_rtx_p): Likewise. >(bb_valid_for_noce_process_p): Likewise. >(noce_process_if_block): Allow non-simple basic blocks >where appropriate. > > 2015-08-11 Kyrylo Tkachov > >* gcc.dg/ifcvt-1.c: New test. >* gcc.dg/ifcvt-2.c: Likewise. >* gcc.dg/ifcvt-3.c: Likewise. >> >> Looks like ifcvt-3.c fails on x86_64. I see >> >> New failures: >> FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made" >> >> Could you please take a look? > > > Hmm, these pass for me on x86_64-pc-linux-gnu. > The test is most probably failing due to branch costs being too low for the > transformation to kick in. The test passes for me with -mtune=intel and > -mtune=generic. > Do you know what the default tuning CPU is used for that failing test? I opened: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462 -- H.J.
RE: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
> > > On 19/08/15 17:57, Jeff Law wrote: > > On 08/12/2015 08:31 AM, Kyrill Tkachov wrote: > >> 2015-08-10 Kyrylo Tkachov> >> > >> * ifcvt.c (struct noce_if_info): Add then_simple, else_simple, > >> then_cost, else_cost fields. Change branch_cost field to > >> unsigned int. > >> (end_ifcvt_sequence): Call set_used_flags on each insn in the > >> sequence. > >> Include rtl-iter.h. > >> (noce_simple_bbs): New function. > >> (noce_try_move): Bail if basic blocks are not simple. > >> (noce_try_store_flag): Likewise. > >> (noce_try_store_flag_constants): Likewise. > >> (noce_try_addcc): Likewise. > >> (noce_try_store_flag_mask): Likewise. > >> (noce_try_cmove): Likewise. > >> (noce_try_minmax): Likewise. > >> (noce_try_abs): Likewise. > >> (noce_try_sign_mask): Likewise. > >> (noce_try_bitop): Likewise. > >> (bbs_ok_for_cmove_arith): New function. > >> (noce_emit_all_but_last): Likewise. > >> (noce_emit_insn): Likewise. > >> (noce_emit_bb): Likewise. > >> (noce_try_cmove_arith): Handle non-simple basic blocks. > >> (insn_valid_noce_process_p): New function. > >> (contains_mem_rtx_p): Likewise. > >> (bb_valid_for_noce_process_p): Likewise. > >> (noce_process_if_block): Allow non-simple basic blocks > >> where appropriate. > >> > >> 2015-08-11 Kyrylo Tkachov > >> > >> * gcc.dg/ifcvt-1.c: New test. > >> * gcc.dg/ifcvt-2.c: Likewise. > >> * gcc.dg/ifcvt-3.c: Likewise. Looks like ifcvt-3.c fails on x86_64. I see New failures: FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made" Could you please take a look? Thanks, Igor
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 02/09/15 16:18, Zamyatin, Igor wrote: On 19/08/15 17:57, Jeff Law wrote: On 08/12/2015 08:31 AM, Kyrill Tkachov wrote: 2015-08-10 Kyrylo Tkachov* ifcvt.c (struct noce_if_info): Add then_simple, else_simple, then_cost, else_cost fields. Change branch_cost field to unsigned int. (end_ifcvt_sequence): Call set_used_flags on each insn in the sequence. Include rtl-iter.h. (noce_simple_bbs): New function. (noce_try_move): Bail if basic blocks are not simple. (noce_try_store_flag): Likewise. (noce_try_store_flag_constants): Likewise. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_minmax): Likewise. (noce_try_abs): Likewise. (noce_try_sign_mask): Likewise. (noce_try_bitop): Likewise. (bbs_ok_for_cmove_arith): New function. (noce_emit_all_but_last): Likewise. (noce_emit_insn): Likewise. (noce_emit_bb): Likewise. (noce_try_cmove_arith): Handle non-simple basic blocks. (insn_valid_noce_process_p): New function. (contains_mem_rtx_p): Likewise. (bb_valid_for_noce_process_p): Likewise. (noce_process_if_block): Allow non-simple basic blocks where appropriate. 2015-08-11 Kyrylo Tkachov * gcc.dg/ifcvt-1.c: New test. * gcc.dg/ifcvt-2.c: Likewise. * gcc.dg/ifcvt-3.c: Likewise. Looks like ifcvt-3.c fails on x86_64. I see New failures: FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made" Could you please take a look? Hmm, these pass for me on x86_64-pc-linux-gnu. The test is most probably failing due to branch costs being too low for the transformation to kick in. The test passes for me with -mtune=intel and -mtune=generic. Do you know what the default tuning CPU is used for that failing test? Thanks, Kyrill Thanks, Igor
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 09/02/2015 09:18 AM, Zamyatin, Igor wrote: On 19/08/15 17:57, Jeff Law wrote: On 08/12/2015 08:31 AM, Kyrill Tkachov wrote: 2015-08-10 Kyrylo Tkachov* ifcvt.c (struct noce_if_info): Add then_simple, else_simple, then_cost, else_cost fields. Change branch_cost field to unsigned int. (end_ifcvt_sequence): Call set_used_flags on each insn in the sequence. Include rtl-iter.h. (noce_simple_bbs): New function. (noce_try_move): Bail if basic blocks are not simple. (noce_try_store_flag): Likewise. (noce_try_store_flag_constants): Likewise. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_minmax): Likewise. (noce_try_abs): Likewise. (noce_try_sign_mask): Likewise. (noce_try_bitop): Likewise. (bbs_ok_for_cmove_arith): New function. (noce_emit_all_but_last): Likewise. (noce_emit_insn): Likewise. (noce_emit_bb): Likewise. (noce_try_cmove_arith): Handle non-simple basic blocks. (insn_valid_noce_process_p): New function. (contains_mem_rtx_p): Likewise. (bb_valid_for_noce_process_p): Likewise. (noce_process_if_block): Allow non-simple basic blocks where appropriate. 2015-08-11 Kyrylo Tkachov * gcc.dg/ifcvt-1.c: New test. * gcc.dg/ifcvt-2.c: Likewise. * gcc.dg/ifcvt-3.c: Likewise. Looks like ifcvt-3.c fails on x86_64. I see New failures: FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made" Could you please take a look? Hmm, FWIW, these are passing in my builds/tests: [law@localhost c-family]$ grep ifcvt-3 /tmp/gcc.sum PASS: gcc.dg/ifcvt-3.c (test for excess errors) PASS: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made" PASS: gcc.dg/vect/vect-ifcvt-3.c (test for excess errors) PASS: gcc.dg/vect/vect-ifcvt-3.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 1 PASS: gcc.dg/vect/vect-ifcvt-3.c -flto -ffat-lto-objects (test for excess errors) PASS: gcc.dg/vect/vect-ifcvt-3.c -flto -ffat-lto-objects execution test PASS: gcc.dg/vect/vect-ifcvt-3.c execution test PASS: gcc.dg/vect/vect-ifcvt-3.c scan-tree-dump-times vect "vectorized 1 loops" 1 Jeff
RE: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
Ping. https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00609.html Thanks, Kyrill -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Kyrill Tkachov Sent: 12 August 2015 15:32 To: Jeff Law; Steven Bosscher Cc: Bernhard Reutner-Fischer; GCC Patches Subject: Re: [PATCH][RTL-ifcvt] Make non-conditional execution if- conversion more aggressive On 11/08/15 18:09, Kyrill Tkachov wrote: On 11/08/15 18:05, Jeff Law wrote: On 08/09/2015 03:20 PM, Steven Bosscher wrote: On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law l...@redhat.com wrote: So there's a tight relationship between the implementation of bbs_ok_for_cmove_arith and insn_valid_noce_process_p. If there wasn't, then we'd probably be looking to use note_stores and note_uses. Perhaps I'm missing something, but what is wrong with using DF here instead of note_stores/note_uses? All the info on refs/mods of registers is available in the DF caches. Nothing inherently wrong with using DF here. I have reworked the patch to use FOR_EACH_INSN_DEF and FOR_EACH_INSN_USE in bbs_ok_for_cmove_arith to extracts the refs/mods and it seems to work. Is that what you mean by DF? I'm doing some more testing and hope to post the updated version soon. Here it is, I've used the FOR_EACH* macros from dataflow to gather the uses and sets. Bootstrapped and tested on x86_64 and aarch64. How does this look? Thanks, Kyrill 2015-08-10 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (struct noce_if_info): Add then_simple, else_simple, then_cost, else_cost fields. Change branch_cost field to unsigned int. (end_ifcvt_sequence): Call set_used_flags on each insn in the sequence. Include rtl-iter.h. (noce_simple_bbs): New function. (noce_try_move): Bail if basic blocks are not simple. (noce_try_store_flag): Likewise. (noce_try_store_flag_constants): Likewise. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_minmax): Likewise. (noce_try_abs): Likewise. (noce_try_sign_mask): Likewise. (noce_try_bitop): Likewise. (bbs_ok_for_cmove_arith): New function. (noce_emit_all_but_last): Likewise. (noce_emit_insn): Likewise. (noce_emit_bb): Likewise. (noce_try_cmove_arith): Handle non-simple basic blocks. (insn_valid_noce_process_p): New function. (contains_mem_rtx_p): Likewise. (bb_valid_for_noce_process_p): Likewise. (noce_process_if_block): Allow non-simple basic blocks where appropriate. 2015-08-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.dg/ifcvt-1.c: New test. * gcc.dg/ifcvt-2.c: Likewise. * gcc.dg/ifcvt-3.c: Likewise. Thanks, Kyrill jeff
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 19/08/15 17:57, Jeff Law wrote: On 08/12/2015 08:31 AM, Kyrill Tkachov wrote: 2015-08-10 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (struct noce_if_info): Add then_simple, else_simple, then_cost, else_cost fields. Change branch_cost field to unsigned int. (end_ifcvt_sequence): Call set_used_flags on each insn in the sequence. Include rtl-iter.h. (noce_simple_bbs): New function. (noce_try_move): Bail if basic blocks are not simple. (noce_try_store_flag): Likewise. (noce_try_store_flag_constants): Likewise. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_minmax): Likewise. (noce_try_abs): Likewise. (noce_try_sign_mask): Likewise. (noce_try_bitop): Likewise. (bbs_ok_for_cmove_arith): New function. (noce_emit_all_but_last): Likewise. (noce_emit_insn): Likewise. (noce_emit_bb): Likewise. (noce_try_cmove_arith): Handle non-simple basic blocks. (insn_valid_noce_process_p): New function. (contains_mem_rtx_p): Likewise. (bb_valid_for_noce_process_p): Likewise. (noce_process_if_block): Allow non-simple basic blocks where appropriate. 2015-08-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.dg/ifcvt-1.c: New test. * gcc.dg/ifcvt-2.c: Likewise. * gcc.dg/ifcvt-3.c: Likewise. Thanks for pinging -- I thought I'd already approved this a few days ago! But I can't find it in my outbox... So clearly I didn't finish the final review. diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 1f29646..c33fe24 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -1625,6 +1672,152 @@ noce_try_cmove (struct noce_if_info *if_info) return FALSE; } +/* Helper for bb_valid_for_noce_process_p. Validate that + the rtx insn INSN is a single set that does not set + the conditional register CC and is in general valid for + if-conversion. */ + +static bool +insn_valid_noce_process_p (rtx_insn *insn, rtx cc) +{ + if (!insn + || !NONJUMP_INSN_P (insn) + || (cc set_of (cc, insn))) + return false; + + rtx sset = single_set (insn); + + /* Currently support only simple single sets in test_bb. */ + if (!sset + || !noce_operand_ok (SET_DEST (sset)) + || !noce_operand_ok (SET_SRC (sset))) +return false; + + return true; +} + + + /* Make sure this is a REG and not some instance +of ZERO_EXTRACT or SUBREG or other dangerous stuff. */ + if (!REG_P (SET_DEST (sset_b))) + { + BITMAP_FREE (bba_sets); + return false; + } BTW, this is overly conservative. You're working with pseudos here, so you can just treat a ZERO_EXTRACT or SUBREG as a read of the full underlying register. If this comes up in practice you might consider handling them as a follow-up patch. I don't think you need to handle that case immediately though. I agree, from my testing and investigation this patch strictly increases the available opportunities, so being conservative here should not cause any regressions against existing behaviour. Making it more aggressive can be done as a follow up though, as you said, I'm not sure how frequently this comes up in practice. I also can't remember if we discussed what happens if blocks A B write to the same register, do we handle that situation correctly? Hmmm... The function bb_valid_for_noce_process_p that we call early on in noce_process_if_block makes sure that the only live reg out of each basic block is the final common destination ('x' in the noce_if_info struct definition). Since both basic blocks satisfy that check I suppose that means that even if A and B do write to the same intermediate pseudo that should not affect correctness since the written-to common register would have to be read within A and B (and nowhere outside A and B), which would cause it to fail the bbs_ok_for_cmove_arith check that checks that no regs written in A are read in B (and vice versa). That's the only issue left in my mind. If we're handling that case correctly, then this is Ok for the trunk as-is. Else we'll need another iteration. Does the above explanation look ok to you? If so, I'll be away for a week from Monday so I'd rather commit the patch when I get back so I can deal with any fallout... Thanks for the reviews! Kyrill Thanks, Jeff
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 08/12/2015 08:31 AM, Kyrill Tkachov wrote: 2015-08-10 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (struct noce_if_info): Add then_simple, else_simple, then_cost, else_cost fields. Change branch_cost field to unsigned int. (end_ifcvt_sequence): Call set_used_flags on each insn in the sequence. Include rtl-iter.h. (noce_simple_bbs): New function. (noce_try_move): Bail if basic blocks are not simple. (noce_try_store_flag): Likewise. (noce_try_store_flag_constants): Likewise. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_minmax): Likewise. (noce_try_abs): Likewise. (noce_try_sign_mask): Likewise. (noce_try_bitop): Likewise. (bbs_ok_for_cmove_arith): New function. (noce_emit_all_but_last): Likewise. (noce_emit_insn): Likewise. (noce_emit_bb): Likewise. (noce_try_cmove_arith): Handle non-simple basic blocks. (insn_valid_noce_process_p): New function. (contains_mem_rtx_p): Likewise. (bb_valid_for_noce_process_p): Likewise. (noce_process_if_block): Allow non-simple basic blocks where appropriate. 2015-08-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.dg/ifcvt-1.c: New test. * gcc.dg/ifcvt-2.c: Likewise. * gcc.dg/ifcvt-3.c: Likewise. Thanks for pinging -- I thought I'd already approved this a few days ago! But I can't find it in my outbox... So clearly I didn't finish the final review. diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 1f29646..c33fe24 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -1625,6 +1672,152 @@ noce_try_cmove (struct noce_if_info *if_info) return FALSE; } +/* Helper for bb_valid_for_noce_process_p. Validate that + the rtx insn INSN is a single set that does not set + the conditional register CC and is in general valid for + if-conversion. */ + +static bool +insn_valid_noce_process_p (rtx_insn *insn, rtx cc) +{ + if (!insn + || !NONJUMP_INSN_P (insn) + || (cc set_of (cc, insn))) + return false; + + rtx sset = single_set (insn); + + /* Currently support only simple single sets in test_bb. */ + if (!sset + || !noce_operand_ok (SET_DEST (sset)) + || !noce_operand_ok (SET_SRC (sset))) +return false; + + return true; +} + + + /* Make sure this is a REG and not some instance +of ZERO_EXTRACT or SUBREG or other dangerous stuff. */ + if (!REG_P (SET_DEST (sset_b))) + { + BITMAP_FREE (bba_sets); + return false; + } BTW, this is overly conservative. You're working with pseudos here, so you can just treat a ZERO_EXTRACT or SUBREG as a read of the full underlying register. If this comes up in practice you might consider handling them as a follow-up patch. I don't think you need to handle that case immediately though. I also can't remember if we discussed what happens if blocks A B write to the same register, do we handle that situation correctly? That's the only issue left in my mind. If we're handling that case correctly, then this is Ok for the trunk as-is. Else we'll need another iteration. Thanks, Jeff
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 08/19/2015 11:20 AM, Kyrill Tkachov wrote: Hmmm... The function bb_valid_for_noce_process_p that we call early on in noce_process_if_block makes sure that the only live reg out of each basic block is the final common destination ('x' in the noce_if_info struct definition). Since both basic blocks satisfy that check I suppose that means that even if A and B do write to the same intermediate pseudo that should not affect correctness since the written-to common register would have to be read within A and B (and nowhere outside A and B), which would cause it to fail the bbs_ok_for_cmove_arith check that checks that no regs written in A are read in B (and vice versa). Excellent. That answers things quite well. In retrospect, I could have figured that out myself :-) That's the only issue left in my mind. If we're handling that case correctly, then this is Ok for the trunk as-is. Else we'll need another iteration. Does the above explanation look ok to you? If so, I'll be away for a week from Monday so I'd rather commit the patch when I get back so I can deal with any fallout... That's fine with me. Commit at your leisure. Thanks, jeff
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 11/08/15 18:09, Kyrill Tkachov wrote: On 11/08/15 18:05, Jeff Law wrote: On 08/09/2015 03:20 PM, Steven Bosscher wrote: On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law l...@redhat.com wrote: So there's a tight relationship between the implementation of bbs_ok_for_cmove_arith and insn_valid_noce_process_p. If there wasn't, then we'd probably be looking to use note_stores and note_uses. Perhaps I'm missing something, but what is wrong with using DF here instead of note_stores/note_uses? All the info on refs/mods of registers is available in the DF caches. Nothing inherently wrong with using DF here. I have reworked the patch to use FOR_EACH_INSN_DEF and FOR_EACH_INSN_USE in bbs_ok_for_cmove_arith to extracts the refs/mods and it seems to work. Is that what you mean by DF? I'm doing some more testing and hope to post the updated version soon. Here it is, I've used the FOR_EACH* macros from dataflow to gather the uses and sets. Bootstrapped and tested on x86_64 and aarch64. How does this look? Thanks, Kyrill 2015-08-10 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (struct noce_if_info): Add then_simple, else_simple, then_cost, else_cost fields. Change branch_cost field to unsigned int. (end_ifcvt_sequence): Call set_used_flags on each insn in the sequence. Include rtl-iter.h. (noce_simple_bbs): New function. (noce_try_move): Bail if basic blocks are not simple. (noce_try_store_flag): Likewise. (noce_try_store_flag_constants): Likewise. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_minmax): Likewise. (noce_try_abs): Likewise. (noce_try_sign_mask): Likewise. (noce_try_bitop): Likewise. (bbs_ok_for_cmove_arith): New function. (noce_emit_all_but_last): Likewise. (noce_emit_insn): Likewise. (noce_emit_bb): Likewise. (noce_try_cmove_arith): Handle non-simple basic blocks. (insn_valid_noce_process_p): New function. (contains_mem_rtx_p): Likewise. (bb_valid_for_noce_process_p): Likewise. (noce_process_if_block): Allow non-simple basic blocks where appropriate. 2015-08-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.dg/ifcvt-1.c: New test. * gcc.dg/ifcvt-2.c: Likewise. * gcc.dg/ifcvt-3.c: Likewise. Thanks, Kyrill jeff commit 76813a58e40d067bb3e64151617a975581bc81cb Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Wed Jul 8 15:45:04 2015 +0100 [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 1f29646..c33fe24 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -53,6 +53,7 @@ #include tree-pass.h #include dbgcnt.h #include shrink-wrap.h +#include rtl-iter.h #include ifcvt.h #ifndef HAVE_incscc @@ -816,8 +817,17 @@ struct noce_if_info form as well. */ bool then_else_reversed; + /* True if the contents of then_bb and else_bb are a + simple single set instruction. */ + bool then_simple; + bool else_simple; + + /* The total rtx cost of the instructions in then_bb and else_bb. */ + unsigned int then_cost; + unsigned int else_cost; + /* Estimated cost of the particular branch instruction. */ - int branch_cost; + unsigned int branch_cost; }; static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int); @@ -1037,6 +1047,10 @@ end_ifcvt_sequence (struct noce_if_info *if_info) set_used_flags (if_info-cond); set_used_flags (if_info-a); set_used_flags (if_info-b); + + for (insn = seq; insn; insn = NEXT_INSN (insn)) +set_used_flags (insn); + unshare_all_rtl_in_chain (seq); end_sequence (); @@ -1054,6 +1068,21 @@ end_ifcvt_sequence (struct noce_if_info *if_info) return seq; } +/* Return true iff the then and else basic block (if it exists) + consist of a single simple set instruction. */ + +static bool +noce_simple_bbs (struct noce_if_info *if_info) +{ + if (!if_info-then_simple) +return false; + + if (if_info-else_bb) +return if_info-else_simple; + + return true; +} + /* Convert if (a != b) x = a; else x = b into x = a and if (a == b) x = a; else x = b into x = b. */ @@ -1068,6 +1097,9 @@ noce_try_move (struct noce_if_info *if_info) if (code != NE code != EQ) return FALSE; + if (!noce_simple_bbs (if_info)) +return FALSE; + /* This optimization isn't valid if either A or B could be a NaN or a signed zero. */ if (HONOR_NANS (if_info-x) @@ -1116,6 +1148,9 @@ noce_try_store_flag (struct noce_if_info *if_info) rtx target; rtx_insn *seq; + if (!noce_simple_bbs (if_info)) +return FALSE; + if (CONST_INT_P (if_info-b) INTVAL (if_info-b) == STORE_FLAG_VALUE if_info-a == const0_rtx) @@ -1165,6 +1200,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) bool can_reverse; machine_mode mode; + if (!noce_simple_bbs (if_info)) +return FALSE; + if (CONST_INT_P (if_info-a)
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 08/09/2015 03:20 PM, Steven Bosscher wrote: On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law l...@redhat.com wrote: So there's a tight relationship between the implementation of bbs_ok_for_cmove_arith and insn_valid_noce_process_p. If there wasn't, then we'd probably be looking to use note_stores and note_uses. Perhaps I'm missing something, but what is wrong with using DF here instead of note_stores/note_uses? All the info on refs/mods of registers is available in the DF caches. Nothing inherently wrong with using DF here. jeff
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 11/08/15 18:05, Jeff Law wrote: On 08/09/2015 03:20 PM, Steven Bosscher wrote: On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law l...@redhat.com wrote: So there's a tight relationship between the implementation of bbs_ok_for_cmove_arith and insn_valid_noce_process_p. If there wasn't, then we'd probably be looking to use note_stores and note_uses. Perhaps I'm missing something, but what is wrong with using DF here instead of note_stores/note_uses? All the info on refs/mods of registers is available in the DF caches. Nothing inherently wrong with using DF here. I have reworked the patch to use FOR_EACH_INSN_DEF and FOR_EACH_INSN_USE in bbs_ok_for_cmove_arith to extracts the refs/mods and it seems to work. Is that what you mean by DF? I'm doing some more testing and hope to post the updated version soon. Thanks, Kyrill jeff
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law l...@redhat.com wrote: So there's a tight relationship between the implementation of bbs_ok_for_cmove_arith and insn_valid_noce_process_p. If there wasn't, then we'd probably be looking to use note_stores and note_uses. Perhaps I'm missing something, but what is wrong with using DF here instead of note_stores/note_uses? All the info on refs/mods of registers is available in the DF caches. Ciao! Steven
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 07/27/2015 10:40 AM, Kyrill Tkachov wrote: On 27/07/15 17:09, Jeff Law wrote: On 07/27/2015 04:17 AM, Kyrill Tkachov wrote: I experimented with resource.c and the roadblock I hit is that it seems to have an assumption that it operates on hard regs (in fact the struct it uses to describe the resources has a HARD_REG_SET for the regs) and so it triggers various HARD_REGISTER_P asserts when I try to use the functions there. if-conversion runs before register allocation, so we're dealing with pseudos here. Sigh. resource.c probably isn't going to be useful then. My other attempt was to go over BB_A and mark the set registers in a bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A we return false. This seemed to do the job and testing worked out ok. That would require one walk over BB_A, one walk over BB_B but I don't know how expensive FOR_EACH_SUBRTX walks are... Would that be an acceptable solution? I think the latter is reasonable. Ultimately we have to do a full look at those rtxs, so it's unavoidable to some extent. The only other possibility would be to use the DF framework. I'm not sure if it's even initialized for the ifcvt code. If it is, then you might be able to avoid some of the walking of the insns and instead walk the DF structures. I think it is initialized (I look at df_get_live_out earlier on in the call chain). I suppose what we want is for the live in regs for BB_B to not include any of the set regs in BB_A? I was thinking more that you'd walk BB_A, which would give you a set of registers changed in BB_A. Then you could walk the uses of those regs using the DF structures and see if any are in BB_B. That may not be any better in practice. It'd depend on things like how many uses exist outside BB_B vs the size of BB_B. Perhaps it isn't worth the trouble and we should stick with a single walk of both blocks? I don't think we have any good generic machinery for this. I think every pass that needs this capability unlinks the insn from the chain and patches it back in at the new location. That's the SET_PREV_INSN, SET_NEXT_INSN functions, right? Right. Passes would just unlink the insn from the chain, then link it back in at a new location. It's always been fairly ad-hoc at the RTL level. The current way the top-level noce_process_if_block is structured it expects the various ifcvt functions (like noce_try_cmove_arith) to generate a sequence, then it takes it, unshares it and removes the empty basic blocks. If we're to instead move insns around we'd need to further modify noce_process_if_block to handle differently this one case where we move insns instead of re-emitting them. I think this would make that function more convoluted than it needs to be. With the current approach we always call unshare_all_rtl_in_chain on the emitted sequence which should take care of any RTL sharing issues and in practice I don't expect to have more than 3-4 insns in these sequences since they will be guarded by the branch cost. So I would rather argue for re-emitting insns in this case to keep consistent with the dozen or so similar functions in ifcvt.c that already work that way. OK, then let's stay with re-emitting since that's the structure generally expected elsewhere in ifcvt.c. I just get worried anytime I spot the potential for invalid RTL sharing. Jeff
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 07/28/2015 04:14 AM, Kyrill Tkachov wrote: [ Snip ] Here's a respin. I've reworked bbs_ok_for_cmove_arith to go over BB_A once and record the set registers then go over BB_B once and look inside the SET_SRC of each insn for those registers. How does this look? Would you like me to investigate the data-flow infrastructure approach? Also, in bb_valid_for_noce_process_p I iterate through the sub-rtxes looking for a MEM with the FOR_EACH_SUBRTX machinery. As I said above, I think moving the insns rather than re-emitting them would make the function more convoluted than I think it needs to be. Bootstrapped and tested on arm, aarch64, x86_64. 2015-07-28 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (struct noce_if_info): Add then_simple, else_simple, then_cost, else_cost fields. Change branch_cost field to unsigned int. (end_ifcvt_sequence): Call set_used_flags on each insn in the sequence. (noce_simple_bbs): New function. (noce_try_move): Bail if basic blocks are not simple. (noce_try_store_flag): Likewise. (noce_try_store_flag_constants): Likewise. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_minmax): Likewise. (noce_try_abs): Likewise. (noce_try_sign_mask): Likewise. (noce_try_bitop): Likewise. (bbs_ok_for_cmove_arith): New function. (noce_emit_all_but_last): Likewise. (noce_emit_insn): Likewise. (noce_emit_bb): Likewise. (noce_try_cmove_arith): Handle non-simple basic blocks. (insn_valid_noce_process_p): New function. (contains_mem_rtx_p): Likewise. (bb_valid_for_noce_process_p): Likewise. (noce_process_if_block): Allow non-simple basic blocks where appropriate. 2015-07-28 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.dg/ifcvt-1.c: New test. * gcc.dg/ifcvt-2.c: Likewise. * gcc.dg/ifcvt-3.c: Likewise. Thanks, Kyrill jeff ifcvt.patch commit a02417f015b70a0e47170ac7c41a5569392085a5 Author: Kyrylo Tkachovkyrylo.tkac...@arm.com Date: Wed Jul 8 15:45:04 2015 +0100 [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 2d97cc5..dae9b41 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -53,6 +53,7 @@ #include tree-pass.h #include dbgcnt.h #include shrink-wrap.h +#include rtl-iter.h #include ifcvt.h Needs to be mentioned in the ChangeLog. @@ -1585,6 +1632,116 @@ noce_try_cmove (struct noce_if_info *if_info) return FALSE; } +/* Return true iff the registers that the insns in BB_A set do not + get used in BB_B. */ + +static bool +bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b) +{ + rtx_insn *a_insn; + bitmap bba_sets = BITMAP_ALLOC (reg_obstack); + + FOR_BB_INSNS (bb_a, a_insn) +{ + if (!active_insn_p (a_insn)) + continue; + + rtx sset_a = single_set (a_insn); + + if (!sset_a) + { + BITMAP_FREE (bba_sets); + return false; + } + + rtx dest_reg = SET_DEST (sset_a); + bitmap_set_bit (bba_sets, REGNO (dest_reg)); +} So there's a tight relationship between the implementation of bbs_ok_for_cmove_arith and insn_valid_noce_process_p. If there wasn't, then we'd probably be looking to use note_stores and note_uses. With that tight relationship, I think the implementations should be close together in the file and a comment in insn_valid_noce_process_p indicating that if insn_valid_noce_process_p is changed that the implementation of bbs_ok_for_cmove_arith may need to change as well. One things I am a bit worried about is a SET_DEST that is a ZERO_EXTRACT or SUBREG. Those are usually a read and a write. So you'd need to filter them out earlier (noce_operand_ok?) or handle them in the second half of bbs_ok_for_cmove_arith. Jeff
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
Hi Jeff, On 27/07/15 17:40, Kyrill Tkachov wrote: On 27/07/15 17:09, Jeff Law wrote: On 07/27/2015 04:17 AM, Kyrill Tkachov wrote: I experimented with resource.c and the roadblock I hit is that it seems to have an assumption that it operates on hard regs (in fact the struct it uses to describe the resources has a HARD_REG_SET for the regs) and so it triggers various HARD_REGISTER_P asserts when I try to use the functions there. if-conversion runs before register allocation, so we're dealing with pseudos here. Sigh. resource.c probably isn't going to be useful then. My other attempt was to go over BB_A and mark the set registers in a bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A we return false. This seemed to do the job and testing worked out ok. That would require one walk over BB_A, one walk over BB_B but I don't know how expensive FOR_EACH_SUBRTX walks are... Would that be an acceptable solution? I think the latter is reasonable. Ultimately we have to do a full look at those rtxs, so it's unavoidable to some extent. The only other possibility would be to use the DF framework. I'm not sure if it's even initialized for the ifcvt code. If it is, then you might be able to avoid some of the walking of the insns and instead walk the DF structures. I think it is initialized (I look at df_get_live_out earlier on in the call chain). I suppose what we want is for the live in regs for BB_B to not include any of the set regs in BB_A? snip It fails when the last insn is not recognised, because noce_try_cmove_arith can modify the last insn, but I have not seen it cause any trouble. If it fails then back in noce_try_cmove_arith we goto end_seq_and_fail which ends the sequence and throws it away (and cancels if-conversion down that path), so it should be safe. OK, I was working for the assumption that memoization ought not fail, but it seems that was a bad assumption on my part.So given noce_try_cmove_arith can change the last insn and make it no-recognizable this code seems reasoanble. So I think the only outstanding issues are: 1. Investigate moving rather than re-emitting insns. I'll look into that, but what is the machinery by which one moves insns? I don't think we have any good generic machinery for this. I think every pass that needs this capability unlinks the insn from the chain and patches it back in at the new location. That's the SET_PREV_INSN, SET_NEXT_INSN functions, right? The current way the top-level noce_process_if_block is structured it expects the various ifcvt functions (like noce_try_cmove_arith) to generate a sequence, then it takes it, unshares it and removes the empty basic blocks. If we're to instead move insns around we'd need to further modify noce_process_if_block to handle differently this one case where we move insns instead of re-emitting them. I think this would make that function more convoluted than it needs to be. With the current approach we always call unshare_all_rtl_in_chain on the emitted sequence which should take care of any RTL sharing issues and in practice I don't expect to have more than 3-4 insns in these sequences since they will be guarded by the branch cost. So I would rather argue for re-emitting insns in this case to keep consistent with the dozen or so similar functions in ifcvt.c that already work that way. Here's a respin. I've reworked bbs_ok_for_cmove_arith to go over BB_A once and record the set registers then go over BB_B once and look inside the SET_SRC of each insn for those registers. How does this look? Would you like me to investigate the data-flow infrastructure approach? Also, in bb_valid_for_noce_process_p I iterate through the sub-rtxes looking for a MEM with the FOR_EACH_SUBRTX machinery. As I said above, I think moving the insns rather than re-emitting them would make the function more convoluted than I think it needs to be. Bootstrapped and tested on arm, aarch64, x86_64. 2015-07-28 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (struct noce_if_info): Add then_simple, else_simple, then_cost, else_cost fields. Change branch_cost field to unsigned int. (end_ifcvt_sequence): Call set_used_flags on each insn in the sequence. (noce_simple_bbs): New function. (noce_try_move): Bail if basic blocks are not simple. (noce_try_store_flag): Likewise. (noce_try_store_flag_constants): Likewise. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_minmax): Likewise. (noce_try_abs): Likewise. (noce_try_sign_mask): Likewise. (noce_try_bitop): Likewise. (bbs_ok_for_cmove_arith): New function. (noce_emit_all_but_last): Likewise. (noce_emit_insn): Likewise. (noce_emit_bb): Likewise. (noce_try_cmove_arith): Handle non-simple basic blocks. (insn_valid_noce_process_p): New function.
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 07/27/2015 04:17 AM, Kyrill Tkachov wrote: I experimented with resource.c and the roadblock I hit is that it seems to have an assumption that it operates on hard regs (in fact the struct it uses to describe the resources has a HARD_REG_SET for the regs) and so it triggers various HARD_REGISTER_P asserts when I try to use the functions there. if-conversion runs before register allocation, so we're dealing with pseudos here. Sigh. resource.c probably isn't going to be useful then. My other attempt was to go over BB_A and mark the set registers in a bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A we return false. This seemed to do the job and testing worked out ok. That would require one walk over BB_A, one walk over BB_B but I don't know how expensive FOR_EACH_SUBRTX walks are... Would that be an acceptable solution? I think the latter is reasonable. Ultimately we have to do a full look at those rtxs, so it's unavoidable to some extent. The only other possibility would be to use the DF framework. I'm not sure if it's even initialized for the ifcvt code. If it is, then you might be able to avoid some of the walking of the insns and instead walk the DF structures. snip It fails when the last insn is not recognised, because noce_try_cmove_arith can modify the last insn, but I have not seen it cause any trouble. If it fails then back in noce_try_cmove_arith we goto end_seq_and_fail which ends the sequence and throws it away (and cancels if-conversion down that path), so it should be safe. OK, I was working for the assumption that memoization ought not fail, but it seems that was a bad assumption on my part.So given noce_try_cmove_arith can change the last insn and make it no-recognizable this code seems reasoanble. So I think the only outstanding issues are: 1. Investigate moving rather than re-emitting insns. I'll look into that, but what is the machinery by which one moves insns? I don't think we have any good generic machinery for this. I think every pass that needs this capability unlinks the insn from the chain and patches it back in at the new location. jeff
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 27/07/15 17:09, Jeff Law wrote: On 07/27/2015 04:17 AM, Kyrill Tkachov wrote: I experimented with resource.c and the roadblock I hit is that it seems to have an assumption that it operates on hard regs (in fact the struct it uses to describe the resources has a HARD_REG_SET for the regs) and so it triggers various HARD_REGISTER_P asserts when I try to use the functions there. if-conversion runs before register allocation, so we're dealing with pseudos here. Sigh. resource.c probably isn't going to be useful then. My other attempt was to go over BB_A and mark the set registers in a bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A we return false. This seemed to do the job and testing worked out ok. That would require one walk over BB_A, one walk over BB_B but I don't know how expensive FOR_EACH_SUBRTX walks are... Would that be an acceptable solution? I think the latter is reasonable. Ultimately we have to do a full look at those rtxs, so it's unavoidable to some extent. The only other possibility would be to use the DF framework. I'm not sure if it's even initialized for the ifcvt code. If it is, then you might be able to avoid some of the walking of the insns and instead walk the DF structures. I think it is initialized (I look at df_get_live_out earlier on in the call chain). I suppose what we want is for the live in regs for BB_B to not include any of the set regs in BB_A? snip It fails when the last insn is not recognised, because noce_try_cmove_arith can modify the last insn, but I have not seen it cause any trouble. If it fails then back in noce_try_cmove_arith we goto end_seq_and_fail which ends the sequence and throws it away (and cancels if-conversion down that path), so it should be safe. OK, I was working for the assumption that memoization ought not fail, but it seems that was a bad assumption on my part.So given noce_try_cmove_arith can change the last insn and make it no-recognizable this code seems reasoanble. So I think the only outstanding issues are: 1. Investigate moving rather than re-emitting insns. I'll look into that, but what is the machinery by which one moves insns? I don't think we have any good generic machinery for this. I think every pass that needs this capability unlinks the insn from the chain and patches it back in at the new location. That's the SET_PREV_INSN, SET_NEXT_INSN functions, right? The current way the top-level noce_process_if_block is structured it expects the various ifcvt functions (like noce_try_cmove_arith) to generate a sequence, then it takes it, unshares it and removes the empty basic blocks. If we're to instead move insns around we'd need to further modify noce_process_if_block to handle differently this one case where we move insns instead of re-emitting them. I think this would make that function more convoluted than it needs to be. With the current approach we always call unshare_all_rtl_in_chain on the emitted sequence which should take care of any RTL sharing issues and in practice I don't expect to have more than 3-4 insns in these sequences since they will be guarded by the branch cost. So I would rather argue for re-emitting insns in this case to keep consistent with the dozen or so similar functions in ifcvt.c that already work that way. Thanks, Kyrill jeff
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
Hi Jeff, On 24/07/15 19:43, Jeff Law wrote: On 07/24/2015 03:31 AM, Kyrill Tkachov wrote: Wouldn't it be better to walk BB_A, gathering the set of all the registers modified, then do a single walk through BB testing for uses of those registers? I think so, yes. I'll try that. You might look at resource.c -- I haven't looked at it in a long time, but it might have many of the interfaces you need to do this kind of book keeping. I experimented with resource.c and the roadblock I hit is that it seems to have an assumption that it operates on hard regs (in fact the struct it uses to describe the resources has a HARD_REG_SET for the regs) and so it triggers various HARD_REGISTER_P asserts when I try to use the functions there. if-conversion runs before register allocation, so we're dealing with pseudos here. My other attempt was to go over BB_A and mark the set registers in a bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A we return false. This seemed to do the job and testing worked out ok. That would require one walk over BB_A, one walk over BB_B but I don't know how expensive FOR_EACH_SUBRTX walks are... Would that be an acceptable solution? snip It fails when the last insn is not recognised, because noce_try_cmove_arith can modify the last insn, but I have not seen it cause any trouble. If it fails then back in noce_try_cmove_arith we goto end_seq_and_fail which ends the sequence and throws it away (and cancels if-conversion down that path), so it should be safe. OK, I was working for the assumption that memoization ought not fail, but it seems that was a bad assumption on my part.So given noce_try_cmove_arith can change the last insn and make it no-recognizable this code seems reasoanble. So I think the only outstanding issues are: 1. Investigate moving rather than re-emitting insns. I'll look into that, but what is the machinery by which one moves insns? Thanks, Kyrill 2. Investigate a more efficient means to find set/use conflicts between the two blocks, possibly using resource.[ch] jeff
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 23/07/15 21:38, Jeff Law wrote: On 07/13/2015 08:03 AM, Kyrill Tkachov wrote: 2015-07-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (struct noce_if_info): Add then_simple, else_simple, then_cost, else_cost fields. Change branch_cost field to unsigned int. (end_ifcvt_sequence): Call set_used_flags on each insn in the sequence. (noce_simple_bbs): New function. (noce_try_move): Bail if basic blocks are not simple. (noce_try_store_flag): Likewise. (noce_try_store_flag_constants): Likewise. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_minmax): Likewise. (noce_try_abs): Likewise. (noce_try_sign_mask): Likewise. (noce_try_bitop): Likewise. (bbs_ok_for_cmove_arith): New function. (noce_emit_all_but_last): Likewise. (noce_emit_insn): Likewise. (noce_emit_bb): Likewise. (noce_try_cmove_arith): Handle non-simple basic blocks. (insn_valid_noce_process_p): New function. (bb_valid_for_noce_process_p): Likewise. (noce_process_if_block): Allow non-simple basic blocks where appropriate. 2015-07-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.dg/ifcvt-1.c: New test. * gcc.dg/ifcvt-2.c: Likewise. * gcc.dg/ifcvt-3.c: Likewise. Thanks, Kyrill cheers, ifcvt.patch commit bc62987a2fa3d9dc3de5a1ed8003a745340255bd Author: Kyrylo Tkachovkyrylo.tkac...@arm.com Date: Wed Jul 8 15:45:04 2015 +0100 [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 31849ee..2f0a228 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -1584,6 +1630,96 @@ noce_try_cmove (struct noce_if_info *if_info) return FALSE; } +/* Return true iff the registers that the insns in BB_A set do not + get used in BB_B. */ + +static bool +bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b) +{ + rtx_insn *a_insn; + FOR_BB_INSNS (bb_a, a_insn) +{ + if (!active_insn_p (a_insn)) + continue; + + rtx sset_a = single_set (a_insn); + + if (!sset_a) + return false; + + rtx dest_reg = SET_DEST (sset_a); + rtx_insn *b_insn; + + FOR_BB_INSNS (bb_b, b_insn) + { + if (!active_insn_p (b_insn)) + continue; + + rtx sset_b = single_set (b_insn); + + if (!sset_b) + return false; + + if (reg_referenced_p (dest_reg, sset_b)) + return false; + } +} + + return true; +} Doesn't this have the potential to get very expensive since you end up looking at every insn in BB_B for every insn in BB_A? Yes, the saving grace is that bbs_ok_for_cmove_arith is called after the costs checks, so that the costs in practice reject blocks more than a few instructions long. Wouldn't it be better to walk BB_A, gathering the set of all the registers modified, then do a single walk through BB testing for uses of those registers? I think so, yes. I'll try that. Don't you have to be careful with MEMs in both blocks? bb_valid_for_noce_process_p called earlier in the chain should have rejected memory operands already. + +/* Emit copies of all the active instructions in BB except the last. + This is a helper for noce_try_cmove_arith. */ + +static void +noce_emit_all_but_last (basic_block bb) +{ + rtx_insn *last = last_active_insn (bb, FALSE); + rtx_insn *insn; + FOR_BB_INSNS (bb, insn) +{ + if (insn != last active_insn_p (insn)) + { + rtx_insn *to_emit = as_a rtx_insn * (copy_rtx (insn)); + + emit_insn (PATTERN (to_emit)); + } +} +} Won't this create invalid RTL sharing? RTL has strict rules about nodes can and can not be shared and I'm pretty sure this blindly shares everything. All the if-conversion functions end up calling end_ifcvt_sequence at the end which seems to unshare the stuff that needs to be unshared. Is that not enough? (I don't have much experience with this part) Now we may get away with that because you're going to delete all the insns from BB. But that begs the question why not just move the insns from BB to their new location rather than re-emiting them? I'll try it out. + +/* Helper for noce_try_cmove_arith. Emit the pattern TO_EMIT and return + the resulting insn or NULL if it's not a valid insn. */ + +static rtx_insn * +noce_emit_insn (rtx to_emit) +{ + gcc_assert (to_emit); + rtx_insn *insn = emit_insn (to_emit); + + if (recog_memoized (insn) 0) +return NULL; + + return insn; +} + +/* Helper for noce_try_cmove_arith. Emit a copy of the insns up to + and including the penultimate one in BB if it is not simple + (as indicated by SIMPLE). Then emit LAST_INSN as the last + insn in the block. The reason for that is that LAST_INSN may + have been modified by the preparation in noce_try_cmove_arith. */ + +static bool +noce_emit_bb (rtx
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 07/24/2015 03:31 AM, Kyrill Tkachov wrote: Wouldn't it be better to walk BB_A, gathering the set of all the registers modified, then do a single walk through BB testing for uses of those registers? I think so, yes. I'll try that. You might look at resource.c -- I haven't looked at it in a long time, but it might have many of the interfaces you need to do this kind of book keeping. Don't you have to be careful with MEMs in both blocks? bb_valid_for_noce_process_p called earlier in the chain should have rejected memory operands already. + +/* Helper for noce_try_cmove_arith. Emit the pattern TO_EMIT and return + the resulting insn or NULL if it's not a valid insn. */ + +static rtx_insn * +noce_emit_insn (rtx to_emit) +{ + gcc_assert (to_emit); + rtx_insn *insn = emit_insn (to_emit); + + if (recog_memoized (insn) 0) +return NULL; + + return insn; +} + +/* Helper for noce_try_cmove_arith. Emit a copy of the insns up to + and including the penultimate one in BB if it is not simple + (as indicated by SIMPLE). Then emit LAST_INSN as the last + insn in the block. The reason for that is that LAST_INSN may + have been modified by the preparation in noce_try_cmove_arith. */ + +static bool +noce_emit_bb (rtx last_insn, basic_block bb, bool simple) +{ + if (bb !simple) +noce_emit_all_but_last (bb); + + if (last_insn !noce_emit_insn (last_insn)) +return false; + + return true; +} Under what conditions can noce_emit_insn fail and what happens to the insn stream if it does? It seems to me like the insn stream would be bogus and we should stop compilation. Which argues that rather than returning a bool, we should just assert that the insn is memoized and remove the check in noce_emit_bb. Or is it the case that we're emitting onto a sequence that we can just throw away in the event of a failure? It fails when the last insn is not recognised, because noce_try_cmove_arith can modify the last insn, but I have not seen it cause any trouble. If it fails then back in noce_try_cmove_arith we goto end_seq_and_fail which ends the sequence and throws it away (and cancels if-conversion down that path), so it should be safe. OK, I was working for the assumption that memoization ought not fail, but it seems that was a bad assumption on my part.So given noce_try_cmove_arith can change the last insn and make it no-recognizable this code seems reasoanble. So I think the only outstanding issues are: 1. Investigate moving rather than re-emitting insns. 2. Investigate a more efficient means to find set/use conflicts between the two blocks, possibly using resource.[ch] jeff
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 07/13/2015 08:03 AM, Kyrill Tkachov wrote: 2015-07-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (struct noce_if_info): Add then_simple, else_simple, then_cost, else_cost fields. Change branch_cost field to unsigned int. (end_ifcvt_sequence): Call set_used_flags on each insn in the sequence. (noce_simple_bbs): New function. (noce_try_move): Bail if basic blocks are not simple. (noce_try_store_flag): Likewise. (noce_try_store_flag_constants): Likewise. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_minmax): Likewise. (noce_try_abs): Likewise. (noce_try_sign_mask): Likewise. (noce_try_bitop): Likewise. (bbs_ok_for_cmove_arith): New function. (noce_emit_all_but_last): Likewise. (noce_emit_insn): Likewise. (noce_emit_bb): Likewise. (noce_try_cmove_arith): Handle non-simple basic blocks. (insn_valid_noce_process_p): New function. (bb_valid_for_noce_process_p): Likewise. (noce_process_if_block): Allow non-simple basic blocks where appropriate. 2015-07-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.dg/ifcvt-1.c: New test. * gcc.dg/ifcvt-2.c: Likewise. * gcc.dg/ifcvt-3.c: Likewise. Thanks, Kyrill cheers, ifcvt.patch commit bc62987a2fa3d9dc3de5a1ed8003a745340255bd Author: Kyrylo Tkachovkyrylo.tkac...@arm.com Date: Wed Jul 8 15:45:04 2015 +0100 [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 31849ee..2f0a228 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -1584,6 +1630,96 @@ noce_try_cmove (struct noce_if_info *if_info) return FALSE; } +/* Return true iff the registers that the insns in BB_A set do not + get used in BB_B. */ + +static bool +bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b) +{ + rtx_insn *a_insn; + FOR_BB_INSNS (bb_a, a_insn) +{ + if (!active_insn_p (a_insn)) + continue; + + rtx sset_a = single_set (a_insn); + + if (!sset_a) + return false; + + rtx dest_reg = SET_DEST (sset_a); + rtx_insn *b_insn; + + FOR_BB_INSNS (bb_b, b_insn) + { + if (!active_insn_p (b_insn)) + continue; + + rtx sset_b = single_set (b_insn); + + if (!sset_b) + return false; + + if (reg_referenced_p (dest_reg, sset_b)) + return false; + } +} + + return true; +} Doesn't this have the potential to get very expensive since you end up looking at every insn in BB_B for every insn in BB_A? Wouldn't it be better to walk BB_A, gathering the set of all the registers modified, then do a single walk through BB testing for uses of those registers? Don't you have to be careful with MEMs in both blocks? + +/* Emit copies of all the active instructions in BB except the last. + This is a helper for noce_try_cmove_arith. */ + +static void +noce_emit_all_but_last (basic_block bb) +{ + rtx_insn *last = last_active_insn (bb, FALSE); + rtx_insn *insn; + FOR_BB_INSNS (bb, insn) +{ + if (insn != last active_insn_p (insn)) + { + rtx_insn *to_emit = as_a rtx_insn * (copy_rtx (insn)); + + emit_insn (PATTERN (to_emit)); + } +} +} Won't this create invalid RTL sharing? RTL has strict rules about nodes can and can not be shared and I'm pretty sure this blindly shares everything. Now we may get away with that because you're going to delete all the insns from BB. But that begs the question why not just move the insns from BB to their new location rather than re-emiting them? + +/* Helper for noce_try_cmove_arith. Emit the pattern TO_EMIT and return + the resulting insn or NULL if it's not a valid insn. */ + +static rtx_insn * +noce_emit_insn (rtx to_emit) +{ + gcc_assert (to_emit); + rtx_insn *insn = emit_insn (to_emit); + + if (recog_memoized (insn) 0) +return NULL; + + return insn; +} + +/* Helper for noce_try_cmove_arith. Emit a copy of the insns up to + and including the penultimate one in BB if it is not simple + (as indicated by SIMPLE). Then emit LAST_INSN as the last + insn in the block. The reason for that is that LAST_INSN may + have been modified by the preparation in noce_try_cmove_arith. */ + +static bool +noce_emit_bb (rtx last_insn, basic_block bb, bool simple) +{ + if (bb !simple) +noce_emit_all_but_last (bb); + + if (last_insn !noce_emit_insn (last_insn)) +return false; + + return true; +} Under what conditions can noce_emit_insn fail and what happens to the insn stream if it does? It seems to me like the insn stream would be bogus and we should stop compilation. Which argues that rather than returning a bool, we should just assert that the insn is memoized and remove the check in noce_emit_bb. Or is it the case that we're emitting onto a sequence that we can just throw away in the
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
Ping. https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01047.html The go testsuite passes for me on x86_64-unknown-linux-gnu for me. A third data point on testing would be appreciated... Thanks, Kyrill On 13/07/15 15:03, Kyrill Tkachov wrote: Hi Bernhard, On 13/07/15 10:45, Kyrill Tkachov wrote: PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;) PPS: attached meant to illustrate comments above. Untested. Thanks a lot! This is all very helpful. I'll respin the patch. Here it is. I've expanded the comments in the functions you mentioned, moved the tests to gcc.dg and enabled them for aarch64 and x86 and changed the types of the costs used to unsigned int. Bootstrapped on aarch64 and x86_64. The go testsuite passes on x86_64-unknown-linux-gnu for me... Thanks, Kyrill 2015-07-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (struct noce_if_info): Add then_simple, else_simple, then_cost, else_cost fields. Change branch_cost field to unsigned int. (end_ifcvt_sequence): Call set_used_flags on each insn in the sequence. (noce_simple_bbs): New function. (noce_try_move): Bail if basic blocks are not simple. (noce_try_store_flag): Likewise. (noce_try_store_flag_constants): Likewise. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_minmax): Likewise. (noce_try_abs): Likewise. (noce_try_sign_mask): Likewise. (noce_try_bitop): Likewise. (bbs_ok_for_cmove_arith): New function. (noce_emit_all_but_last): Likewise. (noce_emit_insn): Likewise. (noce_emit_bb): Likewise. (noce_try_cmove_arith): Handle non-simple basic blocks. (insn_valid_noce_process_p): New function. (bb_valid_for_noce_process_p): Likewise. (noce_process_if_block): Allow non-simple basic blocks where appropriate. 2015-07-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.dg/ifcvt-1.c: New test. * gcc.dg/ifcvt-2.c: Likewise. * gcc.dg/ifcvt-3.c: Likewise. Thanks, Kyrill cheers,
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
Hi Bernhard, On 11/07/15 00:00, Bernhard Reutner-Fischer wrote: On 10 July 2015 at 14:31, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch makes if-conversion more aggressive when handling code of the form: if (test) x := a //THEN else x := b //ELSE The current code adds the costs of both the THEN and ELSE blocks and proceeds if they don't exceed the branch cost. I don't think that's quite a right calculation. We're going to be executing at least one of the basic blocks anyway. This patch we instead check the *maximum* of the two blocks against the branch cost. This should still catch cases where a high latency instruction appears in one or both of the paths. Shouldn't this maximum also take probability into account? Or maybe not, would have to think about it tomorrow. The branch cost that we test against (recorded in if_info earlier in the ifcvt.c call chain) is either the predictable branch cost or the unpredictable branch cost, depending on what the predictable_edge_p machinery returned. I think checking against that should be enough, but it's an easy thing to experiment with, so I'm open to arguments in any direction. $ contrib/check_GNU_style.sh rtl-ifcvt.00.patch Blocks of 8 spaces should be replaced with tabs. 783:+return FALSE; Generally ifcvt.c (resp. the whole tree) could use a sed -i -e s/\([[:space:]]\)FALSE/\1false/g gcc/ifcvt.c Maybe some of the int predicates could then become bools. Ok, will go over the style in the patch. +/* Return iff the registers that the insns in BB_A set do not + get used in BB_B. */ Return true iff I tried to be too formal here ;) https://en.wikipedia.org/wiki/If_and_only_if I'll use a normal if here. Did you include go in your testing? I see: Unexpected results in this build (new failures) FAIL: encoding/json FAIL: go/printer FAIL: go/scanner FAIL: html/template FAIL: log FAIL: net/http FAIL: net/http/cgi FAIL: net/http/cookiejar FAIL: os FAIL: text/template Hmmm. I don't see these failures. I double checked right now and they appear as PASS in my configuration. I tested make check-go on x86_64-unknown-linux-gnu configured with --without-isl --disable-multilib --enable-languages=c,c++,fortran,go. Are you sure this is not some other issue in your tree? bbs_ok_for_cmove_arith() looks costly but i guess you looked if there's some pre-existing cleverness you could have used instead? I did have a look, but couldn't find any. The bbs_ok_for_cmove_arith is done after the costs check so I'd hope that the costs check would already discard really long basic-blocks. noce_emit_bb() could use a better comment. Likewise insn_valid_noce_process_p(). insn_rtx_cost() should return an unsigned int, then_cost, else_cost should thus be unsigned int too. copy_of_a versus copy_of_insn_b; I'd shorten the latter. Thanks, good suggestions. bb_valid_for_noce_process_p() suggests that there is a JOIN_BB param but there is none? Also should document the return value (and should not clobber the OUT params upon failure, no?). bah, I forgot to update the comment once I modified the function during development of the patch. I'll fix those. As for the testcases, it would be nice to have at least a tiny bit for x86_64, too. I can put the testcases in gcc.dg and enable them for x86 as well, but I think a couple of the already pass as is because x86 doesn't need to do an extra zero_extend inside the basic-block. PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;) PPS: attached meant to illustrate comments above. Untested. Thanks a lot! This is all very helpful. I'll respin the patch. Thanks, Kyrill cheers,
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On July 13, 2015 11:45:55 AM GMT+02:00, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi Bernhard, Did you include go in your testing? I see: Unexpected results in this build (new failures) FAIL: encoding/json FAIL: go/printer FAIL: go/scanner FAIL: html/template FAIL: log FAIL: net/http FAIL: net/http/cgi FAIL: net/http/cookiejar FAIL: os FAIL: text/template Hmmm. I don't see these failures. I double checked right now and they appear as PASS in my configuration. I tested make check-go on x86_64-unknown-linux-gnu configured with --without-isl --disable-multilib --enable-languages=c,c++,fortran,go. Are you sure this is not some other issue in your tree? I have ISL enabled. I do have a couple of local stuff but that tested fine before your patch and should not really have impact on the parts your patch touches. So maybe it's ISL. Thanks,
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 13/07/15 10:45, Kyrill Tkachov wrote: +/* Return iff the registers that the insns in BB_A set do not + get used in BB_B. */ Return true iff I tried to be too formal here ;) https://en.wikipedia.org/wiki/If_and_only_if I'll use a normal if here. Err, of course you were talking about the missing 'true'. Sorry, early morning. Kyrill
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
Hi Bernhard, On 13/07/15 10:45, Kyrill Tkachov wrote: PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;) PPS: attached meant to illustrate comments above. Untested. Thanks a lot! This is all very helpful. I'll respin the patch. Here it is. I've expanded the comments in the functions you mentioned, moved the tests to gcc.dg and enabled them for aarch64 and x86 and changed the types of the costs used to unsigned int. Bootstrapped on aarch64 and x86_64. The go testsuite passes on x86_64-unknown-linux-gnu for me... Thanks, Kyrill 2015-07-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (struct noce_if_info): Add then_simple, else_simple, then_cost, else_cost fields. Change branch_cost field to unsigned int. (end_ifcvt_sequence): Call set_used_flags on each insn in the sequence. (noce_simple_bbs): New function. (noce_try_move): Bail if basic blocks are not simple. (noce_try_store_flag): Likewise. (noce_try_store_flag_constants): Likewise. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_minmax): Likewise. (noce_try_abs): Likewise. (noce_try_sign_mask): Likewise. (noce_try_bitop): Likewise. (bbs_ok_for_cmove_arith): New function. (noce_emit_all_but_last): Likewise. (noce_emit_insn): Likewise. (noce_emit_bb): Likewise. (noce_try_cmove_arith): Handle non-simple basic blocks. (insn_valid_noce_process_p): New function. (bb_valid_for_noce_process_p): Likewise. (noce_process_if_block): Allow non-simple basic blocks where appropriate. 2015-07-13 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.dg/ifcvt-1.c: New test. * gcc.dg/ifcvt-2.c: Likewise. * gcc.dg/ifcvt-3.c: Likewise. Thanks, Kyrill cheers, commit bc62987a2fa3d9dc3de5a1ed8003a745340255bd Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Wed Jul 8 15:45:04 2015 +0100 [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 31849ee..2f0a228 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -815,8 +815,17 @@ struct noce_if_info form as well. */ bool then_else_reversed; + /* True if the contents of then_bb and else_bb are a + simple single set instruction. */ + bool then_simple; + bool else_simple; + + /* The total rtx cost of the instructions in then_bb and else_bb. */ + unsigned int then_cost; + unsigned int else_cost; + /* Estimated cost of the particular branch instruction. */ - int branch_cost; + unsigned int branch_cost; }; static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int); @@ -1036,6 +1045,10 @@ end_ifcvt_sequence (struct noce_if_info *if_info) set_used_flags (if_info-cond); set_used_flags (if_info-a); set_used_flags (if_info-b); + + for (insn = seq; insn; insn = NEXT_INSN (insn)) +set_used_flags (insn); + unshare_all_rtl_in_chain (seq); end_sequence (); @@ -1053,6 +1066,21 @@ end_ifcvt_sequence (struct noce_if_info *if_info) return seq; } +/* Return true iff the then and else basic block (if it exists) + consist of a single simple set instruction. */ + +static bool +noce_simple_bbs (struct noce_if_info *if_info) +{ + if (!if_info-then_simple) +return false; + + if (if_info-else_bb) +return if_info-else_simple; + + return true; +} + /* Convert if (a != b) x = a; else x = b into x = a and if (a == b) x = a; else x = b into x = b. */ @@ -1067,6 +1095,9 @@ noce_try_move (struct noce_if_info *if_info) if (code != NE code != EQ) return FALSE; + if (!noce_simple_bbs (if_info)) +return FALSE; + /* This optimization isn't valid if either A or B could be a NaN or a signed zero. */ if (HONOR_NANS (if_info-x) @@ -1115,6 +1146,9 @@ noce_try_store_flag (struct noce_if_info *if_info) rtx target; rtx_insn *seq; + if (!noce_simple_bbs (if_info)) +return FALSE; + if (CONST_INT_P (if_info-b) INTVAL (if_info-b) == STORE_FLAG_VALUE if_info-a == const0_rtx) @@ -1163,6 +1197,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) int normalize, can_reverse; machine_mode mode; + if (!noce_simple_bbs (if_info)) +return FALSE; + if (CONST_INT_P (if_info-a) CONST_INT_P (if_info-b)) { @@ -1291,6 +1328,9 @@ noce_try_addcc (struct noce_if_info *if_info) rtx_insn *seq; int subtract, normalize; + if (!noce_simple_bbs (if_info)) +return FALSE; + if (GET_CODE (if_info-a) == PLUS rtx_equal_p (XEXP (if_info-a, 0), if_info-b) (reversed_comparison_code (if_info-cond, if_info-jump) @@ -1382,6 +1422,9 @@ noce_try_store_flag_mask (struct noce_if_info *if_info) rtx_insn *seq; int reversep; + if (!noce_simple_bbs (if_info)) +return FALSE; + reversep = 0; if
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 13/07/15 11:48, Bernhard Reutner-Fischer wrote: On July 13, 2015 11:45:55 AM GMT+02:00, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi Bernhard, Did you include go in your testing? I see: Unexpected results in this build (new failures) FAIL: encoding/json FAIL: go/printer FAIL: go/scanner FAIL: html/template FAIL: log FAIL: net/http FAIL: net/http/cgi FAIL: net/http/cookiejar FAIL: os FAIL: text/template Hmmm. I don't see these failures. I double checked right now and they appear as PASS in my configuration. I tested make check-go on x86_64-unknown-linux-gnu configured with --without-isl --disable-multilib --enable-languages=c,c++,fortran,go. Are you sure this is not some other issue in your tree? I have ISL enabled. I do have a couple of local stuff but that tested fine before your patch and should not really have impact on the parts your patch touches. So maybe it's ISL. I've rebuilt with ISL from scratch and the tests still pass for me. I'm testing Ubuntu on a Haswell machine, don't know if that's relevant. Kyrill Thanks,
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 10/07/15 13:31, Kyrill Tkachov wrote: + to compute a value for x. Put the rtx cost of the insns + in TEST_BB into COST. Record whether TEST_BB is a single simple + set instruction in SIMPLE_P. If the bb is not simple place all insns + except the last insn into SEQ. */ + That last sentence is stale. That function doesn't have a SEQ argument. Consider that comment sentence removed. Thanks, Kyrill
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 10 July 2015 at 14:31, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch makes if-conversion more aggressive when handling code of the form: if (test) x := a //THEN else x := b //ELSE The current code adds the costs of both the THEN and ELSE blocks and proceeds if they don't exceed the branch cost. I don't think that's quite a right calculation. We're going to be executing at least one of the basic blocks anyway. This patch we instead check the *maximum* of the two blocks against the branch cost. This should still catch cases where a high latency instruction appears in one or both of the paths. Shouldn't this maximum also take probability into account? Or maybe not, would have to think about it tomorrow. $ contrib/check_GNU_style.sh rtl-ifcvt.00.patch Blocks of 8 spaces should be replaced with tabs. 783:+return FALSE; Generally ifcvt.c (resp. the whole tree) could use a sed -i -e s/\([[:space:]]\)FALSE/\1false/g gcc/ifcvt.c Maybe some of the int predicates could then become bools. +/* Return iff the registers that the insns in BB_A set do not + get used in BB_B. */ Return true iff Did you include go in your testing? I see: Unexpected results in this build (new failures) FAIL: encoding/json FAIL: go/printer FAIL: go/scanner FAIL: html/template FAIL: log FAIL: net/http FAIL: net/http/cgi FAIL: net/http/cookiejar FAIL: os FAIL: text/template bbs_ok_for_cmove_arith() looks costly but i guess you looked if there's some pre-existing cleverness you could have used instead? noce_emit_bb() could use a better comment. Likewise insn_valid_noce_process_p(). insn_rtx_cost() should return an unsigned int, then_cost, else_cost should thus be unsigned int too. copy_of_a versus copy_of_insn_b; I'd shorten the latter. bb_valid_for_noce_process_p() suggests that there is a JOIN_BB param but there is none? Also should document the return value (and should not clobber the OUT params upon failure, no?). As for the testcases, it would be nice to have at least a tiny bit for x86_64, too. PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;) PPS: attached meant to illustrate comments above. Untested. cheers, From 1b7d8f9b61eb538cc4338e2073d04a66518f13c2 Mon Sep 17 00:00:00 2001 From: Bernhard Reutner-Fischer rep.dot@gmail.com Date: Fri, 10 Jul 2015 21:25:30 +0200 Subject: [PATCH] rtl-ifcvt typos fix some typos on top of Kyrill's patch. should mv the testcases to common ground. --- gcc/ifcvt.c | 49 ++- gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c |2 +- gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c |2 +- gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c |7 ++-- 4 files changed, 27 insertions(+), 33 deletions(-) diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 3d324257..0bf6645 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -1784,8 +1784,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info) /* We're going to execute one of the basic blocks anyway, so bail out if the most expensive of the two blocks is unacceptable. */ - if (MAX (then_cost, else_cost) - COSTS_N_INSNS (if_info-branch_cost)) + if (MAX (then_cost, else_cost) COSTS_N_INSNS (if_info-branch_cost)) return FALSE; /* Possibly rearrange operands to make things come out more natural. */ @@ -2730,28 +2729,26 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond, rtx cc = cc_in_cond (cond); if (!insn_valid_noce_process_p (last_insn, cc)) -return FALSE; +return false; last_set = single_set (last_insn); rtx x = SET_DEST (last_set); - rtx_insn *first_insn = first_active_insn (test_bb); rtx first_set = single_set (first_insn); - bool speed_p = optimize_bb_for_speed_p (test_bb); - *cost = insn_rtx_cost (last_set, speed_p); if (!first_set) return false; + /* We have a single simple set, that's okay. */ - else if (first_insn == last_insn) + bool speed_p = optimize_bb_for_speed_p (test_bb); + + if (first_insn == last_insn) { *simple_p = noce_operand_ok (SET_DEST (first_set)); *cost = insn_rtx_cost (first_set, speed_p); return *simple_p; } - *simple_p = false; - rtx_insn *prev_last_insn = PREV_INSN (last_insn); gcc_assert (prev_last_insn); @@ -2764,6 +2761,7 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond, /* The regs that are live out of test_bb. */ bitmap test_bb_live_out = df_get_live_out (test_bb); + int potential_cost = insn_rtx_cost (last_set, speed_p); rtx_insn *insn; FOR_BB_INSNS (test_bb, insn) { @@ -2781,7 +2779,7 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond, if (MEM_P (SET_SRC (sset)) || MEM_P (SET_DEST (sset))) goto free_bitmap_and_fail; - *cost += insn_rtx_cost (sset, speed_p); + potential_cost += insn_rtx_cost (sset, speed_p); bitmap_set_bit (test_bb_temps, REGNO (SET_DEST (sset))); } } @@
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 11 July 2015 at 01:00, Bernhard Reutner-Fischer rep.dot@gmail.com wrote: On 10 July 2015 at 14:31, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;) err, arm and aarch64 have no -mbranch-cost, a couple of prominent other arches do..