Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-09-05 Thread H.J. Lu
On Wed, Sep 2, 2015 at 9:01 AM, Kyrill Tkachov  wrote:
>
> On 02/09/15 16:18, Zamyatin, Igor wrote:
>>>
>>>
>>> On 19/08/15 17:57, Jeff Law wrote:

 On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:
>
> 2015-08-10  Kyrylo Tkachov 
>
>* ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
>then_cost, else_cost fields.  Change branch_cost field to
> unsigned int.
>(end_ifcvt_sequence): Call set_used_flags on each insn in the
>sequence.
>Include rtl-iter.h.
>(noce_simple_bbs): New function.
>(noce_try_move): Bail if basic blocks are not simple.
>(noce_try_store_flag): Likewise.
>(noce_try_store_flag_constants): Likewise.
>(noce_try_addcc): Likewise.
>(noce_try_store_flag_mask): Likewise.
>(noce_try_cmove): Likewise.
>(noce_try_minmax): Likewise.
>(noce_try_abs): Likewise.
>(noce_try_sign_mask): Likewise.
>(noce_try_bitop): Likewise.
>(bbs_ok_for_cmove_arith): New function.
>(noce_emit_all_but_last): Likewise.
>(noce_emit_insn): Likewise.
>(noce_emit_bb): Likewise.
>(noce_try_cmove_arith): Handle non-simple basic blocks.
>(insn_valid_noce_process_p): New function.
>(contains_mem_rtx_p): Likewise.
>(bb_valid_for_noce_process_p): Likewise.
>(noce_process_if_block): Allow non-simple basic blocks
>where appropriate.
>
> 2015-08-11  Kyrylo Tkachov 
>
>* gcc.dg/ifcvt-1.c: New test.
>* gcc.dg/ifcvt-2.c: Likewise.
>* gcc.dg/ifcvt-3.c: Likewise.
>>
>> Looks like ifcvt-3.c fails on x86_64. I see
>>
>> New failures:
>> FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
>>
>> Could you please take a look?
>
>
> Hmm, these pass for me on x86_64-pc-linux-gnu.
> The test is most probably failing due to branch costs being too low for the
> transformation to kick in. The test passes for me with -mtune=intel and
> -mtune=generic.
> Do you know what the default tuning CPU is used for that failing test?

I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462


-- 
H.J.


RE: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-09-02 Thread Zamyatin, Igor
> 
> 
> On 19/08/15 17:57, Jeff Law wrote:
> > On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:
> >> 2015-08-10  Kyrylo Tkachov 
> >>
> >>   * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
> >>   then_cost, else_cost fields.  Change branch_cost field to
> >> unsigned int.
> >>   (end_ifcvt_sequence): Call set_used_flags on each insn in the
> >>   sequence.
> >>   Include rtl-iter.h.
> >>   (noce_simple_bbs): New function.
> >>   (noce_try_move): Bail if basic blocks are not simple.
> >>   (noce_try_store_flag): Likewise.
> >>   (noce_try_store_flag_constants): Likewise.
> >>   (noce_try_addcc): Likewise.
> >>   (noce_try_store_flag_mask): Likewise.
> >>   (noce_try_cmove): Likewise.
> >>   (noce_try_minmax): Likewise.
> >>   (noce_try_abs): Likewise.
> >>   (noce_try_sign_mask): Likewise.
> >>   (noce_try_bitop): Likewise.
> >>   (bbs_ok_for_cmove_arith): New function.
> >>   (noce_emit_all_but_last): Likewise.
> >>   (noce_emit_insn): Likewise.
> >>   (noce_emit_bb): Likewise.
> >>   (noce_try_cmove_arith): Handle non-simple basic blocks.
> >>   (insn_valid_noce_process_p): New function.
> >>   (contains_mem_rtx_p): Likewise.
> >>   (bb_valid_for_noce_process_p): Likewise.
> >>   (noce_process_if_block): Allow non-simple basic blocks
> >>   where appropriate.
> >>
> >> 2015-08-11  Kyrylo Tkachov 
> >>
> >>   * gcc.dg/ifcvt-1.c: New test.
> >>   * gcc.dg/ifcvt-2.c: Likewise.
> >>   * gcc.dg/ifcvt-3.c: Likewise.

Looks like ifcvt-3.c fails on x86_64. I see

New failures:
FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"

Could you please take a look?

Thanks,
Igor



Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-09-02 Thread Kyrill Tkachov


On 02/09/15 16:18, Zamyatin, Igor wrote:


On 19/08/15 17:57, Jeff Law wrote:

On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:

2015-08-10  Kyrylo Tkachov 

   * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
   then_cost, else_cost fields.  Change branch_cost field to
unsigned int.
   (end_ifcvt_sequence): Call set_used_flags on each insn in the
   sequence.
   Include rtl-iter.h.
   (noce_simple_bbs): New function.
   (noce_try_move): Bail if basic blocks are not simple.
   (noce_try_store_flag): Likewise.
   (noce_try_store_flag_constants): Likewise.
   (noce_try_addcc): Likewise.
   (noce_try_store_flag_mask): Likewise.
   (noce_try_cmove): Likewise.
   (noce_try_minmax): Likewise.
   (noce_try_abs): Likewise.
   (noce_try_sign_mask): Likewise.
   (noce_try_bitop): Likewise.
   (bbs_ok_for_cmove_arith): New function.
   (noce_emit_all_but_last): Likewise.
   (noce_emit_insn): Likewise.
   (noce_emit_bb): Likewise.
   (noce_try_cmove_arith): Handle non-simple basic blocks.
   (insn_valid_noce_process_p): New function.
   (contains_mem_rtx_p): Likewise.
   (bb_valid_for_noce_process_p): Likewise.
   (noce_process_if_block): Allow non-simple basic blocks
   where appropriate.

2015-08-11  Kyrylo Tkachov 

   * gcc.dg/ifcvt-1.c: New test.
   * gcc.dg/ifcvt-2.c: Likewise.
   * gcc.dg/ifcvt-3.c: Likewise.

Looks like ifcvt-3.c fails on x86_64. I see

New failures:
FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"

Could you please take a look?


Hmm, these pass for me on x86_64-pc-linux-gnu.
The test is most probably failing due to branch costs being too low for the
transformation to kick in. The test passes for me with -mtune=intel and 
-mtune=generic.
Do you know what the default tuning CPU is used for that failing test?

Thanks,
Kyrill



Thanks,
Igor





Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-09-02 Thread Jeff Law

On 09/02/2015 09:18 AM, Zamyatin, Igor wrote:



On 19/08/15 17:57, Jeff Law wrote:

On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:

2015-08-10  Kyrylo Tkachov 

   * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
   then_cost, else_cost fields.  Change branch_cost field to
unsigned int.
   (end_ifcvt_sequence): Call set_used_flags on each insn in the
   sequence.
   Include rtl-iter.h.
   (noce_simple_bbs): New function.
   (noce_try_move): Bail if basic blocks are not simple.
   (noce_try_store_flag): Likewise.
   (noce_try_store_flag_constants): Likewise.
   (noce_try_addcc): Likewise.
   (noce_try_store_flag_mask): Likewise.
   (noce_try_cmove): Likewise.
   (noce_try_minmax): Likewise.
   (noce_try_abs): Likewise.
   (noce_try_sign_mask): Likewise.
   (noce_try_bitop): Likewise.
   (bbs_ok_for_cmove_arith): New function.
   (noce_emit_all_but_last): Likewise.
   (noce_emit_insn): Likewise.
   (noce_emit_bb): Likewise.
   (noce_try_cmove_arith): Handle non-simple basic blocks.
   (insn_valid_noce_process_p): New function.
   (contains_mem_rtx_p): Likewise.
   (bb_valid_for_noce_process_p): Likewise.
   (noce_process_if_block): Allow non-simple basic blocks
   where appropriate.

2015-08-11  Kyrylo Tkachov 

   * gcc.dg/ifcvt-1.c: New test.
   * gcc.dg/ifcvt-2.c: Likewise.
   * gcc.dg/ifcvt-3.c: Likewise.


Looks like ifcvt-3.c fails on x86_64. I see

New failures:
FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"

Could you please take a look?

Hmm, FWIW, these are passing in my builds/tests:
[law@localhost c-family]$ grep ifcvt-3 /tmp/gcc.sum
PASS: gcc.dg/ifcvt-3.c (test for excess errors)
PASS: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
PASS: gcc.dg/vect/vect-ifcvt-3.c (test for excess errors)
PASS: gcc.dg/vect/vect-ifcvt-3.c -flto -ffat-lto-objects 
scan-tree-dump-times vect "vectorized 1 loops" 1
PASS: gcc.dg/vect/vect-ifcvt-3.c -flto -ffat-lto-objects (test for 
excess errors)

PASS: gcc.dg/vect/vect-ifcvt-3.c -flto -ffat-lto-objects execution test
PASS: gcc.dg/vect/vect-ifcvt-3.c execution test
PASS: gcc.dg/vect/vect-ifcvt-3.c scan-tree-dump-times vect "vectorized 1 
loops" 1

Jeff




RE: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-08-19 Thread Kyrill Tkachov
Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00609.html

Thanks,
Kyrill

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Kyrill Tkachov
 Sent: 12 August 2015 15:32
 To: Jeff Law; Steven Bosscher
 Cc: Bernhard Reutner-Fischer; GCC Patches
 Subject: Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-
 conversion more aggressive
 
 
 On 11/08/15 18:09, Kyrill Tkachov wrote:
  On 11/08/15 18:05, Jeff Law wrote:
  On 08/09/2015 03:20 PM, Steven Bosscher wrote:
  On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law l...@redhat.com wrote:
  So there's a tight relationship between the implementation of
  bbs_ok_for_cmove_arith and insn_valid_noce_process_p.  If there
  wasn't, then we'd probably be looking to use note_stores and
 note_uses.
  Perhaps I'm missing something, but what is wrong with using DF here
  instead of note_stores/note_uses? All the info on refs/mods of
  registers is available in the DF caches.
  Nothing inherently wrong with using DF here.
  I have reworked the patch to use FOR_EACH_INSN_DEF and
  FOR_EACH_INSN_USE in bbs_ok_for_cmove_arith to extracts the
 refs/mods and it seems to work.
  Is that what you mean by DF?
  I'm doing some more testing and hope to post the updated version soon.
 
 Here it is, I've used the FOR_EACH* macros from dataflow to gather the uses
 and sets.
 
 Bootstrapped and tested on x86_64 and aarch64.
 How does this look?
 
 Thanks,
 Kyrill
 
 2015-08-10  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
  * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
  then_cost, else_cost fields.  Change branch_cost field to unsigned int.
  (end_ifcvt_sequence): Call set_used_flags on each insn in the
  sequence.
  Include rtl-iter.h.
  (noce_simple_bbs): New function.
  (noce_try_move): Bail if basic blocks are not simple.
  (noce_try_store_flag): Likewise.
  (noce_try_store_flag_constants): Likewise.
  (noce_try_addcc): Likewise.
  (noce_try_store_flag_mask): Likewise.
  (noce_try_cmove): Likewise.
  (noce_try_minmax): Likewise.
  (noce_try_abs): Likewise.
  (noce_try_sign_mask): Likewise.
  (noce_try_bitop): Likewise.
  (bbs_ok_for_cmove_arith): New function.
  (noce_emit_all_but_last): Likewise.
  (noce_emit_insn): Likewise.
  (noce_emit_bb): Likewise.
  (noce_try_cmove_arith): Handle non-simple basic blocks.
  (insn_valid_noce_process_p): New function.
  (contains_mem_rtx_p): Likewise.
  (bb_valid_for_noce_process_p): Likewise.
  (noce_process_if_block): Allow non-simple basic blocks
  where appropriate.
 
 2015-08-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
  * gcc.dg/ifcvt-1.c: New test.
  * gcc.dg/ifcvt-2.c: Likewise.
  * gcc.dg/ifcvt-3.c: Likewise.
 
  Thanks,
  Kyrill
 
  jeff
 





Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-08-19 Thread Kyrill Tkachov


On 19/08/15 17:57, Jeff Law wrote:

On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:

2015-08-10  Kyrylo Tkachov kyrylo.tkac...@arm.com

  * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
  then_cost, else_cost fields.  Change branch_cost field to unsigned
int.
  (end_ifcvt_sequence): Call set_used_flags on each insn in the
  sequence.
  Include rtl-iter.h.
  (noce_simple_bbs): New function.
  (noce_try_move): Bail if basic blocks are not simple.
  (noce_try_store_flag): Likewise.
  (noce_try_store_flag_constants): Likewise.
  (noce_try_addcc): Likewise.
  (noce_try_store_flag_mask): Likewise.
  (noce_try_cmove): Likewise.
  (noce_try_minmax): Likewise.
  (noce_try_abs): Likewise.
  (noce_try_sign_mask): Likewise.
  (noce_try_bitop): Likewise.
  (bbs_ok_for_cmove_arith): New function.
  (noce_emit_all_but_last): Likewise.
  (noce_emit_insn): Likewise.
  (noce_emit_bb): Likewise.
  (noce_try_cmove_arith): Handle non-simple basic blocks.
  (insn_valid_noce_process_p): New function.
  (contains_mem_rtx_p): Likewise.
  (bb_valid_for_noce_process_p): Likewise.
  (noce_process_if_block): Allow non-simple basic blocks
  where appropriate.

2015-08-11  Kyrylo Tkachov kyrylo.tkac...@arm.com

  * gcc.dg/ifcvt-1.c: New test.
  * gcc.dg/ifcvt-2.c: Likewise.
  * gcc.dg/ifcvt-3.c: Likewise.

Thanks for pinging -- I thought I'd already approved this a few days
ago!  But I can't find it in my outbox...  So clearly I didn't finish
the final review.




diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 1f29646..c33fe24 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1625,6 +1672,152 @@ noce_try_cmove (struct noce_if_info *if_info)
 return FALSE;
   }

+/* Helper for bb_valid_for_noce_process_p.  Validate that
+   the rtx insn INSN is a single set that does not set
+   the conditional register CC and is in general valid for
+   if-conversion.  */
+
+static bool
+insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
+{
+  if (!insn
+  || !NONJUMP_INSN_P (insn)
+  || (cc  set_of (cc, insn)))
+  return false;
+
+  rtx sset = single_set (insn);
+
+  /* Currently support only simple single sets in test_bb.  */
+  if (!sset
+  || !noce_operand_ok (SET_DEST (sset))
+  || !noce_operand_ok (SET_SRC (sset)))
+return false;
+
+  return true;
+}
+
+
+  /* Make sure this is a REG and not some instance
+of ZERO_EXTRACT or SUBREG or other dangerous stuff.  */
+  if (!REG_P (SET_DEST (sset_b)))
+   {
+ BITMAP_FREE (bba_sets);
+ return false;
+   }

BTW, this is overly conservative.  You're working with pseudos here, so
you can just treat a ZERO_EXTRACT or SUBREG as a read of the full
underlying register.  If this comes up in practice you might consider
handling them as a follow-up patch.  I don't think you need to handle
that case immediately though.


I agree, from my testing and investigation this patch strictly
increases the available opportunities, so being conservative here
should not cause any regressions against existing behaviour.
Making it more aggressive can be done as a follow up though, as you said,
I'm not sure how frequently this comes up in practice.



I also can't remember if we discussed what happens if blocks A  B write
to the same register, do we handle that situation correctly?


Hmmm...
The function bb_valid_for_noce_process_p that we call early on
in noce_process_if_block makes sure that the only live reg out
of each basic block is the final common destination ('x' in the
noce_if_info struct definition). Since both basic blocks satisfy
that check I suppose that means that even if A and B do write to
the same intermediate pseudo that should not affect correctness
since the written-to common register would have to be read within
A and B (and nowhere outside A and B), which would cause it to
fail the bbs_ok_for_cmove_arith check that checks that no regs
written in A are read in B (and vice versa).



That's the only issue left in my mind.  If we're handling that case
correctly, then this is Ok for the trunk as-is.  Else we'll need another
iteration.


Does the above explanation look ok to you?
If so, I'll be away for a week from Monday so I'd rather commit
the patch when I get back so I can deal with any fallout...

Thanks for the reviews!
Kyrill



Thanks,
Jeff






Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-08-19 Thread Jeff Law

On 08/12/2015 08:31 AM, Kyrill Tkachov wrote:


2015-08-10  Kyrylo Tkachov kyrylo.tkac...@arm.com

 * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
 then_cost, else_cost fields.  Change branch_cost field to unsigned
int.
 (end_ifcvt_sequence): Call set_used_flags on each insn in the
 sequence.
 Include rtl-iter.h.
 (noce_simple_bbs): New function.
 (noce_try_move): Bail if basic blocks are not simple.
 (noce_try_store_flag): Likewise.
 (noce_try_store_flag_constants): Likewise.
 (noce_try_addcc): Likewise.
 (noce_try_store_flag_mask): Likewise.
 (noce_try_cmove): Likewise.
 (noce_try_minmax): Likewise.
 (noce_try_abs): Likewise.
 (noce_try_sign_mask): Likewise.
 (noce_try_bitop): Likewise.
 (bbs_ok_for_cmove_arith): New function.
 (noce_emit_all_but_last): Likewise.
 (noce_emit_insn): Likewise.
 (noce_emit_bb): Likewise.
 (noce_try_cmove_arith): Handle non-simple basic blocks.
 (insn_valid_noce_process_p): New function.
 (contains_mem_rtx_p): Likewise.
 (bb_valid_for_noce_process_p): Likewise.
 (noce_process_if_block): Allow non-simple basic blocks
 where appropriate.

2015-08-11  Kyrylo Tkachov kyrylo.tkac...@arm.com

 * gcc.dg/ifcvt-1.c: New test.
 * gcc.dg/ifcvt-2.c: Likewise.
 * gcc.dg/ifcvt-3.c: Likewise.
Thanks for pinging -- I thought I'd already approved this a few days 
ago!  But I can't find it in my outbox...  So clearly I didn't finish 
the final review.





diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 1f29646..c33fe24 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1625,6 +1672,152 @@ noce_try_cmove (struct noce_if_info *if_info)
return FALSE;
  }

+/* Helper for bb_valid_for_noce_process_p.  Validate that
+   the rtx insn INSN is a single set that does not set
+   the conditional register CC and is in general valid for
+   if-conversion.  */
+
+static bool
+insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
+{
+  if (!insn
+  || !NONJUMP_INSN_P (insn)
+  || (cc  set_of (cc, insn)))
+  return false;
+
+  rtx sset = single_set (insn);
+
+  /* Currently support only simple single sets in test_bb.  */
+  if (!sset
+  || !noce_operand_ok (SET_DEST (sset))
+  || !noce_operand_ok (SET_SRC (sset)))
+return false;
+
+  return true;
+}
+
+



+  /* Make sure this is a REG and not some instance
+of ZERO_EXTRACT or SUBREG or other dangerous stuff.  */
+  if (!REG_P (SET_DEST (sset_b)))
+   {
+ BITMAP_FREE (bba_sets);
+ return false;
+   }
BTW, this is overly conservative.  You're working with pseudos here, so 
you can just treat a ZERO_EXTRACT or SUBREG as a read of the full 
underlying register.  If this comes up in practice you might consider 
handling them as a follow-up patch.  I don't think you need to handle 
that case immediately though.


I also can't remember if we discussed what happens if blocks A  B write 
to the same register, do we handle that situation correctly?


That's the only issue left in my mind.  If we're handling that case 
correctly, then this is Ok for the trunk as-is.  Else we'll need another 
iteration.


Thanks,
Jeff




Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-08-19 Thread Jeff Law

On 08/19/2015 11:20 AM, Kyrill Tkachov wrote:


Hmmm...
The function bb_valid_for_noce_process_p that we call early on
in noce_process_if_block makes sure that the only live reg out
of each basic block is the final common destination ('x' in the
noce_if_info struct definition). Since both basic blocks satisfy
that check I suppose that means that even if A and B do write to
the same intermediate pseudo that should not affect correctness
since the written-to common register would have to be read within
A and B (and nowhere outside A and B), which would cause it to
fail the bbs_ok_for_cmove_arith check that checks that no regs
written in A are read in B (and vice versa).
Excellent.  That answers things quite well.  In retrospect, I could have 
figured that out myself :-)






That's the only issue left in my mind.  If we're handling that case
correctly, then this is Ok for the trunk as-is.  Else we'll need another
iteration.


Does the above explanation look ok to you?
If so, I'll be away for a week from Monday so I'd rather commit
the patch when I get back so I can deal with any fallout...

That's fine with me.  Commit at your leisure.

Thanks,
jeff



Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-08-12 Thread Kyrill Tkachov


On 11/08/15 18:09, Kyrill Tkachov wrote:

On 11/08/15 18:05, Jeff Law wrote:

On 08/09/2015 03:20 PM, Steven Bosscher wrote:

On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law l...@redhat.com wrote:

So there's a tight relationship between the implementation of
bbs_ok_for_cmove_arith and insn_valid_noce_process_p.  If there wasn't, then
we'd probably be looking to use note_stores and note_uses.

Perhaps I'm missing something, but what is wrong with using DF here
instead of note_stores/note_uses? All the info on refs/mods of
registers is available in the DF caches.

Nothing inherently wrong with using DF here.

I have reworked the patch to use FOR_EACH_INSN_DEF and FOR_EACH_INSN_USE
in bbs_ok_for_cmove_arith to extracts the refs/mods and it seems to work.
Is that what you mean by DF?
I'm doing some more testing and hope to post the updated version soon.


Here it is, I've used the FOR_EACH* macros from dataflow
to gather the uses and sets.

Bootstrapped and tested on x86_64 and aarch64.
How does this look?

Thanks,
Kyrill

2015-08-10  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
then_cost, else_cost fields.  Change branch_cost field to unsigned int.
(end_ifcvt_sequence): Call set_used_flags on each insn in the
sequence.
Include rtl-iter.h.
(noce_simple_bbs): New function.
(noce_try_move): Bail if basic blocks are not simple.
(noce_try_store_flag): Likewise.
(noce_try_store_flag_constants): Likewise.
(noce_try_addcc): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_cmove): Likewise.
(noce_try_minmax): Likewise.
(noce_try_abs): Likewise.
(noce_try_sign_mask): Likewise.
(noce_try_bitop): Likewise.
(bbs_ok_for_cmove_arith): New function.
(noce_emit_all_but_last): Likewise.
(noce_emit_insn): Likewise.
(noce_emit_bb): Likewise.
(noce_try_cmove_arith): Handle non-simple basic blocks.
(insn_valid_noce_process_p): New function.
(contains_mem_rtx_p): Likewise.
(bb_valid_for_noce_process_p): Likewise.
(noce_process_if_block): Allow non-simple basic blocks
where appropriate.

2015-08-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* gcc.dg/ifcvt-1.c: New test.
* gcc.dg/ifcvt-2.c: Likewise.
* gcc.dg/ifcvt-3.c: Likewise.


Thanks,
Kyrill


jeff



commit 76813a58e40d067bb3e64151617a975581bc81cb
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Wed Jul 8 15:45:04 2015 +0100

[PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 1f29646..c33fe24 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -53,6 +53,7 @@
 #include tree-pass.h
 #include dbgcnt.h
 #include shrink-wrap.h
+#include rtl-iter.h
 #include ifcvt.h
 
 #ifndef HAVE_incscc
@@ -816,8 +817,17 @@ struct noce_if_info
  form as well.  */
   bool then_else_reversed;
 
+  /* True if the contents of then_bb and else_bb are a
+ simple single set instruction.  */
+  bool then_simple;
+  bool else_simple;
+
+  /* The total rtx cost of the instructions in then_bb and else_bb.  */
+  unsigned int then_cost;
+  unsigned int else_cost;
+
   /* Estimated cost of the particular branch instruction.  */
-  int branch_cost;
+  unsigned int branch_cost;
 };
 
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
@@ -1037,6 +1047,10 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   set_used_flags (if_info-cond);
   set_used_flags (if_info-a);
   set_used_flags (if_info-b);
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+set_used_flags (insn);
+
   unshare_all_rtl_in_chain (seq);
   end_sequence ();
 
@@ -1054,6 +1068,21 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   return seq;
 }
 
+/* Return true iff the then and else basic block (if it exists)
+   consist of a single simple set instruction.  */
+
+static bool
+noce_simple_bbs (struct noce_if_info *if_info)
+{
+  if (!if_info-then_simple)
+return false;
+
+  if (if_info-else_bb)
+return if_info-else_simple;
+
+  return true;
+}
+
 /* Convert if (a != b) x = a; else x = b into x = a and
if (a == b) x = a; else x = b into x = b.  */
 
@@ -1068,6 +1097,9 @@ noce_try_move (struct noce_if_info *if_info)
   if (code != NE  code != EQ)
 return FALSE;
 
+  if (!noce_simple_bbs (if_info))
+return FALSE;
+
   /* This optimization isn't valid if either A or B could be a NaN
  or a signed zero.  */
   if (HONOR_NANS (if_info-x)
@@ -1116,6 +1148,9 @@ noce_try_store_flag (struct noce_if_info *if_info)
   rtx target;
   rtx_insn *seq;
 
+  if (!noce_simple_bbs (if_info))
+return FALSE;
+
   if (CONST_INT_P (if_info-b)
INTVAL (if_info-b) == STORE_FLAG_VALUE
if_info-a == const0_rtx)
@@ -1165,6 +1200,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
   bool can_reverse;
   machine_mode mode;
 
+  if (!noce_simple_bbs (if_info))
+return FALSE;
+
   if (CONST_INT_P (if_info-a)

Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-08-11 Thread Jeff Law

On 08/09/2015 03:20 PM, Steven Bosscher wrote:

On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law l...@redhat.com wrote:


So there's a tight relationship between the implementation of
bbs_ok_for_cmove_arith and insn_valid_noce_process_p.  If there wasn't, then
we'd probably be looking to use note_stores and note_uses.


Perhaps I'm missing something, but what is wrong with using DF here
instead of note_stores/note_uses? All the info on refs/mods of
registers is available in the DF caches.

Nothing inherently wrong with using DF here.

jeff


Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-08-11 Thread Kyrill Tkachov


On 11/08/15 18:05, Jeff Law wrote:

On 08/09/2015 03:20 PM, Steven Bosscher wrote:

On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law l...@redhat.com wrote:

So there's a tight relationship between the implementation of
bbs_ok_for_cmove_arith and insn_valid_noce_process_p.  If there wasn't, then
we'd probably be looking to use note_stores and note_uses.

Perhaps I'm missing something, but what is wrong with using DF here
instead of note_stores/note_uses? All the info on refs/mods of
registers is available in the DF caches.

Nothing inherently wrong with using DF here.


I have reworked the patch to use FOR_EACH_INSN_DEF and FOR_EACH_INSN_USE
in bbs_ok_for_cmove_arith to extracts the refs/mods and it seems to work.
Is that what you mean by DF?
I'm doing some more testing and hope to post the updated version soon.

Thanks,
Kyrill


jeff





Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-08-09 Thread Steven Bosscher
On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law l...@redhat.com wrote:

 So there's a tight relationship between the implementation of
 bbs_ok_for_cmove_arith and insn_valid_noce_process_p.  If there wasn't, then
 we'd probably be looking to use note_stores and note_uses.

Perhaps I'm missing something, but what is wrong with using DF here
instead of note_stores/note_uses? All the info on refs/mods of
registers is available in the DF caches.

Ciao!
Steven


Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-31 Thread Jeff Law

On 07/27/2015 10:40 AM, Kyrill Tkachov wrote:


On 27/07/15 17:09, Jeff Law wrote:

On 07/27/2015 04:17 AM, Kyrill Tkachov wrote:

I experimented with resource.c and the roadblock I hit is that it
seems to have an assumption that it operates on hard regs (in fact
the struct it uses to describe the resources has a HARD_REG_SET for
the regs) and so it triggers various HARD_REGISTER_P asserts when I
try to use the functions there. if-conversion runs before register
allocation, so we're dealing with pseudos here.

Sigh.  resource.c probably isn't going to be useful then.


My other attempt was to go over BB_A and mark the set registers in a
  bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of
each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A
we return false. This seemed to do the job and testing worked out ok.
That would require one walk over BB_A, one walk over BB_B but I don't
know how expensive FOR_EACH_SUBRTX walks are...

Would that be an acceptable solution?

I think the latter is reasonable.  Ultimately we have to do a full look
at those rtxs, so it's unavoidable to some extent.

The only other possibility would be to use the DF framework.  I'm not
sure if it's even initialized for the ifcvt code.  If it is, then you
might be able to avoid some of the walking of the insns and instead walk
the DF structures.


I think it is initialized (I look at df_get_live_out earlier on
in the call chain). I suppose what we want is for the live in regs for BB_B
to not include any of the set regs in BB_A?
I was thinking more that you'd walk BB_A, which would give you a set of 
registers changed in BB_A.  Then you could walk the uses of those regs 
using the DF structures and see if any are in BB_B.


That may not be any better in practice.  It'd depend on things like how 
many uses exist outside BB_B vs the size of BB_B.  Perhaps it isn't 
worth the trouble and we should stick with a single walk of both blocks?




I don't think we have any good generic machinery for this.  I think
every pass that needs this capability unlinks the insn from the chain
and patches it back in at the new location.


That's the SET_PREV_INSN, SET_NEXT_INSN functions, right?
Right.  Passes would just unlink the insn from the chain, then link it 
back in at a new location.  It's always been fairly ad-hoc at the RTL level.




The current way the top-level noce_process_if_block is structured
it expects the various ifcvt functions (like noce_try_cmove_arith)
to generate a sequence, then it takes it, unshares it and removes
the empty basic blocks.

If we're to instead move insns around we'd need to further modify
  noce_process_if_block to handle differently
  this one case where we move insns instead of re-emitting them.
I think this would make that function more convoluted than it needs to be.
With the current approach we always call unshare_all_rtl_in_chain on the
emitted sequence which should take care of any RTL sharing issues and in
practice I don't expect to have more than 3-4 insns in these sequences
since
they will be guarded by the branch cost.

So I would rather argue for re-emitting insns in this case to keep
consistent
with the dozen or so similar functions in ifcvt.c that already work that
way.
OK, then let's stay with re-emitting since that's the structure 
generally expected elsewhere in ifcvt.c.  I just get worried anytime I 
spot the potential for invalid RTL sharing.



Jeff


Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-31 Thread Jeff Law

On 07/28/2015 04:14 AM, Kyrill Tkachov wrote:
[ Snip ]



Here's a respin.
I've reworked bbs_ok_for_cmove_arith to go over BB_A once and record
the set registers then go over BB_B once and look inside the SET_SRC
of each insn for those registers. How does this look? Would you like
me to investigate the data-flow infrastructure approach?

Also, in bb_valid_for_noce_process_p I iterate through the sub-rtxes
looking for a MEM with the FOR_EACH_SUBRTX machinery.

As I said above, I think moving the insns rather than re-emitting them
would make the function more convoluted than I think it needs to be.

Bootstrapped and tested on arm, aarch64, x86_64.



2015-07-28  Kyrylo Tkachov kyrylo.tkac...@arm.com

 * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
 then_cost, else_cost fields.  Change branch_cost field to unsigned
int.
 (end_ifcvt_sequence): Call set_used_flags on each insn in the
 sequence.
 (noce_simple_bbs): New function.
 (noce_try_move): Bail if basic blocks are not simple.
 (noce_try_store_flag): Likewise.
 (noce_try_store_flag_constants): Likewise.
 (noce_try_addcc): Likewise.
 (noce_try_store_flag_mask): Likewise.
 (noce_try_cmove): Likewise.
 (noce_try_minmax): Likewise.
 (noce_try_abs): Likewise.
 (noce_try_sign_mask): Likewise.
 (noce_try_bitop): Likewise.
 (bbs_ok_for_cmove_arith): New function.
 (noce_emit_all_but_last): Likewise.
 (noce_emit_insn): Likewise.
 (noce_emit_bb): Likewise.
 (noce_try_cmove_arith): Handle non-simple basic blocks.
 (insn_valid_noce_process_p): New function.
 (contains_mem_rtx_p): Likewise.
 (bb_valid_for_noce_process_p): Likewise.
 (noce_process_if_block): Allow non-simple basic blocks
 where appropriate.

2015-07-28  Kyrylo Tkachov kyrylo.tkac...@arm.com

 * gcc.dg/ifcvt-1.c: New test.
 * gcc.dg/ifcvt-2.c: Likewise.
 * gcc.dg/ifcvt-3.c: Likewise.

Thanks,
Kyrill


jeff




ifcvt.patch


commit a02417f015b70a0e47170ac7c41a5569392085a5
Author: Kyrylo Tkachovkyrylo.tkac...@arm.com
Date:   Wed Jul 8 15:45:04 2015 +0100

 [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 2d97cc5..dae9b41 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -53,6 +53,7 @@
  #include tree-pass.h
  #include dbgcnt.h
  #include shrink-wrap.h
+#include rtl-iter.h
  #include ifcvt.h

Needs to be mentioned in the ChangeLog.


@@ -1585,6 +1632,116 @@ noce_try_cmove (struct noce_if_info *if_info)
return FALSE;
  }

+/* Return true iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */
+
+static bool
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+{
+  rtx_insn *a_insn;
+  bitmap bba_sets = BITMAP_ALLOC (reg_obstack);
+
+  FOR_BB_INSNS (bb_a, a_insn)
+{
+  if (!active_insn_p (a_insn))
+   continue;
+
+  rtx sset_a = single_set (a_insn);
+
+  if (!sset_a)
+   {
+ BITMAP_FREE (bba_sets);
+ return false;
+   }
+
+  rtx dest_reg = SET_DEST (sset_a);
+  bitmap_set_bit (bba_sets, REGNO (dest_reg));
+}
So there's a tight relationship between the implementation of 
bbs_ok_for_cmove_arith and insn_valid_noce_process_p.  If there wasn't, 
then we'd probably be looking to use note_stores and note_uses.


With that tight relationship, I think the implementations should be 
close together in the file and a comment in insn_valid_noce_process_p 
indicating that if insn_valid_noce_process_p is changed that the 
implementation of bbs_ok_for_cmove_arith may need to change as well.


One things I am a bit worried about is a SET_DEST that is a ZERO_EXTRACT 
or SUBREG.  Those are usually a read and a write.  So you'd need to 
filter them out earlier (noce_operand_ok?) or handle them in the second 
half of bbs_ok_for_cmove_arith.


Jeff



Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-28 Thread Kyrill Tkachov

Hi Jeff,

On 27/07/15 17:40, Kyrill Tkachov wrote:

On 27/07/15 17:09, Jeff Law wrote:

On 07/27/2015 04:17 AM, Kyrill Tkachov wrote:

I experimented with resource.c and the roadblock I hit is that it
seems to have an assumption that it operates on hard regs (in fact
the struct it uses to describe the resources has a HARD_REG_SET for
the regs) and so it triggers various HARD_REGISTER_P asserts when I
try to use the functions there. if-conversion runs before register
allocation, so we're dealing with pseudos here.

Sigh.  resource.c probably isn't going to be useful then.


My other attempt was to go over BB_A and mark the set registers in a
   bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of
each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A
we return false. This seemed to do the job and testing worked out ok.
That would require one walk over BB_A, one walk over BB_B but I don't
know how expensive FOR_EACH_SUBRTX walks are...

Would that be an acceptable solution?

I think the latter is reasonable.  Ultimately we have to do a full look
at those rtxs, so it's unavoidable to some extent.

The only other possibility would be to use the DF framework.  I'm not
sure if it's even initialized for the ifcvt code.  If it is, then you
might be able to avoid some of the walking of the insns and instead walk
the DF structures.

I think it is initialized (I look at df_get_live_out earlier on
in the call chain). I suppose what we want is for the live in regs for BB_B
to not include any of the set regs in BB_A?





snip

It fails when the last insn is not recognised, because
noce_try_cmove_arith can modify the last insn, but I have not seen
it cause any trouble. If it fails then back in noce_try_cmove_arith
we goto end_seq_and_fail which ends the sequence and throws it away
(and cancels if-conversion down that path), so it should be safe.
OK, I was working for the assumption that memoization ought not
fail, but it seems that was a bad assumption on my part.So
given noce_try_cmove_arith can change the last insn and make it
no-recognizable this code seems reasoanble.

So I think the only outstanding issues are:

1. Investigate moving rather than re-emitting insns.

I'll look into that, but what is the machinery by which one moves
insns?

I don't think we have any good generic machinery for this.  I think
every pass that needs this capability unlinks the insn from the chain
and patches it back in at the new location.

That's the SET_PREV_INSN, SET_NEXT_INSN functions, right?

The current way the top-level noce_process_if_block is structured
it expects the various ifcvt functions (like noce_try_cmove_arith)
to generate a sequence, then it takes it, unshares it and removes
the empty basic blocks.

If we're to instead move insns around we'd need to further modify
   noce_process_if_block to handle differently
   this one case where we move insns instead of re-emitting them.
I think this would make that function more convoluted than it needs to be.
With the current approach we always call unshare_all_rtl_in_chain on the
emitted sequence which should take care of any RTL sharing issues and in
practice I don't expect to have more than 3-4 insns in these sequences since
they will be guarded by the branch cost.

So I would rather argue for re-emitting insns in this case to keep consistent
with the dozen or so similar functions in ifcvt.c that already work that way.


Here's a respin.
I've reworked bbs_ok_for_cmove_arith to go over BB_A once and record
the set registers then go over BB_B once and look inside the SET_SRC
of each insn for those registers. How does this look? Would you like
me to investigate the data-flow infrastructure approach?

Also, in bb_valid_for_noce_process_p I iterate through the sub-rtxes
looking for a MEM with the FOR_EACH_SUBRTX machinery.

As I said above, I think moving the insns rather than re-emitting them
would make the function more convoluted than I think it needs to be.

Bootstrapped and tested on arm, aarch64, x86_64.



2015-07-28  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
then_cost, else_cost fields.  Change branch_cost field to unsigned int.
(end_ifcvt_sequence): Call set_used_flags on each insn in the
sequence.
(noce_simple_bbs): New function.
(noce_try_move): Bail if basic blocks are not simple.
(noce_try_store_flag): Likewise.
(noce_try_store_flag_constants): Likewise.
(noce_try_addcc): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_cmove): Likewise.
(noce_try_minmax): Likewise.
(noce_try_abs): Likewise.
(noce_try_sign_mask): Likewise.
(noce_try_bitop): Likewise.
(bbs_ok_for_cmove_arith): New function.
(noce_emit_all_but_last): Likewise.
(noce_emit_insn): Likewise.
(noce_emit_bb): Likewise.
(noce_try_cmove_arith): Handle non-simple basic blocks.
(insn_valid_noce_process_p): New function.

Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-27 Thread Jeff Law

On 07/27/2015 04:17 AM, Kyrill Tkachov wrote:

I experimented with resource.c and the roadblock I hit is that it
seems to have an assumption that it operates on hard regs (in fact
the struct it uses to describe the resources has a HARD_REG_SET for
the regs) and so it triggers various HARD_REGISTER_P asserts when I
try to use the functions there. if-conversion runs before register
allocation, so we're dealing with pseudos here.

Sigh.  resource.c probably isn't going to be useful then.


My other attempt was to go over BB_A and mark the set registers in a
 bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of
each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A
we return false. This seemed to do the job and testing worked out ok.
That would require one walk over BB_A, one walk over BB_B but I don't
know how expensive FOR_EACH_SUBRTX walks are...

Would that be an acceptable solution?
I think the latter is reasonable.  Ultimately we have to do a full look 
at those rtxs, so it's unavoidable to some extent.


The only other possibility would be to use the DF framework.  I'm not 
sure if it's even initialized for the ifcvt code.  If it is, then you 
might be able to avoid some of the walking of the insns and instead walk 
the DF structures.





snip

It fails when the last insn is not recognised, because
noce_try_cmove_arith can modify the last insn, but I have not seen
it cause any trouble. If it fails then back in noce_try_cmove_arith
we goto end_seq_and_fail which ends the sequence and throws it away
(and cancels if-conversion down that path), so it should be safe.
OK, I was working for the assumption that memoization ought not
fail, but it seems that was a bad assumption on my part.So
given noce_try_cmove_arith can change the last insn and make it
no-recognizable this code seems reasoanble.

So I think the only outstanding issues are:

1. Investigate moving rather than re-emitting insns.


I'll look into that, but what is the machinery by which one moves
insns?
I don't think we have any good generic machinery for this.  I think 
every pass that needs this capability unlinks the insn from the chain 
and patches it back in at the new location.


jeff


Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-27 Thread Kyrill Tkachov


On 27/07/15 17:09, Jeff Law wrote:

On 07/27/2015 04:17 AM, Kyrill Tkachov wrote:

I experimented with resource.c and the roadblock I hit is that it
seems to have an assumption that it operates on hard regs (in fact
the struct it uses to describe the resources has a HARD_REG_SET for
the regs) and so it triggers various HARD_REGISTER_P asserts when I
try to use the functions there. if-conversion runs before register
allocation, so we're dealing with pseudos here.

Sigh.  resource.c probably isn't going to be useful then.


My other attempt was to go over BB_A and mark the set registers in a
  bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of
each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A
we return false. This seemed to do the job and testing worked out ok.
That would require one walk over BB_A, one walk over BB_B but I don't
know how expensive FOR_EACH_SUBRTX walks are...

Would that be an acceptable solution?

I think the latter is reasonable.  Ultimately we have to do a full look
at those rtxs, so it's unavoidable to some extent.

The only other possibility would be to use the DF framework.  I'm not
sure if it's even initialized for the ifcvt code.  If it is, then you
might be able to avoid some of the walking of the insns and instead walk
the DF structures.


I think it is initialized (I look at df_get_live_out earlier on
in the call chain). I suppose what we want is for the live in regs for BB_B
to not include any of the set regs in BB_A?






snip

It fails when the last insn is not recognised, because
noce_try_cmove_arith can modify the last insn, but I have not seen
it cause any trouble. If it fails then back in noce_try_cmove_arith
we goto end_seq_and_fail which ends the sequence and throws it away
(and cancels if-conversion down that path), so it should be safe.
OK, I was working for the assumption that memoization ought not
fail, but it seems that was a bad assumption on my part.So
given noce_try_cmove_arith can change the last insn and make it
no-recognizable this code seems reasoanble.

So I think the only outstanding issues are:

1. Investigate moving rather than re-emitting insns.

I'll look into that, but what is the machinery by which one moves
insns?

I don't think we have any good generic machinery for this.  I think
every pass that needs this capability unlinks the insn from the chain
and patches it back in at the new location.


That's the SET_PREV_INSN, SET_NEXT_INSN functions, right?

The current way the top-level noce_process_if_block is structured
it expects the various ifcvt functions (like noce_try_cmove_arith)
to generate a sequence, then it takes it, unshares it and removes
the empty basic blocks.

If we're to instead move insns around we'd need to further modify
 noce_process_if_block to handle differently
 this one case where we move insns instead of re-emitting them.
I think this would make that function more convoluted than it needs to be.
With the current approach we always call unshare_all_rtl_in_chain on the
emitted sequence which should take care of any RTL sharing issues and in
practice I don't expect to have more than 3-4 insns in these sequences since
they will be guarded by the branch cost.

So I would rather argue for re-emitting insns in this case to keep consistent
with the dozen or so similar functions in ifcvt.c that already work that way.

Thanks,
Kyrill



jeff





Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-27 Thread Kyrill Tkachov

Hi Jeff,

On 24/07/15 19:43, Jeff Law wrote:

On 07/24/2015 03:31 AM, Kyrill Tkachov wrote:


Wouldn't it be better to walk BB_A, gathering the set of all the
registers modified, then do a single walk through BB testing for uses of
those registers?

I think so, yes. I'll try that.

You might look at resource.c -- I haven't looked at it in a long time,
but it might have many of the interfaces you need to do this kind of
book keeping.


I experimented with resource.c and the roadblock I hit is that it seems
to have an assumption that it operates on hard regs (in fact the struct it uses 
to
describe the resources has a HARD_REG_SET for the regs) and so it triggers 
various
HARD_REGISTER_P asserts when I try to use the functions there.
if-conversion runs before register allocation, so we're dealing with pseudos 
here.

My other attempt was to go over BB_A and mark the set registers in a bitmap then
go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of each insn. If a sub-rtx 
is
a reg that is set in the bitmap from BB_A we return false. This seemed to do 
the job
and testing worked out ok. That would require one walk over BB_A, one walk over 
BB_B
but I don't know how expensive FOR_EACH_SUBRTX walks are...

Would that be an acceptable solution?

snip

It fails when the last insn is not recognised, because
noce_try_cmove_arith can modify the last insn, but I have
not seen it cause any trouble. If it fails then back in
noce_try_cmove_arith we goto end_seq_and_fail which ends
the sequence and throws it away (and cancels if-conversion
down that path), so it should be safe.
OK, I was working for the assumption that memoization ought not fail,
but it seems that was a bad assumption on my part.So given
noce_try_cmove_arith can change the last insn and make it
no-recognizable this code seems reasoanble.

So I think the only outstanding issues are:

1. Investigate moving rather than re-emitting insns.


I'll look into that, but what is the machinery by which one moves insns?

Thanks,
Kyrill



2. Investigate a more efficient means to find set/use conflicts
 between the two blocks, possibly using resource.[ch]

jeff





Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-24 Thread Kyrill Tkachov


On 23/07/15 21:38, Jeff Law wrote:

On 07/13/2015 08:03 AM, Kyrill Tkachov wrote:

2015-07-13  Kyrylo Tkachov kyrylo.tkac...@arm.com

  * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
  then_cost, else_cost fields.  Change branch_cost field to unsigned
int.
  (end_ifcvt_sequence): Call set_used_flags on each insn in the
  sequence.
  (noce_simple_bbs): New function.
  (noce_try_move): Bail if basic blocks are not simple.
  (noce_try_store_flag): Likewise.
  (noce_try_store_flag_constants): Likewise.
  (noce_try_addcc): Likewise.
  (noce_try_store_flag_mask): Likewise.
  (noce_try_cmove): Likewise.
  (noce_try_minmax): Likewise.
  (noce_try_abs): Likewise.
  (noce_try_sign_mask): Likewise.
  (noce_try_bitop): Likewise.
  (bbs_ok_for_cmove_arith): New function.
  (noce_emit_all_but_last): Likewise.
  (noce_emit_insn): Likewise.
  (noce_emit_bb): Likewise.
  (noce_try_cmove_arith): Handle non-simple basic blocks.
  (insn_valid_noce_process_p): New function.
  (bb_valid_for_noce_process_p): Likewise.
  (noce_process_if_block): Allow non-simple basic blocks
  where appropriate.


2015-07-13  Kyrylo Tkachov kyrylo.tkac...@arm.com

  * gcc.dg/ifcvt-1.c: New test.
  * gcc.dg/ifcvt-2.c: Likewise.
  * gcc.dg/ifcvt-3.c: Likewise.





Thanks,
Kyrill



cheers,


ifcvt.patch


commit bc62987a2fa3d9dc3de5a1ed8003a745340255bd
Author: Kyrylo Tkachovkyrylo.tkac...@arm.com
Date:   Wed Jul 8 15:45:04 2015 +0100

  [PATCH][ifcvt] Make non-conditional execution if-conversion more 
aggressive

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 31849ee..2f0a228 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1584,6 +1630,96 @@ noce_try_cmove (struct noce_if_info *if_info)
 return FALSE;
   }

+/* Return true iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */
+
+static bool
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+{
+  rtx_insn *a_insn;
+  FOR_BB_INSNS (bb_a, a_insn)
+{
+  if (!active_insn_p (a_insn))
+   continue;
+
+  rtx sset_a = single_set (a_insn);
+
+  if (!sset_a)
+   return false;
+
+  rtx dest_reg = SET_DEST (sset_a);
+  rtx_insn *b_insn;
+
+  FOR_BB_INSNS (bb_b, b_insn)
+   {
+ if (!active_insn_p (b_insn))
+   continue;
+
+ rtx sset_b = single_set (b_insn);
+
+ if (!sset_b)
+   return false;
+
+ if (reg_referenced_p (dest_reg, sset_b))
+   return false;
+   }
+}
+
+  return true;
+}

Doesn't this have the potential to get very expensive since you end up
looking at every insn in BB_B for every insn in BB_A?


Yes, the saving grace is that bbs_ok_for_cmove_arith is called after the costs 
checks,
so that the costs in practice reject blocks more than a few instructions long.



Wouldn't it be better to walk BB_A, gathering the set of all the
registers modified, then do a single walk through BB testing for uses of
those registers?


I think so, yes. I'll try that.


Don't you have to be careful with MEMs in both blocks?


bb_valid_for_noce_process_p called earlier in the chain should have
rejected memory operands already.






+
+/* Emit copies of all the active instructions in BB except the last.
+   This is a helper for noce_try_cmove_arith.  */
+
+static void
+noce_emit_all_but_last (basic_block bb)
+{
+  rtx_insn *last = last_active_insn (bb, FALSE);
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+{
+  if (insn != last  active_insn_p (insn))
+   {
+ rtx_insn *to_emit = as_a rtx_insn * (copy_rtx (insn));
+
+ emit_insn (PATTERN (to_emit));
+   }
+}
+}

Won't this create invalid RTL sharing?  RTL has strict rules about nodes
can and can not be shared and I'm pretty sure this blindly shares
everything.


All the if-conversion functions end up calling end_ifcvt_sequence
at the end which seems to unshare the stuff that needs to be unshared.
Is that not enough? (I don't have much experience with this part)



Now we may get away with that because you're going to delete all the
insns from BB.  But that begs the question why not just move the insns
from BB to their new location rather than re-emiting them?


I'll try it out.







+
+/* Helper for noce_try_cmove_arith.  Emit the pattern TO_EMIT and return
+   the resulting insn or NULL if it's not a valid insn.  */
+
+static rtx_insn *
+noce_emit_insn (rtx to_emit)
+{
+  gcc_assert (to_emit);
+  rtx_insn *insn = emit_insn (to_emit);
+
+  if (recog_memoized (insn)  0)
+return NULL;
+
+  return insn;
+}
+
+/* Helper for noce_try_cmove_arith.  Emit a copy of the insns up to
+   and including the penultimate one in BB if it is not simple
+   (as indicated by SIMPLE).  Then emit LAST_INSN as the last
+   insn in the block.  The reason for that is that LAST_INSN may
+   have been modified by the preparation in noce_try_cmove_arith.  */
+
+static bool
+noce_emit_bb (rtx 

Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-24 Thread Jeff Law

On 07/24/2015 03:31 AM, Kyrill Tkachov wrote:


Wouldn't it be better to walk BB_A, gathering the set of all the
registers modified, then do a single walk through BB testing for uses of
those registers?


I think so, yes. I'll try that.
You might look at resource.c -- I haven't looked at it in a long time, 
but it might have many of the interfaces you need to do this kind of 
book keeping.





Don't you have to be careful with MEMs in both blocks?


bb_valid_for_noce_process_p called earlier in the chain should have
rejected memory operands already.












+
+/* Helper for noce_try_cmove_arith.  Emit the pattern TO_EMIT and
return
+   the resulting insn or NULL if it's not a valid insn.  */
+
+static rtx_insn *
+noce_emit_insn (rtx to_emit)
+{
+  gcc_assert (to_emit);
+  rtx_insn *insn = emit_insn (to_emit);
+
+  if (recog_memoized (insn)  0)
+return NULL;
+
+  return insn;
+}
+
+/* Helper for noce_try_cmove_arith.  Emit a copy of the insns up to
+   and including the penultimate one in BB if it is not simple
+   (as indicated by SIMPLE).  Then emit LAST_INSN as the last
+   insn in the block.  The reason for that is that LAST_INSN may
+   have been modified by the preparation in noce_try_cmove_arith.  */
+
+static bool
+noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
+{
+  if (bb  !simple)
+noce_emit_all_but_last (bb);
+
+  if (last_insn  !noce_emit_insn (last_insn))
+return false;
+
+  return true;
+}

Under what conditions can noce_emit_insn fail and what happens to the
insn stream if it does?  It seems to me like the insn stream would be
bogus and we should stop compilation.  Which argues that rather than
returning a bool, we should just assert that the insn is memoized and
remove the check in noce_emit_bb.

Or is it the case that we're emitting onto a sequence that we can just
throw away in the event of a failure?


It fails when the last insn is not recognised, because
noce_try_cmove_arith can modify the last insn, but I have
not seen it cause any trouble. If it fails then back in
noce_try_cmove_arith we goto end_seq_and_fail which ends
the sequence and throws it away (and cancels if-conversion
down that path), so it should be safe.
OK, I was working for the assumption that memoization ought not fail, 
but it seems that was a bad assumption on my part.So given 
noce_try_cmove_arith can change the last insn and make it 
no-recognizable this code seems reasoanble.


So I think the only outstanding issues are:

1. Investigate moving rather than re-emitting insns.

2. Investigate a more efficient means to find set/use conflicts
   between the two blocks, possibly using resource.[ch]

jeff


Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-23 Thread Jeff Law

On 07/13/2015 08:03 AM, Kyrill Tkachov wrote:


2015-07-13  Kyrylo Tkachov kyrylo.tkac...@arm.com

 * ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
 then_cost, else_cost fields.  Change branch_cost field to unsigned
int.
 (end_ifcvt_sequence): Call set_used_flags on each insn in the
 sequence.
 (noce_simple_bbs): New function.
 (noce_try_move): Bail if basic blocks are not simple.
 (noce_try_store_flag): Likewise.
 (noce_try_store_flag_constants): Likewise.
 (noce_try_addcc): Likewise.
 (noce_try_store_flag_mask): Likewise.
 (noce_try_cmove): Likewise.
 (noce_try_minmax): Likewise.
 (noce_try_abs): Likewise.
 (noce_try_sign_mask): Likewise.
 (noce_try_bitop): Likewise.
 (bbs_ok_for_cmove_arith): New function.
 (noce_emit_all_but_last): Likewise.
 (noce_emit_insn): Likewise.
 (noce_emit_bb): Likewise.
 (noce_try_cmove_arith): Handle non-simple basic blocks.
 (insn_valid_noce_process_p): New function.
 (bb_valid_for_noce_process_p): Likewise.
 (noce_process_if_block): Allow non-simple basic blocks
 where appropriate.


2015-07-13  Kyrylo Tkachov kyrylo.tkac...@arm.com

 * gcc.dg/ifcvt-1.c: New test.
 * gcc.dg/ifcvt-2.c: Likewise.
 * gcc.dg/ifcvt-3.c: Likewise.






Thanks,
Kyrill



cheers,



ifcvt.patch


commit bc62987a2fa3d9dc3de5a1ed8003a745340255bd
Author: Kyrylo Tkachovkyrylo.tkac...@arm.com
Date:   Wed Jul 8 15:45:04 2015 +0100

 [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 31849ee..2f0a228 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1584,6 +1630,96 @@ noce_try_cmove (struct noce_if_info *if_info)
return FALSE;
  }

+/* Return true iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */
+
+static bool
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+{
+  rtx_insn *a_insn;
+  FOR_BB_INSNS (bb_a, a_insn)
+{
+  if (!active_insn_p (a_insn))
+   continue;
+
+  rtx sset_a = single_set (a_insn);
+
+  if (!sset_a)
+   return false;
+
+  rtx dest_reg = SET_DEST (sset_a);
+  rtx_insn *b_insn;
+
+  FOR_BB_INSNS (bb_b, b_insn)
+   {
+ if (!active_insn_p (b_insn))
+   continue;
+
+ rtx sset_b = single_set (b_insn);
+
+ if (!sset_b)
+   return false;
+
+ if (reg_referenced_p (dest_reg, sset_b))
+   return false;
+   }
+}
+
+  return true;
+}
Doesn't this have the potential to get very expensive since you end up 
looking at every insn in BB_B for every insn in BB_A?


Wouldn't it be better to walk BB_A, gathering the set of all the 
registers modified, then do a single walk through BB testing for uses of 
those registers?


Don't you have to be careful with MEMs in both blocks?




+
+/* Emit copies of all the active instructions in BB except the last.
+   This is a helper for noce_try_cmove_arith.  */
+
+static void
+noce_emit_all_but_last (basic_block bb)
+{
+  rtx_insn *last = last_active_insn (bb, FALSE);
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+{
+  if (insn != last  active_insn_p (insn))
+   {
+ rtx_insn *to_emit = as_a rtx_insn * (copy_rtx (insn));
+
+ emit_insn (PATTERN (to_emit));
+   }
+}
+}
Won't this create invalid RTL sharing?  RTL has strict rules about nodes 
can and can not be shared and I'm pretty sure this blindly shares 
everything.


Now we may get away with that because you're going to delete all the 
insns from BB.  But that begs the question why not just move the insns 
from BB to their new location rather than re-emiting them?






+
+/* Helper for noce_try_cmove_arith.  Emit the pattern TO_EMIT and return
+   the resulting insn or NULL if it's not a valid insn.  */
+
+static rtx_insn *
+noce_emit_insn (rtx to_emit)
+{
+  gcc_assert (to_emit);
+  rtx_insn *insn = emit_insn (to_emit);
+
+  if (recog_memoized (insn)  0)
+return NULL;
+
+  return insn;
+}
+
+/* Helper for noce_try_cmove_arith.  Emit a copy of the insns up to
+   and including the penultimate one in BB if it is not simple
+   (as indicated by SIMPLE).  Then emit LAST_INSN as the last
+   insn in the block.  The reason for that is that LAST_INSN may
+   have been modified by the preparation in noce_try_cmove_arith.  */
+
+static bool
+noce_emit_bb (rtx last_insn, basic_block bb, bool simple)
+{
+  if (bb  !simple)
+noce_emit_all_but_last (bb);
+
+  if (last_insn  !noce_emit_insn (last_insn))
+return false;
+
+  return true;
+}
Under what conditions can noce_emit_insn fail and what happens to the 
insn stream if it does?  It seems to me like the insn stream would be 
bogus and we should stop compilation.  Which argues that rather than 
returning a bool, we should just assert that the insn is memoized and 
remove the check in noce_emit_bb.


Or is it the case that we're emitting onto a sequence that we can just 
throw away in the 

Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-20 Thread Kyrill Tkachov

Ping.

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01047.html

The go testsuite passes for me on x86_64-unknown-linux-gnu for me.
A third data point on testing would be appreciated...

Thanks,
Kyrill

On 13/07/15 15:03, Kyrill Tkachov wrote:

Hi Bernhard,

On 13/07/15 10:45, Kyrill Tkachov wrote:

PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;)
PPS: attached meant to illustrate comments above. Untested.

Thanks a lot! This is all very helpful.
I'll respin the patch.

Here it is. I've expanded the comments in the functions you mentioned,
moved the tests to gcc.dg and enabled them for aarch64 and x86 and changed
the types of the costs used to unsigned int.


Bootstrapped on aarch64 and x86_64.
The go testsuite passes on x86_64-unknown-linux-gnu for me...

Thanks,
Kyrill

2015-07-13  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
then_cost, else_cost fields.  Change branch_cost field to unsigned int.
(end_ifcvt_sequence): Call set_used_flags on each insn in the
sequence.
(noce_simple_bbs): New function.
(noce_try_move): Bail if basic blocks are not simple.
(noce_try_store_flag): Likewise.
(noce_try_store_flag_constants): Likewise.
(noce_try_addcc): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_cmove): Likewise.
(noce_try_minmax): Likewise.
(noce_try_abs): Likewise.
(noce_try_sign_mask): Likewise.
(noce_try_bitop): Likewise.
(bbs_ok_for_cmove_arith): New function.
(noce_emit_all_but_last): Likewise.
(noce_emit_insn): Likewise.
(noce_emit_bb): Likewise.
(noce_try_cmove_arith): Handle non-simple basic blocks.
(insn_valid_noce_process_p): New function.
(bb_valid_for_noce_process_p): Likewise.
(noce_process_if_block): Allow non-simple basic blocks
where appropriate.


2015-07-13  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* gcc.dg/ifcvt-1.c: New test.
* gcc.dg/ifcvt-2.c: Likewise.
* gcc.dg/ifcvt-3.c: Likewise.





Thanks,
Kyrill



cheers,




Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-13 Thread Kyrill Tkachov

Hi Bernhard,

On 11/07/15 00:00, Bernhard Reutner-Fischer wrote:

On 10 July 2015 at 14:31, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

Hi all,

This patch makes if-conversion more aggressive when handling code of the
form:
if (test)
   x := a  //THEN
else
   x := b  //ELSE
The current code adds the costs of both the THEN and ELSE blocks and proceeds 
if they don't
exceed the branch cost. I don't think that's quite a right calculation.
We're going to be executing at least one of the basic blocks anyway.
This patch we instead check the *maximum* of the two blocks against the branch 
cost.
This should still catch cases where a high latency instruction appears in one 
or both of
the paths.

Shouldn't this maximum also take probability into account? Or maybe
not, would have to think about it tomorrow.


The branch cost that we test against (recorded in if_info earlier in the ifcvt.c
call chain) is either the predictable branch cost or the unpredictable branch
cost, depending on what the predictable_edge_p machinery returned.
I think checking against that should be enough, but it's an easy thing to 
experiment
with, so I'm open to arguments in any direction.



$ contrib/check_GNU_style.sh rtl-ifcvt.00.patch

Blocks of 8 spaces should be replaced with tabs.
783:+return FALSE;


Generally ifcvt.c (resp. the whole tree) could use a
sed -i -e s/\([[:space:]]\)FALSE/\1false/g gcc/ifcvt.c
Maybe some of the int predicates could then become bools.


Ok, will go over the style in the patch.




+/* Return iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */

Return true iff


I tried to be too formal here ;) https://en.wikipedia.org/wiki/If_and_only_if
I'll use a normal if here.




Did you include go in your testing?
I see:
Unexpected results in this build (new failures)
FAIL: encoding/json
FAIL: go/printer
FAIL: go/scanner
FAIL: html/template
FAIL: log
FAIL: net/http
FAIL: net/http/cgi
FAIL: net/http/cookiejar
FAIL: os
FAIL: text/template


Hmmm. I don't see these failures. I double checked right now and they
appear as PASS in my configuration.

I tested make check-go on x86_64-unknown-linux-gnu configured with
--without-isl --disable-multilib --enable-languages=c,c++,fortran,go.

Are you sure this is not some other issue in your tree?



bbs_ok_for_cmove_arith() looks costly but i guess you looked if
there's some pre-existing cleverness you could have used instead?


I did have a look, but couldn't find any.
The bbs_ok_for_cmove_arith is done after the costs check
so I'd hope that the costs check would already discard
really long basic-blocks.



noce_emit_bb() could use a better comment. Likewise insn_valid_noce_process_p().

insn_rtx_cost() should return an unsigned int, then_cost, else_cost
should thus be unsigned int too.

copy_of_a versus copy_of_insn_b; I'd shorten the latter.


Thanks, good suggestions.



bb_valid_for_noce_process_p() suggests that there is a JOIN_BB param
but there is none?
Also should document the return value (and should not clobber the OUT
params upon failure, no?).


bah, I forgot to update the comment once I modified the function
during development of the patch. I'll fix those.



As for the testcases, it would be nice to have at least a tiny bit for
x86_64, too.


I can put the testcases in gcc.dg and enable them for x86 as well,
but I think a couple of the already pass as is because x86 doesn't
need to do an extra zero_extend inside the basic-block.



PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;)
PPS: attached meant to illustrate comments above. Untested.


Thanks a lot! This is all very helpful.
I'll respin the patch.

Thanks,
Kyrill




cheers,




Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-13 Thread Bernhard Reutner-Fischer
On July 13, 2015 11:45:55 AM GMT+02:00, Kyrill Tkachov kyrylo.tkac...@arm.com 
wrote:
Hi Bernhard,


 Did you include go in your testing?
 I see:
 Unexpected results in this build (new failures)
 FAIL: encoding/json
 FAIL: go/printer
 FAIL: go/scanner
 FAIL: html/template
 FAIL: log
 FAIL: net/http
 FAIL: net/http/cgi
 FAIL: net/http/cookiejar
 FAIL: os
 FAIL: text/template

Hmmm. I don't see these failures. I double checked right now and they
appear as PASS in my configuration.

I tested make check-go on x86_64-unknown-linux-gnu configured with
--without-isl --disable-multilib --enable-languages=c,c++,fortran,go.

Are you sure this is not some other issue in your tree?

I have ISL enabled. I do have a couple of local stuff but that tested fine 
before your patch and should not really have impact on the parts your patch 
touches. So maybe it's ISL.

Thanks,



Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-13 Thread Kyrill Tkachov


On 13/07/15 10:45, Kyrill Tkachov wrote:




+/* Return iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */

Return true iff

I tried to be too formal here ;) https://en.wikipedia.org/wiki/If_and_only_if
I'll use a normal if here.


Err, of course you were talking about the missing 'true'.
Sorry, early morning.

Kyrill




Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-13 Thread Kyrill Tkachov

Hi Bernhard,

On 13/07/15 10:45, Kyrill Tkachov wrote:

PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;)
PPS: attached meant to illustrate comments above. Untested.

Thanks a lot! This is all very helpful.
I'll respin the patch.


Here it is. I've expanded the comments in the functions you mentioned,
moved the tests to gcc.dg and enabled them for aarch64 and x86 and changed
the types of the costs used to unsigned int.


Bootstrapped on aarch64 and x86_64.
The go testsuite passes on x86_64-unknown-linux-gnu for me...

Thanks,
Kyrill

2015-07-13  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
then_cost, else_cost fields.  Change branch_cost field to unsigned int.
(end_ifcvt_sequence): Call set_used_flags on each insn in the
sequence.
(noce_simple_bbs): New function.
(noce_try_move): Bail if basic blocks are not simple.
(noce_try_store_flag): Likewise.
(noce_try_store_flag_constants): Likewise.
(noce_try_addcc): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_cmove): Likewise.
(noce_try_minmax): Likewise.
(noce_try_abs): Likewise.
(noce_try_sign_mask): Likewise.
(noce_try_bitop): Likewise.
(bbs_ok_for_cmove_arith): New function.
(noce_emit_all_but_last): Likewise.
(noce_emit_insn): Likewise.
(noce_emit_bb): Likewise.
(noce_try_cmove_arith): Handle non-simple basic blocks.
(insn_valid_noce_process_p): New function.
(bb_valid_for_noce_process_p): Likewise.
(noce_process_if_block): Allow non-simple basic blocks
where appropriate.


2015-07-13  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* gcc.dg/ifcvt-1.c: New test.
* gcc.dg/ifcvt-2.c: Likewise.
* gcc.dg/ifcvt-3.c: Likewise.






Thanks,
Kyrill



cheers,


commit bc62987a2fa3d9dc3de5a1ed8003a745340255bd
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Wed Jul 8 15:45:04 2015 +0100

[PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 31849ee..2f0a228 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -815,8 +815,17 @@ struct noce_if_info
  form as well.  */
   bool then_else_reversed;
 
+  /* True if the contents of then_bb and else_bb are a
+ simple single set instruction.  */
+  bool then_simple;
+  bool else_simple;
+
+  /* The total rtx cost of the instructions in then_bb and else_bb.  */
+  unsigned int then_cost;
+  unsigned int else_cost;
+
   /* Estimated cost of the particular branch instruction.  */
-  int branch_cost;
+  unsigned int branch_cost;
 };
 
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
@@ -1036,6 +1045,10 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   set_used_flags (if_info-cond);
   set_used_flags (if_info-a);
   set_used_flags (if_info-b);
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+set_used_flags (insn);
+
   unshare_all_rtl_in_chain (seq);
   end_sequence ();
 
@@ -1053,6 +1066,21 @@ end_ifcvt_sequence (struct noce_if_info *if_info)
   return seq;
 }
 
+/* Return true iff the then and else basic block (if it exists)
+   consist of a single simple set instruction.  */
+
+static bool
+noce_simple_bbs (struct noce_if_info *if_info)
+{
+  if (!if_info-then_simple)
+return false;
+
+  if (if_info-else_bb)
+return if_info-else_simple;
+
+  return true;
+}
+
 /* Convert if (a != b) x = a; else x = b into x = a and
if (a == b) x = a; else x = b into x = b.  */
 
@@ -1067,6 +1095,9 @@ noce_try_move (struct noce_if_info *if_info)
   if (code != NE  code != EQ)
 return FALSE;
 
+  if (!noce_simple_bbs (if_info))
+return FALSE;
+
   /* This optimization isn't valid if either A or B could be a NaN
  or a signed zero.  */
   if (HONOR_NANS (if_info-x)
@@ -1115,6 +1146,9 @@ noce_try_store_flag (struct noce_if_info *if_info)
   rtx target;
   rtx_insn *seq;
 
+  if (!noce_simple_bbs (if_info))
+return FALSE;
+
   if (CONST_INT_P (if_info-b)
INTVAL (if_info-b) == STORE_FLAG_VALUE
if_info-a == const0_rtx)
@@ -1163,6 +1197,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
   int normalize, can_reverse;
   machine_mode mode;
 
+  if (!noce_simple_bbs (if_info))
+return FALSE;
+
   if (CONST_INT_P (if_info-a)
CONST_INT_P (if_info-b))
 {
@@ -1291,6 +1328,9 @@ noce_try_addcc (struct noce_if_info *if_info)
   rtx_insn *seq;
   int subtract, normalize;
 
+  if (!noce_simple_bbs (if_info))
+return FALSE;
+
   if (GET_CODE (if_info-a) == PLUS
rtx_equal_p (XEXP (if_info-a, 0), if_info-b)
(reversed_comparison_code (if_info-cond, if_info-jump)
@@ -1382,6 +1422,9 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
   rtx_insn *seq;
   int reversep;
 
+  if (!noce_simple_bbs (if_info))
+return FALSE;
+
   reversep = 0;
   if 

Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-13 Thread Kyrill Tkachov


On 13/07/15 11:48, Bernhard Reutner-Fischer wrote:

On July 13, 2015 11:45:55 AM GMT+02:00, Kyrill Tkachov kyrylo.tkac...@arm.com 
wrote:

Hi Bernhard,


Did you include go in your testing?
I see:
Unexpected results in this build (new failures)
FAIL: encoding/json
FAIL: go/printer
FAIL: go/scanner
FAIL: html/template
FAIL: log
FAIL: net/http
FAIL: net/http/cgi
FAIL: net/http/cookiejar
FAIL: os
FAIL: text/template

Hmmm. I don't see these failures. I double checked right now and they
appear as PASS in my configuration.

I tested make check-go on x86_64-unknown-linux-gnu configured with
--without-isl --disable-multilib --enable-languages=c,c++,fortran,go.

Are you sure this is not some other issue in your tree?

I have ISL enabled. I do have a couple of local stuff but that tested fine 
before your patch and should not really have impact on the parts your patch 
touches. So maybe it's ISL.


I've rebuilt with ISL from scratch and the tests still pass for me.
I'm testing Ubuntu on a Haswell machine, don't know if that's relevant.

Kyrill



Thanks,





Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-10 Thread Kyrill Tkachov


On 10/07/15 13:31, Kyrill Tkachov wrote:

+   to compute a value for x.  Put the rtx cost of the insns
+   in TEST_BB into COST.  Record whether TEST_BB is a single simple
+   set instruction in SIMPLE_P.  If the bb is not simple place all insns
+   except the last insn into SEQ.  */
+


That last sentence is stale. That function doesn't have a SEQ argument.
Consider that comment sentence removed.

Thanks,
Kyrill



Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-10 Thread Bernhard Reutner-Fischer
On 10 July 2015 at 14:31, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 This patch makes if-conversion more aggressive when handling code of the
 form:
 if (test)
   x := a  //THEN
 else
   x := b  //ELSE

 The current code adds the costs of both the THEN and ELSE blocks and proceeds 
 if they don't
 exceed the branch cost. I don't think that's quite a right calculation.
 We're going to be executing at least one of the basic blocks anyway.
 This patch we instead check the *maximum* of the two blocks against the 
 branch cost.
 This should still catch cases where a high latency instruction appears in one 
 or both of
 the paths.

Shouldn't this maximum also take probability into account? Or maybe
not, would have to think about it tomorrow.

$ contrib/check_GNU_style.sh rtl-ifcvt.00.patch

Blocks of 8 spaces should be replaced with tabs.
783:+return FALSE;


Generally ifcvt.c (resp. the whole tree) could use a
sed -i -e s/\([[:space:]]\)FALSE/\1false/g gcc/ifcvt.c
Maybe some of the int predicates could then become bools.


+/* Return iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */

Return true iff


Did you include go in your testing?
I see:
Unexpected results in this build (new failures)
FAIL: encoding/json
FAIL: go/printer
FAIL: go/scanner
FAIL: html/template
FAIL: log
FAIL: net/http
FAIL: net/http/cgi
FAIL: net/http/cookiejar
FAIL: os
FAIL: text/template


bbs_ok_for_cmove_arith() looks costly but i guess you looked if
there's some pre-existing cleverness you could have used instead?

noce_emit_bb() could use a better comment. Likewise insn_valid_noce_process_p().

insn_rtx_cost() should return an unsigned int, then_cost, else_cost
should thus be unsigned int too.

copy_of_a versus copy_of_insn_b; I'd shorten the latter.

bb_valid_for_noce_process_p() suggests that there is a JOIN_BB param
but there is none?
Also should document the return value (and should not clobber the OUT
params upon failure, no?).

As for the testcases, it would be nice to have at least a tiny bit for
x86_64, too.

PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;)
PPS: attached meant to illustrate comments above. Untested.

cheers,
From 1b7d8f9b61eb538cc4338e2073d04a66518f13c2 Mon Sep 17 00:00:00 2001
From: Bernhard Reutner-Fischer rep.dot@gmail.com
Date: Fri, 10 Jul 2015 21:25:30 +0200
Subject: [PATCH] rtl-ifcvt typos

fix some typos on top of Kyrill's patch.
should mv the testcases to common ground.
---
 gcc/ifcvt.c |   49 ++-
 gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c |2 +-
 gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c |2 +-
 gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c |7 ++--
 4 files changed, 27 insertions(+), 33 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 3d324257..0bf6645 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1784,8 +1784,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 
   /* We're going to execute one of the basic blocks anyway, so
  bail out if the most expensive of the two blocks is unacceptable.  */
-  if (MAX (then_cost, else_cost)
-   COSTS_N_INSNS (if_info-branch_cost))
+  if (MAX (then_cost, else_cost)  COSTS_N_INSNS (if_info-branch_cost))
 return FALSE;
 
   /* Possibly rearrange operands to make things come out more natural.  */
@@ -2730,28 +2729,26 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   rtx cc = cc_in_cond (cond);
 
   if (!insn_valid_noce_process_p (last_insn, cc))
-return FALSE;
+return false;
   last_set = single_set (last_insn);
 
   rtx x = SET_DEST (last_set);
-
   rtx_insn *first_insn = first_active_insn (test_bb);
   rtx first_set = single_set (first_insn);
-  bool speed_p = optimize_bb_for_speed_p (test_bb);
 
-  *cost = insn_rtx_cost (last_set, speed_p);
   if (!first_set)
 return false;
+
   /* We have a single simple set, that's okay.  */
-  else if (first_insn == last_insn)
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+
+  if (first_insn == last_insn)
 {
   *simple_p = noce_operand_ok (SET_DEST (first_set));
   *cost = insn_rtx_cost (first_set, speed_p);
   return *simple_p;
 }
 
-  *simple_p = false;
-
   rtx_insn *prev_last_insn = PREV_INSN (last_insn);
   gcc_assert (prev_last_insn);
 
@@ -2764,6 +2761,7 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   /* The regs that are live out of test_bb.  */
   bitmap test_bb_live_out = df_get_live_out (test_bb);
 
+  int potential_cost = insn_rtx_cost (last_set, speed_p);
   rtx_insn *insn;
   FOR_BB_INSNS (test_bb, insn)
 {
@@ -2781,7 +2779,7 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
 	  if (MEM_P (SET_SRC (sset)) || MEM_P (SET_DEST (sset)))
 	goto free_bitmap_and_fail;
 
-	  *cost += insn_rtx_cost (sset, speed_p);
+	  potential_cost += insn_rtx_cost (sset, speed_p);
 	  bitmap_set_bit (test_bb_temps, REGNO (SET_DEST (sset)));
 	}
 }
@@ 

Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-10 Thread Bernhard Reutner-Fischer
On 11 July 2015 at 01:00, Bernhard Reutner-Fischer
rep.dot@gmail.com wrote:
 On 10 July 2015 at 14:31, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

 PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either 
 ;)

err, arm and aarch64 have no -mbranch-cost, a couple of prominent
other arches do..