subject:"RE\: \[RFC, Patch\]\: Optimized changes in the register used inside loop for LICM and IVOPTS."

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-30 Thread Jeff Law


On 11/29/2015 09:24 AM, Ajit Kumar Agarwal wrote:


I agree with the above.  To add up on the above, we only require to calculate 
the set of objects ( SSA_NAMES) that are live at the birth or the header of the 
loop.
We don't need to calculate the live through the Loop considering Live in and 
Live out of all the basic blocks of the Loop. This is because the set of 
objects (SSA_NAMES)
That are live-in at the birth or header of the loop will be live-in at every 
node in the Loop.

If a v live out at the header of the loop then the variable is live-in at every 
node in the Loop. To prove this, Consider a Loop L with header h such that
The variable v defined at d is live-in at h. Since v is live at h, d is not 
part of L. This follows from the dominance property, i.e. h is strictly 
dominated by d.
Furthermore, there exists a path from h to a use of v which does not go through 
d. For every node of the loop, p, since the loop is strongly connected
Component of the CFG, there exists a path, consisting only of nodes of L from p 
to h. Concatenating those two paths prove that v is live-in and live-out
Of p.

On top of live-in at the birth or header of the loop as proven above, if we 
calculate the Live out of the exit block of the block and Live-in at the 
destination
Edge of the exit block of the loops. This consider the liveness outside of the 
Loop.

The above two cases forms the basis of better estimator for register pressure 
as far as LICM is concerned.

If you agree with the above, I will implement add the above in the patch for 
register_used estimates for better estimate of register pressure for LICM.

Yes, I think we're in agreement.

jeff

RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-29 Thread Ajit Kumar Agarwal

Hello Jeff:

-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Tuesday, November 17, 2015 4:30 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.

On 11/16/2015 10:36 AM, Ajit Kumar Agarwal wrote:
>> For Induction variable optimization on tree SSA representation, the 
>> register used logic is based on the number of phi nodes at the loop 
>> header to represent the liveness at the loop.  Current Logic used 
>> only the number of phi nodes at the loop header.  Changes are made to 
>> represent the phi operands also live at the loop. Thus number of phi 
>> operands also gets incremented in the number of registers used.
>>But my question is why is the # of PHI operands useful here.  You'd have a 
>>stronger argument if it was the number of unique operands in each PHI. 
 >> While I don't doubt this patch improved things, I think it's just putting a 
 >> band-aid over the problem.

>>I think anything that just looks at PHIs or at register liveness at loop 
>>boundaries is inherently underestimating the register pressure implications 
>>of code motion from inside to outside a >>loop.

>>If an object is pulled out of the loop, then it's going to conflict with 
>>nearly every object that births in the loop (because the object being moved 
>>now has to live throughout the loop).  >>There's exceptions, but I doubt they 
>>matter in practice.  The object is also going to conflict with anything else 
>>that is live through the loop.  At least that's how it seems to me at first 
>>>>thought.

>>So build the set of objects (SSA_NAMEs) that either birth or are live through 
>>the loop that have the same type class as the object we want to hoist out of 
>>the loop (scalar, floating point, >>vector).  Use that set of objects to 
>>estimate register pressure.

I agree with the above.  To add up on the above, we only require to calculate 
the set of objects ( SSA_NAMES) that are live at the birth or the header of the 
loop.
We don't need to calculate the live through the Loop considering Live in and 
Live out of all the basic blocks of the Loop. This is because the set of 
objects (SSA_NAMES) 
That are live-in at the birth or header of the loop will be live-in at every 
node in the Loop.

If a v live out at the header of the loop then the variable is live-in at every 
node in the Loop. To prove this, Consider a Loop L with header h such that
The variable v defined at d is live-in at h. Since v is live at h, d is not 
part of L. This follows from the dominance property, i.e. h is strictly 
dominated by d.
Furthermore, there exists a path from h to a use of v which does not go through 
d. For every node of the loop, p, since the loop is strongly connected
Component of the CFG, there exists a path, consisting only of nodes of L from p 
to h. Concatenating those two paths prove that v is live-in and live-out
Of p.

On top of live-in at the birth or header of the loop as proven above, if we 
calculate the Live out of the exit block of the block and Live-in at the 
destination
Edge of the exit block of the loops. This consider the liveness outside of the 
Loop.

The above two cases forms the basis of better estimator for register pressure 
as far as LICM is concerned.

If you agree with the above, I will implement add the above in the patch for 
register_used estimates for better estimate of register pressure for LICM.

Thanks & Regards
Ajit

>>It won't be exact because some of those objects could end up coloring the 
>>same.  BUt it's probably still considerably more accurate than what we have 
>>now.

>>I suspect that would be a better estimator for register pressure as far as 
>>LICM is concerned.

jeff

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-16 Thread Bin.Cheng

On Tue, Nov 17, 2015 at 1:56 AM, Ajit Kumar Agarwal
 wrote:
>
> Sorry I missed out some of the points in earlier mail which is given below.
>
> -Original Message-
> From: Ajit Kumar Agarwal
> Sent: Monday, November 16, 2015 11:07 PM
> To: 'Jeff Law'; GCC Patches
> Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: RE: [RFC, Patch]: Optimized changes in the register used inside loop 
> for LICM and IVOPTS.
>
>
>
> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Friday, November 13, 2015 11:44 AM
> To: Ajit Kumar Agarwal; GCC Patches
> Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
> for LICM and IVOPTS.
>
> On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:
>
>>
>> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>>
>>
>>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00
>> 2001
>> From: Ajit Kumar Agarwal
>> Date: Wed, 7 Oct 2015 20:50:40 +0200
>> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
>>   loop for LICM and IVOPTS.
>>
>> Changes are done in the Loop Invariant(LICM) at RTL level and also the
>> Induction variable optimization based on SSA representation. The
>> current logic used in LICM for register used inside the loops is
>> changed. The Live Out of the loop latch node and the Live in of the
>> destination of the exit nodes is used to set the Loops Liveness at the exit 
>> of the Loop.
>> The register used is the number of live variables at the exit of the
>> Loop calculated above.
>>
>> For Induction variable optimization on tree SSA representation, the
>> register used logic is based on the number of phi nodes at the loop
>> header to represent the liveness at the loop.  Current Logic used only
>> the number of phi nodes at the loop header.  Changes are made to
>> represent the phi operands also live at the loop. Thus number of phi
>> operands also gets incremented in the number of registers used.
>>
>> ChangeLog:
>> 2015-10-09  Ajit Agarwal
>>
>>   * loop-invariant.c (compute_loop_liveness): New.
>>   (determine_regs_used): New.
>>   (find_invariants_to_move): Use of determine_regs_used.
>>   * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>>   arguments for register used.
>>>I think Bin rejected the tree-ssa-loop-ivopts change.  However, the 
>>>loop-invariant change is still pending, right?
>
>
>>
>> Signed-off-by:Ajit agarwalajit...@xilinx.com
>> ---
>>   gcc/loop-invariant.c   | 72 
>> +-
>>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>>   2 files changed, 60 insertions(+), 16 deletions(-)
>>
>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index
>> 52c8ae8..e4291c9 100644
>> --- a/gcc/loop-invariant.c
>> +++ b/gcc/loop-invariant.c
>> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>>   }
>>   }
>>
>> +static int
>> +determine_regs_used()
>> +{
>> +  unsigned int j;
>> +  unsigned int reg_used = 2;
>> +  bitmap_iterator bi;
>> +
>> +  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
>> +(reg_used) ++;
>> +
>> +  return reg_used;
>> +}
>>>Isn't this just bitmap_count_bits (regs_live) + 2?
>
>
>> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>>   }
>>   }
>>
>> -
>> +static void
>> +calculate_loop_liveness (void)
>>>Needs a function comment.
>
> I will incorporate the above comments.
>> +{
>> +  basic_block bb;
>> +  struct loop *loop;
>>
>> -/* Move the invariants out of the loops.  */
>> +  FOR_EACH_LOOP (loop, 0)
>> +if (loop->aux == NULL)
>> +  {
>> +loop->aux = xcalloc (1, sizeof (struct loop_data));
>> +bitmap_initialize (&LOOP_DATA (loop)->regs_live, ®_obstack);
>> + }
>> +
>> +  FOR_EACH_BB_FN (bb, cfun)
>>>Why loop over blocks here?  Why not just iterate through all the loops
>>>in the loop structure.  Order isn't particularly important AFAICT for
>>>this code.
>
> Iterating over the Loop structure is enough. We don't need iterating over the 
> basic blocks.
>
>> +   {
>> + int  i;
>> + edge e;
>> + vec edges;
>>

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-16 Thread Jeff Law


On 11/16/2015 10:36 AM, Ajit Kumar Agarwal wrote:

For Induction variable optimization on tree SSA representation, the
register used logic is based on the number of phi nodes at the loop
header to represent the liveness at the loop.  Current Logic used only
the number of phi nodes at the loop header.  Changes are made to
represent the phi operands also live at the loop. Thus number of phi
operands also gets incremented in the number of registers used.
But my question is why is the # of PHI operands useful here.  You'd have 
a stronger argument if it was the number of unique operands in each PHI. 
 While I don't doubt this patch improved things, I think it's just 
putting a band-aid over the problem.


I think anything that just looks at PHIs or at register liveness at loop 
boundaries is inherently underestimating the register pressure 
implications of code motion from inside to outside a loop.


If an object is pulled out of the loop, then it's going to conflict with 
nearly every object that births in the loop (because the object being 
moved now has to live throughout the loop).  There's exceptions, but I 
doubt they matter in practice.  The object is also going to conflict 
with anything else that is live through the loop.  At least that's how 
it seems to me at first thought.


So build the set of objects (SSA_NAMEs) that either birth or are live 
through the loop that have the same type class as the object we want to 
hoist out of the loop (scalar, floating point, vector).  Use that set of 
objects to estimate register pressure.


It won't be exact because some of those objects could end up coloring 
the same.  BUt it's probably still considerably more accurate than what 
we have now.


I suspect that would be a better estimator for register pressure as far 
as LICM is concerned.


jeff

RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-16 Thread Ajit Kumar Agarwal


Sorry I missed out some of the points in earlier mail which is given below.

-Original Message-
From: Ajit Kumar Agarwal 
Sent: Monday, November 16, 2015 11:07 PM
To: 'Jeff Law'; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: RE: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.



-Original Message-
From: Jeff Law [mailto:l...@redhat.com]
Sent: Friday, November 13, 2015 11:44 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.

On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:

>
> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>
>
>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00
> 2001
> From: Ajit Kumar Agarwal
> Date: Wed, 7 Oct 2015 20:50:40 +0200
> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
>   loop for LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation. The 
> current logic used in LICM for register used inside the loops is 
> changed. The Live Out of the loop latch node and the Live in of the 
> destination of the exit nodes is used to set the Loops Liveness at the exit 
> of the Loop.
> The register used is the number of live variables at the exit of the 
> Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the number of phi nodes at the loop 
> header to represent the liveness at the loop.  Current Logic used only 
> the number of phi nodes at the loop header.  Changes are made to 
> represent the phi operands also live at the loop. Thus number of phi 
> operands also gets incremented in the number of registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal
>
>   * loop-invariant.c (compute_loop_liveness): New.
>   (determine_regs_used): New.
>   (find_invariants_to_move): Use of determine_regs_used.
>   * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>   arguments for register used.
>>I think Bin rejected the tree-ssa-loop-ivopts change.  However, the 
>>loop-invariant change is still pending, right?


>
> Signed-off-by:Ajit agarwalajit...@xilinx.com
> ---
>   gcc/loop-invariant.c   | 72 
> +-
>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>   2 files changed, 60 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 
> 52c8ae8..e4291c9 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>   }
>   }
>
> +static int
> +determine_regs_used()
> +{
> +  unsigned int j;
> +  unsigned int reg_used = 2;
> +  bitmap_iterator bi;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
> +(reg_used) ++;
> +
> +  return reg_used;
> +}
>>Isn't this just bitmap_count_bits (regs_live) + 2?


> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>   }
>   }
>
> -
> +static void
> +calculate_loop_liveness (void)
>>Needs a function comment.

I will incorporate the above comments.
> +{
> +  basic_block bb;
> +  struct loop *loop;
>
> -/* Move the invariants out of the loops.  */
> +  FOR_EACH_LOOP (loop, 0)
> +if (loop->aux == NULL)
> +  {
> +loop->aux = xcalloc (1, sizeof (struct loop_data));
> +bitmap_initialize (&LOOP_DATA (loop)->regs_live, ®_obstack);
> + }
> +
> +  FOR_EACH_BB_FN (bb, cfun)
>>Why loop over blocks here?  Why not just iterate through all the loops 
>>in the loop structure.  Order isn't particularly important AFAICT for 
>>this code.

Iterating over the Loop structure is enough. We don't need iterating over the 
basic blocks.

> +   {
> + int  i;
> + edge e;
> + vec edges;
> + edges = get_loop_exit_edges (loop);
> + FOR_EACH_VEC_ELT (edges, i, e)
> + {
> +   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_OUT(e->src));
> +   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, 
> + DF_LR_IN(e->dest));
>>Space before the open-paren in the previous two lines DF_LR_OUT 
>>(e->src) and FD_LR_INT (e->dest))

I will incorporate this.

> + }
> +  }
> +  }
> +}
> +
> +/* Move the invariants  ut of the loops.  */
>>Looks lik

RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-16 Thread Ajit Kumar Agarwal



-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Friday, November 13, 2015 11:44 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.

On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:

>
> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>
>
>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 
> 2001
> From: Ajit Kumar Agarwal
> Date: Wed, 7 Oct 2015 20:50:40 +0200
> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
>   loop for LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation. The 
> current logic used in LICM for register used inside the loops is 
> changed. The Live Out of the loop latch node and the Live in of the 
> destination of the exit nodes is used to set the Loops Liveness at the exit 
> of the Loop.
> The register used is the number of live variables at the exit of the 
> Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the number of phi nodes at the loop 
> header to represent the liveness at the loop.  Current Logic used only 
> the number of phi nodes at the loop header.  Changes are made to 
> represent the phi operands also live at the loop. Thus number of phi 
> operands also gets incremented in the number of registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal
>
>   * loop-invariant.c (compute_loop_liveness): New.
>   (determine_regs_used): New.
>   (find_invariants_to_move): Use of determine_regs_used.
>   * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>   arguments for register used.
>>I think Bin rejected the tree-ssa-loop-ivopts change.  However, the 
>>loop-invariant change is still pending, right?


>
> Signed-off-by:Ajit agarwalajit...@xilinx.com
> ---
>   gcc/loop-invariant.c   | 72 
> +-
>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>   2 files changed, 60 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index 52c8ae8..e4291c9 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>   }
>   }
>
> +static int
> +determine_regs_used()
> +{
> +  unsigned int j;
> +  unsigned int reg_used = 2;
> +  bitmap_iterator bi;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
> +(reg_used) ++;
> +
> +  return reg_used;
> +}
>>Isn't this just bitmap_count_bits (regs_live) + 2?


> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>   }
>   }
>
> -
> +static void
> +calculate_loop_liveness (void)
>>Needs a function comment.

I will incorporate the above comments.
> +{
> +  basic_block bb;
> +  struct loop *loop;
>
> -/* Move the invariants out of the loops.  */
> +  FOR_EACH_LOOP (loop, 0)
> +if (loop->aux == NULL)
> +  {
> +loop->aux = xcalloc (1, sizeof (struct loop_data));
> +bitmap_initialize (&LOOP_DATA (loop)->regs_live, ®_obstack);
> + }
> +
> +  FOR_EACH_BB_FN (bb, cfun)
>>Why loop over blocks here?  Why not just iterate through all the loops 
>>in the loop structure.  Order isn't particularly important AFAICT for 
>>this code.

Iterating over the Loop structure is enough. We don't need iterating over the 
basic blocks.

> +   {
> + int  i;
> + edge e;
> + vec edges;
> + edges = get_loop_exit_edges (loop);
> + FOR_EACH_VEC_ELT (edges, i, e)
> + {
> +   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_OUT(e->src));
> +   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN(e->dest));
>>Space before the open-paren in the previous two lines
>>DF_LR_OUT (e->src) and FD_LR_INT (e->dest))

I will incorporate this.

> + }
> +  }
> +  }
> +}
> +
> +/* Move the invariants  ut of the loops.  */
>>Looks like you introduced a typo.

>>I'd like to see testcases which show the change in # regs used 
>>computation helping generate better code. 

We need to measure the test case with the scenario where the new variable 
created for loop invariant increases the register pressure and
the cost with respect to reg_used and new_regs increases that lead to spill and 
fetch and drop the i

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-13 Thread Richard Biener

On Fri, Nov 13, 2015 at 7:31 AM, Bin.Cheng  wrote:
> On Fri, Nov 13, 2015 at 2:13 PM, Jeff Law  wrote:
>> On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:
>>
>>>
>>> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>>>
>>>
>>>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 2001
>>> From: Ajit Kumar Agarwal
>>> Date: Wed, 7 Oct 2015 20:50:40 +0200
>>> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used
>>> inside
>>>   loop for LICM and IVOPTS.
>>>
>>> Changes are done in the Loop Invariant(LICM) at RTL level and also the
>>> Induction variable optimization based on SSA representation. The current
>>> logic used in LICM for register used inside the loops is changed. The
>>> Live Out of the loop latch node and the Live in of the destination of
>>> the exit nodes is used to set the Loops Liveness at the exit of the Loop.
>>> The register used is the number of live variables at the exit of the
>>> Loop calculated above.
>>>
>>> For Induction variable optimization on tree SSA representation, the
>>> register
>>> used logic is based on the number of phi nodes at the loop header to
>>> represent
>>> the liveness at the loop.  Current Logic used only the number of phi nodes
>>> at
>>> the loop header.  Changes are made to represent the phi operands also live
>>> at
>>> the loop. Thus number of phi operands also gets incremented in the number
>>> of
>>> registers used.
>>>
>>> ChangeLog:
>>> 2015-10-09  Ajit Agarwal
>>>
>>> * loop-invariant.c (compute_loop_liveness): New.
>>> (determine_regs_used): New.
>>> (find_invariants_to_move): Use of determine_regs_used.
>>> * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>>> arguments for register used.
>>
>> I think Bin rejected the tree-ssa-loop-ivopts change.  However, the
>> loop-invariant change is still pending, right?
> Ah, reject is a strong word, I am just being dumb and don't understand
> why it's a general better estimation yet.
> Maybe Richard have some inputs here?

Not really.  I agree with Bin that the change doesn't look like an improvement
by design (might be one by accident for some benchmarks).

Richard.

> Thanks,
> bin
>>
>>
>>>
>>> Signed-off-by:Ajit agarwalajit...@xilinx.com
>>> ---
>>>   gcc/loop-invariant.c   | 72
>>> +-
>>>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>>>   2 files changed, 60 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>>> index 52c8ae8..e4291c9 100644
>>> --- a/gcc/loop-invariant.c
>>> +++ b/gcc/loop-invariant.c
>>> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>>>   }
>>>   }
>>>
>>> +static int
>>> +determine_regs_used()
>>> +{
>>> +  unsigned int j;
>>> +  unsigned int reg_used = 2;
>>> +  bitmap_iterator bi;
>>> +
>>> +  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
>>> +(reg_used) ++;
>>> +
>>> +  return reg_used;
>>> +}
>>
>> Isn't this just bitmap_count_bits (regs_live) + 2?
>>
>>
>>> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>>>   }
>>>   }
>>>
>>> -
>>> +static void
>>> +calculate_loop_liveness (void)
>>
>> Needs a function comment.
>>
>>
>>> +{
>>> +  basic_block bb;
>>> +  struct loop *loop;
>>>
>>> -/* Move the invariants out of the loops.  */
>>> +  FOR_EACH_LOOP (loop, 0)
>>> +if (loop->aux == NULL)
>>> +  {
>>> +loop->aux = xcalloc (1, sizeof (struct loop_data));
>>> +bitmap_initialize (&LOOP_DATA (loop)->regs_live, ®_obstack);
>>> + }
>>> +
>>> +  FOR_EACH_BB_FN (bb, cfun)
>>
>> Why loop over blocks here?  Why not just iterate through all the loops in
>> the loop structure.  Order isn't particularly important AFAICT for this
>> code.
>>
>>
>>
>>> +   {
>>> + int  i;
>>> + edge e;
>>> + vec edges;
>>> + edges = get_loop_exit_edges (loop);
>>> + FOR_EACH_VEC_ELT (edges, i, e)
>>> + {
>>> +   bitmap_ior_into (&LOOP_DATA (loop)->regs_live,
>>> DF_LR_OUT(e->src));
>>> +   bitmap_ior_into (&LOOP_DATA (loop)->regs_live,
>>> DF_LR_IN(e->dest));
>>
>> Space before the open-paren in the previous two lines
>> DF_LR_OUT (e->src) and FD_LR_INT (e->dest))
>>
>>
>>> + }
>>> +  }
>>> +  }
>>> +}
>>> +
>>> +/* Move the invariants  ut of the loops.  */
>>
>> Looks like you introduced a typo.
>>
>> I'd like to see testcases which show the change in # regs used computation
>> helping generate better code.
>>
>> And  I'd also like to see some background information on why you think this
>> is a more accurate measure for the number of registers used in the loop.
>> regs_used AFAICT is supposed to be an estimate of the registers live around
>> the loop.  So ISTM that you get that value by live-out set on the backedge
>> of the loop.  I guess you get somethign similar by looking at the exit
>> edge's source block's live-out set.  But I don't see any value  in in

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-12 Thread Bin.Cheng

On Fri, Nov 13, 2015 at 2:13 PM, Jeff Law  wrote:
> On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:
>
>>
>> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>>
>>
>>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 2001
>> From: Ajit Kumar Agarwal
>> Date: Wed, 7 Oct 2015 20:50:40 +0200
>> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used
>> inside
>>   loop for LICM and IVOPTS.
>>
>> Changes are done in the Loop Invariant(LICM) at RTL level and also the
>> Induction variable optimization based on SSA representation. The current
>> logic used in LICM for register used inside the loops is changed. The
>> Live Out of the loop latch node and the Live in of the destination of
>> the exit nodes is used to set the Loops Liveness at the exit of the Loop.
>> The register used is the number of live variables at the exit of the
>> Loop calculated above.
>>
>> For Induction variable optimization on tree SSA representation, the
>> register
>> used logic is based on the number of phi nodes at the loop header to
>> represent
>> the liveness at the loop.  Current Logic used only the number of phi nodes
>> at
>> the loop header.  Changes are made to represent the phi operands also live
>> at
>> the loop. Thus number of phi operands also gets incremented in the number
>> of
>> registers used.
>>
>> ChangeLog:
>> 2015-10-09  Ajit Agarwal
>>
>> * loop-invariant.c (compute_loop_liveness): New.
>> (determine_regs_used): New.
>> (find_invariants_to_move): Use of determine_regs_used.
>> * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>> arguments for register used.
>
> I think Bin rejected the tree-ssa-loop-ivopts change.  However, the
> loop-invariant change is still pending, right?
Ah, reject is a strong word, I am just being dumb and don't understand
why it's a general better estimation yet.
Maybe Richard have some inputs here?

Thanks,
bin
>
>
>>
>> Signed-off-by:Ajit agarwalajit...@xilinx.com
>> ---
>>   gcc/loop-invariant.c   | 72
>> +-
>>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>>   2 files changed, 60 insertions(+), 16 deletions(-)
>>
>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>> index 52c8ae8..e4291c9 100644
>> --- a/gcc/loop-invariant.c
>> +++ b/gcc/loop-invariant.c
>> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>>   }
>>   }
>>
>> +static int
>> +determine_regs_used()
>> +{
>> +  unsigned int j;
>> +  unsigned int reg_used = 2;
>> +  bitmap_iterator bi;
>> +
>> +  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
>> +(reg_used) ++;
>> +
>> +  return reg_used;
>> +}
>
> Isn't this just bitmap_count_bits (regs_live) + 2?
>
>
>> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>>   }
>>   }
>>
>> -
>> +static void
>> +calculate_loop_liveness (void)
>
> Needs a function comment.
>
>
>> +{
>> +  basic_block bb;
>> +  struct loop *loop;
>>
>> -/* Move the invariants out of the loops.  */
>> +  FOR_EACH_LOOP (loop, 0)
>> +if (loop->aux == NULL)
>> +  {
>> +loop->aux = xcalloc (1, sizeof (struct loop_data));
>> +bitmap_initialize (&LOOP_DATA (loop)->regs_live, ®_obstack);
>> + }
>> +
>> +  FOR_EACH_BB_FN (bb, cfun)
>
> Why loop over blocks here?  Why not just iterate through all the loops in
> the loop structure.  Order isn't particularly important AFAICT for this
> code.
>
>
>
>> +   {
>> + int  i;
>> + edge e;
>> + vec edges;
>> + edges = get_loop_exit_edges (loop);
>> + FOR_EACH_VEC_ELT (edges, i, e)
>> + {
>> +   bitmap_ior_into (&LOOP_DATA (loop)->regs_live,
>> DF_LR_OUT(e->src));
>> +   bitmap_ior_into (&LOOP_DATA (loop)->regs_live,
>> DF_LR_IN(e->dest));
>
> Space before the open-paren in the previous two lines
> DF_LR_OUT (e->src) and FD_LR_INT (e->dest))
>
>
>> + }
>> +  }
>> +  }
>> +}
>> +
>> +/* Move the invariants  ut of the loops.  */
>
> Looks like you introduced a typo.
>
> I'd like to see testcases which show the change in # regs used computation
> helping generate better code.
>
> And  I'd also like to see some background information on why you think this
> is a more accurate measure for the number of registers used in the loop.
> regs_used AFAICT is supposed to be an estimate of the registers live around
> the loop.  So ISTM that you get that value by live-out set on the backedge
> of the loop.  I guess you get somethign similar by looking at the exit
> edge's source block's live-out set.  But I don't see any value  in including
> stuff live at the block outside the loop.
>
> It also seems fairly non-intuitive.  Get the block's latch and use its
> live-out set.  That seems more intuitive.
>

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-12 Thread Jeff Law


On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:



0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch


 From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 2001
From: Ajit Kumar Agarwal
Date: Wed, 7 Oct 2015 20:50:40 +0200
Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
  loop for LICM and IVOPTS.

Changes are done in the Loop Invariant(LICM) at RTL level and also the
Induction variable optimization based on SSA representation. The current
logic used in LICM for register used inside the loops is changed. The
Live Out of the loop latch node and the Live in of the destination of
the exit nodes is used to set the Loops Liveness at the exit of the Loop.
The register used is the number of live variables at the exit of the
Loop calculated above.

For Induction variable optimization on tree SSA representation, the register
used logic is based on the number of phi nodes at the loop header to represent
the liveness at the loop.  Current Logic used only the number of phi nodes at
the loop header.  Changes are made to represent the phi operands also live at
the loop. Thus number of phi operands also gets incremented in the number of
registers used.

ChangeLog:
2015-10-09  Ajit Agarwal

* loop-invariant.c (compute_loop_liveness): New.
(determine_regs_used): New.
(find_invariants_to_move): Use of determine_regs_used.
* tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
arguments for register used.
I think Bin rejected the tree-ssa-loop-ivopts change.  However, the 
loop-invariant change is still pending, right?





Signed-off-by:Ajit agarwalajit...@xilinx.com
---
  gcc/loop-invariant.c   | 72 +-
  gcc/tree-ssa-loop-ivopts.c |  4 +--
  2 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 52c8ae8..e4291c9 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
  }
  }

+static int
+determine_regs_used()
+{
+  unsigned int j;
+  unsigned int reg_used = 2;
+  bitmap_iterator bi;
+
+  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
+(reg_used) ++;
+
+  return reg_used;
+}

Isn't this just bitmap_count_bits (regs_live) + 2?



@@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
  }
  }

-
+static void
+calculate_loop_liveness (void)

Needs a function comment.



+{
+  basic_block bb;
+  struct loop *loop;

-/* Move the invariants out of the loops.  */
+  FOR_EACH_LOOP (loop, 0)
+if (loop->aux == NULL)
+  {
+loop->aux = xcalloc (1, sizeof (struct loop_data));
+bitmap_initialize (&LOOP_DATA (loop)->regs_live, ®_obstack);
+ }
+
+  FOR_EACH_BB_FN (bb, cfun)
Why loop over blocks here?  Why not just iterate through all the loops 
in the loop structure.  Order isn't particularly important AFAICT for 
this code.





+   {
+ int  i;
+ edge e;
+ vec edges;
+ edges = get_loop_exit_edges (loop);
+ FOR_EACH_VEC_ELT (edges, i, e)
+ {
+   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_OUT(e->src));
+   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN(e->dest));

Space before the open-paren in the previous two lines
DF_LR_OUT (e->src) and FD_LR_INT (e->dest))



+ }
+  }
+  }
+}
+
+/* Move the invariants  ut of the loops.  */

Looks like you introduced a typo.

I'd like to see testcases which show the change in # regs used 
computation helping generate better code.


And  I'd also like to see some background information on why you think 
this is a more accurate measure for the number of registers used in the 
loop.  regs_used AFAICT is supposed to be an estimate of the registers 
live around the loop.  So ISTM that you get that value by live-out set 
on the backedge of the loop.  I guess you get somethign similar by 
looking at the exit edge's source block's live-out set.  But I don't see 
any value  in including stuff live at the block outside the loop.


It also seems fairly non-intuitive.  Get the block's latch and use its 
live-out set.  That seems more intuitive.

RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-10-11 Thread Ajit Kumar Agarwal



-Original Message-
From: Bin.Cheng [mailto:amker.ch...@gmail.com] 
Sent: Friday, October 09, 2015 8:15 AM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.

On Thu, Oct 8, 2015 at 1:53 PM, Ajit Kumar Agarwal 
 wrote:
>
>
> -Original Message-
> From: Bin.Cheng [mailto:amker.ch...@gmail.com]
> Sent: Thursday, October 08, 2015 10:29 AM
> To: Ajit Kumar Agarwal
> Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
> Hunsigida; Nagaraju Mekala
> Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
> for LICM and IVOPTS.
>
> On Thu, Oct 8, 2015 at 12:32 PM, Ajit Kumar Agarwal 
>  wrote:
>> Following Proposed:
>>
>> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
>> Induction variable optimization based on SSA representation.
>> The current logic used in LICM for register used inside the loops is 
>> changed. The Live Out of the loop latch node and the Live in of the 
>> destination of the exit nodes is used to set the Loops Liveness at the exit 
>> of the Loop. The register used is the number of live variables at the exit 
>> of the Loop calculated above.
>>
>> For Induction variable optimization on tree SSA representation, the 
>> register used logic is based on the number of phi nodes at the loop 
>> header to represent the liveness at the loop. Current Logic used only the 
>> number of phi nodes at the loop header. I have made changes  to represent 
>> the phi operands also live at the loop. Thus number of phi operands also 
>> gets incremented in the number of registers used.
> Hi,
>>>For the GIMPLE IVO part, I don't think the change is reasonable enough.  
>>>IMHO, IVO fails to restrict iv number in some complex cases, your change 
>>>tries to >>rectify that by increasing register pressure irrespective to 
>>>out-of-ssa and coalescing.  I think the original code models reg-pressure 
>>>better, what needs to be >>changed is how we compute cost from register 
>>>pressure and use that to restrict iv number.
>
> Considering the liveness with respect to all the phi arguments will 
> not increase the register pressure. It improves the heuristics for 
> restricting The IV that increases the register pressure. The cost 
> model uses regs_used and modelling the
>>I think register pressure is increased along with regs_needed, doesn't matter 
>>if it will be canceled in estimate_reg_pressure_cost for both ends of cost 
>>>>comparison.
>>Liveness with respect to the phi arguments measures
> Better register pressure.
>>I agree IV number should be controlled for some cases, but not by increasing 
>>`n' using phi argument number unconditionally.  Considering summary 
>>>>reduction as an example, most likely the ssa names will be coalesced and 
>>held in single register.  Furthermore, there is no reason to count phi 
>>node/arg >>number for floating point phi nodes.

>
> Number of phi nodes in the loop header is not only the criteria for 
> regs_used, but the number of liveness with respect to loop should be Criteria 
> to measure appropriate register pressure.
>>IMHO, it's hard to accurately track liveness info on SSA(PHI), because of 
>>coalescing etc.  So could you give some examples/proof for this?

I agree with you that it is hard to predict the exact mapping from SSA to the 
actual register allocation due to coalescing and out of SSA.

The Interference on phi arguments and results are important criteria for 
register pressure on SSA. The conventional SSA where the  phi 
arguments don't interfere. Most of the current compilers don't have 
conventional SSA. In the Non-conventional SSA there are chances
the phi arguments interfere. The Non-Conventional SSA arises due to the copy 
propagation of ssa names  makes the phi arguments
interfere. Due to non-conventional nature of SSA the phi arguments interfere 
and should be considered for the register used.  I interpret
the register used as the number of interfering live ranges that leads to 
increase or decrease in register pressure. 

On top of the above the Out of SSA or SSA names coalescing, for conventional 
SSA is quite simple as each phi nodes is assigned to new
variables and the def and use is replaced with the new variables and makes the 
case of assigning single register and then the corresponding 
phi node is removed. But in the Non- Conventional nature of SSA, the out of ssa 
makes the SSA conventional by inserting copying to each
of the predecessor node and assigned it to new va

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-10-08 Thread Bin.Cheng

On Thu, Oct 8, 2015 at 1:53 PM, Ajit Kumar Agarwal
 wrote:
>
>
> -Original Message-
> From: Bin.Cheng [mailto:amker.ch...@gmail.com]
> Sent: Thursday, October 08, 2015 10:29 AM
> To: Ajit Kumar Agarwal
> Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
> Nagaraju Mekala
> Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
> for LICM and IVOPTS.
>
> On Thu, Oct 8, 2015 at 12:32 PM, Ajit Kumar Agarwal 
>  wrote:
>> Following Proposed:
>>
>> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
>> Induction variable optimization based on SSA representation.
>> The current logic used in LICM for register used inside the loops is
>> changed. The Live Out of the loop latch node and the Live in of the
>> destination of the exit nodes is used to set the Loops Liveness at the exit 
>> of the Loop. The register used is the number of live variables at the exit 
>> of the Loop calculated above.
>>
>> For Induction variable optimization on tree SSA representation, the
>> register used logic is based on the number of phi nodes at the loop
>> header to represent the liveness at the loop. Current Logic used only the 
>> number of phi nodes at the loop header. I have made changes  to represent 
>> the phi operands also live at the loop. Thus number of phi operands also 
>> gets incremented in the number of registers used.
> Hi,
>>>For the GIMPLE IVO part, I don't think the change is reasonable enough.  
>>>IMHO, IVO fails to restrict iv number in some complex cases, your change 
>>>tries to >>rectify that by increasing register pressure irrespective to 
>>>out-of-ssa and coalescing.  I think the original code models reg-pressure 
>>>better, what needs to be >>changed is how we compute cost from register 
>>>pressure and use that to restrict iv number.
>
> Considering the liveness with respect to all the phi arguments will not 
> increase the register pressure. It improves the heuristics for restricting
> The IV that increases the register pressure. The cost model uses regs_used 
> and modelling the
I think register pressure is increased along with regs_needed, doesn't
matter if it will be canceled in estimate_reg_pressure_cost for both
ends of cost comparison.
Liveness with respect to the phi arguments measures
> Better register pressure.
I agree IV number should be controlled for some cases, but not by
increasing `n' using phi argument number unconditionally.  Considering
summary reduction as an example, most likely the ssa names will be
coalesced and held in single register.  Furthermore, there is no
reason to count phi node/arg number for floating point phi nodes.

>
> Number of phi nodes in the loop header is not only the criteria for 
> regs_used, but the number of liveness with respect to loop should be
> Criteria to measure appropriate register pressure.
IMHO, it's hard to accurately track liveness info on SSA(PHI), because
of coalescing etc.  So could you give some examples/proof for this?

Thanks,
bin
>
> Thanks & Regards
> Ajit

RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-10-07 Thread Ajit Kumar Agarwal



-Original Message-
From: Bin.Cheng [mailto:amker.ch...@gmail.com] 
Sent: Thursday, October 08, 2015 10:29 AM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.

On Thu, Oct 8, 2015 at 12:32 PM, Ajit Kumar Agarwal 
 wrote:
> Following Proposed:
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation.
> The current logic used in LICM for register used inside the loops is 
> changed. The Live Out of the loop latch node and the Live in of the 
> destination of the exit nodes is used to set the Loops Liveness at the exit 
> of the Loop. The register used is the number of live variables at the exit of 
> the Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the number of phi nodes at the loop 
> header to represent the liveness at the loop. Current Logic used only the 
> number of phi nodes at the loop header. I have made changes  to represent the 
> phi operands also live at the loop. Thus number of phi operands also gets 
> incremented in the number of registers used.
Hi,
>>For the GIMPLE IVO part, I don't think the change is reasonable enough.  
>>IMHO, IVO fails to restrict iv number in some complex cases, your change 
>>tries to >>rectify that by increasing register pressure irrespective to 
>>out-of-ssa and coalescing.  I think the original code models reg-pressure 
>>better, what needs to be >>changed is how we compute cost from register 
>>pressure and use that to restrict iv number.

Considering the liveness with respect to all the phi arguments will not 
increase the register pressure. It improves the heuristics for restricting
The IV that increases the register pressure. The cost model uses regs_used and 
modelling the Liveness with respect to the phi arguments measures
Better register pressure.

Number of phi nodes in the loop header is not only the criteria for regs_used, 
but the number of liveness with respect to loop should be 
Criteria to measure appropriate register pressure.

Thanks & Regards
Ajit
>>As for the specific function determine_set_costs, I think one change is 
>>necessary to rule out all floating point phi nodes, because they do not have 
>>impact on >>IVO register pressure.  Actually this change will further reduce 
>>register pressure for fp related cases.


Thanks,
bin
>
> Performance runs:
>
> Bootstrapping with i386 goes through fine. The spec cpu 2000 
> benchmarks is run and following performance runs and the code size for
>  i386 target seen.
>
> Ratio with the above optimization changes vs ratio without above 
> optimizations for INT benchmarks (3785.261 vs 3783.064).
> Ratio with the above optimization changes vs ratio without above optimization 
> for FP benchmarks ( 4676.763189 vs 4676.072428 ).
>
> Code size reduction for INT benchmarks : 2324 instructions.
> Code size reduction for FP benchmarks : 1283 instructions.
>
> For Microblaze target the Mibench and EEMBC benchmarks is run and the 
> following improvements is seen.
>
> (qos_lite(5.3%), consumer_jpeg_c(1.34%), security_rijndael_d(1.8%), 
> security_rijndael_e(1.4%))
>
> Code Size reduction for Mibench  = 16164 instructions.
> Code Size reduction for EEMBC = 98 instructions.
>
> Patch ChangeLog:
>
> PATCH] [RFC, Patch]: Optimized changes in the register used inside  loop for 
> LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation. The current 
> logic used in LICM for register used inside the loops is changed.
> The Live Out of the loop latch node and the Live in of the destination 
> of the exit nodes is used to set the  Loops Liveness at the exit of 
> the Loop. The register used is the number of live variables at the exit of 
> the  Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the  number of phi nodes at the loop 
> header to represent the liveness at the loop.  Current Logic used only  
> the number of phi nodes at the loop header.  Changes are made to represent 
> the phi operands also live  at the loop. Thus number of phi operands also 
> gets incremented in the number of registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal  
>
> * loop-invariant.c (compute_loop_liveness): New.
> (determine_regs_used): New.
> (find_invariants_to_move): Use of determine_regs_used.
> * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
> arguments for register used.
>
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com
>
> Thanks & Regards
> Ajit

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-10-07 Thread Bin.Cheng

On Thu, Oct 8, 2015 at 12:32 PM, Ajit Kumar Agarwal
 wrote:
> Following Proposed:
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation.
> The current logic used in LICM for register used inside the loops is changed. 
> The Live Out of the loop latch node and the Live in of the
> destination of the exit nodes is used to set the Loops Liveness at the exit 
> of the Loop. The register used is the number of live variables
> at the exit of the Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the register 
> used logic is based on the number of phi nodes at the loop
> header to represent the liveness at the loop. Current Logic used only the 
> number of phi nodes at the loop header. I have made changes
>  to represent the phi operands also live at the loop. Thus number of phi 
> operands also gets incremented in the number of registers used.
Hi,
For the GIMPLE IVO part, I don't think the change is reasonable
enough.  IMHO, IVO fails to restrict iv number in some complex cases,
your change tries to rectify that by increasing register pressure
irrespective to out-of-ssa and coalescing.  I think the original code
models reg-pressure better, what needs to be changed is how we compute
cost from register pressure and use that to restrict iv number.
As for the specific function determine_set_costs, I think one change
is necessary to rule out all floating point phi nodes, because they do
not have impact on IVO register pressure.  Actually this change will
further reduce register pressure for fp related cases.

Thanks,
bin
>
> Performance runs:
>
> Bootstrapping with i386 goes through fine. The spec cpu 2000 benchmarks is 
> run and following performance runs and the code size for
>  i386 target seen.
>
> Ratio with the above optimization changes vs ratio without above 
> optimizations for INT benchmarks (3785.261 vs 3783.064).
> Ratio with the above optimization changes vs ratio without above optimization 
> for FP benchmarks ( 4676.763189 vs 4676.072428 ).
>
> Code size reduction for INT benchmarks : 2324 instructions.
> Code size reduction for FP benchmarks : 1283 instructions.
>
> For Microblaze target the Mibench and EEMBC benchmarks is run and the 
> following improvements is seen.
>
> (qos_lite(5.3%), consumer_jpeg_c(1.34%), security_rijndael_d(1.8%), 
> security_rijndael_e(1.4%))
>
> Code Size reduction for Mibench  = 16164 instructions.
> Code Size reduction for EEMBC = 98 instructions.
>
> Patch ChangeLog:
>
> PATCH] [RFC, Patch]: Optimized changes in the register used inside  loop for 
> LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization
> based on SSA representation. The current logic used in LICM for register used 
> inside the loops is changed.
> The Live Out of the loop latch node and the Live in of the destination of the 
> exit nodes is used to set the
>  Loops Liveness at the exit of the Loop. The register used is the number of 
> live variables at the exit of the
>  Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the register 
> used logic is based on the
>  number of phi nodes at the loop header to represent the liveness at the 
> loop.  Current Logic used only
>  the number of phi nodes at the loop header.  Changes are made to represent 
> the phi operands also live
>  at the loop. Thus number of phi operands also gets incremented in the number 
> of registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal  
>
> * loop-invariant.c (compute_loop_liveness): New.
> (determine_regs_used): New.
> (find_invariants_to_move): Use of determine_regs_used.
> * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
> arguments for register used.
>
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com
>
> Thanks & Regards
> Ajit

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

13 matches

Site Navigation

Mail list logo

Footer information