RE: [Patch,microblaze]: Better register allocation to minimize the spill and fetch.

2016-02-01 Thread Ajit Kumar Agarwal


-Original Message-
From: Mike Stump [mailto:mikest...@comcast.net] 
Sent: Tuesday, February 02, 2016 12:12 AM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,microblaze]: Better register allocation to minimize the 
spill and fetch.

On Jan 29, 2016, at 2:31 AM, Ajit Kumar Agarwal <ajit.kumar.agar...@xilinx.com> 
wrote:
> 
> This patch improves the allocation of registers in the given function.

>>Is it just me, or, would it be even better to change the abi and make 
>>MB_ABI_ASM_TEMP_REGNUM be allocated by the register allocator?

Yes, it would be even better to make r18 (MB_ABI_ASM_TEMP_REGNUM) to be 
allocated by the register allocator
for the given function. Currently r18 is marked as FIXED_REGISTERS and cannot 
be allocated to the given function only
used for temporaries where the liveness is limited. R18 is used in some of the 
shift patterns and also for the conditional
branches temporaries.

I have made some of the ABI changes making r18 as not FIXED_REGISTERS in my 
local branch and the given function can
be allocated with r18.This will reduce the spill and fetch to a great extent 
and I am seeing good amount of gains with this 
change for Mibench/EEMBC benchmarks.

Since the ABI specifies  r18 is reserved for assembler temporaries, lot of 
kernel driven code and glibc code  uses
 r18 as a temporaries for inline assembly. Changing the ABI requires lot of 
changes in the kernel specific code for
Microblaze where r18 is used as a temporary without storing and fetching after 
the scope is gone.

Currently we don't have  plans to change the ABI that might break the existing 
Kernel and GLIBC code. Changing
the ABI require a good amount of work to be done in Kernel and GLIBC code for 
Microblaze.

Thanks & Regards
Ajit


RE: [Patch,microblaze]: Better register allocation to minimize the spill and fetch.

2016-02-01 Thread Ajit Kumar Agarwal


-Original Message-
From: Michael Eager [mailto:ea...@eagerm.com] 
Sent: Friday, January 29, 2016 11:33 PM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,microblaze]: Better register allocation to minimize the 
spill and fetch.

On 01/29/2016 02:31 AM, Ajit Kumar Agarwal wrote:
>
> This patch improves the allocation of registers in the given function. 
> The allocation is optimized for the conditional branches. The 
> temporary register used in the conditional branches to store the 
> comparison results and use of temporary in the conditional branch is 
> optimized. Such temporary registers are allocated with a fixed register r18.
>
> Currently such temporaries are allocated with a free registers in the 
> given function. Due to this one of the free register is reserved for 
> the temporaries and given function is left with a few registers. This 
> is unoptimized with respect to microblaze. In Microblaze r18 is marked as 
> fixed and cannot be allocated to pseudos'
> in the given function. Instead r18 can be used as a temporary for the 
> conditional branches with compare and branch. Use of r18 as a 
> temporary for conditional branches will save one of the free registers 
> to be allocated. The free registers can be used for other pseudos' and hence 
> the better register allocation.
>
> The usage of r18 as above reduces the spill and fetch because of the 
> availability of one of the free registers to other pseudos instead of 
> being used for conditional temporaries.
>
> The advantage of the above is that the scope of the temporaries is 
> limited to the conditional branches and hence the usage of r18 as 
> temporary for such conditional branches is optimized and preserve the 
> functionality of the function.
>
> Regtested for Microblaze target.
>
> Performance runs are done with Mibench/EEMBC benchmarks.
>
> Following gains are achieved.
>
> Benchmarks  Gains
>
> automotive_qsort1   1.630730524%
> network_dijkstra   1.527506256%
> office_stringsearch   1 1.81356288%
> security_rijndael_d   3.26129357%
> basefp01_lite  4.465120185%
> a2time01_lite  1.893862857%
> cjpeg_lite  3.286496675%
> djpeg_lite 3.120150612%
> qos_lite 2.63964381%
> office_ispell 1.531340405%
>
> Code Size improvements:
>
> Reduction in number of instructions for Mibench  :  12927.
> Reduction in number of instructions for EEMBC :   212.
>
> ChangeLog:
> 2016-01-29  Ajit Agarwal  <ajit...@xilinx.com>
>
>   * config/microblaze/microblaze.c
>   (microblaze_expand_conditional_branch): Use of MB_ABI_ASM_TEMP_REGNUM
>   for temporary conditional branch.
>   (microblaze_expand_conditional_branch_reg): Use of 
> MB_ABI_ASM_TEMP_REGNUM
>   for temporary conditional branch.
>   (microblaze_expand_conditional_branch_sf): Use of MB_ABI_ASM_TEMP_REGNUM
>   for temporary conditional branch.

>>You can combine these ChangeLog entries:

 >> * config/microblaze/microblaze.c
  >>(microblaze_expand_conditional_branch, 
microblaze_expand_conditional_branch_reg,
>>microblaze_expand_conditional_branch_sf): Use MB_ABI_ASM_TEMP_REGNUM 
for temp reg.

Attached is the patch with update in the changeLog.  Modified changeLog is 
given below.

ChangeLog:
2016-01-29  Ajit Agarwal  <ajit...@xilinx.com>

* config/microblaze/microblaze.c
(microblaze_expand_conditional_branch, 
microblaze_expand_conditional_branch_reg,
microblaze_expand_conditional_branch_sf): Use MB_ABI_ASM_TEMP_REGNUM 
for temp reg.

Thanks & Regards
Ajit

>>Otherwise, OK.

>
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com.
> ---
>   gcc/config/microblaze/microblaze.c |6 +++---
>   1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/microblaze/microblaze.c 
> b/gcc/config/microblaze/microblaze.c
> index baff67a..b4277ad 100644
> --- a/gcc/config/microblaze/microblaze.c
> +++ b/gcc/config/microblaze/microblaze.c
> @@ -3402,7 +3402,7 @@ microblaze_expand_conditional_branch (machine_mode 
> mode, rtx operands[])
> rtx cmp_op0 = operands[1];
> rtx cmp_op1 = operands[2];
> rtx label1 = operands[3];
> -  rtx comp_reg = gen_reg_rtx (SImode);
> +  rtx comp_reg =  gen_rtx_REG (SImode, MB_ABI_ASM_TEMP_REGNUM);
> rtx condition;
>
> gcc_assert ((GET_CODE (cmp_op0) == REG) || (GET_CODE (cmp_op0) == 
> SUBREG)); @@ -3439,7 +3439,7 @@ microblaze_expand_conditional_branch_reg 
> (enum machine_mode mode,
> rtx cmp_op0 = operands[1];
> rtx cmp_op1 = operands[2];
>

[Patch,microblaze]: Better register allocation to minimize the spill and fetch.

2016-01-29 Thread Ajit Kumar Agarwal

This patch improves the allocation of registers in the given function. The 
allocation
is optimized for the conditional branches. The temporary register used in the
conditional branches to store the comparison results and use of temporary in the
conditional branch is optimized. Such temporary registers are allocated with a 
fixed
register r18.

Currently such temporaries are allocated with a free registers in the given
function. Due to this one of the free register is reserved for the temporaries 
and
given function is left with a few registers. This is unoptimized with respect to
microblaze. In Microblaze r18 is marked as fixed and cannot be allocated to 
pseudos'
in the given function. Instead r18 can be used as a temporary for the 
conditional
branches with compare and branch. Use of r18 as a temporary for conditional 
branches
will save one of the free registers to be allocated. The free registers can be 
used
for other pseudos' and hence the better register allocation.

The usage of r18 as above reduces the spill and fetch because of the 
availability of
one of the free registers to other pseudos instead of being used for conditional
temporaries.

The advantage of the above is that the scope of the temporaries is limited to 
the
conditional branches and hence the usage of r18 as temporary for such 
conditional
branches is optimized and preserve the functionality of the function.

Regtested for Microblaze target.

Performance runs are done with Mibench/EEMBC benchmarks.

Following gains are achieved.

Benchmarks  Gains

automotive_qsort1   1.630730524%
network_dijkstra   1.527506256%
office_stringsearch   1 1.81356288%
security_rijndael_d   3.26129357%
basefp01_lite  4.465120185%
a2time01_lite  1.893862857%
cjpeg_lite  3.286496675%
djpeg_lite 3.120150612%
qos_lite 2.63964381%
office_ispell 1.531340405%

Code Size improvements:

Reduction in number of instructions for Mibench  :  12927.
Reduction in number of instructions for EEMBC :   212.

ChangeLog:
2016-01-29  Ajit Agarwal  

* config/microblaze/microblaze.c
(microblaze_expand_conditional_branch): Use of MB_ABI_ASM_TEMP_REGNUM
for temporary conditional branch.
(microblaze_expand_conditional_branch_reg): Use of 
MB_ABI_ASM_TEMP_REGNUM
for temporary conditional branch.
(microblaze_expand_conditional_branch_sf): Use of MB_ABI_ASM_TEMP_REGNUM
for temporary conditional branch.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com.
---
 gcc/config/microblaze/microblaze.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/microblaze/microblaze.c 
b/gcc/config/microblaze/microblaze.c
index baff67a..b4277ad 100644
--- a/gcc/config/microblaze/microblaze.c
+++ b/gcc/config/microblaze/microblaze.c
@@ -3402,7 +3402,7 @@ microblaze_expand_conditional_branch (machine_mode mode, 
rtx operands[])
   rtx cmp_op0 = operands[1];
   rtx cmp_op1 = operands[2];
   rtx label1 = operands[3];
-  rtx comp_reg = gen_reg_rtx (SImode);
+  rtx comp_reg =  gen_rtx_REG (SImode, MB_ABI_ASM_TEMP_REGNUM);
   rtx condition;
 
   gcc_assert ((GET_CODE (cmp_op0) == REG) || (GET_CODE (cmp_op0) == SUBREG));
@@ -3439,7 +3439,7 @@ microblaze_expand_conditional_branch_reg (enum 
machine_mode mode,
   rtx cmp_op0 = operands[1];
   rtx cmp_op1 = operands[2];
   rtx label1 = operands[3];
-  rtx comp_reg = gen_reg_rtx (SImode);
+  rtx comp_reg =  gen_rtx_REG (SImode, MB_ABI_ASM_TEMP_REGNUM);
   rtx condition;
 
   gcc_assert ((GET_CODE (cmp_op0) == REG)
@@ -3483,7 +3483,7 @@ microblaze_expand_conditional_branch_sf (rtx operands[])
   rtx condition;
   rtx cmp_op0 = XEXP (operands[0], 0);
   rtx cmp_op1 = XEXP (operands[0], 1);
-  rtx comp_reg = gen_reg_rtx (SImode);
+  rtx comp_reg =  gen_rtx_REG (SImode, MB_ABI_ASM_TEMP_REGNUM);
 
   emit_insn (gen_cstoresf4 (comp_reg, operands[0], cmp_op0, cmp_op1));
   condition = gen_rtx_NE (SImode, comp_reg, const0_rtx);
-- 
1.7.1

Thanks & Regards
Ajit




better-reg-alloc.patch
Description: better-reg-alloc.patch


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2016-01-27 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Wednesday, January 27, 2016 12:48 PM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 01/18/2016 11:27 AM, Ajit Kumar Agarwal wrote:

>>
>> Ajit, can you confirm which of adpcm_code or adpcm_decode where path 
>> splitting is showing a gain?  I suspect it's the former but would 
>> like to make sure so that I can adjust the heuristics properly.
>>> I'd still like to have this answered when you can Ajit, just to be 
>>> 100% that it's the path splitting in adpcm_code that's responsible 
>>> for the improvements you're seeing in adpcm.
>
> The adpcm_coder get optimized with path splitting whereas the 
> adpcm_decoder is not optimized further with path splitting. In 
> adpcm_decoder the join node is duplicated into its predecessors and 
> with the duplication of join node the code is not optimized further.
>>Right.  Just wanted to make sure my analysis corresponded with what you were 
>>seeing in your benchmarking -- and it does.

>>I suspect that if we looked at this problem from the angle of isolating paths 
>>based on how constant PHI arguments feed into and allow simplifications in 
>>later blocks that we might get >>better long term results -- including 
>>improving adpcm_decoder which has the same idiom as adpcm_coder -- it's just 
>>in the wrong spot in the CFG.
>>But that's obviously gcc-7 material.

Can I look into it.

Thanks & Regards
Ajit

Jeff



[patch,ira]: Improve on updated memory cost in coloring pass of integrated register allocator.

2016-01-23 Thread Ajit Kumar Agarwal

This patch improves the updated memory cost in coloring pass of integrated 
register
allocator. Only enter_freq of the loop is considered in updated memory cost in 
the
coloring pass. Consideration of only enter_freq is based on the concept that 
live Out
of the entry or header of the Loop is live in and liveout throughout the loop. 
Exit
freq is ignored in the update memory cost in coloring pass.

This increases the updated memory most and more chances of reducing the spill 
and
fetch and better assignment.

The concept of live-out of the header of the loop is live-in and live-out 
throughout
of the Loop is based on the following.

If a v live is out at the header of the loop then the variable is live-in at 
every node
in the loop. To prove this, consider a loop L with header h such that the 
variable v
defined at d is live-in at h. Since v is live at h, d is not part of L. This 
follows
from the dominance property, i.e. h is strictly dominated by d. Furthermore, 
there
exists a path from h to a use of v which does not go through d. For every node 
p in
the loop, since the loop is strongly connected and node is a component of the 
CFG,
there exists a path, consisting only of nodes of L from p to h. Concatenating 
these
two paths proves that v is live-in and live-out of p.

Bootstrapped on X86_64.

Performance run is done on SPEC CPU2000 benchmarks and following are the 
results.

SPEC INT benchmarks 
(Mean Score with this patch vs Mean score without this patch = 3729.777 vs 
3717.083).

BenchmarksGains.
186.crafty   = 2.78%
176.gcc = 0.7%
253.perlbmk = 0.75%
255.vortex=  0.82%

SPEC FP benchmarks
(Mean Score with this patch vs Mean score without this patch = 4774.65  vs 
4751.838 ).

Benchmarks  Gains

168.wupwise = 0.77%
171.swim= 1.5%
177.mesa= 1.2%
200.sixtrack= 1.2%
178.galgel= 0.6%
179.art = 0.6%
183.equake   = 0.5%
187.facerec   = 0.7%.

ChangeLog:
2016-01-23  Ajit Agarwal  

* ira-color.c
(color_pass): Consider only the enter_freq in calculation
of update memory cost.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com.
---
 gcc/ira-color.c |   12 +---
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 1e4c64f..201017c 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -3173,7 +3173,7 @@ static void
 color_pass (ira_loop_tree_node_t loop_tree_node)
 {
   int regno, hard_regno, index = -1, n;
-  int cost, exit_freq, enter_freq;
+  int cost, enter_freq;
   unsigned int j;
   bitmap_iterator bi;
   machine_mode mode;
@@ -3297,7 +3297,6 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
}
  continue;
}
- exit_freq = ira_loop_edge_freq (subloop_node, regno, true);
  enter_freq = ira_loop_edge_freq (subloop_node, regno, false);
  ira_assert (regno < ira_reg_equiv_len);
  if (ira_equiv_no_lvalue_p (regno))
@@ -3315,15 +3314,14 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
  else if (hard_regno < 0)
{
  ALLOCNO_UPDATED_MEMORY_COST (subloop_allocno)
-   -= ((ira_memory_move_cost[mode][rclass][1] * enter_freq)
-   + (ira_memory_move_cost[mode][rclass][0] * exit_freq));
+   -= ((ira_memory_move_cost[mode][rclass][1] * enter_freq));
}
  else
{
  aclass = ALLOCNO_CLASS (subloop_allocno);
  ira_init_register_move_cost_if_necessary (mode);
  cost = (ira_register_move_cost[mode][rclass][rclass]
- * (exit_freq + enter_freq));
+ * (enter_freq));
  ira_allocate_and_set_or_copy_costs
(_UPDATED_HARD_REG_COSTS (subloop_allocno), aclass,
 ALLOCNO_UPDATED_CLASS_COST (subloop_allocno),
@@ -3339,8 +3337,8 @@ color_pass (ira_loop_tree_node_t loop_tree_node)
ALLOCNO_UPDATED_CLASS_COST (subloop_allocno)
  = ALLOCNO_UPDATED_HARD_REG_COSTS (subloop_allocno)[index];
  ALLOCNO_UPDATED_MEMORY_COST (subloop_allocno)
-   += (ira_memory_move_cost[mode][rclass][0] * enter_freq
-   + ira_memory_move_cost[mode][rclass][1] * exit_freq);
+   += (ira_memory_move_cost[mode][rclass][0] * enter_freq);
+
}
}
 }
-- 
1.7.1

Thanks & Regards
Ajit








ira.patch
Description: ira.patch


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2016-01-18 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Saturday, January 16, 2016 12:03 PM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 01/04/2016 07:32 AM, Ajit Kumar Agarwal wrote:
>
>
> -Original Message- From: Jeff Law [mailto:l...@redhat.com]
> Sent: Wednesday, December 23, 2015 12:06 PM To: Ajit Kumar Agarwal; 
> Richard Biener Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
> Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re:
> [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
> representation
>
> On 12/11/2015 02:11 AM, Ajit Kumar Agarwal wrote:
>>
>> Mibench/EEMBC benchmarks (Target Microblaze)
>>
>> Automotive_qsort1(4.03%), Office_ispell(4.29%), 
>> Office_stringsearch1(3.5%). Telecom_adpcm_d( 1.37%), 
>> ospfv2_lite(1.35%).
>>> I'm having a real tough time reproducing any of these results.
>>> In fact, I'm having a tough time seeing cases where path splitting 
>>> even applies to the Mibench/EEMBC benchmarks
>>> >>mentioned above.
>
>>> In the very few cases where split-paths might apply, the net 
>>> resulting assembly code I get is the same with and without 
>>> split-paths.
>
>>> How consistent are these results?
>
> I am consistently getting the gains for office_ispell and 
> office_stringsearch1, telcom_adpcm_d. I ran it again today and we see 
> gains in the same bench mark tests with the split path changes.
>
>>> What functions are being affected that in turn impact performance?
>
> For office_ispell: The function are Function "linit (linit, 
> funcdef_no=0, decl_uid=2535, cgraph_uid=0, symbol_order=2) for 
> lookup.c file". "Function checkfile (checkfile, funcdef_no=1, 
> decl_uid=2478, cgraph_uid=1, symbol_order=4)" " Function correct 
> (correct, funcdef_no=2, decl_uid=2503, cgraph_uid=2, symbol_order=5)" 
> " Function askmode (askmode, funcdef_no=24, decl_uid=2464, 
> cgraph_uid=24, symbol_order=27)" for correct.c file.
>
> For office_stringsearch1: The function is Function "bmhi_search 
> (bmhi_search, funcdef_no=1, decl_uid=2178, cgraph_uid=1, 
> symbol_order=5)" for bmhisrch.c file.
>>Can you send me the pre-processed lookup.c, correct.c and bmhi_search.c?

>>I generated mine using x86 and that may be affecting my ability to reproduce 
>>your results on the microblaze target.  Looking specifically at bmhi_search.c 
>>and correct.c, I see they are >>going to be sensitive to the target headers.  
>>If (for exmaple) they use FORTIFY_SOURCE or macros for toupper.

>>In the bmhi_search I'm looking at, I don't see any opportunities for the path 
>>splitter to do anything.  The CFG just doesn't have the right shape.  Again, 
>>that may be an artifact of how >>toupper is implemented in the system header 
>>files -- hence my request for the cpp output on each of the important files.

Would you like me  to send the above files and function pre-processed with -E 
option flag.

Thanks & Regards
Ajit
Jeff


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2016-01-18 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Saturday, January 16, 2016 4:33 AM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 01/14/2016 01:55 AM, Jeff Law wrote:
[ Replying to myself again, mostly to make sure we've got these thoughts in the 
archives. ]
>
> Anyway, going back to adpcm_decode, we do end up splitting this path:
>
>   # vpdiff_12 = PHI <vpdiff_11(12), vpdiff_50(13)>
>if (sign_41 != 0)
>  goto ;
>else
>  goto ;
> ;;succ:   15
> ;;16
>
> ;;   basic block 15, loop depth 1
> ;;pred:   14
>valpred_51 = valpred_76 - vpdiff_12;
>goto ;
> ;;succ:   17
>
> ;;   basic block 16, loop depth 1
> ;;pred:   14
>valpred_52 = vpdiff_12 + valpred_76;
> ;;succ:   17
>
> ;;   basic block 17, loop depth 1
> ;;pred:   15
> ;;16
># valpred_7 = PHI <valpred_51(15), valpred_52(16)>
>_85 = MAX_EXPR <valpred_7, -32768>;
>valpred_13 = MIN_EXPR <_85, 32767>;
>step_53 = stepsizeTable[index_62];
>outp_54 = outp_69 + 2;
>_55 = (short int) valpred_13;
>MEM[base: outp_54, offset: -2B] = _55;
>if (outp_54 != _74)
>  goto ;
>else
>  goto ;
>
> This doesn't result in anything particularly interesting/good AFAICT. 
> We propagate valpred_51/52 into the use in the MAX_EXPR in the 
> duplicate paths, but that doesn't allow any further simplification.
>>So with the heuristic I'm poking at, this gets rejected.  Essentially it 
>>doesn't think it's likely to expose CSE/DCE opportunities (and it's correct). 
>> The number of statements in predecessor >>blocks that feed operands in the 
>>to-be-copied-block is too small relative to the size of the 
>>to-be-copied-block.


>
> Ajit, can you confirm which of adpcm_code or adpcm_decode where path 
> splitting is showing a gain?  I suspect it's the former but would like 
> to make sure so that I can adjust the heuristics properly.
>>I'd still like to have this answered when you can Ajit, just to be 100%
 >> that it's the path splitting in adpcm_code that's responsible for the 
 >> improvements you're seeing in adpcm.

The adpcm_coder get optimized with path splitting whereas the adpcm_decoder is 
not optimized further with path splitting. In adpcm_decoder
the join node is duplicated into its predecessors and with the duplication of 
join node the code is not optimized further.

In adpcm_coder with path splitting the following optimization is triggered with 
path splitting.

1. /* Output last step, if needed */
if ( !bufferstep )
  *outp++ = outputbuffer;

 IF-THEN inside the loop will be triggered with bufferstep is 1.  Then the 
flip happens and bufferstep is 0. For the exit branch if the bufferstep
Is 1 the flip convert it to 0  and above IF-THEN generate store to assign 
outputbuffer to outp.

The above sequence is optimized with path splitting, if the bufferstep is 1 
then exit branch of the loop branches to the above store. This does not require 
the flip of
bufferstep using xor with immediate 1. With this optimization there is one 
level of exit branch for the bufferstep 1 path. This lead to scheduling the
exit branch to the store with a meaningful instruction instead of xor with 
immediate 1.

Without Path Splitting if the bufferstep is 1  the exit branch of the loop 
branches to piece of branch flipping it to zero and the above IF-THEN outside 
the
loop does the store to assign outputbuffer to outp. Thus without path splitting 
there is two level of branch in the case of exit branch in the path where 
bufferstep is 1 inside the loop generating non optimized. Also without path 
splitting the two level of exit branch of the loop is scheduled with xor 
immediate with 1.

 Thanks & Regards
Ajit

jeff


[Patch,microblaze]: Optimized register reorganization for Microblaze.

2016-01-12 Thread Ajit Kumar Agarwal
The patch contains the changes in the macros fixed_registers and 
call_used_registers.
Earlier the register r21 is marked as fixed and also marked as 1 for call_used 
registers.
On top of that r21 is not assigned to any of the temporaries in rtl insns.

This makes the usage of registers r21 in the callee functions not possible and 
wasting
one registers to allocate in the callee function. The changes makes the 
register r21 as
allocatable to the callee function and optimized usage of the registers r21 in 
the callee
function reduces spill and fetch. The reduction in the spill and fetch is due 
to availability
of register r21 in the callee function. The availability of register r21 is 
made by
marking the register r21 as not fixed and the call_used registers is marked as 
0.
Also r20 is marked as fixed. The changes are done not to mark as fixed thus 
allowing
the registers to be used reducing the spill and fetch.

Regtested for Microblaze.

Performance runs made on Mibench/EEMBC benchmarks for microblaze. Following 
benchmarks 
shows the gains

 Benchmarks Gains
automotive_qsort1 =3.96%
automotive_susan_c = 7.68%
consumer_mad =9.6%
security_rijndael_d =19.57%
telecom_CRC32 =   7.66%
bitmnp01_lite =  10.61%
a2time01_lite =6.97%

ChangeLog:
2016-01-12  Ajit Agarwal  

* config/microblaze/microblaze.h
(FIXED_REGISTERS): Update in macro.
(CALL_USED_REGISTERS): Update in macro.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com.
---
 gcc/config/microblaze/microblaze.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/microblaze/microblaze.h 
b/gcc/config/microblaze/microblaze.h
index e115c42..dbfb652 100644
--- a/gcc/config/microblaze/microblaze.h
+++ b/gcc/config/microblaze/microblaze.h
@@ -253,14 +253,14 @@ extern enum pipeline_type microblaze_pipe;
 #define FIXED_REGISTERS
\
 {  \
   1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,  \
-  1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  \
+  1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  \
   1, 1, 1, 1   \
 }
 
 #define CALL_USED_REGISTERS\
 {  \
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  \
-  1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  \
+  1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  \
   1, 1, 1, 1   \
 }
 #define GP_REG_FIRST0
-- 
1.7.1

Thanks & Regards
Ajit



reg-reorg.patch
Description: reg-reorg.patch


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2016-01-04 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Wednesday, December 23, 2015 12:06 PM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 12/11/2015 02:11 AM, Ajit Kumar Agarwal wrote:
>
> Mibench/EEMBC benchmarks (Target Microblaze)
>
> Automotive_qsort1(4.03%), Office_ispell(4.29%), Office_stringsearch1(3.5%). 
> Telecom_adpcm_d( 1.37%), ospfv2_lite(1.35%).
>>I'm having a real tough time reproducing any of these results.  In fact, I'm 
>>having a tough time seeing cases where path splitting even applies to the 
>>Mibench/EEMBC benchmarks >>mentioned above.

>>In the very few cases where split-paths might apply, the net resulting 
>>assembly code I get is the same with and without split-paths.

>>How consistent are these results?

I am consistently getting the gains for office_ispell and office_stringsearch1, 
telcom_adpcm_d. I ran it again today and we see gains in the same bench mark 
tests 
with the split path changes.

>>What functions are being affected that in turn impact performance?

For office_ispell: The function are Function "linit (linit, funcdef_no=0, 
decl_uid=2535, cgraph_uid=0, symbol_order=2) for lookup.c file".
   "Function checkfile (checkfile, 
funcdef_no=1, decl_uid=2478, cgraph_uid=1, symbol_order=4)"
   " Function correct (correct, funcdef_no=2, 
decl_uid=2503, cgraph_uid=2, symbol_order=5)"
   " Function askmode (askmode, funcdef_no=24, 
decl_uid=2464, cgraph_uid=24, symbol_order=27)"
   for correct.c file.
  
For office_stringsearch1: The function is Function "bmhi_search (bmhi_search, 
funcdef_no=1, decl_uid=2178, cgraph_uid=1, symbol_order=5)"
for bmhisrch.c file.

>>What options are you using to compile the benchmarks?  I'm trying with
>>-O2 -fsplit-paths and -O3 in my attempts to trigger the transformation so 
>>that I can look more closely at possible heuristics.

I am using the following flags.

-O3 mlittle-endian -mxl-barrel-shift -mno-xl-soft-div -mhard-float 
-mxl-float-convert -mxl-float-sqrt   -mno-xl-soft-mul -mxl-multiply-high 
-mxl-pattern-compare.

To disable split paths -fno-split-paths is used on top of the above flags.

>>Is this with the standard microblaze-elf target?  Or with some other target?

I am using the --target=microblaze-xilinx-elf to build the microblaze target.

Thanks & Regards
Ajit

jeff




RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-12-25 Thread Ajit Kumar Agarwal
Hello Jeff:

I am out on vacation till 3rd Jan 2016.
Is it okay If I respond on the below once I am back in office.

Thanks & Regards
Ajit

-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Wednesday, December 23, 2015 12:06 PM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 12/11/2015 02:11 AM, Ajit Kumar Agarwal wrote:
>
> Mibench/EEMBC benchmarks (Target Microblaze)
>
> Automotive_qsort1(4.03%), Office_ispell(4.29%), Office_stringsearch1(3.5%). 
> Telecom_adpcm_d( 1.37%), ospfv2_lite(1.35%).
I'm having a real tough time reproducing any of these results.  In fact, I'm 
having a tough time seeing cases where path splitting even applies to the 
Mibench/EEMBC benchmarks mentioned above.

In the very few cases where split-paths might apply, the net resulting assembly 
code I get is the same with and without split-paths.

How consistent are these results?

What functions are being affected that in turn impact performance?

What options are you using to compile the benchmarks?  I'm trying with
-O2 -fsplit-paths and -O3 in my attempts to trigger the transformation so that 
I can look more closely at possible heuristics.

Is this with the standard microblaze-elf target?  Or with some other target?

jeff




RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-12-17 Thread Ajit Kumar Agarwal
Hello Jeff and Richard:

Here is the Summary of the FDO(Feedback Directed Optimization ) performance 
results.

SPEC CPU2000 INT benchmarks.
a) FDO + Splitting Paths enabled + tracer enabled
 Geomean Score = 3907.751673.
b) FDO + No Splitting Paths + tracer enabled
 Geomean Score = 3895.191536.

SPEC CPU2000 FP benchmarks.
a) FDO + Splitting Paths enabled + tracer enabled
 Geomean Score = 4793.321963
b) FDO + No Splitting Paths + tracer enabled
 Geomean Score = 4770.855467

The gains are maximum with Split Paths enabled + tracer pass enabled as 
compared to No Split Paths + tracer enabled. The 
Split Paths pass is very much required.

Thanks & Regards
Ajit

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Ajit Kumar Agarwal
Sent: Wednesday, December 16, 2015 3:44 PM
To: Richard Biener
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation



-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Richard Biener
Sent: Wednesday, December 16, 2015 3:27 PM
To: Ajit Kumar Agarwal
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On Wed, Dec 16, 2015 at 8:43 AM, Ajit Kumar Agarwal 
<ajit.kumar.agar...@xilinx.com> wrote:
> Hello Jeff:
>
> Here is more of a data you have asked for.
>
> SPEC FP benchmarks.
> a) No Path Splitting + tracer enabled
> Geomean Score =  4749.726.
> b) Path Splitting enabled + tracer enabled.
> Geomean Score =  4781.655.
>
> Conclusion: With both Path Splitting and tracer enabled we got maximum gains. 
> I think we need to have Path Splitting pass.
>
> SPEC INT benchmarks.
> a) Path Splitting enabled + tracer not enabled.
> Geomean Score =  3745.193.
> b) No Path Splitting + tracer enabled.
> Geomean Score = 3738.558.
> c) Path Splitting enabled + tracer enabled.
> Geomean Score = 3742.833.

>>I suppose with SPEC you mean SPEC CPU 2006?

The performance data is with respect to SPEC CPU 2000 benchmarks.

>>Can you disclose the architecture you did the measurements on and the compile 
>>flags you used otherwise?

Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz 
cpu cores   : 10
cache size  : 25600 KB

I have used -O3 and enable the tracer with  -ftracer .

Thanks & Regards
Ajit
>>Note that tracer does a very good job only when paired with FDO so can you 
>>re-run SPEC with FDO and compare with path-splitting enabled on top of that?


Thanks,
Richard.

> Conclusion: We are getting more gains with Path Splitting as compared to 
> tracer. With both Path Splitting and tracer enabled we are also getting  
> gains.
> I think we should have Path Splitting pass.
>
> One more observation: Richard's concern is the creation of multiple 
> exits with Splitting paths through duplication. My observation is,  in 
> tracer pass also there is a creation of multiple exits through duplication. I 
> don’t think that’s an issue with the practicality considering the gains we 
> are getting with Splitting paths with more PRE, CSE and DCE.
>
> Thanks & Regards
> Ajit
>
>
>
>
> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Wednesday, December 16, 2015 5:20 AM
> To: Richard Biener
> Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya 
> Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on 
> tree ssa representation
>
> On 12/11/2015 03:05 AM, Richard Biener wrote:
>> On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law <l...@redhat.com> wrote:
>>> On 12/03/2015 07:38 AM, Richard Biener wrote:
>>>>
>>>> This pass is now enabled by default with -Os but has no limits on 
>>>> the amount of stmts it copies.
>>>
>>> The more statements it copies, the more likely it is that the path 
>>> spitting will turn out to be useful!  It's counter-intuitive.
>>
>> Well, it's still not appropriate for -Os (nor -O2 I think).  -ftracer 
>> is enabled with -fprofile-use (but it is also properly driven to only 
>> trace hot paths) and otherwise not by default at any optimization level.
> Definitely not appropriate for -Os.  But as I mentioned, I really want to 
> look at the tracer code as it may totally subsume path splitting.
>
>>
>> Don't see how this would work for the CFG pattern it operates on 
>> unless you duplicate the exit condition into that new block creatin

[RFC, rtl optimization]: Better heuristics for estimate_reg_pressure_cost in presence of call for LICM.

2015-12-16 Thread Ajit Kumar Agarwal

The estimate on target_clobbered_registers based on the call_used arrays is not 
correct. This is the worst case 
heuristics on the estimate on target_clobbered_registers. This disables many of 
the loop Invariant code motion 
opportunities in presence of call. Instead of considering the spill cost we 
consider only the target_reg_cost
aggressively.

With this  change with old logic used in regs_used gave the following gains.

diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c 
--- a/gcc/cfgloopanal.c
+++ b/gcc/cfgloopanal.c
@@ -373,15 +373,23 @@ estimate_reg_pressure_cost (unsigned n_new, unsigned 
n_old, bool speed,
 
   /* If there is a call in the loop body, the call-clobbered registers
  are not available for loop invariants.  */
+
   if (call_p)
 available_regs = available_regs - target_clobbered_regs;
-
+  
   /* If we have enough registers, we should use them and not restrict
  the transformations unnecessarily.  */
   if (regs_needed + target_res_regs <= available_regs)
 return 0;
 
-  if (regs_needed <= available_regs)
+  /* Estimation of target_clobbered register is based on the call_used
+ arrays which is not the right estimate for the clobbered register
+ used in called function. Instead of considering the spill cost we
+ consider only the reg_cost aggressively.  */
+
+  if ((regs_needed <= available_regs) 
+  || (call_p && (regs_needed <= 
+  (available_regs + target_clobbered_regs
 /* If we are close to running out of registers, try to preserve
them.  */
 cost = target_reg_cost [speed] * n_new;

SPEC CPU 2000 INT benchmarks
(Geomean Score without the above change vs Geomean score with the above change 
= 3745.193 vs vs 3752.717)

SPEC INT CPU 2000 benchmarks.
 (Geomean Score without the above change vs Geomean score with the above change 
= 4741.825 vs 4792.085) 

Thanks & Regards
Ajit



RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-12-16 Thread Ajit Kumar Agarwal


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Richard Biener
Sent: Wednesday, December 16, 2015 3:27 PM
To: Ajit Kumar Agarwal
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On Wed, Dec 16, 2015 at 8:43 AM, Ajit Kumar Agarwal 
<ajit.kumar.agar...@xilinx.com> wrote:
> Hello Jeff:
>
> Here is more of a data you have asked for.
>
> SPEC FP benchmarks.
> a) No Path Splitting + tracer enabled
> Geomean Score =  4749.726.
> b) Path Splitting enabled + tracer enabled.
> Geomean Score =  4781.655.
>
> Conclusion: With both Path Splitting and tracer enabled we got maximum gains. 
> I think we need to have Path Splitting pass.
>
> SPEC INT benchmarks.
> a) Path Splitting enabled + tracer not enabled.
> Geomean Score =  3745.193.
> b) No Path Splitting + tracer enabled.
> Geomean Score = 3738.558.
> c) Path Splitting enabled + tracer enabled.
> Geomean Score = 3742.833.

>>I suppose with SPEC you mean SPEC CPU 2006?

The performance data is with respect to SPEC CPU 2000 benchmarks.

>>Can you disclose the architecture you did the measurements on and the compile 
>>flags you used otherwise?

Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz 
cpu cores   : 10
cache size  : 25600 KB

I have used -O3 and enable the tracer with  -ftracer .

Thanks & Regards
Ajit
>>Note that tracer does a very good job only when paired with FDO so can you 
>>re-run SPEC with FDO and compare with path-splitting enabled on top of that?


Thanks,
Richard.

> Conclusion: We are getting more gains with Path Splitting as compared to 
> tracer. With both Path Splitting and tracer enabled we are also getting  
> gains.
> I think we should have Path Splitting pass.
>
> One more observation: Richard's concern is the creation of multiple 
> exits with Splitting paths through duplication. My observation is,  in 
> tracer pass also there is a creation of multiple exits through duplication. I 
> don’t think that’s an issue with the practicality considering the gains we 
> are getting with Splitting paths with more PRE, CSE and DCE.
>
> Thanks & Regards
> Ajit
>
>
>
>
> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Wednesday, December 16, 2015 5:20 AM
> To: Richard Biener
> Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya 
> Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on 
> tree ssa representation
>
> On 12/11/2015 03:05 AM, Richard Biener wrote:
>> On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law <l...@redhat.com> wrote:
>>> On 12/03/2015 07:38 AM, Richard Biener wrote:
>>>>
>>>> This pass is now enabled by default with -Os but has no limits on 
>>>> the amount of stmts it copies.
>>>
>>> The more statements it copies, the more likely it is that the path 
>>> spitting will turn out to be useful!  It's counter-intuitive.
>>
>> Well, it's still not appropriate for -Os (nor -O2 I think).  -ftracer 
>> is enabled with -fprofile-use (but it is also properly driven to only 
>> trace hot paths) and otherwise not by default at any optimization level.
> Definitely not appropriate for -Os.  But as I mentioned, I really want to 
> look at the tracer code as it may totally subsume path splitting.
>
>>
>> Don't see how this would work for the CFG pattern it operates on 
>> unless you duplicate the exit condition into that new block creating 
>> an even more obfuscated CFG.
> Agreed, I don't see any way to fix the multiple exit problem.  Then again, 
> this all runs after the tree loop optimizer, so I'm not sure how big of an 
> issue it is in practice.
>
>
>>> It was only after I approved this code after twiddling it for Ajit 
>>> that I came across Honza's tracer implementation, which may in fact 
>>> be retargettable to these loops and do a better job.  I haven't 
>>> experimented with that.
>>
>> Well, I originally suggested to merge this with the tracer pass...
> I missed that, or it didn't sink into my brain.
>
>>> Again, the more statements it copies the more likely it is to be profitable.
>>> Think superblocks to expose CSE, DCE and the like.
>>
>> Ok, so similar to tracer (where I think the main benefit is actually 
>> increasing scheduling opportunities for architectures where it matters).
> Right.  They're both building superblocks, which has the effect of larger 
> windows for 

RE: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.

2015-12-15 Thread Ajit Kumar Agarwal
The patch is modified based on the review comments from Richard, Bernd and 
David. 

The following changes are done to incorporate the comments received on the 
previous mail.

1. With this patch the liveness of the Loop is not stored on the LOOPDATA 
structures. The liveness is calculated based on the current loops
 for which the invariant is checked and the regs_used is calculated. 
2. Memory leaks are fixed.
3. Reworked on the comments section based on Bernd's comments.

Bootstrapped and regtested for i386 and Microblaze target.

SPEC CPU 2000 benchmarks are run on i386 target and following is the summary of 
the results.

SPEC CPU 2000 INT benchmarks.

( Gemoean Score without the change vs Geomean score with reg pressure change = 
3745.193 vs 3745.328)

SPEC CPU 2000 FP benchmarks.

( Gemoean Score without the change vs Geomean score with reg pressure change = 
4741.825 vs 4748.364).


[Patch,rtl Optimization]: Better register pressure estimate for loop 
invariant code motion

Calculate the loop liveness used for regs for calculating the register 
pressure
in the cost estimation.  Loop liveness is based on the following properties.
We only need to find the set of objects that are live at the birth or the 
header
of the loop. We don't need to calculate the live through the loop by 
considering
live in and live out of all the basic blocks of the loop. This is based on 
the
point that the set of objects that are live-in at the birth or header of 
the loop
will be live-in at every node in the loop.

If a v live is out at the header of the loop then the variable is live-in 
at every node
in the loop. To prove this, consider a loop L with header h such that the 
variable v
defined at d is live-in at h. Since v is live at h, d is not part of L. 
This follows i
from the dominance property, i.e. h is strictly dominated by d. 
Furthermore, there
exists a path from h to a use of v which does not go through d. For every 
node p in
the loop, since the loop is strongly connected and node is a component of 
the CFG,
there exists a path, consisting only of nodes of L from p to h. 
Concatenating these
two paths proves that v is live-in and live-out of p.

Calculate the live-out and live-in for the exit edge of the loop. This 
patch considers
liveness for not only the loop latch but also the liveness outside the 
loops.

ChangeLog:
2015-12-15  Ajit Agarwal  <ajit...@xilinx.com>

* loop-invariant.c
(find_invariants_to_move): Add the logic of regs_used based
on liveness.
* cfgloopanal.c
(estimate_reg_pressure_cost): Update the heuristics in presence
of call_p.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com.

Thanks & Regards
Ajit

-Original Message-
From: Bernd Schmidt [mailto:bschm...@redhat.com] 
Sent: Wednesday, December 09, 2015 7:34 PM
To: Ajit Kumar Agarwal; Richard Biener
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,rtl Optimization]: Better register pressure estimate for 
Loop Invariant Code Motion.

On 12/09/2015 12:22 PM, Ajit Kumar Agarwal wrote:
>
> This is because the available_regs = 6 and the regs_needed = 1 and 
> new_regs = 0 and the regs_used = 10.  As the reg_used that are based 
> on the Liveness given above is greater than the available_regs, then
 > it's candidate of spill and the estimate_register_pressure calculates  > the 
 > spill cost. This spill cost is greater than inv_cost and gain  > comes to be 
 > negative. The disables the loop invariant for the above  > testcase.

As far as I can tell this loop does not lead to a spill. Hence, failure of this 
testcase would suggest there is something wrong with your idea.

>> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)  {

Formatting.

>> +/* Loop Liveness is based on the following proprties.

"properties"

>> +   we only require to calculate the set of objects that are live at
>> +   the birth or the header of the loop.
>> +   We don't need to calculate the live through the Loop considering
>> +   Live in and Live out of all the basic blocks of the Loop. This is
>> +   because the set of objects. That are live-in at the birth or header
>> +   of the loop will be live-in at every node in the Loop.
>> +   If a v live out at the header of the loop then the variable is 
>> live-in
>> +   at every node in the Loop. To prove this, Consider a Loop L with 
>> header
>> +   h such that The variable v defined at d is live-in at h. Since v is 
>> live
>> +   at h, d is not part of L. This follows from the dominance property, 
>> i.e.
>> +   h is strictly dominated by d. Furthermore, there exists a path from 
>> h to
>> +   a use of v whic

RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-12-15 Thread Ajit Kumar Agarwal
Hello Jeff:

Here is more of a data you have asked for.

SPEC FP benchmarks.
a) No Path Splitting + tracer enabled 
Geomean Score =  4749.726.
b) Path Splitting enabled + tracer enabled.
Geomean Score =  4781.655.

Conclusion: With both Path Splitting and tracer enabled we got maximum gains. I 
think we need to have Path Splitting pass.

SPEC INT benchmarks.
a) Path Splitting enabled + tracer not enabled.
Geomean Score =  3745.193.
b) No Path Splitting + tracer enabled.
Geomean Score = 3738.558.
c) Path Splitting enabled + tracer enabled.
Geomean Score = 3742.833.

Conclusion: We are getting more gains with Path Splitting as compared to 
tracer. With both Path Splitting and tracer enabled we are also getting  gains.
I think we should have Path Splitting pass.

One more observation: Richard's concern is the creation of multiple exits with 
Splitting paths through duplication. My observation is,  in tracer pass also 
there
is a creation of multiple exits through duplication. I don’t think that’s an 
issue with the practicality considering the gains we are getting with Splitting 
paths with
more PRE, CSE and DCE.

Thanks & Regards
Ajit 




-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Wednesday, December 16, 2015 5:20 AM
To: Richard Biener
Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 12/11/2015 03:05 AM, Richard Biener wrote:
> On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law <l...@redhat.com> wrote:
>> On 12/03/2015 07:38 AM, Richard Biener wrote:
>>>
>>> This pass is now enabled by default with -Os but has no limits on 
>>> the amount of stmts it copies.
>>
>> The more statements it copies, the more likely it is that the path 
>> spitting will turn out to be useful!  It's counter-intuitive.
>
> Well, it's still not appropriate for -Os (nor -O2 I think).  -ftracer 
> is enabled with -fprofile-use (but it is also properly driven to only 
> trace hot paths) and otherwise not by default at any optimization level.
Definitely not appropriate for -Os.  But as I mentioned, I really want to look 
at the tracer code as it may totally subsume path splitting.

>
> Don't see how this would work for the CFG pattern it operates on 
> unless you duplicate the exit condition into that new block creating 
> an even more obfuscated CFG.
Agreed, I don't see any way to fix the multiple exit problem.  Then again, this 
all runs after the tree loop optimizer, so I'm not sure how big of an issue it 
is in practice.


>> It was only after I approved this code after twiddling it for Ajit 
>> that I came across Honza's tracer implementation, which may in fact 
>> be retargettable to these loops and do a better job.  I haven't 
>> experimented with that.
>
> Well, I originally suggested to merge this with the tracer pass...
I missed that, or it didn't sink into my brain.

>> Again, the more statements it copies the more likely it is to be profitable.
>> Think superblocks to expose CSE, DCE and the like.
>
> Ok, so similar to tracer (where I think the main benefit is actually 
> increasing scheduling opportunities for architectures where it matters).
Right.  They're both building superblocks, which has the effect of larger 
windows for scheduling, DCE, CSE, etc.


>
> Note that both passes are placed quite late and thus won't see much
> of the GIMPLE optimizations (DOM mainly).  I wonder why they were
> not placed adjacent to each other.
Ajit had it fairly early, but that didn't play well with if-conversion. 
  I just pushed it past if-conversion and vectorization, but before the 
last DOM pass.  That turns out to be where tracer lives too as you noted.

>>
>> I wouldn't lose any sleep if we disabled by default or removed, particularly
>> if we can repurpose Honza's code.  In fact, I might strongly support the
>> former until we hear back from Ajit on performance data.
>
> See above for what we do with -ftracer.  path-splitting should at _least_
> restrict itself to operate on optimize_loop_for_speed_p () loops.
I think we need to decide if we want the code at all, particularly given 
the multiple-exit problem.

The difficulty is I think Ajit posted some recent data that shows it's 
helping.  So maybe the thing to do is ask Ajit to try the tracer 
independent of path splitting and take the obvious actions based on 
Ajit's data.


>
> It should also (even if counter-intuitive) limit the amount of stmt copying
> it does - after all there is sth like an instruction cache size which 
> exceeeding
> for loops will never be a good idea (and even smaller special loop caches on
> some archs).
Yup.

>
> Note that a better heuristic t

RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-12-11 Thread Ajit Kumar Agarwal
Hello Jeff:

Sorry for the delay in sending the benchmarks run with Split-Path change.

Here is the Summary of the results.

SPEC CPU 2000 INT benchmarks ( Target i386)
( Geomean Score without Split-Paths changes vs Geomean Score with Split-Path 
changes  =  3740.789 vs 3745.193).

SPEC CPU 2000 FP benchmarks. ( Target i386)
( Geomean Score without Split-Paths changes vs Geomean Score with Split-Path 
changes  =  4721.655 vs 4741.825).

Mibench/EEMBC benchmarks (Target Microblaze)

Automotive_qsort1(4.03%), Office_ispell(4.29%), Office_stringsearch1(3.5%). 
Telecom_adpcm_d( 1.37%), ospfv2_lite(1.35%).

We are seeing minor negative gains that are mainly noise.(less than 0.5%)

Thanks & Regards
Ajit
-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Friday, December 11, 2015 1:39 AM
To: Richard Biener
Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 12/03/2015 07:38 AM, Richard Biener wrote:
> This pass is now enabled by default with -Os but has no limits on the 
> amount of stmts it copies.
The more statements it copies, the more likely it is that the path spitting 
will turn out to be useful!  It's counter-intuitive.

The primary benefit AFAICT with path splitting is that it exposes additional 
CSE, DCE, etc opportunities.

IIRC  Ajit posited that it could help with live/conflict analysis, I never saw 
that, and with the changes to push splitting deeper into the pipeline I'd 
further life/conflict analysis since that work also involved preserving the 
single latch property.



  It also will make all loops with this shape have at least two
> exits (if the resulting loop will be disambiguated the inner loop will 
> have two exits).
> Having more than one exit will disable almost all loop optimizations after it.
Hmmm, the updated code keeps the single latch property, but I'm pretty sure it 
won't keep a single exit policy.

To keep a single exit policy would require keeping an additional block around.  
Each of the split paths would unconditionally transfer to this new block.  The 
new block would then either transfer to the latch block or out of the loop.


>
> The pass itself documents the transform it does but does zero to motivate it.
>
> What's the benefit of this pass (apart from disrupting further optimizations)?
It's essentially building superblocks in a special case to enable additional 
CSE, DCE and the like.

Unfortunately what is is missing is heuristics and de-duplication.  The former 
to drive cases when it's not useful and the latter to reduce codesize for any 
statements that did not participate in optimizations when they were duplicated.

The de-duplication is the "sink-statements-through-phi" problems, cross 
jumping, tail merging and the like class of problems.

It was only after I approved this code after twiddling it for Ajit that I came 
across Honza's tracer implementation, which may in fact be retargettable to 
these loops and do a better job.  I haven't experimented with that.



>
> I can see a _single_ case where duplicating the latch will allow 
> threading one of the paths through the loop header to eliminate the 
> original exit.  Then disambiguation may create a nice nested loop out 
> of this.  Of course that is only profitable again if you know the 
> remaining single exit of the inner loop (exiting to the outer one) is 
> executed infrequently (thus the inner loop actually loops).
It wasn't ever about threading.

>
> But no checks other than on the CFG shape exist (oh, it checks it will 
> at _least_ copy two stmts!).
Again, the more statements it copies the more likely it is to be profitable.  
Think superblocks to expose CSE, DCE and the like.

>
> Given the profitability constraints above (well, correct me if I am 
> wrong on these) it looks like the whole transform should be done 
> within the FSM threading code which might be able to compute whether 
> there will be an inner loop with a single exit only.
While it shares some concepts with jump threading, I don't think the 
transformation belongs in jump threading.

>
> I'm inclined to request the pass to be removed again or at least 
> disabled by default.
I wouldn't lose any sleep if we disabled by default or removed, particularly if 
we can repurpose Honza's code.  In fact, I might strongly support the former 
until we hear back from Ajit on performance data.

I also keep coming back to Click's paper on code motion -- in that context, 
copying statements would be a way to break dependencies and give the global 
code motion algorithm more freedom.  The advantage of doing it in a framework 
like Click's is it's got a built-in sinking step.


>
> What closed source benchmark was this transform invented for?
I think it was EEMBC or C

RE: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.

2015-12-09 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Wednesday, December 09, 2015 4:06 PM
To: Ajit Kumar Agarwal
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,rtl Optimization]: Better register pressure estimate for 
Loop Invariant Code Motion.

On Tue, Dec 8, 2015 at 11:24 PM, Ajit Kumar Agarwal 
<ajit.kumar.agar...@xilinx.com> wrote:
> Based on the comments on RFC patch this patch incorporates all the comments 
> from Jeff. Thanks Jeff for the valuable feedback.
>
> This patch enables the better register pressure estimate for Loop 
> Invariant code motion. This patch Calculate the Loop Liveness used for 
> regs_used  used to calculate the register pressure  used in the cost 
> estimation.
>
> Loop Liveness is based on the following properties.  we only require 
> to calculate the set of objects that are live at the birth or the 
> header of the loop.  We don't need to calculate the live through the Loop 
> considering Live in and Live out of all the basic blocks of the Loop. This is 
> because the set of objects. That are live-in at the birth or header of the 
> loop will be live-in at every node in the Loop.
>
> If a v live out at the header of the loop then the variable is live-in 
> at every node in the Loop. To prove this, Consider a Loop L with header h 
> such that The variable v defined at d is live-in at h. Since v is live at h, 
> d is not part of L. This follows from the dominance property, i.e. h is 
> strictly dominated by d.
> Furthermore, there exists a path from h to a use of v which does not 
> go through d. For every node of the loop, p, since the loop is strongly 
> connected Component of the CFG, there exists a path, consisting only of nodes 
> of L from p to h. Concatenating those two paths prove that v is live-in and 
> live-out Of p.
>
> Also Calculate the Live-Out and Live-In for the exit edge of the loop. This 
> considers liveness for not only the Loop latch but also the liveness outside 
> the Loops.
>
> Bootstrapped and Regtested for x86_64 and microblaze target.
>
> There is an extra failure for the testcase gcc.dg/loop-invariant.c  with the 
> change that looks correct to me.
>
> This is because the available_regs = 6 and the regs_needed = 1 and 
> new_regs = 0 and the regs_used = 10.  As the reg_used that are based 
> on the Liveness given above is greater than the available_regs, then it's 
> candidate of spill and the estimate_register_pressure calculates the spill 
> cost. This spill cost is greater than inv_cost and gain comes to be negative. 
> The disables the loop invariant for the above testcase.
>
> Disabling of the Loop invariant for the testcase loop-invariant.c with 
> this patch  looks correct to me considering the calculation of available_regs 
> in cfgloopanal.c is correct.

>>You keep a lot of data (bitmaps) live just to use the number of bits set in 
>>the end.

>>I'll note that this also increases the complexity of the pass which is 
>>enabled at -O1+ where
>>-O1 should limit itself to O(n*log n) algorithms but live is quadratic.

>.So I don't think doing this unconditionally is a good idea (and we have 
>-fira-loop-pressure after all).

>>Please watch not making -O1 worse again after we spent so much time making it 
>>suitable for all the weird autogenerated code.

I can also implement without keeping the bitmaps live data  and would not 
require to set the bits in the end.  I am interested in the
Liveness in the loop header and the loop exit which I can calculate where I am 
setting the regs_used for the curr_loop.

This will not require to set and keep the bitmaps for liveness and also make it 
available for the curr_loop that are candidates of loop
Invariant. 

Thanks & Regards
Ajit

Richard.

>  ChangeLog:
>  2015-12-09  Ajit Agarwal  <ajit...@xilinx.com>
>
> * loop-invariant.c
> (calculate_loop_liveness): New.
> (move_loop_invariants): Use of calculate_loop_liveness.
>
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com
>
> ---
>  gcc/loop-invariant.c |   77 
> ++
>  1 files changed, 65 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 
> 53d1377..ac08594 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1464,18 +1464,7 @@ find_invariants_to_move (bool speed, bool call_p)
> registers used; we put some initial bound here to stand for
> induction variables etc.  that we do not detect.  */
>  {
> -  unsigned int n_regs = DF_REG_SIZE (df);
> -
> -  regs_used = 2;
> -
> -  for (i = 0; i < 

RE: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.

2015-12-09 Thread Ajit Kumar Agarwal


-Original Message-
From: Bernd Schmidt [mailto:bschm...@redhat.com] 
Sent: Wednesday, December 09, 2015 7:34 PM
To: Ajit Kumar Agarwal; Richard Biener
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,rtl Optimization]: Better register pressure estimate for 
Loop Invariant Code Motion.

On 12/09/2015 12:22 PM, Ajit Kumar Agarwal wrote:
>
> This is because the available_regs = 6 and the regs_needed = 1 and 
> new_regs = 0 and the regs_used = 10.  As the reg_used that are based 
> on the Liveness given above is greater than the available_regs, then
 > it's candidate of spill and the estimate_register_pressure calculates  > the 
 > spill cost. This spill cost is greater than inv_cost and gain  > comes to be 
 > negative. The disables the loop invariant for the above  > testcase.

>>As far as I can tell this loop does not lead to a spill. Hence, failure of 
>>this testcase would suggest there is something wrong with your idea.

As far as I can see there is nothing wrong with the idea, the existing 
implementation of the calculation of available_regs results to 6 which is not 
the case for this
testcase. The estimate of regs_used based on the Liveness ( idea behind this 
patch)  is correct, but the incorrect estimate of the available_regs (6) 
results in the gain to be negative. 

>> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)  {

>>Formatting.

>> +/* Loop Liveness is based on the following proprties.

>>"properties"

I will incorporate the change.

>> +   we only require to calculate the set of objects that are live at
>> +   the birth or the header of the loop.
>> +   We don't need to calculate the live through the Loop considering
>> +   Live in and Live out of all the basic blocks of the Loop. This is
>> +   because the set of objects. That are live-in at the birth or header
>> +   of the loop will be live-in at every node in the Loop.
>> +   If a v live out at the header of the loop then the variable is 
>> live-in
>> +   at every node in the Loop. To prove this, Consider a Loop L with 
>> header
>> +   h such that The variable v defined at d is live-in at h. Since v is 
>> live
>> +   at h, d is not part of L. This follows from the dominance property, 
>> i.e.
>> +   h is strictly dominated by d. Furthermore, there exists a path from 
>> h to
>> +   a use of v which does not go through d. For every node of the loop, 
>> p,
>> +   since the loop is strongly connected Component of the CFG, there 
>> exists
>> +   a path, consisting only of nodes of L from p to h. Concatenating 
>> those
>> +   two paths prove that v is live-in and live-out Of p.  */

>>Please Refrain From Randomly Capitalizing Words, and please also fix the 
>>grammar ("set of objects. That are live-in"). These problems make this 
>>comment (and also your emails) >>extremely hard to read.

>>Partly for that reason, I can't quite make up my mind whether this patch 
>>makes things better or just different. The testcase failure makes me think 
>>it's probably not an improvement.

I'll correct it. I am seeing the gains for Mibench/EEMBC benchmarks for 
Microblaze target with this patch. I will run SPEC CPU 2000  benchmarks for 
i386 with this patch.

>> +bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_IN (loop->header));
>> +bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_OUT 
>> + (loop->header));

>>Formatting.

I'll incorporate the change.

Thanks & Regards
Ajit

Bernd


[Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.

2015-12-08 Thread Ajit Kumar Agarwal
Based on the comments on RFC patch this patch incorporates all the comments 
from Jeff. Thanks Jeff for the valuable feedback.

This patch enables the better register pressure estimate for Loop Invariant 
code motion. This patch Calculate the Loop Liveness used for regs_used 
 used to calculate the register pressure  used in the cost estimation.

Loop Liveness is based on the following properties.  we only require to 
calculate the set of objects that are live at the birth or the header of the 
loop.  We
don't need to calculate the live through the Loop considering Live in and Live 
out of all the basic blocks of the Loop. This is because the set of objects. 
That 
are live-in at the birth or header of the loop will be live-in at every node in 
the Loop.

If a v live out at the header of the loop then the variable is live-in at every 
node in the Loop. To prove this, Consider a Loop L with header h such that The
variable v defined at d is live-in at h. Since v is live at h, d is not part of 
L. This follows from the dominance property, i.e. h is strictly dominated by d. 
Furthermore, there exists a path from h to a use of v which does not go through 
d. For every node of the loop, p, since the loop is strongly connected 
Component of the CFG, there exists a path, consisting only of nodes of L from p 
to h. Concatenating those two paths prove that v is live-in and live-out Of p.

Also Calculate the Live-Out and Live-In for the exit edge of the loop. This 
considers liveness for not only the Loop latch but also the liveness outside 
the Loops.

Bootstrapped and Regtested for x86_64 and microblaze target.

There is an extra failure for the testcase gcc.dg/loop-invariant.c  with the 
change that looks correct to me. 

This is because the available_regs = 6 and the regs_needed = 1 and new_regs = 0 
and the regs_used = 10.  As the reg_used that are based on the Liveness
given above is greater than the available_regs, then it's candidate of spill 
and the estimate_register_pressure calculates the spill cost. This spill cost 
is greater 
than inv_cost and gain comes to be negative. The disables the loop invariant 
for the above testcase. 

Disabling of the Loop invariant for the testcase loop-invariant.c with this 
patch  looks correct to me considering the calculation of available_regs in 
cfgloopanal.c 
is correct.

 ChangeLog:
 2015-12-09  Ajit Agarwal  

* loop-invariant.c
(calculate_loop_liveness): New.
(move_loop_invariants): Use of calculate_loop_liveness.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

---
 gcc/loop-invariant.c |   77 ++
 1 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 53d1377..ac08594 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1464,18 +1464,7 @@ find_invariants_to_move (bool speed, bool call_p)
registers used; we put some initial bound here to stand for
induction variables etc.  that we do not detect.  */
 {
-  unsigned int n_regs = DF_REG_SIZE (df);
-
-  regs_used = 2;
-
-  for (i = 0; i < n_regs; i++)
-   {
- if (!DF_REGNO_FIRST_DEF (i) && DF_REGNO_LAST_USE (i))
-   {
- /* This is a value that is used but not changed inside loop.  */
- regs_used++;
-   }
-   }
+  regs_used = bitmap_count_bits (_DATA (curr_loop)->regs_live) + 2;
 }
 
   if (! flag_ira_loop_pressure)
@@ -1966,7 +1955,63 @@ mark_ref_regs (rtx x)
   }
 }
 
+/* Calculate the Loop Liveness used for regs_used used in 
+   heuristics to calculate the register pressure.  */
+
+static void
+calculate_loop_liveness (void)
+{
+  struct loop *loop;
+
+  FOR_EACH_LOOP (loop, 0)
+if (loop->aux == NULL)
+  {
+loop->aux = xcalloc (1, sizeof (struct loop_data));
+bitmap_initialize (_DATA (loop)->regs_live, _obstack);
+ }
+
+  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+  {
+int  i;
+edge e;
+vec edges;
+edges = get_loop_exit_edges (loop);
+
+/* Loop Liveness is based on the following proprties.
+   we only require to calculate the set of objects that are live at
+   the birth or the header of the loop.
+   We don't need to calculate the live through the Loop considering
+   Live in and Live out of all the basic blocks of the Loop. This is
+   because the set of objects. That are live-in at the birth or header
+   of the loop will be live-in at every node in the Loop.
+
+   If a v live out at the header of the loop then the variable is live-in
+   at every node in the Loop. To prove this, Consider a Loop L with header
+   h such that The variable v defined at d is live-in at h. Since v is live
+   at h, d is not part of L. This follows from the dominance property, i.e.
+   h is strictly dominated by d. Furthermore, there exists a path from h to
+   a use of v which 

RE: [Patch,microblaze]: Instruction prefetch optimization for microblaze.

2015-12-07 Thread Ajit Kumar Agarwal


-Original Message-
From: Michael Eager [mailto:ea...@eagerm.com] 
Sent: Thursday, December 03, 2015 7:27 PM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,microblaze]: Instruction prefetch optimization for 
microblaze.

On 12/01/2015 12:49 AM, Ajit Kumar Agarwal wrote:
> The changes are made in this patch for the instruction prefetch optimizations 
> for Microblaze.
>
> Reg tested for Microblaze target.
>
> The changes are made for instruction prefetch optimizations for 
> Microblaze. The "wic" microblaze instruction is the instruction 
> prefetch instruction. The instruction prefetch optimization is done to 
> generate the iprefetch instruction at the call site fall through path. 
> This optimization is enabled with  microblaze target flag mxl-prefetch. The 
> purpose of adding the flags is that selection of "wic" instruction should be 
> enabled in the reconfigurable design and the selection is not enabled by 
> default.
>
> ChangeLog:
> 2015-12-01  Ajit Agarwal  <ajit...@xilinx.com>
>
>   * config/microblaze/microblaze.c
>   (get_branch_target): New.
>   (insert_wic_for_ilb_runout): New.
>   (insert_wic): New.
>   (microblaze_machine_dependent_reorg): New.
>   (TARGET_MACHINE_DEPENDENT_REORG): Define macro.
>   * config/microblaze/microblaze.md
>   (UNSPEC_IPREFETCH): Define.
>   (iprefetch): New pattern
>   * config/microblaze/microblaze.opt
>   (mxl-prefetch): New flag.
>
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com
>
>
> Thanks & Regards
> Ajit
>

>>+  rtx_insn *insn, *before_4 = 0, *before_16 = 0;  int addr = 0, length, 
>>+ first_addr = -1;  int wic_addr0 = 128 * 4, wic_addr1 = 128 * 4;

>>Especially when there are initializers, I prefer to see each variable 
>>declared on a separate line.  If the meaning of a variable is not clear (and 
>>most of these are not), include a comment >>before the declaration.

>>+if (first_addr == -1)
>>+  first_addr = INSN_ADDRESSES (INSN_UID (insn));

>>Can be moved to initialize first_addr.

>>+addr = INSN_ADDRESSES (INSN_UID (insn)) - first_addr;

>>Is "addr" and address or offset?  If the latter, use a more descriptive name.


>>+if (before_4 == 0 && addr + length >= 4 * 4)
>>+  before_4 = insn;
...

>>Please add comments to describe what you are doing here.  What are before_4 
>>and before_16?  What are all these conditions testing?


>>+  loop_optimizer_finalize();

>>Space before parens.

All the above comments are incorporated. Updated patch is attached.

Regtested for Microblaze target. 

Mibench/EEMBC benchmarks are run on the hardware enabling the mxl-prefetch and 
the run goes through fine
With the generation of "wic" instruction.

[Patch,microblaze]: Instruction prefetch optimization for microblaze.

The changes are made for instruction prefetch optimizations for Microblaze. The 
"wic"
microblaze instruction is the instruction prefetch instruction. The instruction 
prefetch
optimization is done to generate the iprefetch instruction at the call site 
fall through
path. This optimization is enabled with  microblaze target flag mxl-prefetch. 
The purpose
of adding the flags is that selection of "wic" instruction should be enabled in 
the
reconfigurable design and the selection is not enabled by default.

ChangeLog:
2015-12-07  Ajit Agarwal  <ajit...@xilinx.com>

* config/microblaze/microblaze.c
(get_branch_target): New.
(insert_wic_for_ilb_runout): New.
(insert_wic): New.
(microblaze_machine_dependent_reorg): New.
(TARGET_MACHINE_DEPENDENT_REORG): Define macro.
* config/microblaze/microblaze.md
(UNSPEC_IPREFETCH): Define.
(iprefetch): New pattern
* config/microblaze/microblaze.opt
(mxl-prefetch): New flag.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks & Regards
Ajit

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


iprefetch.patch
Description: iprefetch.patch


[Patch,microblaze]: Instruction prefetch optimization for microblaze.

2015-12-01 Thread Ajit Kumar Agarwal
The changes are made in this patch for the instruction prefetch optimizations 
for Microblaze.

Reg tested for Microblaze target.

The changes are made for instruction prefetch optimizations for Microblaze. The 
"wic" microblaze instruction is the
instruction prefetch instruction. The instruction prefetch optimization is done 
to generate the iprefetch instruction 
at the call site fall through path. This optimization is enabled with  
microblaze target flag mxl-prefetch. The purpose
of adding the flags is that selection of "wic" instruction should be enabled in 
the reconfigurable design and the 
selection is not enabled by default.

ChangeLog:
2015-12-01  Ajit Agarwal  

* config/microblaze/microblaze.c
(get_branch_target): New.
(insert_wic_for_ilb_runout): New.
(insert_wic): New.
(microblaze_machine_dependent_reorg): New.
(TARGET_MACHINE_DEPENDENT_REORG): Define macro.
* config/microblaze/microblaze.md
(UNSPEC_IPREFETCH): Define.
(iprefetch): New pattern
* config/microblaze/microblaze.opt
(mxl-prefetch): New flag.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com


Thanks & Regards
Ajit


iprefetch.patch
Description: iprefetch.patch


RE: [Patch,microblaze]: Instruction prefetch optimization for microblaze.

2015-12-01 Thread Ajit Kumar Agarwal
Moreover this patch is tested and run on hardware with Mibench/EEMBC benchmarks 
for Microblaze target. The reconfigurable 
design is enabled with the selection of "wic" instruction prefetch instruction 
and above benchmarks compiled with -mxl-prefetch flags.

Thanks & Regards
Ajit
-Original Message-----
From: Ajit Kumar Agarwal 
Sent: Tuesday, December 01, 2015 2:19 PM
To: GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: [Patch,microblaze]: Instruction prefetch optimization for microblaze.

The changes are made in this patch for the instruction prefetch optimizations 
for Microblaze.

Reg tested for Microblaze target.

The changes are made for instruction prefetch optimizations for Microblaze. The 
"wic" microblaze instruction is the instruction prefetch instruction. The 
instruction prefetch optimization is done to generate the iprefetch instruction 
at the call site fall through path. This optimization is enabled with  
microblaze target flag mxl-prefetch. The purpose of adding the flags is that 
selection of "wic" instruction should be enabled in the reconfigurable design 
and the selection is not enabled by default.

ChangeLog:
2015-12-01  Ajit Agarwal  <ajit...@xilinx.com>

* config/microblaze/microblaze.c
(get_branch_target): New.
(insert_wic_for_ilb_runout): New.
(insert_wic): New.
(microblaze_machine_dependent_reorg): New.
(TARGET_MACHINE_DEPENDENT_REORG): Define macro.
* config/microblaze/microblaze.md
(UNSPEC_IPREFETCH): Define.
(iprefetch): New pattern
* config/microblaze/microblaze.opt
(mxl-prefetch): New flag.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com


Thanks & Regards
Ajit


[Patch,SLP]: Correction in the comment for SLP vectorization profitable case.

2015-11-30 Thread Ajit Kumar Agarwal
This patch made correction in the comment for SLP profitable vectorization case.

Correction in the comment for vectorizable profitable case. The comment is
contradicting the condition vec_outside_cost + vec_inside_cost > scalar_cost.

ChangeLog:
2015-11-30  Ajit Agarwal  

* tree-vect-slp.c
(vect_bb_vectorization_profitable_p): Correction in the comment.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks & Regards
Ajit


comment-slp.patch
Description: comment-slp.patch


RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-29 Thread Ajit Kumar Agarwal
Hello Jeff:

-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Tuesday, November 17, 2015 4:30 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.

On 11/16/2015 10:36 AM, Ajit Kumar Agarwal wrote:
>> For Induction variable optimization on tree SSA representation, the 
>> register used logic is based on the number of phi nodes at the loop 
>> header to represent the liveness at the loop.  Current Logic used 
>> only the number of phi nodes at the loop header.  Changes are made to 
>> represent the phi operands also live at the loop. Thus number of phi 
>> operands also gets incremented in the number of registers used.
>>But my question is why is the # of PHI operands useful here.  You'd have a 
>>stronger argument if it was the number of unique operands in each PHI. 
 >> While I don't doubt this patch improved things, I think it's just putting a 
 >> band-aid over the problem.

>>I think anything that just looks at PHIs or at register liveness at loop 
>>boundaries is inherently underestimating the register pressure implications 
>>of code motion from inside to outside a >>loop.

>>If an object is pulled out of the loop, then it's going to conflict with 
>>nearly every object that births in the loop (because the object being moved 
>>now has to live throughout the loop).  >>There's exceptions, but I doubt they 
>>matter in practice.  The object is also going to conflict with anything else 
>>that is live through the loop.  At least that's how it seems to me at first 
>>>>thought.

>>So build the set of objects (SSA_NAMEs) that either birth or are live through 
>>the loop that have the same type class as the object we want to hoist out of 
>>the loop (scalar, floating point, >>vector).  Use that set of objects to 
>>estimate register pressure.

I agree with the above.  To add up on the above, we only require to calculate 
the set of objects ( SSA_NAMES) that are live at the birth or the header of the 
loop.
We don't need to calculate the live through the Loop considering Live in and 
Live out of all the basic blocks of the Loop. This is because the set of 
objects (SSA_NAMES) 
That are live-in at the birth or header of the loop will be live-in at every 
node in the Loop.

If a v live out at the header of the loop then the variable is live-in at every 
node in the Loop. To prove this, Consider a Loop L with header h such that
The variable v defined at d is live-in at h. Since v is live at h, d is not 
part of L. This follows from the dominance property, i.e. h is strictly 
dominated by d.
Furthermore, there exists a path from h to a use of v which does not go through 
d. For every node of the loop, p, since the loop is strongly connected
Component of the CFG, there exists a path, consisting only of nodes of L from p 
to h. Concatenating those two paths prove that v is live-in and live-out
Of p.

On top of live-in at the birth or header of the loop as proven above, if we 
calculate the Live out of the exit block of the block and Live-in at the 
destination
Edge of the exit block of the loops. This consider the liveness outside of the 
Loop.

The above two cases forms the basis of better estimator for register pressure 
as far as LICM is concerned.

If you agree with the above, I will implement add the above in the patch for 
register_used estimates for better estimate of register pressure for LICM.

Thanks & Regards
Ajit

>>It won't be exact because some of those objects could end up coloring the 
>>same.  BUt it's probably still considerably more accurate than what we have 
>>now.

>>I suspect that would be a better estimator for register pressure as far as 
>>LICM is concerned.

jeff




RE: [Patch,testsuite]: Fix the tree-ssa/split-path-1.c testcase

2015-11-18 Thread Ajit Kumar Agarwal

Hello Jeff:

Please ignore my previous mails as they bounced back. Sorry for that.

I have fixed the problem with the testcase. The splitting path optimization 
remains intact.
Attached is the patch.

The problem was related to the testcase as the loop bound goes beyond the 
malloced array.
There was also a problem with accessing the elements of EritePtr.

ChangeLog:
2015-11-18  Ajit Agarwal  

    * gcc.dg/tree-ssa/split-path-1.c: Fix the testcase.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks & Regards
Ajit


testcase.patch
Description: testcase.patch


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-11-18 Thread Ajit Kumar Agarwal


-Original Message-
From: Tom de Vries [mailto:tom_devr...@mentor.com] 
Sent: Wednesday, November 18, 2015 1:14 PM
To: Jeff Law; Richard Biener
Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 14/11/15 00:35, Jeff Law wrote:
> Anyway, bootstrapped and regression tested on x86_64-linux-gnu.
> Installed on the trunk.

>  [Patch,tree-optimization]: Add new path Splitting pass on tree ssa
>  representation
>
>   * Makefile.in (OBJS): Add gimple-ssa-split-paths.o
>   * common.opt (-fsplit-paths): New flag controlling path splitting.
>   * doc/invoke.texi (fsplit-paths): Document.
>   * opts.c (default_options_table): Add -fsplit-paths to -O2.
>   * passes.def: Add split_paths pass.
>   * timevar.def (TV_SPLIT_PATHS): New timevar.
>   * tracer.c: Include "tracer.h"
>   (ignore_bb_p): No longer static.
>   (transform_duplicate): New function, broken out of tail_duplicate.
>   (tail_duplicate): Use transform_duplicate.
>   * tracer.h (ignore_bb_p): Declare
>   (transform_duplicate): Likewise.
>   * tree-pass.h (make_pass_split_paths): Declare.
>   * gimple-ssa-split-paths.c: New file.
>
>   * gcc.dg/tree-ssa/split-path-1.c: New test.

>>I've filed PR68402 - FAIL: gcc.dg/tree-ssa/split-path-1.c execution test with 
>>-m32.

I have fixed the above PR and the patch is submitted.

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02217.html

Thanks & Regards
Ajit

Thanks,
- Tom



RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-16 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Friday, November 13, 2015 11:44 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.

On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:

>
> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>
>
>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 
> 2001
> From: Ajit Kumar Agarwal<ajit...@xilix.com>
> Date: Wed, 7 Oct 2015 20:50:40 +0200
> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
>   loop for LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation. The 
> current logic used in LICM for register used inside the loops is 
> changed. The Live Out of the loop latch node and the Live in of the 
> destination of the exit nodes is used to set the Loops Liveness at the exit 
> of the Loop.
> The register used is the number of live variables at the exit of the 
> Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the number of phi nodes at the loop 
> header to represent the liveness at the loop.  Current Logic used only 
> the number of phi nodes at the loop header.  Changes are made to 
> represent the phi operands also live at the loop. Thus number of phi 
> operands also gets incremented in the number of registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal<ajit...@xilinx.com>
>
>   * loop-invariant.c (compute_loop_liveness): New.
>   (determine_regs_used): New.
>   (find_invariants_to_move): Use of determine_regs_used.
>   * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>   arguments for register used.
>>I think Bin rejected the tree-ssa-loop-ivopts change.  However, the 
>>loop-invariant change is still pending, right?


>
> Signed-off-by:Ajit agarwalajit...@xilinx.com
> ---
>   gcc/loop-invariant.c   | 72 
> +-
>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>   2 files changed, 60 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index 52c8ae8..e4291c9 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>   }
>   }
>
> +static int
> +determine_regs_used()
> +{
> +  unsigned int j;
> +  unsigned int reg_used = 2;
> +  bitmap_iterator bi;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (_DATA (curr_loop)->regs_live, 0, j, bi)
> +(reg_used) ++;
> +
> +  return reg_used;
> +}
>>Isn't this just bitmap_count_bits (regs_live) + 2?


> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>   }
>   }
>
> -
> +static void
> +calculate_loop_liveness (void)
>>Needs a function comment.

I will incorporate the above comments.
> +{
> +  basic_block bb;
> +  struct loop *loop;
>
> -/* Move the invariants out of the loops.  */
> +  FOR_EACH_LOOP (loop, 0)
> +if (loop->aux == NULL)
> +  {
> +loop->aux = xcalloc (1, sizeof (struct loop_data));
> +bitmap_initialize (_DATA (loop)->regs_live, _obstack);
> + }
> +
> +  FOR_EACH_BB_FN (bb, cfun)
>>Why loop over blocks here?  Why not just iterate through all the loops 
>>in the loop structure.  Order isn't particularly important AFAICT for 
>>this code.

Iterating over the Loop structure is enough. We don't need iterating over the 
basic blocks.

> +   {
> + int  i;
> + edge e;
> + vec edges;
> + edges = get_loop_exit_edges (loop);
> + FOR_EACH_VEC_ELT (edges, i, e)
> + {
> +   bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_OUT(e->src));
> +   bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_IN(e->dest));
>>Space before the open-paren in the previous two lines
>>DF_LR_OUT (e->src) and FD_LR_INT (e->dest))

I will incorporate this.

> + }
> +  }
> +  }
> +}
> +
> +/* Move the invariants  ut of the loops.  */
>>Looks like you introduced a typo.

>>I'd like to see testcases which show the change in # regs used 
>>computation helping generate better code. 

We need to measure the test case with the scenario where the new variable 
created for loop invariant increases the register pressure and
the cost with respect to reg_used and new_regs increases that lead to spill and 
fetch and drop the invarian

RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-16 Thread Ajit Kumar Agarwal

Sorry I missed out some of the points in earlier mail which is given below.

-Original Message-
From: Ajit Kumar Agarwal 
Sent: Monday, November 16, 2015 11:07 PM
To: 'Jeff Law'; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: RE: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.



-Original Message-
From: Jeff Law [mailto:l...@redhat.com]
Sent: Friday, November 13, 2015 11:44 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.

On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:

>
> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>
>
>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00
> 2001
> From: Ajit Kumar Agarwal<ajit...@xilix.com>
> Date: Wed, 7 Oct 2015 20:50:40 +0200
> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
>   loop for LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation. The 
> current logic used in LICM for register used inside the loops is 
> changed. The Live Out of the loop latch node and the Live in of the 
> destination of the exit nodes is used to set the Loops Liveness at the exit 
> of the Loop.
> The register used is the number of live variables at the exit of the 
> Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the number of phi nodes at the loop 
> header to represent the liveness at the loop.  Current Logic used only 
> the number of phi nodes at the loop header.  Changes are made to 
> represent the phi operands also live at the loop. Thus number of phi 
> operands also gets incremented in the number of registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal<ajit...@xilinx.com>
>
>   * loop-invariant.c (compute_loop_liveness): New.
>   (determine_regs_used): New.
>   (find_invariants_to_move): Use of determine_regs_used.
>   * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>   arguments for register used.
>>I think Bin rejected the tree-ssa-loop-ivopts change.  However, the 
>>loop-invariant change is still pending, right?


>
> Signed-off-by:Ajit agarwalajit...@xilinx.com
> ---
>   gcc/loop-invariant.c   | 72 
> +-
>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>   2 files changed, 60 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 
> 52c8ae8..e4291c9 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>   }
>   }
>
> +static int
> +determine_regs_used()
> +{
> +  unsigned int j;
> +  unsigned int reg_used = 2;
> +  bitmap_iterator bi;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (_DATA (curr_loop)->regs_live, 0, j, bi)
> +(reg_used) ++;
> +
> +  return reg_used;
> +}
>>Isn't this just bitmap_count_bits (regs_live) + 2?


> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>   }
>   }
>
> -
> +static void
> +calculate_loop_liveness (void)
>>Needs a function comment.

I will incorporate the above comments.
> +{
> +  basic_block bb;
> +  struct loop *loop;
>
> -/* Move the invariants out of the loops.  */
> +  FOR_EACH_LOOP (loop, 0)
> +if (loop->aux == NULL)
> +  {
> +loop->aux = xcalloc (1, sizeof (struct loop_data));
> +bitmap_initialize (_DATA (loop)->regs_live, _obstack);
> + }
> +
> +  FOR_EACH_BB_FN (bb, cfun)
>>Why loop over blocks here?  Why not just iterate through all the loops 
>>in the loop structure.  Order isn't particularly important AFAICT for 
>>this code.

Iterating over the Loop structure is enough. We don't need iterating over the 
basic blocks.

> +   {
> + int  i;
> + edge e;
> + vec edges;
> + edges = get_loop_exit_edges (loop);
> + FOR_EACH_VEC_ELT (edges, i, e)
> + {
> +   bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_OUT(e->src));
> +   bitmap_ior_into (_DATA (loop)->regs_live, 
> + DF_LR_IN(e->dest));
>>Space before the open-paren in the previous two lines DF_LR_OUT 
>>(e->src) and FD_LR_INT (e->dest))

I will incorporate this.

> + }
> +  }
> +  }
> +}
> +
> +/* Move the invariants  ut of the loops.  */
>>Looks like you

RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-11-13 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Friday, November 13, 2015 3:28 AM
To: Richard Biener
Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 11/12/2015 11:32 AM, Jeff Law wrote:
> On 11/12/2015 10:05 AM, Jeff Law wrote:
>>> But IIRC you mentioned it should enable vectorization or so?  In 
>>> this case that's obviously too late.
>> The opposite.  Path splitting interferes with if-conversion & 
>> vectorization.  Path splitting mucks up the CFG enough that 
>> if-conversion won't fire and as a result vectorization is inhibited.  
>> It also creates multi-latch loops, which isn't a great situation either.
>>
>> It *may* be the case that dropping it that far down in the pipeline 
>> and making the modifications necessary to handle simple latches may 
>> in turn make the path splitting code play better with if-conversion 
>> and vectorization and avoid creation of multi-latch loops.  At least 
>> that's how it looks on paper when I draw out the CFG manipulations.
>>
>> I'll do some experiments.
> It doesn't look too terrible to ravamp the recognition code to work 
> later in the pipeline with simple latches.  Sadly that doesn't seem to 
> have fixed the bad interactions with if-conversion.
>
> *But* that does open up the possibility of moving the path splitting 
> pass even deeper in the pipeline -- in particular we can move it past 
> the vectorizer.  Which is may be a win.
>
> So the big question is whether or not we'll still see enough benefits 
> from having it so late in the pipeline.  It's still early enough that 
> we get DOM, VRP, reassoc, forwprop, phiopt, etc.
>
> Ajit, I'll pass along an updated patch after doing some more testing.

Hello Jeff:
>>So here's what I'm working with.  It runs after the vectorizer now.

>>Ajit, if you could benchmark this it would be greatly appreciated.  I know 
>>you saw significant improvements on one or more benchmarks in the past.  It'd 
>>be good to know that the >>updated placement of the pass doesn't invalidate 
>>the gains you saw.

>>With the updated pass placement, we don't have to worry about switching the 
>>pass on/off based on whether or not the vectorizer & if-conversion are 
>>enabled.  So that hackery is gone.

>>I think I've beefed up the test to identify the diamond patterns we want so 
>>that it's stricter in what we accept.  The call to ignore_bb_p is a part of 
>>that test so that we're actually looking at >>the right block in a world 
>>where we're doing this transformation with simple latches.

>>I've also put a graphical comment before perform_path_splitting which 
>>hopefully shows the CFG transformation we're making a bit clearer.

>>This bootstraps and regression tests cleanly on x86_64-linux-gnu.

Thank you for the inputs. I will build the compiler and run SPEC CPU 2000 
benchmarks for X86 target and respond back as soon as run is done.
I will also run  the EEMBC/Mibench benchmarks for Microblaze target.
 
Would let you know the results at the earliest.

Thanks & Regards
Ajit



RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-10-20 Thread Ajit Kumar Agarwal
Hello Jeff:

Did you get a chance to look at the below response. Please let me know your 
opinion on the below.

Thanks & Regards
Ajit

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Ajit Kumar Agarwal
Sent: Saturday, September 12, 2015 4:09 PM
To: Jeff Law; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation



-Original Message-
From: Jeff Law [mailto:l...@redhat.com]
Sent: Thursday, September 10, 2015 3:10 AM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 08/26/2015 11:29 PM, Ajit Kumar Agarwal wrote:
>
> Thanks. The following testcase testsuite/gcc.dg/tree-ssa/ifc-5.c
>
> void dct_unquantize_h263_inter_c (short *block, int n, int qscale, int
> nCoeffs) { int i, level, qmul, qadd;
>
> qadd = (qscale - 1) | 1; qmul = qscale << 1;
>
> for (i = 0; i <= nCoeffs; i++) { level = block[i]; if (level < 0) 
> level = level * qmul - qadd; else level = level * qmul + qadd; 
> block[i] = level; } }
>
> The above Loop is a candidate of path splitting as the IF block merges 
> at the latch of the Loop and the path splitting duplicates The latch 
> of the loop which is the statement block[i] = level into the 
> predecessors THEN and ELSE block.
>
> Due to above path splitting,  the IF conversion is disabled and the 
> above IF-THEN-ELSE is not IF-converted and the test case fails.
>>So I think the question then becomes which of the two styles generally 
>>results in better code?  The path-split version or the older if-converted 
>>version.

>>If the latter, then this may suggest that we've got the path splitting code 
>>at the wrong stage in the optimizer pipeline or that we need better 
>>heuristics for >>when to avoid applying path splitting.

The code generated by the Path Splitting is useful when it exposes the DCE, 
PRE,CCP candidates. Whereas the IF-conversion is useful When the if-conversion 
exposes the vectorization candidates. If the  if-conversion doesn't exposes the 
vectorization and the path splitting doesn't Exposes the DCE, PRE redundancy 
candidates, it's hard to predict. If the if-conversion does not exposes the 
vectorization and in the similar case Path splitting exposes the DCE , PRE  and 
CCP redundancy candidates then path splitting is useful. Also the path 
splitting increases the granularity of the THEN and ELSE path makes better 
register allocation and code scheduling.

The suggestion for keeping the path splitting later in the pipeline after the 
if-conversion and the vectorization is useful as it doesn't break the Existing 
Deja GNU tests. Also useful to keep the path splitting later in the pipeline 
after the if-conversion and vectorization is that path splitting Can always 
duplicate the merge node into its predecessor after the if-conversion and 
vectorization pass, if the if-conversion and vectorization Is not applicable to 
the Loops. But this suppresses the CCP, PRE candidates which are earlier in the 
optimization pipeline.


>
> There were following review comments from the above patch.
>
> +/* This function performs the feasibility tests for path splitting
>> +   to perform. Return false if the feasibility for path splitting
>> +   is not done and returns true if the feasibility for path
>> splitting +   is done. Following feasibility tests are performed.
>> + +   1. Return false if the join block has rhs casting for assign
>> +  gimple statements.
>
> Comments from Jeff:
>
>>> These seem totally arbitrary.  What's the reason behind each of 
>>> these restrictions?  None should be a correctness requirement 
>>> AFAICT.
>
> In the above patch I have made a check given in point 1. in the loop 
> latch and the Path splitting is disabled and the IF-conversion happens 
> and the test case passes.
>>That sounds more like a work-around/hack.  There's nothing inherent with a 
>>type conversion that should disable path splitting.

I have sent the patch with this change and I will remove the above check from 
the patch.

>>What happens if we delay path splitting to a point after if-conversion is 
>>complete?

This is better suggestion as explained above, but gains achieved through path 
splitting by keeping earlier in the pipeline before if-conversion , 
tree-vectorization, tree-vrp is suppressed if the following optimization after 
path splitting is not applicable for the above loops.

I have made the above changes and the existing set up doesn't break 

RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-10-11 Thread Ajit Kumar Agarwal


-Original Message-
From: Bin.Cheng [mailto:amker.ch...@gmail.com] 
Sent: Friday, October 09, 2015 8:15 AM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.

On Thu, Oct 8, 2015 at 1:53 PM, Ajit Kumar Agarwal 
<ajit.kumar.agar...@xilinx.com> wrote:
>
>
> -Original Message-
> From: Bin.Cheng [mailto:amker.ch...@gmail.com]
> Sent: Thursday, October 08, 2015 10:29 AM
> To: Ajit Kumar Agarwal
> Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
> Hunsigida; Nagaraju Mekala
> Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
> for LICM and IVOPTS.
>
> On Thu, Oct 8, 2015 at 12:32 PM, Ajit Kumar Agarwal 
> <ajit.kumar.agar...@xilinx.com> wrote:
>> Following Proposed:
>>
>> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
>> Induction variable optimization based on SSA representation.
>> The current logic used in LICM for register used inside the loops is 
>> changed. The Live Out of the loop latch node and the Live in of the 
>> destination of the exit nodes is used to set the Loops Liveness at the exit 
>> of the Loop. The register used is the number of live variables at the exit 
>> of the Loop calculated above.
>>
>> For Induction variable optimization on tree SSA representation, the 
>> register used logic is based on the number of phi nodes at the loop 
>> header to represent the liveness at the loop. Current Logic used only the 
>> number of phi nodes at the loop header. I have made changes  to represent 
>> the phi operands also live at the loop. Thus number of phi operands also 
>> gets incremented in the number of registers used.
> Hi,
>>>For the GIMPLE IVO part, I don't think the change is reasonable enough.  
>>>IMHO, IVO fails to restrict iv number in some complex cases, your change 
>>>tries to >>rectify that by increasing register pressure irrespective to 
>>>out-of-ssa and coalescing.  I think the original code models reg-pressure 
>>>better, what needs to be >>changed is how we compute cost from register 
>>>pressure and use that to restrict iv number.
>
> Considering the liveness with respect to all the phi arguments will 
> not increase the register pressure. It improves the heuristics for 
> restricting The IV that increases the register pressure. The cost 
> model uses regs_used and modelling the
>>I think register pressure is increased along with regs_needed, doesn't matter 
>>if it will be canceled in estimate_reg_pressure_cost for both ends of cost 
>>>>comparison.
>>Liveness with respect to the phi arguments measures
> Better register pressure.
>>I agree IV number should be controlled for some cases, but not by increasing 
>>`n' using phi argument number unconditionally.  Considering summary 
>>>>reduction as an example, most likely the ssa names will be coalesced and 
>>held in single register.  Furthermore, there is no reason to count phi 
>>node/arg >>number for floating point phi nodes.

>
> Number of phi nodes in the loop header is not only the criteria for 
> regs_used, but the number of liveness with respect to loop should be Criteria 
> to measure appropriate register pressure.
>>IMHO, it's hard to accurately track liveness info on SSA(PHI), because of 
>>coalescing etc.  So could you give some examples/proof for this?

I agree with you that it is hard to predict the exact mapping from SSA to the 
actual register allocation due to coalescing and out of SSA.

The Interference on phi arguments and results are important criteria for 
register pressure on SSA. The conventional SSA where the  phi 
arguments don't interfere. Most of the current compilers don't have 
conventional SSA. In the Non-conventional SSA there are chances
the phi arguments interfere. The Non-Conventional SSA arises due to the copy 
propagation of ssa names  makes the phi arguments
interfere. Due to non-conventional nature of SSA the phi arguments interfere 
and should be considered for the register used.  I interpret
the register used as the number of interfering live ranges that leads to 
increase or decrease in register pressure. 

On top of the above the Out of SSA or SSA names coalescing, for conventional 
SSA is quite simple as each phi nodes is assigned to new
variables and the def and use is replaced with the new variables and makes the 
case of assigning single register and then the corresponding 
phi node is removed. But in the Non- Conventional nature of SSA, the out of ssa 
makes the SSA conventional by inserting copying to each

[RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-10-07 Thread Ajit Kumar Agarwal
Following Proposed:

Changes are done in the Loop Invariant(LICM) at RTL level and also the 
Induction variable optimization based on SSA representation. 
The current logic used in LICM for register used inside the loops is changed. 
The Live Out of the loop latch node and the Live in of the 
destination of the exit nodes is used to set the Loops Liveness at the exit of 
the Loop. The register used is the number of live variables 
at the exit of the Loop calculated above.

For Induction variable optimization on tree SSA representation, the register 
used logic is based on the number of phi nodes at the loop 
header to represent the liveness at the loop. Current Logic used only the 
number of phi nodes at the loop header. I have made changes
 to represent the phi operands also live at the loop. Thus number of phi 
operands also gets incremented in the number of registers used.

Performance runs:

Bootstrapping with i386 goes through fine. The spec cpu 2000 benchmarks is run 
and following performance runs and the code size for
 i386 target seen.

Ratio with the above optimization changes vs ratio without above optimizations 
for INT benchmarks (3785.261 vs 3783.064).
Ratio with the above optimization changes vs ratio without above optimization 
for FP benchmarks ( 4676.763189 vs 4676.072428 ).

Code size reduction for INT benchmarks : 2324 instructions.
Code size reduction for FP benchmarks : 1283 instructions.

For Microblaze target the Mibench and EEMBC benchmarks is run and the following 
improvements is seen.

(qos_lite(5.3%), consumer_jpeg_c(1.34%), security_rijndael_d(1.8%), 
security_rijndael_e(1.4%))

Code Size reduction for Mibench  = 16164 instructions.
Code Size reduction for EEMBC = 98 instructions.

Patch ChangeLog:

PATCH] [RFC, Patch]: Optimized changes in the register used inside  loop for 
LICM and IVOPTS.

Changes are done in the Loop Invariant(LICM) at RTL level and also the 
Induction variable optimization 
based on SSA representation. The current logic used in LICM for register used 
inside the loops is changed. 
The Live Out of the loop latch node and the Live in of the destination of the 
exit nodes is used to set the
 Loops Liveness at the exit of the Loop. The register used is the number of 
live variables at the exit of the
 Loop calculated above.

For Induction variable optimization on tree SSA representation, the register 
used logic is based on the
 number of phi nodes at the loop header to represent the liveness at the loop.  
Current Logic used only
 the number of phi nodes at the loop header.  Changes are made to represent the 
phi operands also live
 at the loop. Thus number of phi operands also gets incremented in the number 
of registers used.

ChangeLog:
2015-10-09  Ajit Agarwal  

* loop-invariant.c (compute_loop_liveness): New.
(determine_regs_used): New.
(find_invariants_to_move): Use of determine_regs_used.
* tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
arguments for register used.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks & Regards
Ajit


0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
Description: 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch


RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-10-07 Thread Ajit Kumar Agarwal


-Original Message-
From: Bin.Cheng [mailto:amker.ch...@gmail.com] 
Sent: Thursday, October 08, 2015 10:29 AM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.

On Thu, Oct 8, 2015 at 12:32 PM, Ajit Kumar Agarwal 
<ajit.kumar.agar...@xilinx.com> wrote:
> Following Proposed:
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation.
> The current logic used in LICM for register used inside the loops is 
> changed. The Live Out of the loop latch node and the Live in of the 
> destination of the exit nodes is used to set the Loops Liveness at the exit 
> of the Loop. The register used is the number of live variables at the exit of 
> the Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the number of phi nodes at the loop 
> header to represent the liveness at the loop. Current Logic used only the 
> number of phi nodes at the loop header. I have made changes  to represent the 
> phi operands also live at the loop. Thus number of phi operands also gets 
> incremented in the number of registers used.
Hi,
>>For the GIMPLE IVO part, I don't think the change is reasonable enough.  
>>IMHO, IVO fails to restrict iv number in some complex cases, your change 
>>tries to >>rectify that by increasing register pressure irrespective to 
>>out-of-ssa and coalescing.  I think the original code models reg-pressure 
>>better, what needs to be >>changed is how we compute cost from register 
>>pressure and use that to restrict iv number.

Considering the liveness with respect to all the phi arguments will not 
increase the register pressure. It improves the heuristics for restricting
The IV that increases the register pressure. The cost model uses regs_used and 
modelling the Liveness with respect to the phi arguments measures
Better register pressure.

Number of phi nodes in the loop header is not only the criteria for regs_used, 
but the number of liveness with respect to loop should be 
Criteria to measure appropriate register pressure.

Thanks & Regards
Ajit
>>As for the specific function determine_set_costs, I think one change is 
>>necessary to rule out all floating point phi nodes, because they do not have 
>>impact on >>IVO register pressure.  Actually this change will further reduce 
>>register pressure for fp related cases.


Thanks,
bin
>
> Performance runs:
>
> Bootstrapping with i386 goes through fine. The spec cpu 2000 
> benchmarks is run and following performance runs and the code size for
>  i386 target seen.
>
> Ratio with the above optimization changes vs ratio without above 
> optimizations for INT benchmarks (3785.261 vs 3783.064).
> Ratio with the above optimization changes vs ratio without above optimization 
> for FP benchmarks ( 4676.763189 vs 4676.072428 ).
>
> Code size reduction for INT benchmarks : 2324 instructions.
> Code size reduction for FP benchmarks : 1283 instructions.
>
> For Microblaze target the Mibench and EEMBC benchmarks is run and the 
> following improvements is seen.
>
> (qos_lite(5.3%), consumer_jpeg_c(1.34%), security_rijndael_d(1.8%), 
> security_rijndael_e(1.4%))
>
> Code Size reduction for Mibench  = 16164 instructions.
> Code Size reduction for EEMBC = 98 instructions.
>
> Patch ChangeLog:
>
> PATCH] [RFC, Patch]: Optimized changes in the register used inside  loop for 
> LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation. The current 
> logic used in LICM for register used inside the loops is changed.
> The Live Out of the loop latch node and the Live in of the destination 
> of the exit nodes is used to set the  Loops Liveness at the exit of 
> the Loop. The register used is the number of live variables at the exit of 
> the  Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the  number of phi nodes at the loop 
> header to represent the liveness at the loop.  Current Logic used only  
> the number of phi nodes at the loop header.  Changes are made to represent 
> the phi operands also live  at the loop. Thus number of phi operands also 
> gets incremented in the number of registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal  <ajit...@xilinx.com>
>
> * loop-invariant.c (compute_loop_liveness): New.
> (determine_regs_used): New.
> (find_invariants_to_move): Use of determine_regs_used.
> * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
> arguments for register used.
>
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com
>
> Thanks & Regards
> Ajit


RE: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-29 Thread Ajit Kumar Agarwal


-Original Message-
From: Aaron Sawdey [mailto:acsaw...@linux.vnet.ibm.com] 
Sent: Monday, September 28, 2015 11:55 PM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,optimization]: Optimized changes in the estimate register 
pressure cost.

On Sat, 2015-09-26 at 04:51 +, Ajit Kumar Agarwal wrote:
> I have made the following changes in the estimate_reg_pressure_cost 
> function used by the loop invariant and IVOPTS.
> 
> Earlier the estimate_reg_pressure cost uses the cost of n_new 
> variables that are generated by the Loop Invariant  and IVOPTS. These 
> are not sufficient for register pressure calculation. The register 
> pressure cost calculation should use the n_new + n_old (numbers) to 
> consider the cost. n_old is the register  used inside the loops and 
> the effect of  n_new new variables generated by loop invariant and 
> IVOPTS on register pressure is based on how the new variables impact 
> on register used inside the loops. The increase or decrease in register 
> pressure is due to the impact of new variables on the register used  inside 
> the loops. The register-register move cost or the spill cost should consider 
> the cost associated with register used and the new variables generated. The 
> movement  of new variables increases or decreases the register pressure, 
> which is based on  overall cost of n_new + n_old variables.
> 
> The increase and decrease in register pressure is based on the overall 
> cost of n_new + n_old as the changes in the register pressure caused 
> due to new variables is based on how the changes behave with respect to the 
> register used in the loops.
> 
> Thus the register pressure caused to new variables is based on the new 
> variables and its impact on register used inside  the loops and thus consider 
> the overall  cost of n_new + n_old.
> 
> Bootstrap for i386 and reg tested on i386 with the change is fine.
> 
> SPEC CPU 2000 benchmarks are run and there is following impact on the 
> performance and code size.
> 
> ratio with the optimization vs ratio without optimization for INT 
> benchmarks
> (3807.632 vs 3804.661)
> 
> ratio with the optimization vs ratio without optimization for FP 
> benchmarks ( 4668.743 vs 4778.741)
> 
> Code size reduction with respect to FP SPEC CPU 2000 benchmarks
> 
> Number of instruction with optimization = 1094117 Number of 
> instruction without optimization = 1094659
> 
> Reduction in number of instruction with the optimization = 542 instruction.
> 
> [Patch,optimization]: Optimized changes in the estimate  register 
> pressure cost.
> 
> Earlier the estimate_reg_pressure cost uses the cost of n_new 
> variables that are generated by the Loop Invariant and IVOPTS. These 
> are not sufficient for register pressure calculation. The register 
> pressure cost calculation should use the n_new + n_old (numbers) to 
> consider the cost. n_old is the register used inside the loops and the 
> affect of n_new new variables generated by loop invariant and IVOPTS 
> on register pressure is based on how the new variables impact on register 
> used inside the loops.
> 
> ChangeLog:
> 2015-09-26  Ajit Agarwal  <ajit...@xilinx.com>
> 
>   * cfgloopanal.c (estimate_reg_pressure_cost) : Add changes
>   to consider the n_new plus n_old in the register pressure
>   cost.
> 
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com

>>Ajit,

 >>It looks to me like your change doesn't do anything at all inside the 
 >>loop-invariant.c code. There it's doing a difference between two 
 >>>>estimate_reg_pressure_cost calls so adding n_old (regs_used) to both is 
 >>canceled out.

>>  size_cost = (estimate_reg_pressure_cost (new_regs[0] + regs_needed[0],
>>  regs_used, speed, call_p)
>>   - estimate_reg_pressure_cost (new_regs[0],
>> regs_used, speed, call_p));

>>I'm not quite sure I understand the "why" of the heuristic you've added here 
>>-- can you explain your reasoning further?

Aaron:

Extract from function estimate_reg_pressure_cost() where the changes are made.

if (regs_needed <= available_regs)
/* If we are close to running out of registers, try to preserve
   them.  */
/* Case 1 */
cost = target_reg_cost [speed] * regs_needed ;
  else
/* If we run out of registers, it is very expensive to add another
   one.  */
 /* Case 2*/
cost = target_spill_cost [speed] * regs_needed;

If the first estimate_reg_pressure falls into the category of Case I or Case 2 
and the second estimate_reg_pressure falls into same Category for C

RE: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-27 Thread Ajit Kumar Agarwal


-Original Message-
From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] 
Sent: Sunday, September 27, 2015 7:49 PM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,optimization]: Optimized changes in the estimate register 
pressure cost.

On Sat, Sep 26, 2015 at 04:51:20AM +, Ajit Kumar Agarwal wrote:
> SPEC CPU 2000 benchmarks are run and there is following impact on the 
> performance and code size.
> 
> ratio with the optimization vs ratio without optimization for INT 
> benchmarks
> (3807.632 vs 3804.661)
> 
> ratio with the optimization vs ratio without optimization for FP 
> benchmarks ( 4668.743 vs 4778.741)

>>Did you swap these?  You're saying FP got significantly worse?

Sorry for the typo error.  Please find the corrected one.

Ratio  with the optimization vs ratio without optimization for FP  benchmarks ( 
4668.743 vs 4668.741). With the optimization
FP is slightly better performance.

Thanks & Regards
Ajit

Segher


RE: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-27 Thread Ajit Kumar Agarwal


-Original Message-
From: Bin.Cheng [mailto:amker.ch...@gmail.com] 
Sent: Monday, September 28, 2015 7:05 AM
To: Ajit Kumar Agarwal
Cc: Segher Boessenkool; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,optimization]: Optimized changes in the estimate register 
pressure cost.

On Sun, Sep 27, 2015 at 11:13 PM, Ajit Kumar Agarwal 
<ajit.kumar.agar...@xilinx.com> wrote:
>
>
> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Sunday, September 27, 2015 7:49 PM
> To: Ajit Kumar Agarwal
> Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
> Hunsigida; Nagaraju Mekala
> Subject: Re: [Patch,optimization]: Optimized changes in the estimate register 
> pressure cost.
>
> On Sat, Sep 26, 2015 at 04:51:20AM +, Ajit Kumar Agarwal wrote:
>> SPEC CPU 2000 benchmarks are run and there is following impact on the 
>> performance and code size.
>>
>> ratio with the optimization vs ratio without optimization for INT 
>> benchmarks
>> (3807.632 vs 3804.661)
>>
>> ratio with the optimization vs ratio without optimization for FP 
>> benchmarks ( 4668.743 vs 4778.741)
>
>>>Did you swap these?  You're saying FP got significantly worse?
>
> Sorry for the typo error.  Please find the corrected one.
>
> Ratio  with the optimization vs ratio without optimization for FP  
> benchmarks ( 4668.743 vs 4668.741). With the optimization FP is slightly 
> better performance.
>>Did you mis-type the number again?  Or this must be noise.  Now I remember 
>>why I didn't get perf improvement from this.  Changing reg_new to reg_new >>+ 
>>reg_old doesn't have big impact because it just increased the starting number 
>>for each scenarios.  Maybe it still makes sense for cases on the verge of 
>>>>exceeding target's available register number.  I will try to collect 
>>benchmark data on ARM, but it may take some time.

This is the correct one.

Thanks & Regards
Ajit

Thanks,
bin
>
> Thanks & Regards
> Ajit
>
> Segher


[Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-25 Thread Ajit Kumar Agarwal
I have made the following changes in the estimate_reg_pressure_cost function 
used 
by the loop invariant and IVOPTS. 

Earlier the estimate_reg_pressure cost uses the cost of n_new variables that 
are generated by the Loop Invariant
 and IVOPTS. These are not sufficient for register pressure calculation. The 
register pressure cost calculation should
use the n_new + n_old (numbers) to consider the cost. n_old is the register  
used inside the loops and the effect of
 n_new new variables generated by loop invariant and IVOPTS on register 
pressure is based on how the new
variables impact on register used inside the loops. The increase or decrease in 
register pressure is due to the impact
of new variables on the register used  inside the loops. The register-register 
move cost or the spill cost should consider
the cost associated with register used and the new variables generated. The 
movement  of new variables increases or 
decreases the register pressure, which is based on  overall cost of n_new + 
n_old variables.

The increase and decrease in register pressure is based on the overall cost of 
n_new + n_old as the changes in the 
register pressure caused due to new variables is based on how the changes 
behave with respect to the register used 
in the loops.

Thus the register pressure caused to new variables is based on the new 
variables and its impact on register used inside
 the loops and thus consider the overall  cost of n_new + n_old.

Bootstrap for i386 and reg tested on i386 with the change is fine.

SPEC CPU 2000 benchmarks are run and there is following impact on the 
performance
and code size.

ratio with the optimization vs ratio without optimization for INT benchmarks
(3807.632 vs 3804.661)

ratio with the optimization vs ratio without optimization for FP benchmarks
( 4668.743 vs 4778.741)

Code size reduction with respect to FP SPEC CPU 2000 benchmarks

Number of instruction with optimization = 1094117
Number of instruction without optimization = 1094659

Reduction in number of instruction with the optimization = 542 instruction.

[Patch,optimization]: Optimized changes in the estimate
 register pressure cost.

Earlier the estimate_reg_pressure cost uses the cost of n_new variables that
are generated by the Loop Invariant and IVOPTS. These are not sufficient for
register pressure calculation. The register pressure cost calculation should
use the n_new + n_old (numbers) to consider the cost. n_old is the register
used inside the loops and the affect of n_new new variables generated by
loop invariant and IVOPTS on register pressure is based on how the new
variables impact on register used inside the loops.

ChangeLog:
2015-09-26  Ajit Agarwal  

* cfgloopanal.c (estimate_reg_pressure_cost) : Add changes
to consider the n_new plus n_old in the register pressure
cost.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks & Regards
Ajit



0001-Patch-optimization-Optimized-changes-in-the-estimate.patch
Description: 0001-Patch-optimization-Optimized-changes-in-the-estimate.patch


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-09-12 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Thursday, September 10, 2015 3:10 AM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 08/26/2015 11:29 PM, Ajit Kumar Agarwal wrote:
>
> Thanks. The following testcase testsuite/gcc.dg/tree-ssa/ifc-5.c
>
> void dct_unquantize_h263_inter_c (short *block, int n, int qscale, int 
> nCoeffs) { int i, level, qmul, qadd;
>
> qadd = (qscale - 1) | 1; qmul = qscale << 1;
>
> for (i = 0; i <= nCoeffs; i++) { level = block[i]; if (level < 0) 
> level = level * qmul - qadd; else level = level * qmul + qadd; 
> block[i] = level; } }
>
> The above Loop is a candidate of path splitting as the IF block merges 
> at the latch of the Loop and the path splitting duplicates The latch 
> of the loop which is the statement block[i] = level into the 
> predecessors THEN and ELSE block.
>
> Due to above path splitting,  the IF conversion is disabled and the 
> above IF-THEN-ELSE is not IF-converted and the test case fails.
>>So I think the question then becomes which of the two styles generally 
>>results in better code?  The path-split version or the older if-converted 
>>version.

>>If the latter, then this may suggest that we've got the path splitting code 
>>at the wrong stage in the optimizer pipeline or that we need better 
>>heuristics for >>when to avoid applying path splitting.

The code generated by the Path Splitting is useful when it exposes the DCE, 
PRE,CCP candidates. Whereas the IF-conversion is useful
When the if-conversion exposes the vectorization candidates. If the  
if-conversion doesn't exposes the vectorization and the path splitting doesn't 
Exposes the DCE, PRE redundancy candidates, it's hard to predict. If the 
if-conversion does not exposes the vectorization and in the similar case
Path splitting exposes the DCE , PRE  and CCP redundancy candidates then path 
splitting is useful. Also the path splitting increases the granularity of the
THEN and ELSE path makes better register allocation and code scheduling.

The suggestion for keeping the path splitting later in the pipeline after the 
if-conversion and the vectorization is useful as it doesn't break the
Existing Deja GNU tests. Also useful to keep the path splitting later in the 
pipeline after the if-conversion and vectorization is that path splitting
Can always duplicate the merge node into its predecessor after the 
if-conversion and vectorization pass, if the if-conversion and vectorization
Is not applicable to the Loops. But this suppresses the CCP, PRE candidates 
which are earlier in the optimization pipeline.


>
> There were following review comments from the above patch.
>
> +/* This function performs the feasibility tests for path splitting
>> +   to perform. Return false if the feasibility for path splitting
>> +   is not done and returns true if the feasibility for path
>> splitting +   is done. Following feasibility tests are performed.
>> + +   1. Return false if the join block has rhs casting for assign
>> +  gimple statements.
>
> Comments from Jeff:
>
>>> These seem totally arbitrary.  What's the reason behind each of 
>>> these restrictions?  None should be a correctness requirement 
>>> AFAICT.
>
> In the above patch I have made a check given in point 1. in the loop 
> latch and the Path splitting is disabled and the IF-conversion happens 
> and the test case passes.
>>That sounds more like a work-around/hack.  There's nothing inherent with a 
>>type conversion that should disable path splitting.

I have sent the patch with this change and I will remove the above check from 
the patch.

>>What happens if we delay path splitting to a point after if-conversion is 
>>complete?

This is better suggestion as explained above, but gains achieved through path 
splitting by keeping earlier in the pipeline before if-conversion
, tree-vectorization, tree-vrp is suppressed if the following optimization 
after path splitting is not applicable for the above loops.

I have made the above changes and the existing set up doesn't break but the 
gains achieved in the benchmarks like rgbcmy_lite(EEMBC)
Benchmarks is suppressed. The path splitting for the above EEMBC benchmarks 
give gains of 9% and for such loops if-conversion and
Vectorization is not applicable  exposing gain with path splitting 
optimizations.

>>Alternately, could if-conversion export a routine which indicates if a 
>>particular sub-graph is likely to be if-convertable?  The path splitting pass 
>>could then use >>that routine to help determine if the path ough

RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-09-04 Thread Ajit Kumar Agarwal
All:

Thanks Jeff for the review comments.

The patch attached incorporate all the review comments given below.

Bootstrapped on i386 and Microblaze and the Deja GNU tests for Microblaze 
results looks fine.

[Patch,tree-optimization]: Add new path Splitting pass on
 tree ssa representation.

Added a new pass on path splitting on tree SSA representation. The path
splitting optimization does the CFG transformation when the two execution
paths of the IF-THEN-ELSE merge at the latch node of loop, then duplicate
the merge mode into two paths preserving the SSA semantics.

ChangeLog:
2015-09-05  Ajit Agarwal  <ajit...@xilinx.com>

* Makefile.in (OBJS): Add tree-ssa-path-split.o
* common.opt (ftree-path-split): Add the new flag.
* opts.c (default_options_table) : Add an entry for
Path splitting optimization at -O2 and above.
* passes.def (path_split): Add new path splitting pass.
* timevar.def (TV_TREE_PATH_SPLIT): New.
* tree-pass.h (make_pass_path_split): New declaration.
* tree-ssa-path-split.c: New file.
* tracer.c (transform_duplicate): New function.
* tracer.h: New header file.
* doc/invoke.texi (ftree-path-split): Document.
(fdump-tree-path_split): Document.
* testsuite/gcc.dg/path-split-1.c: New.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks & Regards
Ajit
-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Thursday, August 20, 2015 3:16 AM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 08/15/2015 11:01 AM, Ajit Kumar Agarwal wrote:
>
>
>  From cf2b64cc1d6623424d770f2a9ea257eb7e58e887 Mon Sep 17 00:00:00 
> 2001
> From: Ajit Kumar Agarwal<ajit...@xilix.com>
> Date: Sat, 15 Aug 2015 18:19:14 +0200
> Subject: [PATCH] [Patch,tree-optimization]: Add new path Splitting pass on
>   tree ssa representation.
>
> Added a new pass on path splitting on tree SSA representation. The 
> path splitting optimization does the CFG transformation of join block 
> of the if-then-else same as the loop latch node is moved and merged 
> with the predecessor blocks after preserving the SSA representation.
>
> ChangeLog:
> 2015-08-15  Ajit Agarwal<ajit...@xilinx.com>
>
>   * gcc/Makefile.in: Add the build of the new file
>   tree-ssa-path-split.c
Instead:

* Makefile.in (OBJS): Add tree-ssa-path-split.o.


>   * gcc/opts.c (OPT_ftree_path_split) : Add an entry for
>   Path splitting pass with optimization flag greater and
>   equal to O2.

* opts.c (default_options_table): Add entry for path splitting
optimization at -O2 and above.



>   * gcc/passes.def (path_split): add new path splitting pass.
Capitalize "add".




>   * gcc/tree-ssa-path-split.c: New.
Use "New file".

>   * gcc/tracer.c (transform_duplicate): New.
Use "New function".

>   * gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New.
>   * gcc/testsuite/gcc.dg/path-split-1.c: New.
These belong in gcc/testsuite/ChangeLog and remove the "gcc/testsuite" 
prefix.

>   * gcc/doc/invoke.texi
>   (ftree-path-split): Document.
>   (fdump-tree-path_split): Document.
Should just be two lines instead of three.

And more generally, there's no need to prefix ChangeLog entries with "gcc/".

Now that the ChangeLog nits are out of the way, let's get to stuff that's more 
interesting.



>
> Signed-off-by:Ajit agarwalajit...@xilinx.com
> ---
>   gcc/Makefile.in  |   1 +
>   gcc/common.opt   |   4 +
>   gcc/doc/invoke.texi  |  16 +-
>   gcc/opts.c   |   1 +
>   gcc/passes.def   |   1 +
>   gcc/testsuite/gcc.dg/path-split-1.c  |  65 ++
>   gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c |  60 +
>   gcc/timevar.def  |   1 +
>   gcc/tracer.c |  37 +--
>   gcc/tree-pass.h  |   1 +
>   gcc/tree-ssa-path-split.c| 330 
> +++
>   11 files changed, 503 insertions(+), 14 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/path-split-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c
>   create mode 100644 gcc/tree-ssa-path-split.c
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index e80eadf..1d02582 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2378,6 +2378,10 @@ ftree-vrp
>   Common Report Var(flag_tree_vrp) Init(0) Optimization
>   Perform 

RE: [PATCH GCC][rework]Improve loop bound info by simplifying conversions in iv base

2015-08-27 Thread Ajit Kumar Agarwal


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Bin Cheng
Sent: Thursday, August 27, 2015 3:12 PM
To: gcc-patches@gcc.gnu.org
Subject: [PATCH GCC][rework]Improve loop bound info by simplifying conversions 
in iv base

Hi,
This is a rework for
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02335.html, with review 
comments addressed.  For now, SCEV may compute iv base in the form of 
(signed T)((unsigned T)base + step)).  This complicates other 
optimizations/analysis depending on SCEV because it's hard to dive into type 
conversions.  This kind of type conversions can be simplified with 
additional range information implied by loop initial conditions.  This patch 
does such simplification.
With simplified iv base, loop niter analysis can compute more accurate bound 
information since sensible value range can be derived for base+step.  For 
example, accurate loop boundmay_be_zero information is computed for cases 
added by this patch.

The code is actually moved from loop_exits_before_overflow.  After this 
patch, the corresponding code in loop_exits_before_overflow will be never 
executed, so I removed that part code.  The patch also includes some code 
format changes.

Bootstrap and test on x86_64.  Is it OK?

The scalar Evolution calculates the chrec (base , +,step) based on chain 
of recurrence through induction variable expressions and 
Propagating the value in SSA representation to derive at the above chrec.. If 
the base value assigned is unsigned and the declaration of 
the base is signed, then only the above chrec is derived based on conversion 
from unsigned to signed? Such type 
conversions can be ignored for the calculation of iteration bound as this 
cannot be overflow in any case. Is the below patch aim at that?

Thanks  Regards
Ajit

Thanks,
bin

2015-08-27  Bin Cheng  bin.ch...@arm.com

* tree-ssa-loop-niter.c (tree_simplify_using_condition_1): Support
new parameter.
(tree_simplify_using_condition): Ditto.
(simplify_using_initial_conditions): Ditto.
(loop_exits_before_overflow): Pass new argument to function
simplify_using_initial_conditions.  Remove case for type conversions
simplification.
* tree-ssa-loop-niter.h (simplify_using_initial_conditions): New
parameter.
* tree-scalar-evolution.c (simple_iv): Simplify type conversions
in iv base using loop initial conditions.

gcc/testsuite/ChangeLog
2015-08-27  Bin Cheng  bin.ch...@arm.com

* gcc.dg/tree-ssa/loop-bound-2.c: New test.
* gcc.dg/tree-ssa/loop-bound-4.c: New test.
* gcc.dg/tree-ssa/loop-bound-6.c: New test.


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-08-26 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Thursday, August 20, 2015 9:19 PM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 08/20/2015 09:38 AM, Ajit Kumar Agarwal wrote:


 Bootstrapping with i386 and Microblaze target works fine. No 
 regression is seen in Deja GNU tests for Microblaze. There are lesser 
 failures. Mibench/EEMBC benchmarks were run for Microblaze target and 
 the gain of 9.3% is seen in rgbcmy_lite the EEMBC benchmarks.
 What do you mean by there are lesser failures?  Are you saying there are 
 cases where path splitting generates incorrect code, or cases where path 
 splitting produces code that is less efficient, or something else?

 I meant there are more Deja GNU testcases passes with the path splitting 
 changes.
Ah, in that case, that's definitely good news!

Thanks. The following testcase testsuite/gcc.dg/tree-ssa/ifc-5.c

void
dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
{
  int i, level, qmul, qadd;

  qadd = (qscale - 1) | 1;
  qmul = qscale  1;

  for (i = 0; i = nCoeffs; i++)
{
  level = block[i];
  if (level  0)
level = level * qmul - qadd;
  else
level = level * qmul + qadd;
  block[i] = level;
}
}

The above Loop is a candidate of path splitting as the IF block merges at the 
latch of the Loop and the path splitting duplicates
The latch of the loop which is the statement block[i] = level into the 
predecessors THEN and ELSE block.

Due to above path splitting,  the IF conversion is disabled and the above 
IF-THEN-ELSE is not IF-converted and the test case fails.

There were following review comments from the above patch.

+/* This function performs the feasibility tests for path splitting
 +   to perform. Return false if the feasibility for path splitting
 +   is not done and returns true if the feasibility for path splitting
 +   is done. Following feasibility tests are performed.
 +
 +   1. Return false if the join block has rhs casting for assign
 +  gimple statements.

Comments from Jeff:

These seem totally arbitrary.  What's the reason behind each of these 
restrictions?  None should be a correctness requirement AFAICT.  

In the above patch I have made a check given in point 1. in the loop latch and 
the Path splitting is disabled and the IF-conversion
happens and the test case passes.

I have incorporated the above review comments of not doing the above 
feasibility check of the point 1 and the above testcases goes
For path splitting and due to path splitting the if-cvt is not happening and 
the test case fails (expecting the pattern Applying if conversion 
To be present). With the above patch given for review and the Feasibility check 
of cast assign in the latch of the loop as given in point 1
 disables the path splitting  and if-cvt happens and the above test case passes.

Please let me know whether to keep the above feasibility check as given in 
point 1  or better appropriate changes required for the above 
Test case scenario of path splitting vs IF-conversion.

Thanks  Regards
Ajit


jeff



RE: [RFC]: Vectorization cost benefit changes.

2015-08-21 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Friday, August 21, 2015 2:03 PM
To: Ajit Kumar Agarwal
Cc: Jeff Law; GCC Patches; g...@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; 
Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC]: Vectorization cost benefit changes.

On Fri, Aug 21, 2015 at 7:18 AM, Ajit Kumar Agarwal 
ajit.kumar.agar...@xilinx.com wrote:
 All:

 I have done the vectorization cost changes as given below. I have considered 
 only the cost associated with the inner instead of outside.
 The consideration of inside scalar and vector cost is done as the inner cost 
 are the most cost effective than the outside cost.

I think you are confused about what the variables cost are associated to.  
You are changing a place that computes also the cost for 
non-outer-loop-vectorization so your patch is clearly not applicable.

vec_outside_cost is the cost of setting up invariants for example.
All costs apply to the outer loop - if there is a nested loop inside that 
loop its costs are folded into the outer loop cost already at this stage.

So I think your analysis is simply wrong and thus your patch.

You need to find another place to fix inner loop cost.

Thanks for your valuable suggestions and feedback. I will certainly look into 
it.

Thanks  Regards
Ajit
Richard.

  min_profitable_iters = ((scalar_single_iter_cost
 - vec_inside_cost) *vf);

 The Scalar_single_iter_cost consider the hardcoded value 50 which is 
 used for most of the targets and the scalar cost is multiplied With 
 50. This scalar cost is subtracted with vector cost and as the scalar cost is 
 increased the chances of vectorization is more with same Vectorization factor 
  and more loops will be vectorized.

 I have not changed the iteration count which is hardcoded with 50 and 
 I will do the changes to replace the 50 with the static Estimates of 
 iteration count if you agree upon the below changes.

 I have ran the SPEC cpu 2000 benchmarks with the below changes for 
 i386 targets and the significant gains are achieved with respect To INT and 
 FP benchmarks.

 Here is the data.

 Ratio of vectorization cost changes(FP benchmarks) vs Ratio of without 
 vectorization cost changes( FP benchmarks)  = 4640.102 vs 4583.379.
 Ratio of vectorization cost changes (INT benchmarks ) vs Ratio of 
 without vectorization cost changes( INT benchmarks0 = 3812.883 vs 
 3778.558

 Please give your feedback on the below changes for vectorization cost benefit.

 diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 
 422b883..35d538f 100644
 --- a/gcc/tree-vect-loop.c
 +++ b/gcc/tree-vect-loop.c
 @@ -2987,11 +2987,8 @@ vect_estimate_min_profitable_iters (loop_vec_info 
 loop_vinfo,
  min_profitable_iters = 1;
else
  {
 -  min_profitable_iters = ((vec_outside_cost - scalar_outside_cost) * 
 vf
 - - vec_inside_cost * peel_iters_prologue
 -  - vec_inside_cost * peel_iters_epilogue)
 - / ((scalar_single_iter_cost * vf)
 -- vec_inside_cost);
 +  min_profitable_iters = ((scalar_single_iter_cost
 +- vec_inside_cost) *vf);

if ((scalar_single_iter_cost * vf * min_profitable_iters)
= (((int) vec_inside_cost * min_profitable_iters)

 Thanks  Regards
 Ajit


[RFC]: Vectorization cost benefit changes.

2015-08-20 Thread Ajit Kumar Agarwal
All:

I have done the vectorization cost changes as given below. I have considered 
only the cost associated with the inner instead of outside.
The consideration of inside scalar and vector cost is done as the inner cost 
are the most cost effective than the outside cost.

 min_profitable_iters = ((scalar_single_iter_cost
- vec_inside_cost) *vf);

The Scalar_single_iter_cost consider the hardcoded value 50 which is used for 
most of the targets and the scalar cost is multiplied
With 50. This scalar cost is subtracted with vector cost and as the scalar cost 
is increased the chances of vectorization is more with same
Vectorization factor  and more loops will be vectorized.

I have not changed the iteration count which is hardcoded with 50 and I will do 
the changes to replace the 50 with the static
Estimates of iteration count if you agree upon the below changes. 

I have ran the SPEC cpu 2000 benchmarks with the below changes for i386 targets 
and the significant gains are achieved with respect
To INT and FP benchmarks.

Here is the data.

Ratio of vectorization cost changes(FP benchmarks) vs Ratio of without 
vectorization cost changes( FP benchmarks)  = 4640.102 vs 4583.379.
Ratio of vectorization cost changes (INT benchmarks ) vs Ratio of without 
vectorization cost changes( INT benchmarks0 = 3812.883 vs 3778.558

Please give your feedback on the below changes for vectorization cost benefit.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 422b883..35d538f 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2987,11 +2987,8 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
 min_profitable_iters = 1;
   else
 {
-  min_profitable_iters = ((vec_outside_cost - scalar_outside_cost) * vf
- - vec_inside_cost * peel_iters_prologue
-  - vec_inside_cost * peel_iters_epilogue)
- / ((scalar_single_iter_cost * vf)
-- vec_inside_cost);
+  min_profitable_iters = ((scalar_single_iter_cost
+- vec_inside_cost) *vf);

   if ((scalar_single_iter_cost * vf * min_profitable_iters)
   = (((int) vec_inside_cost * min_profitable_iters)

Thanks  Regards
Ajit


vect.diff
Description: vect.diff


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-08-20 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Thursday, August 20, 2015 1:13 AM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 08/15/2015 11:01 AM, Ajit Kumar Agarwal wrote:
 All:

 Please find the updated patch with suggestion and feedback 
 incorporated.

 Thanks Jeff and Richard for the review comments.

 Following changes were done based on the feedback on RFC comments.
 and the review for the previous patch.

 1. Both tracer and path splitting pass are separate passes so  that 
 two instances of the pass will run in the end, one doing path 
 splitting and one doing  tracing, at different times in the 
 optimization pipeline.
I'll have to think about this.  I'm not sure I agree totally with Richi's 
assertion that we should share code with the tracer pass, but I'll give it a 
good looksie.



 2. Transform code is shared for tracer and path splitting pass. The 
 common code in extracted in a given function transform_duplicate And 
 place the function in tracer.c and the path splitting pass uses the 
 transform code.
OK.  I'll take a good look at that.


 3. Analysis for the basic block population and traversing the basic 
 block using the Fibonacci heap is commonly used. This cannot be 
 Factored out into new function as the tracer pass does more analysis 
 based on the profile and the different heuristics is used in tracer 
 And path splitting pass.
Understood.


 4. The include headers is minimal and presence of what is required for 
 the path splitting pass.
THanks.


 5. The earlier patch does the SSA updating  with replace function to 
 preserve the SSA representation required to move the loop latch node 
 same as join Block to its predecessors and the loop latch node is just 
 forward block. Such replace function are not required as suggested by 
 the Jeff. Such replace Function goes away with this patch and the 
 transformed code is factored into a given function which is shared 
 between tracer and path splitting pass.
Sounds good.


 Bootstrapping with i386 and Microblaze target works fine. No 
 regression is seen in Deja GNU tests for Microblaze. There are lesser 
 failures. Mibench/EEMBC benchmarks were run for Microblaze target and 
 the gain of 9.3% is seen in rgbcmy_lite the EEMBC benchmarks.
What do you mean by there are lesser failures?  Are you saying there are 
cases where path splitting generates incorrect code, or cases where path 
splitting produces code that is less efficient, or something else?

I meant there are more Deja GNU testcases passes with the path splitting 
changes.


 SPEC 2000 benchmarks were run with i386 target and the following
 performance number is achieved.

 INT benchmarks with path splitting(ratio) Vs INT benchmarks without
 path splitting(ratio) = 3661.225091 vs 3621.520572
That's an impressive improvement.

Anyway, I'll start taking a close look at this momentarily.

Thanks  Regards
Ajit

Jeff


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-08-20 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Thursday, August 20, 2015 3:16 AM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 08/15/2015 11:01 AM, Ajit Kumar Agarwal wrote:


  From cf2b64cc1d6623424d770f2a9ea257eb7e58e887 Mon Sep 17 00:00:00 
 2001
 From: Ajit Kumar Agarwalajit...@xilix.com
 Date: Sat, 15 Aug 2015 18:19:14 +0200
 Subject: [PATCH] [Patch,tree-optimization]: Add new path Splitting pass on
   tree ssa representation.

 Added a new pass on path splitting on tree SSA representation. The 
 path splitting optimization does the CFG transformation of join block 
 of the if-then-else same as the loop latch node is moved and merged 
 with the predecessor blocks after preserving the SSA representation.

 ChangeLog:
 2015-08-15  Ajit Agarwalajit...@xilinx.com

   * gcc/Makefile.in: Add the build of the new file
   tree-ssa-path-split.c
Instead:

  * Makefile.in (OBJS): Add tree-ssa-path-split.o.


   * gcc/opts.c (OPT_ftree_path_split) : Add an entry for
   Path splitting pass with optimization flag greater and
   equal to O2.

  * opts.c (default_options_table): Add entry for path splitting
  optimization at -O2 and above.



   * gcc/passes.def (path_split): add new path splitting pass.
Capitalize add.




   * gcc/tree-ssa-path-split.c: New.
Use New file.

   * gcc/tracer.c (transform_duplicate): New.
Use New function.

   * gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New.
   * gcc/testsuite/gcc.dg/path-split-1.c: New.
These belong in gcc/testsuite/ChangeLog and remove the gcc/testsuite 
prefix.

   * gcc/doc/invoke.texi
   (ftree-path-split): Document.
   (fdump-tree-path_split): Document.
Should just be two lines instead of three.

And more generally, there's no need to prefix ChangeLog entries with gcc/.

Now that the ChangeLog nits are out of the way, let's get to stuff that's 
more interesting.

I will incorporate all the above changes  in the upcoming patches.


 Signed-off-by:Ajit agarwalajit...@xilinx.com
 ---
   gcc/Makefile.in  |   1 +
   gcc/common.opt   |   4 +
   gcc/doc/invoke.texi  |  16 +-
   gcc/opts.c   |   1 +
   gcc/passes.def   |   1 +
   gcc/testsuite/gcc.dg/path-split-1.c  |  65 ++
   gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c |  60 +
   gcc/timevar.def  |   1 +
   gcc/tracer.c |  37 +--
   gcc/tree-pass.h  |   1 +
   gcc/tree-ssa-path-split.c| 330 
 +++
   11 files changed, 503 insertions(+), 14 deletions(-)
   create mode 100644 gcc/testsuite/gcc.dg/path-split-1.c
   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c
   create mode 100644 gcc/tree-ssa-path-split.c

 diff --git a/gcc/common.opt b/gcc/common.opt
 index e80eadf..1d02582 100644
 --- a/gcc/common.opt
 +++ b/gcc/common.opt
 @@ -2378,6 +2378,10 @@ ftree-vrp
   Common Report Var(flag_tree_vrp) Init(0) Optimization
   Perform Value Range Propagation on trees

 +ftree-path-split
 +Common Report Var(flag_tree_path_split) Init(0) Optimization
 +Perform Path Splitting
Maybe Perform Path Splitting for loop backedges or something which is 
a little more descriptive.  The above isn't exactly right, so don't use 
it as-is.



 @@ -9068,6 +9075,13 @@ enabled by default at @option{-O2} and higher.  Null 
 pointer check
   elimination is only done if @option{-fdelete-null-pointer-checks} is
   enabled.

 +@item -ftree-path-split
 +@opindex ftree-path-split
 +Perform Path Splitting  on trees.  The join blocks of IF-THEN-ELSE same
 +as loop latch node is moved to its predecessor and the loop latch node
 +will be forwarding block.  This is enabled by default at @option{-O2}
 +and higher.
Needs some work.  Maybe something along the lines of

When two paths of execution merge immediately before a loop latch node, 
try to duplicate the merge node into the two paths.

I will incorporate all the above changes.

 diff --git a/gcc/passes.def b/gcc/passes.def
 index 6b66f8f..20ddf3d 100644
 --- a/gcc/passes.def
 +++ b/gcc/passes.def
 @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
 NEXT_PASS (pass_ccp);
 /* After CCP we rewrite no longer addressed locals into SSA
form if possible.  */
 +  NEXT_PASS (pass_path_split);
 NEXT_PASS (pass_forwprop);
 NEXT_PASS (pass_sra_early);
I can't recall if we've discussed the location of the pass at all.  I'm 
not objecting to this location, but would like to hear why you chose 
this particular location in the optimization pipeline.

I have placed the path

RE: [PATCH GCC]Improve bound information in loop niter analysis

2015-08-18 Thread Ajit Kumar Agarwal


-Original Message-
From: Bin.Cheng [mailto:amker.ch...@gmail.com] 
Sent: Tuesday, August 18, 2015 1:08 PM
To: Ajit Kumar Agarwal
Cc: Richard Biener; Bin Cheng; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [PATCH GCC]Improve bound information in loop niter analysis

On Mon, Aug 17, 2015 at 6:49 PM, Ajit Kumar Agarwal 
ajit.kumar.agar...@xilinx.com wrote:
 All:

 Does the Logic to calculate the Loop bound information through Value 
 Range Analyis uses the post dominator and Dominator info. The iteration 
 branches instead of Loop exit condition can be calculated through post 
 dominator info.
 If the node in the Loop has two successors and post dominates the two 
 successors then the iteration branch can be The same node.

 For All the nodes L in the Loop B
 If (L1, L2  belongs to successors of (L)  L1,L2 belongs to 
 PosDom(Header of Loop)) {
   I = I union L1
 }

 Thus I will have all set of iteration branches. This will handle 
 more cases of Loop bound information that Will be accurate through the 
 exact iteration count that are known cases along with Value Range Information 
 Where the condition is instead not the Loop exits but other nodes in the Loop.

I don't quite follow your words here.  Could you please give a simple example 
about it?  Especially I don't know how post-dom helps the loop bound 
analysis.  Seems your pseudo code is collecting some comparison basic block 
of loop?

The Algorithm I have given above is based on Post Dominator Info. This helps to 
calculate the iteration branches. The iteration branches are the
Branches that determine the loop exit condition. Based on the condition it 
either branches to the header of the Loop, Or it may branch to the 
Block dominated by the header or exit from the loop. The above Algorithm finds 
out such iteration branches and thus decides on the Loop bound
Or iteration count. If such iteration branches are not at the back edge node 
and it may be a node inside the loop based on some conditions.
Finding out such iteration branches can be done through the post dominator info 
using the above algorithm.  Based on the iteration branches the 
conditions can be analyzed and that helps in finding out the iteration bound 
for Known cases. Know cases are the cases where the loop bound can be 
determined at compile time.

 One Example would be Multi-Exits Loops where the Loop exit condition can be at 
the back edge or it may be a block inside the Loop based on the 
IF conditions and breaks out based on the conditions. Thus having multiple 
exits. Such iteration branches can be found using The above Algorithm.

Thanks  Regards
Ajit 

Thanks,
bin

 Thanks  Regards
 Ajit


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org 
 [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Bin.Cheng
 Sent: Monday, August 17, 2015 3:32 PM
 To: Richard Biener
 Cc: Bin Cheng; GCC Patches
 Subject: Re: [PATCH GCC]Improve bound information in loop niter 
 analysis

 Thanks for all your reviews.

 On Fri, Aug 14, 2015 at 4:17 PM, Richard Biener richard.guent...@gmail.com 
 wrote:
 On Tue, Jul 28, 2015 at 11:36 AM, Bin Cheng bin.ch...@arm.com wrote:
 Hi,
 Loop niter computes inaccurate bound information for different loops.
 This patch is to improve it by using loop initial condition in 
 determine_value_range.  Generally, loop niter is computed by 
 subtracting start var from end var in loop exit condition.  
 Moreover, loop bound is computed using value range information of both 
 start and end variables.
 Basic idea of this patch is to check if loop initial condition 
 implies more range information for both start/end variables.  If 
 yes, we refine range information and use that to compute loop bound.
 With this improvement, more accurate loop bound information is 
 computed for test cases added by this patch.

 +  c0 = fold_convert (type, c0);
 +  c1 = fold_convert (type, c1);
 +
 +  if (operand_equal_p (var, c0, 0))

 I believe if c0 is not already of type type operand-equal_p will never 
 succeed.
 It's quite specific case targeting comparison between var and it's range 
 bounds.  Given c0 is in form of var + offc0, then the comparison var + 
 offc0 != range bounds doesn't have any useful information.  Maybe useless 
 type conversion can be handled here though, it might be even corner case.


 (side-note: we should get rid of the GMP use, that's expensive and 
 now we have wide-int available which should do the trick as well)

 + /* Case of comparing with the bounds of the type.  */
 + if (TYPE_MIN_VALUE (type)
 +  operand_equal_p (c1, TYPE_MIN_VALUE (type), 0))
 +   cmp = GT_EXPR;
 + if (TYPE_MAX_VALUE (type)
 +  operand_equal_p (c1, TYPE_MAX_VALUE (type), 0))
 +   cmp = LT_EXPR;

 don't use TYPE_MIN/MAX_VALUE.  Instead use the types precision and 
 all wide_int operations (see match.pd wi::max_value use).
 Done

RE: [PATCH GCC]Improve bound information in loop niter analysis

2015-08-17 Thread Ajit Kumar Agarwal
All:

Does the Logic to calculate the Loop bound information through Value Range 
Analyis uses the post dominator and
Dominator info. The iteration branches instead of Loop exit condition can be 
calculated through post dominator info.
If the node in the Loop has two successors and post dominates the two 
successors then the iteration branch can be
The same node. 

For All the nodes L in the Loop B
If (L1, L2  belongs to successors of (L)  L1,L2 belongs to PosDom(Header of 
Loop))
{
  I = I union L1
}

Thus I will have all set of iteration branches. This will handle more cases 
of Loop bound information that 
Will be accurate through the exact iteration count that are known cases along 
with Value Range Information
Where the condition is instead not the Loop exits but other nodes in the Loop.

Thanks  Regards
Ajit
 

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Bin.Cheng
Sent: Monday, August 17, 2015 3:32 PM
To: Richard Biener
Cc: Bin Cheng; GCC Patches
Subject: Re: [PATCH GCC]Improve bound information in loop niter analysis

Thanks for all your reviews.

On Fri, Aug 14, 2015 at 4:17 PM, Richard Biener richard.guent...@gmail.com 
wrote:
 On Tue, Jul 28, 2015 at 11:36 AM, Bin Cheng bin.ch...@arm.com wrote:
 Hi,
 Loop niter computes inaccurate bound information for different loops.  
 This patch is to improve it by using loop initial condition in 
 determine_value_range.  Generally, loop niter is computed by 
 subtracting start var from end var in loop exit condition.  Moreover, 
 loop bound is computed using value range information of both start and end 
 variables.
 Basic idea of this patch is to check if loop initial condition 
 implies more range information for both start/end variables.  If yes, 
 we refine range information and use that to compute loop bound.
 With this improvement, more accurate loop bound information is 
 computed for test cases added by this patch.

 +  c0 = fold_convert (type, c0);
 +  c1 = fold_convert (type, c1);
 +
 +  if (operand_equal_p (var, c0, 0))

 I believe if c0 is not already of type type operand-equal_p will never 
 succeed.
It's quite specific case targeting comparison between var and it's range 
bounds.  Given c0 is in form of var + offc0, then the comparison var + offc0 
!= range bounds doesn't have any useful information.  Maybe useless type 
conversion can be handled here though, it might be even corner case.


 (side-note: we should get rid of the GMP use, that's expensive and now 
 we have wide-int available which should do the trick as well)

 + /* Case of comparing with the bounds of the type.  */
 + if (TYPE_MIN_VALUE (type)
 +  operand_equal_p (c1, TYPE_MIN_VALUE (type), 0))
 +   cmp = GT_EXPR;
 + if (TYPE_MAX_VALUE (type)
 +  operand_equal_p (c1, TYPE_MAX_VALUE (type), 0))
 +   cmp = LT_EXPR;

 don't use TYPE_MIN/MAX_VALUE.  Instead use the types precision and all 
 wide_int operations (see match.pd wi::max_value use).
Done.


 +  else if (!operand_equal_p (var, varc0, 0))
 +goto end_2;

 ick - goto.  We need sth like a auto_mpz class with a destructor.
Label end_2 removed.


 struct auto_mpz
 {
   auto_mpz () { mpz_init (m_val); }
   ~auto_mpz () { mpz_clear (m_val); }
   mpz operator() { return m_val; }
   mpz m_val;
 };

 Is it OK?

 I see the code follows existing practice in niter analysis even though 
 my overall plan was to transition its copying of value-range related 
 optimizations to use VRP infrastructure.
Yes, I think it's easy to push it to VRP infrastructure.  Actually from the 
name of the function, it's more vrp related.  For now, the function is called 
only by bound_difference, not so many as vrp queries.  We need cache facility 
in vrp otherwise it would be expensive.


 I'm still ok with improving the existing code on the basis that I 
 won't get to that for GCC 6.

 So - ok with the TYPE_MIN/MAX_VALUE change suggested above.

 Refactoring with auto_mpz welcome.
That will be an independent patch, so I skipped it in this one.

New version attached.  Bootstrap and test on x86_64.

Thanks,
bin

 Thanks,
 RIchard.

 Thanks,
 bin

 2015-07-28  Bin Cheng  bin.ch...@arm.com

 * tree-ssa-loop-niter.c (refine_value_range_using_guard): New.
 (determine_value_range): Call refine_value_range_using_guard for
 each loop initial condition to improve value range.

 gcc/testsuite/ChangeLog
 2015-07-28  Bin Cheng  bin.ch...@arm.com

 * gcc.dg/tree-ssa/loop-bound-1.c: New test.
 * gcc.dg/tree-ssa/loop-bound-3.c: New test.
 * gcc.dg/tree-ssa/loop-bound-5.c: New test.


RE: [PATCH GCC]Improve bound information in loop niter analysis

2015-08-17 Thread Ajit Kumar Agarwal
Oops, there is a typo error instead of L it was typed as L1.
Here is the corrected one.

For All the nodes L in the Loop B
If (L1, L2  belongs to successors of (L)  L1,L2 belongs to PosDom(Header of 
Loop)) {
  I = I union L;
}

Thanks  Regards
Ajit
-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Ajit Kumar Agarwal
Sent: Monday, August 17, 2015 4:19 PM
To: Bin.Cheng; Richard Biener
Cc: Bin Cheng; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: RE: [PATCH GCC]Improve bound information in loop niter analysis

All:

Does the Logic to calculate the Loop bound information through Value Range 
Analyis uses the post dominator and Dominator info. The iteration branches 
instead of Loop exit condition can be calculated through post dominator info.
If the node in the Loop has two successors and post dominates the two 
successors then the iteration branch can be The same node. 

For All the nodes L in the Loop B
If (L1, L2  belongs to successors of (L)  L1,L2 belongs to PosDom(Header of 
Loop)) {
  I = I union L1
}

Thus I will have all set of iteration branches. This will handle more cases 
of Loop bound information that Will be accurate through the exact iteration 
count that are known cases along with Value Range Information Where the 
condition is instead not the Loop exits but other nodes in the Loop.

Thanks  Regards
Ajit
 

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Bin.Cheng
Sent: Monday, August 17, 2015 3:32 PM
To: Richard Biener
Cc: Bin Cheng; GCC Patches
Subject: Re: [PATCH GCC]Improve bound information in loop niter analysis

Thanks for all your reviews.

On Fri, Aug 14, 2015 at 4:17 PM, Richard Biener richard.guent...@gmail.com 
wrote:
 On Tue, Jul 28, 2015 at 11:36 AM, Bin Cheng bin.ch...@arm.com wrote:
 Hi,
 Loop niter computes inaccurate bound information for different loops.  
 This patch is to improve it by using loop initial condition in 
 determine_value_range.  Generally, loop niter is computed by 
 subtracting start var from end var in loop exit condition.  Moreover, 
 loop bound is computed using value range information of both start and end 
 variables.
 Basic idea of this patch is to check if loop initial condition 
 implies more range information for both start/end variables.  If yes, 
 we refine range information and use that to compute loop bound.
 With this improvement, more accurate loop bound information is 
 computed for test cases added by this patch.

 +  c0 = fold_convert (type, c0);
 +  c1 = fold_convert (type, c1);
 +
 +  if (operand_equal_p (var, c0, 0))

 I believe if c0 is not already of type type operand-equal_p will never 
 succeed.
It's quite specific case targeting comparison between var and it's range 
bounds.  Given c0 is in form of var + offc0, then the comparison var + offc0 
!= range bounds doesn't have any useful information.  Maybe useless type 
conversion can be handled here though, it might be even corner case.


 (side-note: we should get rid of the GMP use, that's expensive and now 
 we have wide-int available which should do the trick as well)

 + /* Case of comparing with the bounds of the type.  */
 + if (TYPE_MIN_VALUE (type)
 +  operand_equal_p (c1, TYPE_MIN_VALUE (type), 0))
 +   cmp = GT_EXPR;
 + if (TYPE_MAX_VALUE (type)
 +  operand_equal_p (c1, TYPE_MAX_VALUE (type), 0))
 +   cmp = LT_EXPR;

 don't use TYPE_MIN/MAX_VALUE.  Instead use the types precision and all 
 wide_int operations (see match.pd wi::max_value use).
Done.


 +  else if (!operand_equal_p (var, varc0, 0))
 +goto end_2;

 ick - goto.  We need sth like a auto_mpz class with a destructor.
Label end_2 removed.


 struct auto_mpz
 {
   auto_mpz () { mpz_init (m_val); }
   ~auto_mpz () { mpz_clear (m_val); }
   mpz operator() { return m_val; }
   mpz m_val;
 };

 Is it OK?

 I see the code follows existing practice in niter analysis even though 
 my overall plan was to transition its copying of value-range related 
 optimizations to use VRP infrastructure.
Yes, I think it's easy to push it to VRP infrastructure.  Actually from the 
name of the function, it's more vrp related.  For now, the function is called 
only by bound_difference, not so many as vrp queries.  We need cache facility 
in vrp otherwise it would be expensive.


 I'm still ok with improving the existing code on the basis that I 
 won't get to that for GCC 6.

 So - ok with the TYPE_MIN/MAX_VALUE change suggested above.

 Refactoring with auto_mpz welcome.
That will be an independent patch, so I skipped it in this one.

New version attached.  Bootstrap and test on x86_64.

Thanks,
bin

 Thanks,
 RIchard.

 Thanks,
 bin

 2015-07-28  Bin Cheng  bin.ch...@arm.com

 * tree-ssa-loop-niter.c (refine_value_range_using_guard): New.
 (determine_value_range

RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-08-15 Thread Ajit Kumar Agarwal
All:

Please find the updated patch with suggestion and feedback incorporated.

Thanks Jeff and Richard for the review comments.

Following changes were done based on the feedback on RFC comments. and the 
review for the previous patch.

1. Both tracer and path splitting pass are separate passes so  that two 
instances of the pass will run in the end, one doing path splitting
 and one doing  tracing, at different times in the optimization pipeline.
2. Transform code is shared for tracer and path splitting pass. The common code 
in extracted in a given function transform_duplicate
And place the function in tracer.c and the path splitting pass uses the 
transform code.
3. Analysis for the basic block population and traversing the basic block using 
the Fibonacci heap is commonly used. This cannot be
Factored out into new function as the tracer pass does more analysis based on 
the profile and the different heuristics is used in tracer
And path splitting pass.
4. The include headers is minimal and presence of what is required for the path 
splitting pass.
5. The earlier patch does the SSA updating  with replace function to preserve 
the SSA representation required to move the loop latch node same as join
Block to its predecessors and the loop latch node is just forward block. Such 
replace function are not required as suggested by the Jeff. Such replace
Function goes away with this patch and the transformed code is factored into a 
given function which is shared between tracer and path splitting pass.   

Bootstrapping with i386 and Microblaze target works fine. No regression is seen 
in Deja GNU tests for Microblaze. There
are lesser failures. Mibench/EEMBC benchmarks were run for Microblaze target 
and the gain of
9.3% is seen in rgbcmy_lite the EEMBC benchmarks.

SPEC 2000 benchmarks were run with i386 target and the following performance 
number is achieved.

INT benchmarks with path splitting(ratio) Vs INT benchmarks without path 
splitting(ratio) = 3661.225091 vs 3621.520572
FP benchmarks with path splitting(ratio) Vs FP benchmarks without path 
splitting(ratio )  =  4339.986209 vs 4339.775527

Maximum gains achieved with 252.eon INT benchmarks = 9.03%.

ChangeLog:
2015-08-15  Ajit Agarwal  ajit...@xilinx.com

* gcc/Makefile.in: Add the build of the new file
tree-ssa-path-split.c
* gcc/common.opt (ftree-path-split): Add the new flag.
* gcc/opts.c (OPT_ftree_path_split) : Add an entry for
Path splitting pass with optimization flag greater and
equal to O2.
* gcc/passes.def (path_split): add new path splitting pass.
* gcc/timevar.def (TV_TREE_PATH_SPLIT): New.
* gcc/tree-pass.h (make_pass_path_split): New declaration.
   * gcc/tree-ssa-path-split.c: New.
* gcc/tracer.c (transform_duplicate): New.
* gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New.
* gcc/testsuite/gcc.dg/path-split-1.c: New.
* gcc/doc/invoke.texi
(ftree-path-split): Document.
(fdump-tree-path_split): Document.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com.

Thanks  Regards
Ajit

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Ajit Kumar Agarwal
Sent: Wednesday, July 29, 2015 10:13 AM
To: Richard Biener; Jeff Law
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation



-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com]
Sent: Thursday, July 16, 2015 4:30 PM
To: Ajit Kumar Agarwal
Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On Tue, Jul 7, 2015 at 3:22 PM, Ajit Kumar Agarwal 
ajit.kumar.agar...@xilinx.com wrote:


 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, July 07, 2015 2:21 PM
 To: Ajit Kumar Agarwal
 Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
 Vidhumouli Hunsigida; Nagaraju Mekala
 Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on 
 tree ssa representation

 On Sat, Jul 4, 2015 at 2:39 PM, Ajit Kumar Agarwal 
 ajit.kumar.agar...@xilinx.com wrote:


 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, June 30, 2015 4:42 PM
 To: Ajit Kumar Agarwal
 Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
 Vidhumouli Hunsigida; Nagaraju Mekala
 Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass 
 on tree ssa representation

 On Tue, Jun 30, 2015 at 10:16 AM, Ajit Kumar Agarwal 
 ajit.kumar.agar...@xilinx.com wrote:
 All:

 The below patch added a new path Splitting optimization pass on SSA 
 representation. The Path Splitting optimization Pass moves

RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-07-28 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Thursday, July 16, 2015 4:30 PM
To: Ajit Kumar Agarwal
Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On Tue, Jul 7, 2015 at 3:22 PM, Ajit Kumar Agarwal 
ajit.kumar.agar...@xilinx.com wrote:


 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, July 07, 2015 2:21 PM
 To: Ajit Kumar Agarwal
 Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
 Vidhumouli Hunsigida; Nagaraju Mekala
 Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on 
 tree ssa representation

 On Sat, Jul 4, 2015 at 2:39 PM, Ajit Kumar Agarwal 
 ajit.kumar.agar...@xilinx.com wrote:


 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, June 30, 2015 4:42 PM
 To: Ajit Kumar Agarwal
 Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
 Vidhumouli Hunsigida; Nagaraju Mekala
 Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass 
 on tree ssa representation

 On Tue, Jun 30, 2015 at 10:16 AM, Ajit Kumar Agarwal 
 ajit.kumar.agar...@xilinx.com wrote:
 All:

 The below patch added a new path Splitting optimization pass on SSA 
 representation. The Path Splitting optimization Pass moves the join 
 block of if-then-else same as loop latch to its predecessors and get merged 
 with the predecessors Preserving the SSA representation.

 The patch is tested for Microblaze and i386 target. The 
 EEMBC/Mibench benchmarks is run with the Microblaze target And the 
 performance gain of 9.15% and rgbcmy01_lite(EEMBC benchmarks). The Deja GNU 
 tests is run for Mircroblaze Target and no regression is seen for 
 Microblaze target and the new testcase attached are passed.

 For i386 bootstrapping goes through fine and the Spec cpu2000 
 benchmarks is run with this patch. Following observation were seen with 
 spec cpu2000 benchmarks.

 Ratio of path splitting change vs Ratio of not having path splitting change 
 is 3653.353 vs 3652.14 for INT benchmarks.
 Ratio of path splitting change vs Ratio of not having path splitting change 
 is  4353.812 vs 4345.351 for FP benchmarks.

 Based on comments from RFC patch following changes were done.

 1. Added a new pass for path splitting changes.
 2. Placed the new path  Splitting Optimization pass before the copy 
 propagation pass.
 3. The join block same as the Loop latch is wired into its 
 predecessors so that the CFG Cleanup pass will merge the blocks Wired 
 together.
 4. Copy propagation routines added for path splitting changes is not 
 needed as suggested by Jeff. They are removed in the patch as The copy 
 propagation in the copied join blocks will be done by the existing copy 
 propagation pass and the update ssa pass.
 5. Only the propagation of phi results of the join block with the 
 phi argument is done which will not be done by the existing update_ssa Or 
 copy propagation pass on tree ssa representation.
 6. Added 2 tests.
 a) compilation check  tests.
b) execution tests.
 7. Refactoring of the code for the feasibility check and finding the join 
 block same as loop latch node.

 [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
 representation.

 Added a new pass on path splitting on tree SSA representation. The path
 splitting optimization does the CFG transformation of join block of the
 if-then-else same as the loop latch node is moved and merged with the
 predecessor blocks after preserving the SSA representation.

 ChangeLog:
 2015-06-30  Ajit Agarwal  ajit...@xilinx.com

 * gcc/Makefile.in: Add the build of the new file
 tree-ssa-path-split.c
 * gcc/common.opt: Add the new flag ftree-path-split.
 * gcc/opts.c: Add an entry for Path splitting pass
 with optimization flag greater and equal to O2.
 * gcc/passes.def: Enable and add new pass path splitting.
 * gcc/timevar.def: Add the new entry for TV_TREE_PATH_SPLIT.
 * gcc/tree-pass.h: Extern Declaration of make_pass_path_split.
 * gcc/tree-ssa-path-split.c: New file for path splitting pass.
 * gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New testcase.
 * gcc/testsuite/gcc.dg/path-split-1.c: New testcase.

I'm not 100% sure I understand the transform but what I see from the 
testcases it tail-duplicates from a conditional up to a loop latch block 
(not sure if it includes it and thus ends up creating a loop nest or not).

An observation I have is that the pass should at least share the transform 
stage to some extent with the existing tracer pass (tracer.c) which 
essentially does the same but not restricted to loops in any way.

 The following piece of code from tracer.c can be shared

RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-07-07 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Tuesday, July 07, 2015 2:21 PM
To: Ajit Kumar Agarwal
Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On Sat, Jul 4, 2015 at 2:39 PM, Ajit Kumar Agarwal 
ajit.kumar.agar...@xilinx.com wrote:


 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, June 30, 2015 4:42 PM
 To: Ajit Kumar Agarwal
 Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
 Vidhumouli Hunsigida; Nagaraju Mekala
 Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on 
 tree ssa representation

 On Tue, Jun 30, 2015 at 10:16 AM, Ajit Kumar Agarwal 
 ajit.kumar.agar...@xilinx.com wrote:
 All:

 The below patch added a new path Splitting optimization pass on SSA 
 representation. The Path Splitting optimization Pass moves the join 
 block of if-then-else same as loop latch to its predecessors and get merged 
 with the predecessors Preserving the SSA representation.

 The patch is tested for Microblaze and i386 target. The EEMBC/Mibench 
 benchmarks is run with the Microblaze target And the performance gain 
 of 9.15% and rgbcmy01_lite(EEMBC benchmarks). The Deja GNU tests is run for 
 Mircroblaze Target and no regression is seen for Microblaze target and the 
 new testcase attached are passed.

 For i386 bootstrapping goes through fine and the Spec cpu2000 
 benchmarks is run with this patch. Following observation were seen with spec 
 cpu2000 benchmarks.

 Ratio of path splitting change vs Ratio of not having path splitting change 
 is 3653.353 vs 3652.14 for INT benchmarks.
 Ratio of path splitting change vs Ratio of not having path splitting change 
 is  4353.812 vs 4345.351 for FP benchmarks.

 Based on comments from RFC patch following changes were done.

 1. Added a new pass for path splitting changes.
 2. Placed the new path  Splitting Optimization pass before the copy 
 propagation pass.
 3. The join block same as the Loop latch is wired into its 
 predecessors so that the CFG Cleanup pass will merge the blocks Wired 
 together.
 4. Copy propagation routines added for path splitting changes is not 
 needed as suggested by Jeff. They are removed in the patch as The copy 
 propagation in the copied join blocks will be done by the existing copy 
 propagation pass and the update ssa pass.
 5. Only the propagation of phi results of the join block with the phi 
 argument is done which will not be done by the existing update_ssa Or copy 
 propagation pass on tree ssa representation.
 6. Added 2 tests.
 a) compilation check  tests.
b) execution tests.
 7. Refactoring of the code for the feasibility check and finding the join 
 block same as loop latch node.

 [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
 representation.

 Added a new pass on path splitting on tree SSA representation. The path
 splitting optimization does the CFG transformation of join block of the
 if-then-else same as the loop latch node is moved and merged with the
 predecessor blocks after preserving the SSA representation.

 ChangeLog:
 2015-06-30  Ajit Agarwal  ajit...@xilinx.com

 * gcc/Makefile.in: Add the build of the new file
 tree-ssa-path-split.c
 * gcc/common.opt: Add the new flag ftree-path-split.
 * gcc/opts.c: Add an entry for Path splitting pass
 with optimization flag greater and equal to O2.
 * gcc/passes.def: Enable and add new pass path splitting.
 * gcc/timevar.def: Add the new entry for TV_TREE_PATH_SPLIT.
 * gcc/tree-pass.h: Extern Declaration of make_pass_path_split.
 * gcc/tree-ssa-path-split.c: New file for path splitting pass.
 * gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New testcase.
 * gcc/testsuite/gcc.dg/path-split-1.c: New testcase.

I'm not 100% sure I understand the transform but what I see from the 
testcases it tail-duplicates from a conditional up to a loop latch block 
(not sure if it includes it and thus ends up creating a loop nest or not).

An observation I have is that the pass should at least share the transform 
stage to some extent with the existing tracer pass (tracer.c) which 
essentially does the same but not restricted to loops in any way.

 The following piece of code from tracer.c can be shared with the existing 
 path splitting pass.

 {
  e = find_edge (bb, bb2);

   copy = duplicate_block (bb2, e, bb);
   flush_pending_stmts (e);

   add_phi_args_after_copy (copy, 1, NULL); }

 Sharing the above code of the transform stage of tracer.c with the path 
 splitting pass has the following limitation.

 1. The duplicated loop latch node is wired to its predecessors and the 
 existing phi node in the loop

[Patch,microblaze]: Optimized Instruction prefetch with the generation of wic

2015-07-07 Thread Ajit Kumar Agarwal
All:

 Please find the patch for optimized usage of instruction prefetch with the 
generation of microblaze instruction wic.
No regressions is seen in Deja GNU tests for microblaze.

[Patch,microblaze]: Optimized Instruction prefetch with the generation of wic
instruction.

The changes are made in the patch for optimized Instruction prefetch with
the generation of wic microblaze instructions. The Wic microblaze 
instruction
is the instruction prefetch instruction that optimizes the Instruction 
prefetch.
The wic instruction is generated at the call site fall through path and is
enabled with a flag mxl-prefetch. The purpose of adding the flags is that 
wic
instruction selected for the particular FPGA design and is not enabled by
default.

ChangeLog:
2015-07-07  Ajit Agarwal  ajit...@xilinx.com

* config/microblaze/microblaze.c
(get_branch_target): New.
(insert_wic_for_ilb_runout): New.
(insert_wic): New.
(microblaze_machine_dependent_reorg): New.
(TARGET_MACHINE_DEPENDENT_REORG): Define macro.
* config/microblaze/microblaze.md
(UNSPEC_IPREFETCH): Define.
(iprefetch): New pattern
* config/microblaze/microblaze.opt
(mxl-prefetch): New flag.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com.

---
 gcc/config/microblaze/microblaze.c   |  139 ++
 gcc/config/microblaze/microblaze.md  |   14 
 gcc/config/microblaze/microblaze.opt |4 +
 3 files changed, 157 insertions(+), 0 deletions(-)

diff --git a/gcc/config/microblaze/microblaze.c 
b/gcc/config/microblaze/microblaze.c
index 566b78c..eea2f67 100644
--- a/gcc/config/microblaze/microblaze.c
+++ b/gcc/config/microblaze/microblaze.c
@@ -71,6 +71,7 @@
 #include cgraph.h
 #include builtins.h
 #include rtl-iter.h
+#include cfgloop.h
 
 #define MICROBLAZE_VERSION_COMPARE(VA,VB) strcasecmp (VA, VB)
 
@@ -3593,6 +3594,141 @@ microblaze_legitimate_constant_p (machine_mode mode 
ATTRIBUTE_UNUSED, rtx x)
   return true;
 }
 
+static rtx
+get_branch_target (rtx branch)
+{
+  if (CALL_P (branch))
+{
+  rtx call;
+
+  call = XVECEXP (PATTERN (branch), 0, 0);
+  if (GET_CODE (call) == SET)
+call = SET_SRC (call);
+  if (GET_CODE (call) != CALL)
+abort ();
+  return XEXP (XEXP (call, 0), 0);
+}
+}
+
+/* Heuristics to identify where to insert at the
+   fall through path of the caller function. If there
+   is a call after the caller branch delay slot then
+   we dont generate the instruction prefetch instruction.   */
+
+static void
+insert_wic_for_ilb_runout (rtx_insn *first)
+{
+  rtx_insn *insn, *before_4 = 0, *before_16 = 0;
+  int addr = 0, length, first_addr = -1;
+  int wic_addr0 = 128 * 4, wic_addr1 = 128 * 4;
+  int insert_lnop_after = 0;
+
+  for (insn = first; insn; insn = NEXT_INSN (insn))
+if (INSN_P (insn))
+  {
+if (first_addr == -1)
+  first_addr = INSN_ADDRESSES (INSN_UID (insn));
+
+addr = INSN_ADDRESSES (INSN_UID (insn)) - first_addr;
+length = get_attr_length (insn);
+
+if (before_4 == 0  addr + length = 4 * 4)
+  before_4 = insn;
+
+if (JUMP_P(insn))
+  return;
+if (before_16 == 0  addr + length = 14 * 4)
+  before_16 = insn;
+if (CALL_P (insn) || tablejump_p (insn, 0, 0))
+  return;
+if (addr + length = 32 * 4)
+  {
+gcc_assert (before_4  before_16);
+if (wic_addr0  4 * 4)
+  {
+insn =
+  emit_insn_before (gen_iprefetch
+(gen_int_mode (addr, SImode)),
+before_4);
+recog_memoized (insn);
+INSN_LOCATION (insn) = INSN_LOCATION (before_4);
+INSN_ADDRESSES_NEW (insn, INSN_ADDRESSES (INSN_UID 
(before_4)));
+return;
+  }
+   }
+   }
+}
+
+/* Insert instruction prefetch instruction at the fall
+   through path of the function call.  */
+
+static void
+insert_wic (void)
+{
+  rtx_insn *insn;
+  int i, j;
+  basic_block bb,prev = 0;
+  rtx branch_target = 0;
+
+  shorten_branches (get_insns ());
+
+  for (i = 0; i  n_basic_blocks_for_fn (cfun) - 1; i++)
+ {
+   edge e;
+   edge_iterator ei;
+   bool simple_loop = false;
+
+   bb = BASIC_BLOCK_FOR_FN (cfun, i);
+
+   if (bb == NULL)
+ continue;
+
+   if ((prev != 0)  (prev != bb))
+ continue;
+   else
+ prev = 0;
+
+   FOR_EACH_EDGE (e, ei, bb-preds)
+ if (e-src == bb)
+   {
+ simple_loop = true;
+ prev= e-dest;
+ break;
+   }
+
+   for (insn = BB_END (bb); insn; insn = PREV_INSN (insn))
+  {
+if (INSN_P (insn)  !simple_loop
+CALL_P(insn))
+  {
+if ((branch_target = 

[Patch,microblaze]: Optimized usage of reserved stack space for function arguments.

2015-07-06 Thread Ajit Kumar Agarwal
All:

The below patch optimized the usage of the reserved stack space for function 
arguments. The stack space is reserved if 
the function is a libcall, variable number of arguments, aggregate data types, 
and some parameter are reserved in registers 
and some parameters is reserved in the stack. Along with the above conditions 
the stack space is not reserved if no arguments 
are passed. No regressions is seen in Deja GNU tests for microblaze.

[Patch,microblaze]: Optimized usage of reserved stack space for function 
arguments.

The changes are made in the patch for optimized usage of
reserved stack space for arguments. The stack space is
reserved if the function is a libcall, variable number of
arguments, aggregate data types, and some parameter are
reserved in registers and some parameters is reserved in the
stack. Along with the above conditions the stack space is not
reserved if no arguments are passed.

ChangeLog:
2015-07-06  Ajit Agarwal  ajit...@xilinx.com

* config/microblaze/microblaze.c
(microblaze_parm_needs_stack): New.
(microblaze_function_parms_need_stack): New.
(microblaze_reg_parm_stack_space): New.
* config/microblaze/microblaze.h
(REG_PARM_STACK_SPACE): Modify the macro.
* config/microblaze/microblaze-protos.h
(microblaze_reg_parm_stack_space): Declare.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

---
 gcc/config/microblaze/microblaze-protos.h |1 +
 gcc/config/microblaze/microblaze.c|  140 +
 gcc/config/microblaze/microblaze.h|2 +-
 3 files changed, 142 insertions(+), 1 deletions(-)

diff --git a/gcc/config/microblaze/microblaze-protos.h 
b/gcc/config/microblaze/microblaze-protos.h
index 57879b1..d27d3e1 100644
--- a/gcc/config/microblaze/microblaze-protos.h
+++ b/gcc/config/microblaze/microblaze-protos.h
@@ -56,6 +56,7 @@ extern bool microblaze_tls_referenced_p (rtx);
 extern int symbol_mentioned_p (rtx);
 extern int label_mentioned_p (rtx);
 extern bool microblaze_cannot_force_const_mem (machine_mode, rtx);
+extern int  microblaze_reg_parm_stack_space(tree fun);
 #endif  /* RTX_CODE */
 
 /* Declare functions in microblaze-c.c.  */
diff --git a/gcc/config/microblaze/microblaze.c 
b/gcc/config/microblaze/microblaze.c
index 566b78c..0eae4cd 100644
--- a/gcc/config/microblaze/microblaze.c
+++ b/gcc/config/microblaze/microblaze.c
@@ -3592,7 +3592,147 @@ microblaze_legitimate_constant_p (machine_mode mode 
ATTRIBUTE_UNUSED, rtx x)
 
   return true;
 }
+/* Heuristics and criteria for having param needs stack.  */
 
+static bool
+microblaze_parm_needs_stack (cumulative_args_t args_so_far, tree type)
+{
+  enum machine_mode mode;
+  int unsignedp;
+  rtx entry_parm;
+
+  /* Catch errors.  */
+  if (type == NULL || type == error_mark_node)
+return true;
+
+  /* Handle types with no storage requirement.  */
+  if (TYPE_MODE (type) == VOIDmode)
+return false;
+
+   /* Handle complex types.  */
+  if (TREE_CODE (type) == COMPLEX_TYPE)
+return (microblaze_parm_needs_stack (args_so_far, TREE_TYPE (type))
+ || microblaze_parm_needs_stack (args_so_far, TREE_TYPE (type)));
+
+  /* Handle transparent aggregates.  */
+  if ((TREE_CODE (type) == UNION_TYPE || TREE_CODE (type) == RECORD_TYPE)
+   TYPE_TRANSPARENT_AGGR (type))
+type = TREE_TYPE (first_field (type));
+
+  /* See if this arg was passed by invisible reference.  */
+  if (pass_by_reference (get_cumulative_args (args_so_far),
+ TYPE_MODE (type), type, true))
+type = build_pointer_type (type);
+
+  /* Find mode as it is passed by the ABI.  */
+  unsignedp = TYPE_UNSIGNED (type);
+  mode = promote_mode (type, TYPE_MODE (type), unsignedp);
+
+  /* If there is no incoming register, we need a stack.  */
+  entry_parm = microblaze_function_arg (args_so_far, mode, type, true);
+
+  if (entry_parm == NULL)
+return true;
+
+  /* Likewise if we need to pass both in registers and on the stack.  */
+  if (GET_CODE (entry_parm) == PARALLEL
+   XEXP (XVECEXP (entry_parm, 0, 0), 0) == NULL_RTX)
+return true;
+
+  /* Also true if we're partially in registers and partially not.  */
+  if (function_arg_partial_bytes (args_so_far, mode, type, true) != 0)
+return true;
+
+  /* Update info on where next arg arrives in registers.  */
+  microblaze_function_arg_advance (args_so_far, mode, type, true);
+
+  return false;
+}
+
+/* Function need stack for param if
+   1. The function is a libcall.
+   2. Variable number of arguments.
+   3. If the param is aggregate data types.
+   4. If partially some param in registers and some in the stack.  */
+
+static bool
+microblaze_function_parms_need_stack (tree fun, bool incoming)
+{
+  tree fntype, result;
+  CUMULATIVE_ARGS args_so_far_v;
+  cumulative_args_t args_so_far;
+  int num_of_args = 0;
+
+  /* Must be a libcall, all of which only use reg parms.  */
+  if (!fun)
+ 

RE: [Patch,microblaze]: Optimized usage of reserved stack space for function arguments.

2015-07-06 Thread Ajit Kumar Agarwal


-Original Message-
From: Oleg Endo [mailto:oleg.e...@t-online.de] 
Sent: Monday, July 06, 2015 7:07 PM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,microblaze]: Optimized usage of reserved stack space for 
function arguments.

Hi,

Just some general comments...

Thanks.

On 06 Jul 2015, at 22:05, Ajit Kumar Agarwal ajit.kumar.agar...@xilinx.com 
wrote:

 +static bool
 +microblaze_parm_needs_stack (cumulative_args_t args_so_far, tree 
 +type) {
 +  enum machine_mode mode;
 
 'enum' is not required in C++, please omit it.
 We've been trying to remove unnecessary 'struct' and 'enum' after the
 switch to C++.  Although there are still some of them around, please
 don't add new ones.
 Sure.

 +  int unsignedp;
 +  rtx entry_parm;
 
 Please declare variables at their first use.
 (there are other such cases in your patch)

I have declared  the above variables  in the first scope of their first 
use. Sorry, it's not clear to me.
You meant to say to declare in the following way.

   int unsignedp = TYPE_UNSIGNED (type);
   rtx entry_parm = microblaze_function_arg (args_so_far, mode, type, true);
   
Thanks  Regards
Ajit

Cheers,
Oleg


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-07-04 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Tuesday, June 30, 2015 4:42 PM
To: Ajit Kumar Agarwal
Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On Tue, Jun 30, 2015 at 10:16 AM, Ajit Kumar Agarwal 
ajit.kumar.agar...@xilinx.com wrote:
 All:

 The below patch added a new path Splitting optimization pass on SSA 
 representation. The Path Splitting optimization Pass moves the join 
 block of if-then-else same as loop latch to its predecessors and get merged 
 with the predecessors Preserving the SSA representation.

 The patch is tested for Microblaze and i386 target. The EEMBC/Mibench 
 benchmarks is run with the Microblaze target And the performance gain 
 of 9.15% and rgbcmy01_lite(EEMBC benchmarks). The Deja GNU tests is run for 
 Mircroblaze Target and no regression is seen for Microblaze target and the 
 new testcase attached are passed.

 For i386 bootstrapping goes through fine and the Spec cpu2000 
 benchmarks is run with this patch. Following observation were seen with spec 
 cpu2000 benchmarks.

 Ratio of path splitting change vs Ratio of not having path splitting change 
 is 3653.353 vs 3652.14 for INT benchmarks.
 Ratio of path splitting change vs Ratio of not having path splitting change 
 is  4353.812 vs 4345.351 for FP benchmarks.

 Based on comments from RFC patch following changes were done.

 1. Added a new pass for path splitting changes.
 2. Placed the new path  Splitting Optimization pass before the copy 
 propagation pass.
 3. The join block same as the Loop latch is wired into its 
 predecessors so that the CFG Cleanup pass will merge the blocks Wired 
 together.
 4. Copy propagation routines added for path splitting changes is not 
 needed as suggested by Jeff. They are removed in the patch as The copy 
 propagation in the copied join blocks will be done by the existing copy 
 propagation pass and the update ssa pass.
 5. Only the propagation of phi results of the join block with the phi 
 argument is done which will not be done by the existing update_ssa Or copy 
 propagation pass on tree ssa representation.
 6. Added 2 tests.
 a) compilation check  tests.
b) execution tests.
 7. Refactoring of the code for the feasibility check and finding the join 
 block same as loop latch node.

 [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
 representation.

 Added a new pass on path splitting on tree SSA representation. The path
 splitting optimization does the CFG transformation of join block of the
 if-then-else same as the loop latch node is moved and merged with the
 predecessor blocks after preserving the SSA representation.

 ChangeLog:
 2015-06-30  Ajit Agarwal  ajit...@xilinx.com

 * gcc/Makefile.in: Add the build of the new file
 tree-ssa-path-split.c
 * gcc/common.opt: Add the new flag ftree-path-split.
 * gcc/opts.c: Add an entry for Path splitting pass
 with optimization flag greater and equal to O2.
 * gcc/passes.def: Enable and add new pass path splitting.
 * gcc/timevar.def: Add the new entry for TV_TREE_PATH_SPLIT.
 * gcc/tree-pass.h: Extern Declaration of make_pass_path_split.
 * gcc/tree-ssa-path-split.c: New file for path splitting pass.
 * gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New testcase.
 * gcc/testsuite/gcc.dg/path-split-1.c: New testcase.

I'm not 100% sure I understand the transform but what I see from the 
testcases it tail-duplicates from a conditional up to a loop latch block (not 
sure if it includes it and thus ends up creating a loop nest or not).

An observation I have is that the pass should at least share the transform 
stage to some extent with the existing tracer pass (tracer.c) which 
essentially does the same but not restricted to loops in any way.  

The following piece of code from tracer.c can be shared with the existing path 
splitting pass.

{
 e = find_edge (bb, bb2);

  copy = duplicate_block (bb2, e, bb);
  flush_pending_stmts (e);

  add_phi_args_after_copy (copy, 1, NULL);
}

Sharing the above code of the transform stage of tracer.c with the path 
splitting pass has the following limitation.

1. The duplicated loop latch node is wired to its predecessors and the existing 
phi node in the loop latch node with the
Phi arguments from its corresponding predecessors is moved to the duplicated 
loop latch node that is wired into its predecessors. Due
To this, the duplicated loop latch nodes wired into its predecessors will not 
be merged with the original predecessors by CFG cleanup phase .

 So I wonder if your pass could be simply another heuristic to compute paths 
 to trace in the existing tracer pass.

Sorry, I am not very clear when you say

RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-07-01 Thread Ajit Kumar Agarwal


-Original Message-
From: Joseph Myers [mailto:jos...@codesourcery.com] 
Sent: Wednesday, July 01, 2015 3:48 AM
To: Ajit Kumar Agarwal
Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On Tue, 30 Jun 2015, Ajit Kumar Agarwal wrote:

 * gcc/common.opt: Add the new flag ftree-path-split.

All options need documenting in invoke.texi.

Sure.
 +#include tm.h

Why?  Does some other header depend on this, or are you using a target macro?

I am not using any target macro. There are many header files that includes the 
tm.h and also there are many tree-ssa optimization
files that have included  tm.h  listing some of them tree-ssa-threadupdate.c  
tree-vrp.c ,  tree-ssa-threadedge.c.

Thanks  Regards
Ajit
--
Joseph S. Myers
jos...@codesourcery.com


[Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-06-30 Thread Ajit Kumar Agarwal
 *make_pass_build_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-path-split.c b/gcc/tree-ssa-path-split.c
new file mode 100644
index 000..3da7791
--- /dev/null
+++ b/gcc/tree-ssa-path-split.c
@@ -0,0 +1,462 @@
+/* Support routines for Path Splitting.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by Ajit Kumar Agarwal ajit...@xilinx.com.
+ 
+ This file is part of GCC.
+
+ GCC is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3, or (at your option)
+ any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */
+
+#include config.h
+#include system.h
+#include coretypes.h
+#include tm.h
+#include flags.h
+#include tree.h
+#include stor-layout.h
+#include calls.h
+#include predict.h
+#include vec.h
+#include hashtab.h
+#include hash-set.h
+#include machmode.h
+#include hard-reg-set.h
+#include input.h
+#include function.h
+#include dominance.h
+#include cfg.h
+#include cfganal.h
+#include basic-block.h
+#include tree-ssa-alias.h
+#include internal-fn.h
+#include gimple-fold.h
+#include tree-eh.h
+#include gimple-expr.h
+#include is-a.h
+#include gimple.h
+#include gimple-iterator.h
+#include gimple-walk.h
+#include gimple-ssa.h
+#include tree-cfg.h
+#include tree-phinodes.h
+#include ssa-iterators.h
+#include stringpool.h
+#include tree-ssanames.h
+#include tree-ssa-loop-manip.h
+#include tree-ssa-loop-niter.h
+#include tree-ssa-loop.h
+#include tree-into-ssa.h
+#include tree-ssa.h
+#include tree-pass.h
+#include tree-dump.h
+#include gimple-pretty-print.h
+#include diagnostic-core.h
+#include intl.h
+#include cfgloop.h
+#include tree-scalar-evolution.h
+#include tree-ssa-propagate.h
+#include tree-chrec.h
+#include tree-ssa-threadupdate.h
+#include expr.h
+#include insn-codes.h
+#include optabs.h
+#include tree-ssa-threadedge.h
+#include wide-int.h
+
+/* Replace_uses_phi function propagates the phi results with the
+   first phi argument into each of the copied join blocks wired into
+   its predecessors. This function is called from the replace_uses_phi 
+   to replace the uses of first phi arguments with the second
+   phi arguments in the next copy of join block.  */
+
+static void
+replace_use_phi_operand1_with_operand2 (basic_block b,
+tree use1,
+tree use2)
+{
+  use_operand_p use;
+  ssa_op_iter iter;
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_bb (b); !gsi_end_p (gsi);)
+ {
+   gimple stmt = gsi_stmt (gsi);
+   FOR_EACH_SSA_USE_OPERAND (use, stmt, iter, SSA_OP_USE)
+   {
+ tree tuse = USE_FROM_PTR (use);
+  if (use1 == tuse || use1 == NULL_TREE)
+{
+  propagate_value (use, use2);
+  update_stmt(stmt);
+}
+}
+   gsi_next(gsi);
+ }
+}
+
+/* This function propagates the phi result into the use points with
+   the phi arguments. The join block is copied and wired into the
+   predecessors. Since the use points of the phi results will be same
+   in the each of the copy join blocks in the  predecessors, it
+   propagates the phi arguments in the copy of the join blocks wired
+   into its predecessor.  */
+ 
+static
+void replace_uses_phi (basic_block b, basic_block temp_bb)
+{
+  gimple_seq phis = phi_nodes (b);
+  gimple phi = gimple_seq_first_stmt (phis);
+  tree def = gimple_phi_result (phi), use = gimple_phi_arg_def (phi,0);
+  tree use2 = gimple_phi_arg_def (phi,1);
+
+  if (virtual_operand_p (def))
+{
+  imm_use_iterator iter;
+  use_operand_p use_p;
+  gimple stmt;
+
+  FOR_EACH_IMM_USE_STMT (stmt, iter, def)
+FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+  SET_USE (use_p, use);
+  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (def))
+SSA_NAME_OCCURS_IN_ABNORMAL_PHI (use) = 1;
+}
+   else
+ replace_uses_by (def, use);
+   replace_use_phi_operand1_with_operand2 (temp_bb, use, use2);
+}
+
+/* Returns true if the block bb has label or call statements.
+   Otherwise return false.  */
+
+static bool
+is_block_has_label_call (basic_block bb)
+{
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
+ {
+   gimple stmt = gsi_stmt(gsi);
+   if (dyn_cast glabel * (stmt))
+ {
+   return true;
+ }
+   if (is_gimple_call (stmt))
+ return true;
+ }
+  return false;
+}
+
+/* This function performs the feasibility tests for path splitting

RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-06-30 Thread Ajit Kumar Agarwal
I forgot to attach the Link of the RFC comments from Jeff  for reference.

https://gcc.gnu.org/ml/gcc/2015-05/msg00302.html

Thanks  Regards
Ajit

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Ajit Kumar Agarwal
Sent: Tuesday, June 30, 2015 1:46 PM
To: l...@redhat.com; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

All:

The below patch added a new path Splitting optimization pass on SSA 
representation. The Path Splitting optimization Pass moves the join block of 
if-then-else same as loop latch to its predecessors and get merged with the 
predecessors Preserving the SSA representation.

The patch is tested for Microblaze and i386 target. The EEMBC/Mibench 
benchmarks is run with the Microblaze target And the performance gain of 9.15% 
and rgbcmy01_lite(EEMBC benchmarks). The Deja GNU tests is run for Mircroblaze 
Target and no regression is seen for Microblaze target and the new testcase 
attached are passed.

For i386 bootstrapping goes through fine and the Spec cpu2000 benchmarks is run 
with this patch. Following observation were seen with spec cpu2000 benchmarks. 

Ratio of path splitting change vs Ratio of not having path splitting change is 
3653.353 vs 3652.14 for INT benchmarks.
Ratio of path splitting change vs Ratio of not having path splitting change is  
4353.812 vs 4345.351 for FP benchmarks.

Based on comments from RFC patch following changes were done.

1. Added a new pass for path splitting changes.
2. Placed the new path  Splitting Optimization pass before the copy propagation 
pass.
3. The join block same as the Loop latch is wired into its predecessors so that 
the CFG Cleanup pass will merge the blocks Wired together.
4. Copy propagation routines added for path splitting changes is not needed as 
suggested by Jeff. They are removed in the patch as The copy propagation in the 
copied join blocks will be done by the existing copy propagation pass and the 
update ssa pass.
5. Only the propagation of phi results of the join block with the phi argument 
is done which will not be done by the existing update_ssa Or copy propagation 
pass on tree ssa representation.
6. Added 2 tests.
a) compilation check  tests.
   b) execution tests.
7. Refactoring of the code for the feasibility check and finding the join block 
same as loop latch node.

[Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation.

Added a new pass on path splitting on tree SSA representation. The path
splitting optimization does the CFG transformation of join block of the
if-then-else same as the loop latch node is moved and merged with the
predecessor blocks after preserving the SSA representation.

ChangeLog:
2015-06-30  Ajit Agarwal  ajit...@xilinx.com

* gcc/Makefile.in: Add the build of the new file
tree-ssa-path-split.c
* gcc/common.opt: Add the new flag ftree-path-split.
* gcc/opts.c: Add an entry for Path splitting pass
with optimization flag greater and equal to O2.
* gcc/passes.def: Enable and add new pass path splitting.
* gcc/timevar.def: Add the new entry for TV_TREE_PATH_SPLIT.
* gcc/tree-pass.h: Extern Declaration of make_pass_path_split.
* gcc/tree-ssa-path-split.c: New file for path splitting pass.
* gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New testcase.
* gcc/testsuite/gcc.dg/path-split-1.c: New testcase.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com.

gcc/Makefile.in  |   1 +
 gcc/common.opt   |   4 +
 gcc/opts.c   |   1 +
 gcc/passes.def   |   1 +
 gcc/testsuite/gcc.dg/path-split-1.c  |  65 
 gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c |  62 
 gcc/timevar.def  |   1 +
 gcc/tree-pass.h  |   1 +
 gcc/tree-ssa-path-split.c| 462 +++
 9 files changed, 598 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/path-split-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c
 create mode 100644 gcc/tree-ssa-path-split.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 5f9261f..35ac363 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1476,6 +1476,7 @@ OBJS = \
tree-vect-slp.o \
tree-vectorizer.o \
tree-vrp.o \
+tree-ssa-path-split.o \
tree.o \
valtrack.o \
value-prof.o \
diff --git a/gcc/common.opt b/gcc/common.opt index e104269..c63b100 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2328,6 +2328,10 @@ ftree-vrp
 Common Report Var(flag_tree_vrp) Init(0) Optimization  Perform Value Range 
Propagation on trees
 
+ftree-path-split

RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-06-30 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Tuesday, June 30, 2015 4:42 PM
To: Ajit Kumar Agarwal
Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On Tue, Jun 30, 2015 at 10:16 AM, Ajit Kumar Agarwal 
ajit.kumar.agar...@xilinx.com wrote:
 All:

 The below patch added a new path Splitting optimization pass on SSA 
 representation. The Path Splitting optimization Pass moves the join 
 block of if-then-else same as loop latch to its predecessors and get merged 
 with the predecessors Preserving the SSA representation.

 The patch is tested for Microblaze and i386 target. The EEMBC/Mibench 
 benchmarks is run with the Microblaze target And the performance gain 
 of 9.15% and rgbcmy01_lite(EEMBC benchmarks). The Deja GNU tests is run for 
 Mircroblaze Target and no regression is seen for Microblaze target and the 
 new testcase attached are passed.

 For i386 bootstrapping goes through fine and the Spec cpu2000 
 benchmarks is run with this patch. Following observation were seen with spec 
 cpu2000 benchmarks.

 Ratio of path splitting change vs Ratio of not having path splitting change 
 is 3653.353 vs 3652.14 for INT benchmarks.
 Ratio of path splitting change vs Ratio of not having path splitting change 
 is  4353.812 vs 4345.351 for FP benchmarks.

 Based on comments from RFC patch following changes were done.

 1. Added a new pass for path splitting changes.
 2. Placed the new path  Splitting Optimization pass before the copy 
 propagation pass.
 3. The join block same as the Loop latch is wired into its 
 predecessors so that the CFG Cleanup pass will merge the blocks Wired 
 together.
 4. Copy propagation routines added for path splitting changes is not 
 needed as suggested by Jeff. They are removed in the patch as The copy 
 propagation in the copied join blocks will be done by the existing copy 
 propagation pass and the update ssa pass.
 5. Only the propagation of phi results of the join block with the phi 
 argument is done which will not be done by the existing update_ssa Or copy 
 propagation pass on tree ssa representation.
 6. Added 2 tests.
 a) compilation check  tests.
b) execution tests.
 7. Refactoring of the code for the feasibility check and finding the join 
 block same as loop latch node.

 [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
 representation.

 Added a new pass on path splitting on tree SSA representation. The path
 splitting optimization does the CFG transformation of join block of the
 if-then-else same as the loop latch node is moved and merged with the
 predecessor blocks after preserving the SSA representation.

 ChangeLog:
 2015-06-30  Ajit Agarwal  ajit...@xilinx.com

 * gcc/Makefile.in: Add the build of the new file
 tree-ssa-path-split.c
 * gcc/common.opt: Add the new flag ftree-path-split.
 * gcc/opts.c: Add an entry for Path splitting pass
 with optimization flag greater and equal to O2.
 * gcc/passes.def: Enable and add new pass path splitting.
 * gcc/timevar.def: Add the new entry for TV_TREE_PATH_SPLIT.
 * gcc/tree-pass.h: Extern Declaration of make_pass_path_split.
 * gcc/tree-ssa-path-split.c: New file for path splitting pass.
 * gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New testcase.
 * gcc/testsuite/gcc.dg/path-split-1.c: New testcase.

I'm not 100% sure I understand the transform but what I see from the 
testcases it tail-duplicates from a conditional up to a loop latch block (not 
sure if it includes it and thus ends up creating a loop nest or not).

The path splitting pass  wired the duplicated basic block of  the loop latch 
block to both of its predecessor path, if the loop latch block 
is same as join block. The CFG cleanup phase of the path splitting 
transformation merges the basic blocks which is wired with the original 
predecessors and thus making the loop latch block just  as forwarding block of 
the predecessors  with the sequential statements of the 
loop latch block is set as NULL having only the phi nodes, and the same Loop 
semantics with respect to loop latch edge is preserved
Also the SSA updates are preserved.

Thanks  Regards
Ajit 

An observation I have is that the pass should at least share the transform 
stage to some extent with the existing tracer pass (tracer.c) which 
essentially does the same but not restricted to loops in any way.  So I 
wonder if your pass could be simply another heuristic to compute paths to 
trace in the existing tracer pass.


Thanks,
Richard.

 Signed-off-by:Ajit Agarwal ajit...@xilinx.com.

 gcc/Makefile.in  |   1 +
  gcc/common.opt   |   4 +
  gcc/opts.c   |   1

RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-06-30 Thread Ajit Kumar Agarwal


-Original Message-
From: Bernhard Reutner-Fischer [mailto:rep.dot@gmail.com] 
Sent: Tuesday, June 30, 2015 3:57 PM
To: Ajit Kumar Agarwal; l...@redhat.com; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On June 30, 2015 10:16:01 AM GMT+02:00, Ajit Kumar Agarwal 
ajit.kumar.agar...@xilinx.com wrote:
All:

The below patch added a new path Splitting optimization pass on SSA 
representation. The Path Splitting optimization Pass moves the join 
block of if-then-else same as loop latch to its predecessors and get 
merged with the predecessors Preserving the SSA representation.

The patch is tested for Microblaze and i386 target. The EEMBC/Mibench 
benchmarks is run with the Microblaze target And the performance gain 
of 9.15% and rgbcmy01_lite(EEMBC benchmarks).
The Deja GNU tests is run for Mircroblaze Target and no regression is 
seen for Microblaze target and the new testcase attached are passed.

For i386 bootstrapping goes through fine and the Spec cpu2000 
benchmarks is run with this patch. Following observation were seen with 
spec cpu2000 benchmarks.

Ratio of path splitting change vs Ratio of not having path splitting 
change is 3653.353 vs 3652.14 for INT benchmarks.
Ratio of path splitting change vs Ratio of not having path splitting 
change is  4353.812 vs 4345.351 for FP benchmarks.

Based on comments from RFC patch following changes were done.

1. Added a new pass for path splitting changes.
2. Placed the new path  Splitting Optimization pass before the copy 
propagation pass.
3. The join block same as the Loop latch is wired into its predecessors 
so that the CFG Cleanup pass will merge the blocks Wired together.
4. Copy propagation routines added for path splitting changes is not 
needed as suggested by Jeff. They are removed in the patch as The copy 
propagation in the copied join blocks will be done by the existing copy 
propagation pass and the update ssa pass.
5. Only the propagation of phi results of the join block with the phi 
argument is done which will not be done by the existing update_ssa Or 
copy propagation pass on tree ssa representation.
6. Added 2 tests.
a) compilation check  tests.
   b) execution tests.

The 2 tests seem to be identical, so why do you have both?
Also, please remove cleanup-tree-dump, this is now done automatically.

The testcase path-split-1.c  is to check for execution which is present in 
gcc.dg top directory . The one
present in the gcc.dg/tree-ssa/path-split-2.c is to check the compilation as 
the action item is compilation. For the
execution tests path-split-1.c the action is compile and run.

Thanks  Regards
Ajit

Thanks,

7. Refactoring of the code for the feasibility check and finding the 
join block same as loop latch node.




RE: [Patch] OPT: Update heuristics for loop-invariant for address arithmetic.

2015-04-24 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Sandiford [mailto:rdsandif...@googlemail.com] 
Sent: Friday, April 24, 2015 12:40 AM
To: Ajit Kumar Agarwal
Cc: vmaka...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch] OPT: Update heuristics for loop-invariant for address 
arithmetic.

Very delayed answer, sorry...

Ajit Kumar Agarwal ajit.kumar.agar...@xilinx.com writes:
 Hello All:

 The changes are made in the patch to update the heuristics for loop 
 invariant for address arithemetic at RTL Level.  The heuristics are 
 updated with the consideration of single def and use for register 
 pressure calculation instead Of ignoring it and also to update the 
 estimated register pressure cost along with the check of actual uses 
 with Address uses.

 With the above change, gains are seen in the Geomean for Mibench/EEMBC 
 benchmarks for microblaze target. No Regression is seen in deja GNU 
 regressions tests for microblaze.

Since thispatch is basically removing code, were you able to analyse why that 
code was having a detrimental effect?  I assume it benefited some target 
??originally.

This patch modified the estimated register pressure cost for non ira based 
register pressure(flag_ira_loop_pressure is not set).
Following changes were made in the estimated register pressure cost.

size_cost = (estimate_reg_pressure_cost (new_regs[0] + regs_needed[0],
   regs_used, speed, call_p)
   - estimate_reg_pressure_cost (new_regs[0],
 regs_used, speed, call_p));

is changed to 

size_cost =  estimate_reg_pressure_cost (regs_needed[0],
   regs_used, speed, call_p);

This looks reasonable change for the estimated_reg_pressure_cost calculation. 
The other changes I have made, For the single use for the given
Def the current code does not include such invariants in the register pressure 
calculation which I have enabled including the single use for the
Given def for the register pressure calculation. Though the comment in the code 
says that there won't be a new register for single use after moving,
But moving such invariants outside the loop will affect the register pressures 
as  the spans of the live range after moving out of loops differs from 
The original loop. Since the Live range spans varies such cases should surely 
affect the registers pressure.

The above explanation looks reasonable and the code that does not include such 
invariants into register pressure is removed in the patch.

I don't any see background or the patches in the past that explicit made the 
above check as part of  any performance improvement or bug fix.

Thanks  Regards
Ajit

 



Thanks,
Richard


RE: [Patch] IRA: Update heuristics for optimal coalescing

2015-03-23 Thread Ajit Kumar Agarwal
Hello Vladimir:

Did you get a chance to look at the below patch.

Thanks  Regards
Ajit

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Ajit Kumar Agarwal
Sent: Friday, February 27, 2015 11:25 AM
To: vmaka...@redhat.com; Jeff Law; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: [Patch] IRA: Update heuristics for optimal coalescing

Hello Vladimir:

The changes in the patch are made in the frequency heuristics for optimal 
coalescing. The Loop back edge frequencies are taken instead of the block 
frequency for optimal coalescing. Ignores the frequency for the loop for the 
allocno not having references inside the loops but spans the loops and live at 
the exit block of the loop. Another similar change are made not to consider 
allcono frequency at the cost calculation but to consider the loop back edge 
frequencies having references and spans through the loop and live at the exit 
of the block.

We have tested the changes with MIBench and EEMBC benchmarks and there is a 
gain in the Geomean for the overall benchmarks for Microblaze target. Also no 
regressions are seen in deja GNU tests run for microblaze.

Please let us know with your feedbacks.

commit e6a2edd3794080a973695f80e77df3e7de55452d
Author: Ajit Kumar Agarwal ajitkum@xhdspdgnu.(none)
Date:   Fri Feb 27 11:15:48 2015 +0530

IRA: Update heuristics for optimal coalescing.

The changes are made in the frequency heuristics for optimal coalescing.
The Loop back edge frequencies are taken instead of the block frequency
for optimal coalescing. Ignores the frequency for the loop having not any
references but spans the loop and live at the exit block of the loop.
Another similar change not to consider allcono frequency at the cost
calculation but to consider the loop back edge frequencies having references
and spans through the loop and live at the exit of the block.

ChangeLog:
2015-02-27  Ajit Agarwal  ajit...@xilinx.com

* ira-color.c (ira_loop_back_edge_freq): New.
(coalesce_allocnos): Use of ira_loop_back_edge_freq to update
the back edge frequencies.
(setup_coalesced_allocno_costs_and_nums): Use of
ira_loop_back_edge_freq to update the cost.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks  Regards
Ajit


RE: [Patch] OPT: Update heuristics for loop-invariant for address arithmetic.

2015-03-23 Thread Ajit Kumar Agarwal
Hello All:

Did you get a chance to look at the below patch.

Thanks  Regards
Ajit

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Ajit Kumar Agarwal
Sent: Wednesday, March 04, 2015 3:57 PM
To: vmaka...@redhat.com; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: [Patch] OPT: Update heuristics for loop-invariant for address 
arithmetic.

Hello All:

The changes are made in the patch to update the heuristics for loop invariant 
for address arithemetic at RTL Level. 
The heuristics are updated with the consideration of single def and use for 
register pressure calculation instead Of ignoring it and also to update the 
estimated register pressure cost along with the check of actual uses with 
Address uses.

With the above change, gains are seen in the Geomean for Mibench/EEMBC 
benchmarks for microblaze target. No Regression is seen in deja GNU regressions 
tests for microblaze.

Please let us know your feedback.

commit 039b95028c93f99fc1da7fa255f9b5fff4e17223
Author: Ajit Kumar Agarwal ajitkum@xhdspdgnu.(none)
Date:   Wed Mar 4 15:46:45 2015 +0530

[Patch] OPT: Update heuristics for loop-invariant for address arithmetic.

The changes are made in the patch to update the heuristics
for loop invariant for address arithmetic. The heuristics is
changed to incorporate the single def and use in the register
pressure calculation in order to move the invariant out of
loop. The heuristics are further changed to not to use the
check for addr uses with actual uses. Also changes are done in
the heuristics of estimated register pressure cost.

ChangeLog:
2015-03-04  Ajit Agarwal  ajit...@xilinx.com

* loop-invariant.c (gain_for_invariant): update the
heuristics for estimate_reg_pressure_cost.
(get_inv_cost): Remove the check for addr uses with
actual uses. Add the single def and use in the register
pressure for invariants.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks  Regards
Ajit


RE: [Patch,microblaze]: Optimized usage of pcmp conditional instruction.

2015-03-05 Thread Ajit Kumar Agarwal


-Original Message-
From: Michael Eager [mailto:ea...@eagerm.com] 
Sent: Thursday, February 26, 2015 4:29 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,microblaze]: Optimized usage of pcmp conditional 
instruction.

On 02/25/15 02:19, Ajit Kumar Agarwal wrote:
 Hello All:

 Please find the patch for the optimized usage of pcmp instructions in 
 microblaze. No regressions is seen In deja GNU tests. There are many 
 testcases that are already there in deja GNU to check the generation of 
 pcmpne/pcmpeq instructions and are used to check the validity.

 commit b74acf44ce4286649e5be7cff7518d814cb2491f
 Author: Ajit Kumar Agarwal ajitkum@xhdspdgnu.(none)
 Date:   Wed Feb 25 15:33:02 2015 +0530

  [Patch,microblaze]: Optimized usage of pcmp conditional instruction.

  The changes are made in the patch for optimized usage of pcmpne/pcmpeq
  instructions. The xor with register to register is replaced with pcmpeq
  /pcmpne instructions and for immediate check still the xori will be used.
  The purpose of the change is to acheive the aggressive usage of pcmpne
  /pcmpeq instructions instead of xor being used for comparison.

  ChangeLog:
  2015-02-25  Ajit Agarwal  ajit...@xilinx.com

  * config/microblaze/microblaze.md (cbranchsi4): Added immediate
  constraints.
  (cbranchsi4_reg): New.
  * config/microblaze/microblaze.c
  (microblaze_expand_conditional_branch_reg): New.
  * config/microblaze/microblaze-protos.h
  (microblaze_expand_conditional_branch_reg): New prototype.

+  if (cmp_op1 == const0_rtx)
+{
+  comp_reg = cmp_op0;
+  condition = gen_rtx_fmt_ee (signed_condition (code),
+  SImode, comp_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (condition, label1));
+}
+
+  else if (code == EQ || code == NE)
+{
+  if (code == NE)
+{
+  emit_insn (gen_sne_internal_pat (comp_reg, cmp_op0,
+   cmp_op1));
+  condition = gen_rtx_NE (SImode, comp_reg, const0_rtx);
+}
+  else
+{
+  emit_insn (gen_seq_internal_pat (comp_reg,
+   cmp_op0, cmp_op1));
+  condition = gen_rtx_EQ (SImode, comp_reg, const0_rtx);
+}
+  emit_jump_insn (gen_condjump (condition, label1));
+}
+  else
+{
...

No blank line between end brace of if and else.

Replace with
+  else if (code == EQ)
+{
+   emit_insn (gen_seq_internal_pat (comp_reg, cmp_op0, cmp_op1));
+   condition = gen_rtx_EQ (SImode, comp_reg, const0_rtx);
+   emit_jump_insn (gen_condjump (condition, label1));
+}
+  else if (code == NE)
+{
+  emit_insn (gen_sne_internal_pat (comp_reg, cmp_op0, cmp_op1));
+  condition = gen_rtx_NE (SImode, comp_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (condition, label1));
+}
+  else
+{
...

--

Changes are  incorporated. Please find the log of the updated patch.

commit 91f275c144165320850ddf18e3a1e059a66c
Author: Ajit Kumar Agarwal ajitkum@xhdspdgnu.(none)
Date:   Fri Mar 6 09:55:11 2015 +0530

[Patch,microblaze]: Optimized usage of pcmp conditional instruction.

The changes are made in the patch for optimized usage of pcmpne/pcmpeq
instructions. The xor with register to register is replaced with pcmpeq
/pcmpne instructions and for immediate check still the xori will be used.
The purpose of the change is to acheive the aggressive usage of pcmpne
/pcmpeq instructions instead of xor being used for comparison.

ChangeLog:
2015-03-06  Ajit Agarwal  ajit...@xilinx.com

* config/microblaze/microblaze.md (cbranchsi4): Added immediate
constraints.
(cbranchsi4_reg): New.
* config/microblaze/microblaze.c
(microblaze_expand_conditional_branch_reg): New.
* config/microblaze/microblaze-protos.h
(microblaze_expand_conditional_branch_reg): New prototype.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks  Regards
Ajit
 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


0001-Patch-microblaze-Optimized-usage-of-pcmp-conditional.patch
Description: 0001-Patch-microblaze-Optimized-usage-of-pcmp-conditional.patch


[Patch] OPT: Update heuristics for loop-invariant for address arithmetic.

2015-03-04 Thread Ajit Kumar Agarwal
Hello All:

The changes are made in the patch to update the heuristics for loop invariant 
for address arithemetic at RTL Level. 
The heuristics are updated with the consideration of single def and use for 
register pressure calculation instead 
Of ignoring it and also to update the estimated register pressure cost along 
with the check of actual uses with
Address uses.

With the above change, gains are seen in the Geomean for Mibench/EEMBC 
benchmarks for microblaze target. No
Regression is seen in deja GNU regressions tests for microblaze.

Please let us know your feedback.

commit 039b95028c93f99fc1da7fa255f9b5fff4e17223
Author: Ajit Kumar Agarwal ajitkum@xhdspdgnu.(none)
Date:   Wed Mar 4 15:46:45 2015 +0530

[Patch] OPT: Update heuristics for loop-invariant for address arithmetic.

The changes are made in the patch to update the heuristics
for loop invariant for address arithmetic. The heuristics is
changed to incorporate the single def and use in the register
pressure calculation in order to move the invariant out of
loop. The heuristics are further changed to not to use the
check for addr uses with actual uses. Also changes are done in
the heuristics of estimated register pressure cost.

ChangeLog:
2015-03-04  Ajit Agarwal  ajit...@xilinx.com

* loop-invariant.c (gain_for_invariant): update the
heuristics for estimate_reg_pressure_cost.
(get_inv_cost): Remove the check for addr uses with
actual uses. Add the single def and use in the register
pressure for invariants.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks  Regards
Ajit


0001-Patch-OPT-Update-heuristics-for-loop-invariant-for-a.patch
Description: 0001-Patch-OPT-Update-heuristics-for-loop-invariant-for-a.patch


RE: [Patch,microblaze]: Optimized usage of fint instruction.

2015-03-04 Thread Ajit Kumar Agarwal


-Original Message-
From: Michael Eager [mailto:ea...@eagerm.com] 
Sent: Thursday, February 26, 2015 4:33 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,microblaze]: Optimized usage of fint instruction.

On 02/25/15 02:20, Ajit Kumar Agarwal wrote:
 Hello All:

 Please find the patch for the optimized usage of fint instruction 
 changes. No regression is seen in the deja GNU tests.

 commit ed4dc0b96bf43c200cacad97f73a98ab7048e51b
 Author: Ajit Kumar Agarwal ajitkum@xhdspdgnu.(none)
 Date:   Wed Feb 25 15:36:29 2015 +0530

  [Patch,microblaze]: Optimized usage of fint instruction.

  The changes are made in the patch for optimized usage of fint 
 instruction.
  The sequence of fint/cond_branch is replaced with fcmp/cond_branch. The
  fint instruction takes 6/7 cycles as compared to fcmp instruction which
  takes 1 cycles. The conversion from float to int with fint instruction
  is not required and can directly compared with fcmp instruction which
  takes 1 cycle as compared to 6/7 cycles with fint instruction.

  ChangeLog:
  2015-02-25  Ajit Agarwal  ajit...@xilinx.com

  * config/microblaze/microblaze.md (peephole2): New.


+emit_insn (gen_cstoresf4 (comp_reg, operands[2],
+  gen_rtx_REG(SFmode,REGNO(cmp_op0)),
+  gen_rtx_REG(SFmode,REGNO(cmp_op1;

Spaces before left parens and after comma in last two lines.

Changes are incorporated. Please find the log for updated patch.

commit 492b0d0b67a5b12d2dc239de3215630c8838edea
Author: Ajit Kumar Agarwal ajitkum@xhdspdgnu.(none)
Date:   Wed Mar 4 17:15:16 2015 +0530

[Patch,microblaze]: Optimized usage of fint instruction.

The changes are made in the patch for optimized usage of fint instruction.
The sequence of fint/cond_branch is replaced with fcmp/cond_branch. The
fint instruction takes 6/7 cycles as compared to fcmp instruction which
takes 1 cycles. The conversion from float to int with fint instruction
is not required and can directly compared with fcmp instruction which
takes 1 cycle as compared to 6/7 cycles with fint instruction.

ChangeLog:
2015-03-04  Ajit Agarwal  ajit...@xilinx.com

* config/microblaze/microblaze.md (peephole2): New.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks  Regards
Ajit

Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


0001-Patch-microblaze-Optimized-usage-of-fint-instruction.patch
Description: 0001-Patch-microblaze-Optimized-usage-of-fint-instruction.patch


[Patch] IRA: Update heuristics for optimal coalescing

2015-02-26 Thread Ajit Kumar Agarwal
Hello Vladimir:

The changes in the patch are made in the frequency heuristics for optimal 
coalescing. The Loop back edge frequencies are taken instead 
of the block frequency for optimal coalescing. Ignores the frequency for the 
loop for the allocno not having references inside the loops
but spans the loops and live at the exit block of the loop. Another similar 
change are made not to consider allcono frequency at the cost 
calculation but to consider the loop back edge frequencies having references 
and spans through the loop and live at the exit of the block.

We have tested the changes with MIBench and EEMBC benchmarks and there is a 
gain in the Geomean for the overall benchmarks for
Microblaze target. Also no regressions are seen in deja GNU tests run for 
microblaze.

Please let us know with your feedbacks.

commit e6a2edd3794080a973695f80e77df3e7de55452d
Author: Ajit Kumar Agarwal ajitkum@xhdspdgnu.(none)
Date:   Fri Feb 27 11:15:48 2015 +0530

IRA: Update heuristics for optimal coalescing.

The changes are made in the frequency heuristics for optimal coalescing.
The Loop back edge frequencies are taken instead of the block frequency
for optimal coalescing. Ignores the frequency for the loop having not any
references but spans the loop and live at the exit block of the loop.
Another similar change not to consider allcono frequency at the cost
calculation but to consider the loop back edge frequencies having references
and spans through the loop and live at the exit of the block.

ChangeLog:
2015-02-27  Ajit Agarwal  ajit...@xilinx.com

* ira-color.c (ira_loop_back_edge_freq): New.
(coalesce_allocnos): Use of ira_loop_back_edge_freq to update
the back edge frequencies.
(setup_coalesced_allocno_costs_and_nums): Use of
ira_loop_back_edge_freq to update the cost.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks  Regards
Ajit


0001-IRA-Update-heuristics-for-optimal-coalescing.patch
Description: 0001-IRA-Update-heuristics-for-optimal-coalescing.patch


[Patch,microblaze]: Optimized usage of pcmp conditional instruction.

2015-02-25 Thread Ajit Kumar Agarwal
Hello All:

Please find the patch for the optimized usage of pcmp instructions in 
microblaze. No regressions is seen
In deja GNU tests. There are many testcases that are already there in deja GNU 
to check the generation of 
pcmpne/pcmpeq instructions and are used to check the validity. 

commit b74acf44ce4286649e5be7cff7518d814cb2491f
Author: Ajit Kumar Agarwal ajitkum@xhdspdgnu.(none)
Date:   Wed Feb 25 15:33:02 2015 +0530

[Patch,microblaze]: Optimized usage of pcmp conditional instruction.

The changes are made in the patch for optimized usage of pcmpne/pcmpeq
instructions. The xor with register to register is replaced with pcmpeq
/pcmpne instructions and for immediate check still the xori will be used.
The purpose of the change is to acheive the aggressive usage of pcmpne
/pcmpeq instructions instead of xor being used for comparison.

ChangeLog:
2015-02-25  Ajit Agarwal  ajit...@xilinx.com

* config/microblaze/microblaze.md (cbranchsi4): Added immediate
constraints.
(cbranchsi4_reg): New.
* config/microblaze/microblaze.c
(microblaze_expand_conditional_branch_reg): New.
* config/microblaze/microblaze-protos.h
(microblaze_expand_conditional_branch_reg): New prototype.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks  Regards
Ajit


0001-Patch-microblaze-Optimized-usage-of-pcmp-conditional.patch
Description: 0001-Patch-microblaze-Optimized-usage-of-pcmp-conditional.patch


[Patch,microblaze]: Optimized usage of fint instruction.

2015-02-25 Thread Ajit Kumar Agarwal
Hello All:

Please find the patch for the optimized usage of fint instruction changes. No 
regression is seen
in the deja GNU tests.

commit ed4dc0b96bf43c200cacad97f73a98ab7048e51b
Author: Ajit Kumar Agarwal ajitkum@xhdspdgnu.(none)
Date:   Wed Feb 25 15:36:29 2015 +0530

[Patch,microblaze]: Optimized usage of fint instruction.

The changes are made in the patch for optimized usage of fint instruction.
The sequence of fint/cond_branch is replaced with fcmp/cond_branch. The
fint instruction takes 6/7 cycles as compared to fcmp instruction which
takes 1 cycles. The conversion from float to int with fint instruction
is not required and can directly compared with fcmp instruction which
takes 1 cycle as compared to 6/7 cycles with fint instruction.

ChangeLog:
2015-02-25  Ajit Agarwal  ajit...@xilinx.com

* config/microblaze/microblaze.md (peephole2): New.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks  Regards
Ajit


0001-Patch-microblaze-Optimized-usage-of-fint-instruction.patch
Description: 0001-Patch-microblaze-Optimized-usage-of-fint-instruction.patch


RE: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition

2015-02-10 Thread Ajit Kumar Agarwal


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Jeff Law
Sent: Tuesday, February 10, 2015 5:03 AM
To: Bin.Cheng
Cc: Alex Velenko; Felix Yang; Yangfei (Felix); Marcus Shawcroft; GCC Patches; 
vmaka...@redhat.com
Subject: Re: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for 
pseudo with more than one definition

On 02/03/15 20:03, Bin.Cheng wrote:
 I looked into the test and can confirm the previous compilation is correct.
 The cover letter of this patch said IRA mis-handled REQ_EQUIV before, 
 but in this case it is REG_EQUAL that is lost.  The full dump (without 
 this patch) after IRA is like:
Right, but a REG_EQUIV is generated based on the incoming REG_EQUAL notes in 
the insn stream.  Basically update_equiv_regs will scan insn stream and some 
REG_EQUAL notes will be promoted to REG_EQUIV notes.

The REG_EQUIV is a function-wide equivalence, meaning that one could substitute 
the REG_EQUIV note for in any uses of the destination register and still have a 
valid representation of the program.

REG_EQUAL's validity is limited to the point after the insn in which it appears 
and before the next insn.


 Before r216169 (with REG_EQUAL in insn9), jumps from basic block 6/7/8
 - 9 can be merged because r110 equals to -1 afterwards.  But with the
 patch, the equal information of r110==-1 in basic block 8 is lost.  As 
 a result, jump from 8-9 can't be merged and two additional 
 instructions are generated.
 
 I suppose the REG_EQUAL note is correct in insn9?  According to 
 GCCint, it only means r110 set by insn9 will be equal to the value at 
 run time at the end of this insn but not necessarily elsewhere in the 
 function.
If you previously got a REG_EQUIV note on any of those insns it was wrong and 
this is the precise kind of situation that the change was trying to fix.

R110 can have the value -1 (BB6, BB7, BB8) or 0 (BB5).  Thus there is no 
single value across the entire function that one can validly use for r110.

Does the value of R110 should not change across all the callee path from the 
given caller functions.  If the value is aliased, then how the call side 
affects should make
sure the value remains same across all the callee chain path from  the given 
caller function. I am curious to know how the value remain constant throughout 
the function
 is identified in case of aliasing and the interprocedural case.

I think you could mark this as a missed optimization, but not a regresssion 
since the desired output was relying on a bug in the compiler.

If I were to start looking at this, my first thought would be to look at why 
we have multiple sets of r110, particularly if there are lifetimes that are 
disjoint.



 I also found another problem (or mis-leading?) with the document:
 Thus, compiler passes prior to register allocation need only check 
 for REG_EQUAL notes and passes subsequent to register allocation need 
 only check for REG_EQUIV notes.  This seems not true now as in this 
 example, passes after register allocation do take advantage of 
 REG_EQUAL in optimization and we can't achieve that by using 
 REG_EQUIV.
I think that's a long standing (and incorrect) generalization.  IIRC we can 
get a REG_EQUIV note earlier for certain argument setup situations. 
  And I think it's been the case for a long time that a pass after reload 
could try to exploit REG_EQUAL notes.

Thanks  Regards
Ajit
jeff


RE: [PATCH 0/5] Fix handling of word subregs of wide registers

2014-09-22 Thread Ajit Kumar Agarwal


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Richard Sandiford
Sent: Monday, September 22, 2014 12:54 PM
To: Jeff Law
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 0/5] Fix handling of word subregs of wide registers

Jeff Law l...@redhat.com writes:
 On 09/19/14 01:23, Richard Sandiford wrote:
 Jeff Law l...@redhat.com writes:
 On 09/18/14 04:07, Richard Sandiford wrote:
 This series is a cleaned-up version of:

   https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html

 The underlying problem is that the semantics of subregs depend on 
 the word size.  You can't have a subreg for byte 2 of a 4-byte 
 word, say, but you can have a subreg for word 2 of a 4-word value 
 (as well as lowpart subregs of that word, etc.).  This causes 
 problems when an architecture has wider-than-word registers, since 
 the addressability of a word can then depend on which register 
 class is used.

 The register allocators need to fix up cases where a subreg turns 
 out to be invalid for a particular class.  This is really an 
 extension of what we need to do for CANNOT_CHANGE_MODE_CLASS.

 Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.
 I thought we fixed these problems long ago with the change to subreg_byte?!?

 No, that was fixing something else.  (I'm just about old enough to 
 remember that too!)  The problem here is that (say):

  (subreg:SI (reg:DI X) 4)

 is independently addressable on little-endian AArch32 if X assigned 
 to a GPR, but not if X is assigned to a vector register.  We need to 
 allow these kinds of subreg on pseudos in order to decompose 
 multiword arithmetic.  It's then up to the RA to realise that a 
 reload would be needed if X were assigned to a vector register, since 
 the upper half of a vector register cannot be independently accessed.

 Note that you could write this example even with the old word-style 
 offsets and IIRC the effect would have been the same.
 OK.  So I kept thinking in terms of the byte offset stuff.  But what 
 you're tackling is related to the mess around the mode of the subreg 
 having a different meaning if its smaller than a word vs word-sized or 
 greater.

 Right?

Yeah, that's right.  Addressability is based on words, which is inconvenient 
when your registers are bigger than a word.

If the architecture like Microblaze which doesn't support  the 1 byte or 2 byte 
registers. In this scenario what should be returned when SUBREG_WORD is used.

Thanks,
Richard



RE: [PATCH 0/5] Fix handling of word subregs of wide registers

2014-09-22 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Sandiford [mailto:richard.sandif...@arm.com] 
Sent: Monday, September 22, 2014 4:56 PM
To: Ajit Kumar Agarwal
Cc: Jeff Law; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 0/5] Fix handling of word subregs of wide registers

Ajit Kumar Agarwal ajit.kumar.agar...@xilinx.com writes:
 Jeff Law l...@redhat.com writes:
 On 09/19/14 01:23, Richard Sandiford wrote:
 Jeff Law l...@redhat.com writes:
 On 09/18/14 04:07, Richard Sandiford wrote:
 This series is a cleaned-up version of:

   https://gcc.gnu.org/ml/gcc/2014-03/msg00163.html

 The underlying problem is that the semantics of subregs depend on 
 the word size.  You can't have a subreg for byte 2 of a 4-byte 
 word, say, but you can have a subreg for word 2 of a 4-word value 
 (as well as lowpart subregs of that word, etc.).  This causes 
 problems when an architecture has wider-than-word registers, since 
 the addressability of a word can then depend on which register 
 class is used.

 The register allocators need to fix up cases where a subreg turns 
 out to be invalid for a particular class.  This is really an 
 extension of what we need to do for CANNOT_CHANGE_MODE_CLASS.

 Tested on x86_64-linux-gnu, powerpc64-linux-gnu and aarch64_be-elf.
 I thought we fixed these problems long ago with the change to 
 subreg_byte?!?

 No, that was fixing something else.  (I'm just about old enough to 
 remember that too!)  The problem here is that (say):

  (subreg:SI (reg:DI X) 4)

 is independently addressable on little-endian AArch32 if X assigned 
 to a GPR, but not if X is assigned to a vector register.  We need to 
 allow these kinds of subreg on pseudos in order to decompose 
 multiword arithmetic.  It's then up to the RA to realise that a 
 reload would be needed if X were assigned to a vector register, 
 since the upper half of a vector register cannot be independently accessed.

 Note that you could write this example even with the old word-style 
 offsets and IIRC the effect would have been the same.
 OK.  So I kept thinking in terms of the byte offset stuff.  But what 
 you're tackling is related to the mess around the mode of the subreg 
 having a different meaning if its smaller than a word vs word-sized 
 or greater.

 Right?

Yeah, that's right.  Addressability is based on words, which is  
inconvenient when your registers are bigger than a word.

 If the architecture like Microblaze which doesn't support the 1 byte 
 or
 2 byte registers. In this scenario what should be returned when 
 SUBREG_WORD is used.

I don't understand the question sorry.  Subreg offsets are still represented 
as bytes rather than words.  The patch doesn't change the way that subregs 
are represented or the rules about which subregs are valid.

Both before and after the patch, the semantics of subregs say that if you 
have 4-byte words, only one of:

(subreg:QI (reg:SI X) 0)
(subreg:QI (reg:SI X) 1)
(subreg:QI (reg:SI X) 2)
(subreg:QI (reg:SI X) 3)

is ever valid (0 for little-endian, 3 for big-endian).  Writing to that one 
valid subreg will change the whole of X, unless the subreg is wrapped in a 
strict_lowpart.  In other words, subregs are defined so that individual 
parts of a word are not independently addressable.

However, individual words of a multiword register _are_ addressable.  I.e.:

   (subreg:SI (reg:DI Y) 0)
   (subreg:SI (reg:DI Y) 4)

are both valid.  Writing to one does not change the other.

The problem the patch was trying to solve was that you can have targets with 
4-byte words but some 8-byte registers.  In those cases, it's still possible 
to form both of the Y subregs above if Y is allocated to a word-sized 
register, but not if Y is allocated to a doubleword-sized register.

Thanks Richard for the explanation. 

Thanks,
Richard



[Patch, microblaze]: Add Init_priority support.

2014-07-28 Thread Ajit Kumar Agarwal
Please find the following patch for init_priority support for microblaze. 
Testing Done : No regressions seen in gcc and g++ regressions testsuite.

   [Patch, microblaze]: Add Init_priority support.

Added TARGET_ASM_CONSTRUCTOR and TARGET_ASM_DESTRUCTOR macros. These
macros allows users to control the order of initialization of objects
defined at namespace scope with the init_priority attribute by
specifying a relative priority.

ChangeLog:
2014-07-28  Ajit Agarwal  ajit...@xilinx.com

* config/microblaze/microblaze.c (microblaze_elf_asm_cdtor): New.
(microblaze_elf_asm_constructor,microblaze_elf_asm_destructor): New.
* config/microblaze/microblaze.h
(TARGET_ASM_CONSTRUCTOR,TARGET_ASM_DESTRUCTOR): New Macros.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com.

Thanks  Regards
Ajit


0001-Patch-microblaze-Add-Init_priority-support.patch
Description: 0001-Patch-microblaze-Add-Init_priority-support.patch


RE: [RFC][ARM]: Fix reload spill failure (PR 60617)

2014-06-16 Thread Ajit Kumar Agarwal
You can assign the same register to more than operand based on the Liveness. It 
will be tricky if on the basis of Liveness  available registers not found. In 
that you need to spill one of the operands and use the registers assigned to 
the next operand. This forms the basis of spilling one of the values to assign 
the same registers to another operand.

This pattern is very specific to LO_REGS and I am sure there will low registers 
pressure for such operands. Is this pattern used  for dedicated registers for 
the operands?

Thanks  Regards
Ajit

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Venkataramanan Kumar
Sent: Monday, June 16, 2014 6:23 PM
To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan; Marcus 
Shawcroft; Patch Tracking; Maxim Kuvyrkov
Subject: [RFC][ARM]: Fix reload spill failure (PR 60617)

Hi Maintainers,

This patch fixes the PR 60617 that occurs when we turn on reload pass in thumb2 
mode.

It occurs for the pattern *ior_scc_scc that gets generated for the 3 argument 
of the below function call.

JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));


(snip---)
(insn 634 633 635 27 (parallel [
(set (reg:SI 3 r3)
(ior:SI (eq:SI (reg/v:SI 110 [ dst ]) == This operand
r5 is registers gets assigned
(reg/v:SI 112 [ op2 ]))
(eq:SI (reg/v:SI 110 [ dst ]) == This operand
(reg/v:SI 111 [ op1 ]
(clobber (reg:CC 100 cc))
]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300 
{*ior_scc_scc
(snip---)

The issue here is that the above pattern demands 5 registers (LO_REGS).

But when we are in reload, registers r0 is used for pointer to the class, r1 
and r2 for first and second argument. r7 is used for stack pointer.

So we are left with r3,r4,r5 and r6. But the above patterns needs five LO_REGS. 
Hence we get spill failure when processing the last register operand in that 
pattern,

In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and for thumb 
2 mode there is mention of using LO_REG in the comment as below.

Care should be taken to avoid adding thumb-2 patterns that require many low 
registers

So conservative fix is not to allow this pattern for Thumb-2 mode.

I allowed these pattern for Thumb2 when we have constant operands for 
comparison. That makes the target tests arm/thum2-cond-cmp-1.c to 
thum2-cond-cmp-4.c pass.

Regression tested with gcc 4.9 branch since in trunk this bug is masked 
revision 209897.

Please provide your suggestion on this patch

regards,
Venkat.


RE: [Patch,Microblaze]: Added Break Handler Support

2014-05-14 Thread Ajit Kumar Agarwal
 ))]
   
-  {
-if (microblaze_is_interrupt_variant ())
-return rtid\tr14,0 \;%#;
+  {
+if (microblaze_is_break_handler ()) 
+return rtbd\tr16,8\;%#; 
+else if (microblaze_is_interrupt_variant () 
+  (!microblaze_is_break_handler())) 
+return rtid\tr14,0 \;%#; 
 else
 return rtsd\tr15,8 \;%#;
   }
@@ -2068,8 +2074,14 @@
 register rtx target2 = gen_rtx_REG (Pmode,
  GP_REG_FIRST + MB_ABI_SUB_RETURN_ADDR_REGNUM);
 if (GET_CODE (target) == SYMBOL_REF) {
-gen_rtx_CLOBBER (VOIDmode, target2);
-return brlid\tr15,%0\;%#;
+if (microblaze_break_function_p (SYMBOL_REF_DECL (target))) {
+gen_rtx_CLOBBER (VOIDmode, target2);
+return brki\tr16,%0\;%#;
+}
+else { 
+gen_rtx_CLOBBER (VOIDmode, target2);
+return brlid\tr15,%0\;%#;
+}
 } else if (GET_CODE (target) == CONST_INT)
 return la\t%@,r0,%0\;brald\tr15,%@\;%#;
 else if (GET_CODE (target) == REG)
@@ -2173,13 +2185,15 @@
 if (GET_CODE (target) == SYMBOL_REF)
 {
   gen_rtx_CLOBBER (VOIDmode,target2);
-  if (SYMBOL_REF_FLAGS (target)  SYMBOL_FLAG_FUNCTION)
+  if (microblaze_break_function_p (SYMBOL_REF_DECL (target)))
+return brki\tr16,%1\;%#; 
+  else if (SYMBOL_REF_FLAGS (target)  SYMBOL_FLAG_FUNCTION)
 {
  return brlid\tr15,%1\;%#;
 }
   else
 {
- return bralid\tr15,%1\;%#;
+   return bralid\tr15,%1\;%#;
 }
 }
 else if (GET_CODE (target) == CONST_INT)
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 9780d92..adb6410 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3776,6 +3776,18 @@ registers) are saved in the function prologue.  If the 
function is a leaf
 function, only volatiles used by the function are saved.  A normal function
 return is generated instead of a return from interrupt.
 
+@item break_handler 
+@cindex break handler functions
+Use this attribute on the MicroBlaze ports to indicate that
+the specified function is an break handler.  The compiler generates function 
+entry and exit sequences suitable for use in an break handler when this 
+attribute is present. The return from @code{break_handler} is done through
+the @code{rtbd} instead of @code{rtsd}.
+
+@smallexample
+void f () __attribute__ ((break_handler));
+@end smallexample
+
 @item section (@var{section-name})
 @cindex @code{section} function attribute
 Normally, the compiler places the code it generates in the @code{text} section.
diff --git a/gcc/testsuite/gcc.target/microblaze/others/break_handler.c 
b/gcc/testsuite/gcc.target/microblaze/others/break_handler.c
new file mode 100644
index 000..1ccafd0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/microblaze/others/break_handler.c
@@ -0,0 +1,15 @@
+int func () __attribute__ ((break_handler));
+volatile int intr_occurred;
+
+int func ()
+{
+
+  /* { dg-final { scan-assembler rtbd\tr(\[0-9]\|\[1-2]\[0-9]\|3\[0-1]),8 } 
} */
+intr_occurred += 1;
+}
+int main()
+{
+/* { dg-final { scan-assembler brki\tr16 } } */
+func();
+return 0;
+}
-- 
1.7.1

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Michael Eager
Sent: Wednesday, May 14, 2014 3:33 AM
To: Ajit Kumar Agarwal; gcc-patches@gcc.gnu.org
Cc: Vinod Kathail; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,Microblaze]: Added Break Handler Support

On 05/13/14 14:42, Ajit Kumar Agarwal wrote:
 Hello Michael:

 Resubmitting the Patch with documentation for _break_handler in the 
 config/microblaze/microblaze.h.

Please put everything together in one place.
When you resubmit a patch, include the ChangeLog.

I'm not sure what you changed, but there are no changes to gcc/doc/extend.texi 
in your patch.


-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


0001-Patch-MicroBlaze-Add-break-Handler-Support.patch
Description: 0001-Patch-MicroBlaze-Add-break-Handler-Support.patch


RE: [Patch,Microblaze]: Added Break Handler Support

2014-05-13 Thread Ajit Kumar Agarwal
Hello Michael:

The following patch is to handle Software and Hardware breaks in Microblaze 
Architecture.
Deja GNU testcase does not have any regressions and the testcase attached 
passes through.
Review comments are incorporated.

Okay for trunk?

Thanks  Regards
Ajit

From 15dfaee8feef37430745d3dbc58f74bed876aabb Mon Sep 17 00:00:00 2001
From: Ajit Kumar Agarwal ajit...@xilinx.com
Date: Tue, 13 May 2014 13:25:52 +0530
Subject: [PATCH] [Patch, microblaze] Added Break Handler support

Added Break Handler support to incorporate the hardware and software break. The 
Break Handler routine
will be generating the rtbd instruction. At the call point where the software 
breaks are generated with
the instruction brki with register operand as r16.

2014-05-13 Ajit Agarwal ajit...@xilinx.com

* config/microblaze/microblaze.c
   (microblaze_break_function_p,microblaze_is_break_handler) : New

* config/microblaze/microblaze.h (BREAK_HANDLER_NAME) : New macro

* config/microblaze/microblaze.md :
  Extended support for generation of brki instruction and rtbd instruction.

* config/microblaze/microblaze-protos.h
   (microblaze_break_function_p,microblaze_is_break_handler) : New Declaration.

* testsuite/gcc.target/microblaze/others/break_handler.c : New.

Signed-off-by:Nagaraju nmek...@xilinx.com
---
gcc/config/microblaze/microblaze-protos.h  |4 +-
gcc/config/microblaze/microblaze.c |   47 +---
gcc/config/microblaze/microblaze.h |2 +-
gcc/config/microblaze/microblaze.md|   34 ++
.../gcc.target/microblaze/others/break_handler.c   |   15 ++
5 files changed, 83 insertions(+), 19 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/microblaze/others/break_handler.c

diff --git a/gcc/config/microblaze/microblaze-protos.h 
b/gcc/config/microblaze/microblaze-protos.h
index b03e9e1..f3cc099 100644
--- a/gcc/config/microblaze/microblaze-protos.h
+++ b/gcc/config/microblaze/microblaze-protos.h
@@ -40,10 +40,12 @@ extern void print_operand_address (FILE *, rtx);
extern void init_cumulative_args (CUMULATIVE_ARGS *,tree, rtx);
extern bool microblaze_legitimate_address_p (enum machine_mode, rtx, bool);
extern int microblaze_is_interrupt_variant (void);
+extern int microblaze_is_break_handler (void);
+extern int microblaze_break_function_p (tree func);
extern rtx microblaze_return_addr (int, rtx);
extern int simple_memory_operand (rtx, enum machine_mode);
extern int double_memory_operand (rtx, enum machine_mode);
-
+extern void microblaze_order_regs_for_local_alloc (void);
extern int microblaze_regno_ok_for_base_p (int, int);
extern HOST_WIDE_INT microblaze_initial_elimination_offset (int, int);
extern void microblaze_declare_object (FILE *, const char *, const char *,
diff --git a/gcc/config/microblaze/microblaze.c 
b/gcc/config/microblaze/microblaze.c
index ba8109b..fc458a5 100644
--- a/gcc/config/microblaze/microblaze.c
+++ b/gcc/config/microblaze/microblaze.c
@@ -209,6 +209,7 @@ enum reg_class microblaze_regno_to_class[] =
 and epilogue and use appropriate interrupt return.
save_volatiles- Similar to interrupt handler, but use normal return.  */
int interrupt_handler;
+int break_handler;
int fast_interrupt;
int save_volatiles;

@@ -217,6 +218,8 @@ const struct attribute_spec microblaze_attribute_table[] = {
  affects_type_identity */
   {interrupt_handler, 0,   0, true,false,   false,NULL,
false },
+  {break_handler, 0,   0, true,false,   false,NULL,
+false }, 
   {fast_interrupt,0,   0, true,false,   false,NULL,
 false },
   {save_volatiles   , 0,   0, true,false,   false,NULL,
@@ -1866,7 +1869,18 @@ microblaze_fast_interrupt_function_p (tree func)
   a = lookup_attribute (fast_interrupt, DECL_ATTRIBUTES (func));
   return a != NULL_TREE;
}
+int
+microblaze_break_function_p (tree func)
+{
+  tree a;
+  if (!func) 
+return 0;
+  if (TREE_CODE (func) != FUNCTION_DECL)
+return 0;

+  a = lookup_attribute (break_handler, DECL_ATTRIBUTES (func));
+  return a != NULL_TREE;
+} 
 /* Return true if FUNC is an interrupt function which uses
normal return, indicated by the save_volatiles attribute.  */

@@ -1891,6 +1905,13 @@ microblaze_is_interrupt_variant (void)
{
   return (interrupt_handler || fast_interrupt);
}
+int 
+microblaze_is_break_handler (void) 
+{ 
+  return break_handler; 
+} 
+
+ 
 
 /* Determine of register must be saved/restored in call.  */
static int
@@ -1994,9 +2015,14 @@ compute_frame_size (HOST_WIDE_INT size)

   interrupt_handler =
 microblaze_interrupt_function_p (current_function_decl);
+  break_handler = 
+microblaze_break_function_p (current_function_decl); 
+
   fast_interrupt =
 microblaze_fast_interrupt_function_p (current_function_decl);
   save_volatiles = microblaze_save_volatiles (current_function_decl);
+  if (break_handler)
+interrupt_handler

RE: [Patch,Microblaze]: Added Break Handler Support

2014-05-13 Thread Ajit Kumar Agarwal
Hello Michael:

Thanks for the comments on ChangeLog. Modified ChangeLog is inlined below.

2014-05-13 Ajit Agarwal ajit...@xilinx.com

 * config/microblaze/microblaze.c
   (break_handler): New Declaration.
   (microblaze_break_function_p,microblaze_is_break_handler) : New function.
   (compute_frame_size): use of microblaze_break_function_p. Add the test 
of break_handler.
   (microblaze_function_prologue,microblaze_function_epilogue) : Add the 
test of variable
   break_handler.
   (microblaze_globalize_label) : Add the test of break_handler.

 * config/microblaze/microblaze.h (BREAK_HANDLER_NAME) : New macro

 * config/microblaze/microblaze.md :
   (*optab,optab_internal): Add microblaze_is_break_handler () test.
   (call_internal1,call_value_intern) : Use of microblaze_break_function_p.
   Use of SYMBOL_REF_DECL.

 * config/microblaze/microblaze-protos.h
   (microblaze_break_function_p,microblaze_is_break_handler) : New 
Declaration.

 * testsuite/gcc.target/microblaze/others/break_handler.c : New.

Thanks  Regards
Ajit
-Original Message-
From: Michael Eager [mailto:ea...@eagercon.com] 
Sent: Tuesday, May 13, 2014 10:30 PM
To: Ajit Kumar Agarwal; gcc-patches@gcc.gnu.org
Cc: Vinod Kathail; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,Microblaze]: Added Break Handler Support

On 05/13/14 02:14, Ajit Kumar Agarwal wrote:
 Hello Michael:

 The following patch is to handle Software and Hardware breaks in Microblaze 
 Architecture.
 Deja GNU testcase does not have any regressions and the testcase attached 
 passes through.
 Review comments are incorporated.

 Okay for trunk?

Just saying OK would only be appropriate if you had write access to the 
repository.

 Added Break Handler support to incorporate the hardware and software 
 break. The Break Handler routine will be generating the rtbd 
 instruction. At the call point where the software breaks are generated with 
 the instruction brki with register operand as r16.

 2014-05-13 Ajit Agarwal ajit...@xilinx.com

 * config/microblaze/microblaze.c
 (microblaze_break_function_p,microblaze_is_break_handler) : New

 * config/microblaze/microblaze.h (BREAK_HANDLER_NAME) : New macro

 * config/microblaze/microblaze.md :
Extended support for generation of brki instruction and rtbd instruction.

A better ChangeLog entry is
* config/microblaze/microblaze.md (*optab,optab_internal):
 Add microblaze_is_break_handler () test.

Give specifics, naming functions, rather than making general comments.
As the ChangeLog standard says:
   It’s important to name the changed function or variable in full.
   Don’t abbreviate function or variable names, and don’t combine them.
   Subsequent maintainers will often search for a function name to find
   all the change log entries that pertain to it; if you abbreviate the
   name, they won’t find it when they search.

Mention each place where there are changes.  There should be a ChangeLog entry 
for each non-trivial change.

Your patch made four significant changes to microblaze.md.
There appear to be several changes in microblaze.c, not just the definition of 
the new functions as shown in your entry.


 * config/microblaze/microblaze-protos.h
 (microblaze_break_function_p,microblaze_is_break_handler) : New 
 Declaration.

 * testsuite/gcc.target/microblaze/others/break_handler.c : New.

Thanks for the test case.

As mentioned previously, add documentation for _break_handler.

 diff --git a/gcc/config/microblaze/microblaze-protos.h 
 b/gcc/config/microblaze/microblaze-protos.h
 index b03e9e1..f3cc099 100644
 --- a/gcc/config/microblaze/microblaze-protos.h
 +++ b/gcc/config/microblaze/microblaze-protos.h

Please include the patch only once, not both inline and again as an attachment.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077



RE: [Patch,Microblaze]: Added Break Handler Support

2014-05-13 Thread Ajit Kumar Agarwal
Hello Michael:

Resubmitting the Patch with documentation for _break_handler in the 
config/microblaze/microblaze.h.

Thanks  Regards
Ajit

-Original Message-
From: Michael Eager [mailto:ea...@eagercon.com] 
Sent: Wednesday, May 14, 2014 12:55 AM
To: Ajit Kumar Agarwal; gcc-patches@gcc.gnu.org
Cc: Vinod Kathail; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,Microblaze]: Added Break Handler Support

On 05/13/14 12:15, Ajit Kumar Agarwal wrote:
 Hello Michael:

 Thanks for the comments on ChangeLog. Modified ChangeLog is inlined below.

Please resubmit the patch with documentation for _break_handler.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


0001-Patch-microblaze-Added-Break-Handler-support.patch
Description: 0001-Patch-microblaze-Added-Break-Handler-support.patch