[PATCH] ira: Consider save/restore costs of callee-save registers [PR110071]

2023-09-14 Thread Surya Kumari Jangala via Gcc-patches
ira: Consider save/restore costs of callee-save registers [PR110071]

In improve_allocation() routine, IRA checks for each allocno if spilling
any conflicting allocnos can improve the allocation of this allocno.
This routine computes the cost improvement for usage of each profitable
hard register for a given allocno. The existing code in
improve_allocation() does not consider the save/restore costs of callee
save registers while computing the cost improvement.

This can result in a callee save register being assigned to a pseudo
that is live in the entire function and across a call, overriding a
non-callee save register assigned to the pseudo by graph coloring. So
the entry basic block requires a prolog, thereby causing shrink wrap to
fail.

2023-09-14  Surya Kumari Jangala  

gcc/
PR rtl-optimization/110071
* ira-color.cc (improve_allocation): Consider cost of callee
save registers.

gcc/testsuite/
PR rtl-optimization/110071
* gcc.target/powerpc/pr110071.c: New test.
---

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 5807d6d26f6..f2e8ea34152 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -3150,13 +3150,15 @@ improve_allocation (void)
   int j, k, n, hregno, conflict_hregno, base_cost, class_size, word, nwords;
   int check, spill_cost, min_cost, nregs, conflict_nregs, r, best;
   bool try_p;
-  enum reg_class aclass;
+  enum reg_class aclass, rclass;
   machine_mode mode;
   int *allocno_costs;
   int costs[FIRST_PSEUDO_REGISTER];
   HARD_REG_SET conflicting_regs[2], profitable_hard_regs;
   ira_allocno_t a;
   bitmap_iterator bi;
+  int saved_nregs;
+  int add_cost;
 
   /* Don't bother to optimize the code with static chain pointer and
  non-local goto in order not to spill the chain pointer
@@ -3194,6 +3196,7 @@ improve_allocation (void)
  conflicting_regs,
  _hard_regs);
   class_size = ira_class_hard_regs_num[aclass];
+  mode = ALLOCNO_MODE (a);
   /* Set up cost improvement for usage of each profitable hard
 register for allocno A.  */
   for (j = 0; j < class_size; j++)
@@ -3207,6 +3210,22 @@ improve_allocation (void)
  costs[hregno] = (allocno_costs == NULL
   ? ALLOCNO_UPDATED_CLASS_COST (a) : allocno_costs[k]);
  costs[hregno] -= allocno_copy_cost_saving (a, hregno);
+
+ if ((saved_nregs = calculate_saved_nregs (hregno, mode)) != 0)
+ {
+   /* We need to save/restore the hard register in
+  epilogue/prologue.  Therefore we increase the cost.
+  Since the prolog is placed in the entry BB, the frequency
+  of the entry BB is considered while computing the cost.  */
+   rclass = REGNO_REG_CLASS (hregno);
+   add_cost = ((ira_memory_move_cost[mode][rclass][0]
++ ira_memory_move_cost[mode][rclass][1])
+   * saved_nregs / hard_regno_nregs (hregno,
+ mode) - 1)
+  * REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+   costs[hregno] += add_cost;
+ }
+
  costs[hregno] -= base_cost;
  if (costs[hregno] < 0)
try_p = true;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr110071.c 
b/gcc/testsuite/gcc.target/powerpc/pr110071.c
new file mode 100644
index 000..ec03fecfb15
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr110071.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */
+
+/* Verify there is an early return without the prolog and shrink-wrap
+   the function. */
+void bar ();
+long
+foo (long i, long cond)
+{
+  if (cond)
+bar ();
+  return i+1;
+}
+
+/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 
"pro_and_epilogue" } } */


[PATCH v2] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-09-10 Thread Surya Kumari Jangala via Gcc-patches
swap: Fix incorrect lane extraction by vec_extract() [PR106770]

In the routine rs6000_analyze_swaps(), special handling of swappable
instructions is done even if the webs that contain the swappable
instructions are not optimized, i.e., the webs do not contain any
permuting load/store instructions along with the associated register
swap instructions. Doing special handling in such webs will result in
the extracted lane being adjusted unnecessarily for vec_extract.

Another issue is that existing code treats non-permuting loads/stores
as special swappables. Non-permuting loads/stores (that have not yet
been split into a permuting load/store and a swap) are handled by
converting them into a permuting load/store (which effectively removes
the swap). As a result, if special swappables are handled only in webs
containing permuting loads/stores, then non-optimal code is generated
for non-permuting loads/stores.

Hence, in this patch, all webs containing either permuting loads/
stores or non-permuting loads/stores are marked as requiring special
handling of swappables. Swaps associated with permuting loads/stores
are marked for removal, and non-permuting loads/stores are converted to
permuting loads/stores. Then the special swappables in the webs are
fixed up.

Another issue with always handling swappable instructions is that it is
incorrect to do so in webs where loads/stores on quad word aligned
addresses are changed to lvx/stvx. Similarly, in webs where
swap(load(vector constant)) instructions are replaced with
load(swapped vector constant), the swappable instructions should not be
modified.

2023-09-10  Surya Kumari Jangala  

gcc/
PR rtl-optimization/PR106770
* config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New
function.
(handle_non_permuting_mem_insn): New function.
(rs6000_analyze_swaps): Handle swappable instructions only in
certain webs.
(web_requires_special_handling): New instance variable.
(handle_special_swappables): Remove handling of non-permuting
load/store instructions.

gcc/testsuite/
PR rtl-optimization/PR106770
* gcc.target/powerpc/pr106770.c: New test.
---

diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
b/gcc/config/rs6000/rs6000-p8swap.cc
index 0388b9bd736..3a695aa1318 100644
--- a/gcc/config/rs6000/rs6000-p8swap.cc
+++ b/gcc/config/rs6000/rs6000-p8swap.cc
@@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
   unsigned int special_handling : 4;
   /* Set if the web represented by this entry cannot be optimized.  */
   unsigned int web_not_optimizable : 1;
+  /* Set if the swappable insns in the web represented by this entry
+ have to be fixed. Swappable insns have to be fixed in :
+   - webs containing permuting loads/stores and the swap insns
+in such webs have been marked for removal
+   - webs where non-permuting loads/stores have been converted
+to permuting loads/stores  */
+  unsigned int web_requires_special_handling : 1;
   /* Set if this insn should be deleted.  */
   unsigned int will_delete : 1;
 };
@@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry *insn_entry, 
unsigned i)
   if (dump_file)
fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
   break;
-case SH_NOSWAP_LD:
-  /* Convert a non-permuting load to a permuting one.  */
-  permute_load (insn);
-  break;
-case SH_NOSWAP_ST:
-  /* Convert a non-permuting store to a permuting one.  */
-  permute_store (insn);
-  break;
 case SH_EXTRACT:
   /* Change the lane on an extract operation.  */
   adjust_extract (insn);
@@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun)
   free (to_delete);
 }
 
+/* Return true if insn is a non-permuting load/store.  */
+static bool
+non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
+{
+  return (insn_entry[i].special_handling == SH_NOSWAP_LD ||
+ insn_entry[i].special_handling == SH_NOSWAP_ST);
+}
+
+/* Convert a non-permuting load/store insn to a permuting one.  */
+static void
+handle_non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
+{
+  rtx_insn *insn = insn_entry[i].insn;
+  if (insn_entry[i].special_handling == SH_NOSWAP_LD)
+permute_load (insn);
+  else if (insn_entry[i].special_handling == SH_NOSWAP_ST)
+permute_store (insn);
+}
+
 /* Main entry point for this pass.  */
 unsigned int
 rs6000_analyze_swaps (function *fun)
@@ -2624,25 +2642,56 @@ rs6000_analyze_swaps (function *fun)
   dump_swap_insn_table (insn_entry);
 }
 
-  /* For each load and store in an optimizable web (which implies
- the loads and stores are permuting), find the associated
- register swaps and mark them for removal.  Due to various
- optimizations we may mark the same swap more than once.  Also
- perform special handling for swappable insns that require it.  */
+  /* There are two kinds of optimizations 

[PING][PATCH] ira: update allocated_hardreg_p[] in improve_allocation() [PR110254]

2023-07-31 Thread Surya Kumari Jangala via Gcc-patches
Ping

On 21/07/23 3:43 pm, Surya Kumari Jangala via Gcc-patches wrote:
> The improve_allocation() routine does not update the
> allocated_hardreg_p[] array after an allocno is assigned a register.
> 
> If the register chosen in improve_allocation() is one that already has
> been assigned to a conflicting allocno, then allocated_hardreg_p[]
> already has the corresponding bit set to TRUE, so nothing needs to be
> done.
> 
> But improve_allocation() can also choose a register that has not been
> assigned to a conflicting allocno, and also has not been assigned to any
> other allocno. In this case, allocated_hardreg_p[] has to be updated.
> 
> 2023-07-21  Surya Kumari Jangala  
> 
> gcc/
>   PR rtl-optimization/PR110254
>   * ira-color.cc (improve_allocation): Update array
> ---
> 
> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> index 1fb2958bddd..5807d6d26f6 100644
> --- a/gcc/ira-color.cc
> +++ b/gcc/ira-color.cc
> @@ -3340,6 +3340,10 @@ improve_allocation (void)
>   }
>/* Assign the best chosen hard register to A.  */
>ALLOCNO_HARD_REGNO (a) = best;
> +
> +  for (j = nregs - 1; j >= 0; j--)
> + allocated_hardreg_p[best + j] = true;
> +
>if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
>   fprintf (ira_dump_file, "Assigning %d to a%dr%d\n",
>best, ALLOCNO_NUM (a), ALLOCNO_REGNO (a));


[PATCH] ira: update allocated_hardreg_p[] in improve_allocation() [PR110254]

2023-07-21 Thread Surya Kumari Jangala via Gcc-patches
The improve_allocation() routine does not update the
allocated_hardreg_p[] array after an allocno is assigned a register.

If the register chosen in improve_allocation() is one that already has
been assigned to a conflicting allocno, then allocated_hardreg_p[]
already has the corresponding bit set to TRUE, so nothing needs to be
done.

But improve_allocation() can also choose a register that has not been
assigned to a conflicting allocno, and also has not been assigned to any
other allocno. In this case, allocated_hardreg_p[] has to be updated.

2023-07-21  Surya Kumari Jangala  

gcc/
PR rtl-optimization/PR110254
* ira-color.cc (improve_allocation): Update array
---

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 1fb2958bddd..5807d6d26f6 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -3340,6 +3340,10 @@ improve_allocation (void)
}
   /* Assign the best chosen hard register to A.  */
   ALLOCNO_HARD_REGNO (a) = best;
+
+  for (j = nregs - 1; j >= 0; j--)
+   allocated_hardreg_p[best + j] = true;
+
   if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
fprintf (ira_dump_file, "Assigning %d to a%dr%d\n",
 best, ALLOCNO_NUM (a), ALLOCNO_REGNO (a));


Re: [PATCH v2] rs6000: fmr gets used instead of faster xxlor [PR93571]

2023-06-14 Thread Surya Kumari Jangala via Gcc-patches



On 25/02/23 3:20 pm, Ajit Agarwal via Gcc-patches wrote:
> Hello All:
> 
> Here is the patch that uses xxlor instead of fmr where possible.
> Performance results shows that fmr is better in power9 and 
> power10 architectures whereas xxlor is better in power7 and
> power 8 architectures. fmr is the only option before p7.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu
> 
> Thanks & Regards
> Ajit
> 
>   rs6000: Use xxlor instead of fmr where possible
> 
>   Replaces fmr with xxlor instruction for power7 and power8
>   architectures whereas for power9 and power10 keep fmr
>   instruction.
> 
>   Perf measurement results:
> 
>   Power9 fmr:  201,847,661 cycles.
>   Power9 xxlor: 201,877,78 cycles.
>   Power8 fmr: 200,901,043 cycles.
>   Power8 xxlor: 201,020,518 cycles.

'fmr' is better than 'xxlor' for power8 according to the above numbers. Should 
we then replace fmr with xxlor?

-Surya


Re: [PATCH v4 3/4] ree: Main functionality to improve ree pass for rs6000 target.

2023-04-24 Thread Surya Kumari Jangala via Gcc-patches



On 21/04/23 8:51 pm, Ajit Agarwal via Gcc-patches wrote:

> +/* Return TRUE if the cfg has following properties.
> + bb1
> + |\
> + | \
> + |  bb2
> + |  /
> + bb3
> +
> +   whereas bb1 has IF_THEN_ELSE  and bb2 has the definition and bb3 has
> +   zero/sign/AND extensions.  */
> +

Any specific reason for requiring CFGs to have only this particular shape? The 
patch should be generic enough to work for all CFGs.

Regards,
Surya

> +static bool
> +feasible_cfg (ext_cand *cand, rtx_insn *def_insn)
> +{
> +  basic_block bb = BLOCK_FOR_INSN (cand->insn);
> +  edge fallthru_edge;
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +{
> +  rtx_insn *insn = BB_END (e->src) ? PREV_INSN (BB_END (e->src)) : NULL;
> +
> +  if (insn == NULL)
> + continue;
> +
> +  if (DEBUG_INSN_P (insn))
> + continue;
> +
> +  rtx set = single_set (insn);
> +
> +  /* Block has IF_THEN_ELSE  */
> +  if (insn && set
> +   && GET_CODE (set) == SET && SET_SRC (set)
> +   && GET_CODE (SET_SRC (set)) == IF_THEN_ELSE)
> + {
> +   if (e->dest == bb)
> + {
> +   basic_block jump_block = e->dest;
> +   if (jump_block != bb)
> + return false;
> +  }
> +  else
> +{
> +  /* def_insn block has single successor and fall through
> + edge target are the block for cand insn.  */
> +  if (single_succ_p (e->dest))
> +{
> +  fallthru_edge = single_succ_edge (e->dest);
> +  if (BB_END (fallthru_edge->dest)
> +  && bb != fallthru_edge->dest)
> +return false;
> +}
> + }
> +   }
> +}
> +
> +  /* def_insn block has single successor and fall through
> + edge target are the block for cand insn.  */
> +  if (single_succ_p (BLOCK_FOR_INSN (def_insn)))
> +{
> +  fallthru_edge = single_succ_edge (BLOCK_FOR_INSN (def_insn));
> +  if (BB_END (fallthru_edge->dest)
> +   && bb != fallthru_edge->dest)
> + return false;
> +}
> +   else
> + return false;
> +
> +  return true;
> +}
> +
> +/* Return TRUE if the candidate extension INSN and def_insn are
> +   feasible for extension elimination.
> +
> +   Things to consider:
> +
> +   cfg properties are feasible for extension elimination.
> +
> +   sign_extend with def insn as PLUS and the reaching definition
> +   of def_insn are not ASHIFT and LSHIFTRT.
> +
> +   zero_extend with def insn as XOR/IOR and the reachin definition
> +   of def_insn are not ASHIFT and LSHIFTRT.
> +
> +   The destination register of the extension insn must not be
> +   used or set between the def_insn and cand->insn exclusive.
> +
> +   AND with zero extension properties has USE and the register
> +   of cand insn are same as register of USE operand.  */
> +
> +static bool
> +eliminate_across_bbs_p (ext_cand *cand, rtx_insn *def_insn)
> +{
> +  basic_block bb = BLOCK_FOR_INSN (cand->insn);
> +
> +  if (!feasible_cfg (cand, def_insn))
> +return false;
> +
> +  rtx cand_set = single_set(cand->insn);
> +  /* The destination register of the extension insn must not be
> +  used or set between the def_insn and cand->insn exclusive.  */
> +  if (INSN_CHAIN_CODE_P (GET_CODE (def_insn))
> +  && INSN_CHAIN_CODE_P (cand->code))
> +if ((cand->code == ZERO_EXTEND)
> +  && REG_P (SET_DEST (cand_set)) && NEXT_INSN (def_insn)
> +  && reg_used_set_between_p(SET_DEST (cand_set), def_insn, cand->insn))
> +  return false;
> +
> +  if (cand->code == ZERO_EXTEND
> +  && (bb != BLOCK_FOR_INSN (def_insn)
> +  || DF_INSN_LUID (def_insn) > DF_INSN_LUID (cand->insn)))
> +return false;
> +
> +  if (rtx_is_zext_p (cand->insn))
> +{
> +  if (GET_CODE (PATTERN (BB_END (bb))) != USE)
> + return false;
> +
> +  if (REGNO (XEXP (PATTERN (BB_END (bb)), 0)) != REGNO (SET_DEST 
> (cand->expr)))
> + return false;
> +}
> +
> +  rtx set = single_set (def_insn);
> +
> +  if (!set)
> +return false;
> +
> +  if (cand->code == SIGN_EXTEND
> +  && GET_CODE (set) == SET)
> +{
> +  rtx orig_src = SET_SRC (set);
> +  machine_mode ext_src_mode;
> +
> +  ext_src_mode = GET_MODE (XEXP (SET_SRC (cand->expr), 0));
> +
> +  if (GET_MODE (SET_DEST (set)) != ext_src_mode)
> + return false;
> +
> +  if (GET_CODE (orig_src) != PLUS)
> + return false;
> +
> +  if (!REG_P (XEXP (orig_src, 0)))
> + return false;
> +
> +  if (!REG_P (XEXP (orig_src,1)))
> + return false;
> +
> +  if (GET_CODE (orig_src) == PLUS)
> + {
> +   bool def_src1
> + = def_arith_p (def_insn,
> +XEXP (SET_SRC (set), 0));
> +   bool def_src2
> + = def_arith_p (def_insn,
> +XEXP (SET_SRC (set), 1));
> +
> +   if (def_src1 || def_src2)
> + return false;
> + }
> +  

Re: [PATCH] rs6000: suboptimal code for returning bool value on target ppc

2023-03-16 Thread Surya Kumari Jangala via Gcc-patches
The issue of suboptimal code exists even for integer return value and not just 
bool return value. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103784#c9 
So the patch would need to take care of integer return values too.

On 16/03/23 10:50 am, Ajit Agarwal via Gcc-patches wrote:
> Hello All:
> 
> 
> This patch eliminates unnecessary zero extension instruction from power 
> generated assembly.
> Bootstrapped and regtested on powerpc64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
>   rs6000: suboptimal code for returning bool value on target ppc.
> 
>   New pass to eliminate unnecessary zero extension. This pass
>   is registered after cse rtl pass.
> 
>   2023-03-16  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-passes.def: Registered zero elimination
>   pass.
>   * config/rs6000/rs6000-zext-elim.cc: Add new pass.
>   * config.gcc: Add new executable.
>   * config/rs6000/rs6000-protos.h: Add new prototype for zero
>   elimination pass.
>   * config/rs6000/rs6000.cc: Add new prototype for zero
>   elimination pass.
>   * config/rs6000/t-rs6000: Add new rule.
>   * expr.cc: Modified gcc assert.
>   * explow.cc: Modified gcc assert.
>   * optabs.cc: Modified gcc assert.
> ---
>  gcc/config.gcc|   4 +-
>  gcc/config/rs6000/rs6000-passes.def   |   2 +
>  gcc/config/rs6000/rs6000-protos.h |   1 +
>  gcc/config/rs6000/rs6000-zext-elim.cc | 361 ++
>  gcc/config/rs6000/rs6000.cc   |   2 +
>  gcc/config/rs6000/t-rs6000|   5 +
>  gcc/explow.cc |   3 +-
>  gcc/expr.cc   |   4 +-
>  gcc/optabs.cc |   3 +-
>  9 files changed, 379 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/config/rs6000/rs6000-zext-elim.cc
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index da3a6d3ba1f..e8ac9d882f0 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -503,7 +503,7 @@ or1k*-*-*)
>   ;;
>  powerpc*-*-*)
>   cpu_type=rs6000
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-zext-elim.o 
> rs6000-logue.o"
>   extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>   extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
>   extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
> @@ -538,7 +538,7 @@ riscv*)
>   ;;
>  rs6000*-*-*)
>   extra_options="${extra_options} g.opt fused-madd.opt 
> rs6000/rs6000-tables.opt"
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-zext-elim.o 
> rs6000-logue.o"
>   extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>   target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-logue.cc 
> \$(srcdir)/config/rs6000/rs6000-call.cc"
>   target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
> diff --git a/gcc/config/rs6000/rs6000-passes.def 
> b/gcc/config/rs6000/rs6000-passes.def
> index ca899d5f7af..d7500feddf1 100644
> --- a/gcc/config/rs6000/rs6000-passes.def
> +++ b/gcc/config/rs6000/rs6000-passes.def
> @@ -28,6 +28,8 @@ along with GCC; see the file COPYING3.  If not see
>   The power8 does not have instructions that automaticaly do the byte 
> swaps
>   for loads and stores.  */
>INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
> +  INSERT_PASS_AFTER (pass_cse, 1, pass_analyze_zext);
> +
>  
>/* Pass to do the PCREL_OPT optimization that combines the load of an
>   external symbol's address along with a single load or store using that
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index 1a4fc1df668..f6cf2d673d4 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -340,6 +340,7 @@ namespace gcc { class context; }
>  class rtl_opt_pass;
>  
>  extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
> +extern rtl_opt_pass *make_pass_analyze_zext (gcc::context *);
>  extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *);
>  extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
>  extern bool rs6000_quadword_masked_address_p (const_rtx exp);
> diff --git a/gcc/config/rs6000/rs6000-zext-elim.cc 
> b/gcc/config/rs6000/rs6000-zext-elim.cc
> new file mode 100644
> index 000..777c7a5a387
> --- /dev/null
> +++ b/gcc/config/rs6000/rs6000-zext-elim.cc
> @@ -0,0 +1,361 @@
> +/* Subroutine to eliminate redundant zero extend for power architecture.
> +   Copyright (C) 1991-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.

Re: [PATCH] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-03-03 Thread Surya Kumari Jangala via Gcc-patches



On 27/02/23 9:58 pm, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Jan 04, 2023 at 01:58:19PM +0530, Surya Kumari Jangala wrote:
>> In the routine rs6000_analyze_swaps(), special handling of swappable
>> instructions is done even if the webs that contain the swappable
>> instructions are not optimized, i.e., the webs do not contain any
>> permuting load/store instructions along with the associated register
>> swap instructions. Doing special handling in such webs will result in
>> the extracted lane being adjusted unnecessarily for vec_extract.
>>
>> Modifying swappable instructions is also incorrect in webs where
>> loads/stores on quad word aligned addresses are changed to lvx/stvx.
>> Similarly, in webs where swap(load(vector constant)) instructions are
>> replaced with load(swapped vector constant), the swappable
>> instructions should not be modified.
>>
>> 2023-01-04  Surya Kumari Jangala  
>>
>> gcc/
>>  PR rtl-optimization/106770
>>  * rs6000-p8swap.cc (rs6000_analyze_swaps): .
> 
> Please add an entry?  Or multiple ones, actually.  Describe all changes.
Ok

> 
>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>>unsigned int special_handling : 4;
>>/* Set if the web represented by this entry cannot be optimized.  */
>>unsigned int web_not_optimizable : 1;
>> +  /* Set if the web represented by this entry has been optimized, ie,
> 
> s/ie/i.e./
> 
>> + register swaps of permuting loads/stores have been removed.  */
> 
> If it really means only exactly this, then the name isn't so good.

There is another bit in this class named "web_not_optimizable". This stands for 
webs that cannot be optimized. I am reusing this name for the new bit I am 
adding, and I have named this bit as "web_is_optimized".

> 
>> +  unsigned int web_is_optimized : 1;
> 
> And if it is as general as the name suggests, then the comment is no
> good.  Which is it?  :-)
> 
>>/* For each load and store in an optimizable web (which implies
>>   the loads and stores are permuting), find the associated
>>   register swaps and mark them for removal.  Due to various
>> - optimizations we may mark the same swap more than once.  Also
>> - perform special handling for swappable insns that require it.  */
>> + optimizations we may mark the same swap more than once. Fix up
>> + the non-permuting loads and stores by converting them into
>> + permuting ones.  */
> 
> Two spaces after a full stop is correct.  Please put that back.
Ok

> 
> Is it a good idea convert from/to swapping load/stores in this pass at
> all?  Doesdn't that belong elsewhere?  Like, in combine, where we
> already should do this.  Why does that not work> 
>> -if (!root_entry->web_not_optimizable)
>> +if (!root_entry->web_not_optimizable) {
> 
> Blocks start on a new line, indented.
Ok

> 
>>mark_swaps_for_removal (insn_entry, i);
>> +  root_entry->web_is_optimized = true;
> 
> Indent using tabs where possible.
Ok

> 
>> +swap_web_entry* root_entry
>> +  = (swap_web_entry*)((_entry[i])->unionfind_root ());
> 
> Space before *, in all cases. Space before the second (.  There are too
> many brackets here, too.
Ok

> 
>> +  /* Perform special handling for swappable insns that require it. 
> 
> No trailing spaces.
Ok

> 
>> + Note that special handling should be done only for those 
>> + swappable insns that are present in webs optimized above.  */
>> +  for (i = 0; i < e; ++i)
>> +if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
>> +!(insn_entry[i].special_handling == SH_NOSWAP_LD || 
>> +  insn_entry[i].special_handling == SH_NOSWAP_ST))
>>{
>>  swap_web_entry* root_entry
>>= (swap_web_entry*)((_entry[i])->unionfind_root ());
>> -if (!root_entry->web_not_optimizable)
>> +if (root_entry->web_is_optimized)
>>handle_special_swappables (insn_entry, i);
>>}
> 
> Why this change?

The swap pass analyzes vector computations and removes unnecessary doubleword 
swaps (xxswapdi instructions). The swap pass first constructs webs and removes 
swap instructions if possible. If the web contains operations that are 
sensitive to element order (ie, insns requiring special handling), such as an 
extract, then such instructions should be modified. For example, for an extract 
operation the lane is changed.
However, the swap pass is changing element order of an extract operation that 
is present in unoptimized webs. Unoptimized webs are those for which register 
swaps were not removed, one of the reasons being that there are no loads/stores 
present in this web. For such webs, element order of extract operation should 
not be changed.
Hence, we first mark webs that can be optimized, and only for such webs we call 
the routine handle_special_swappables() to modify operations sensitive to 
element order.

[PING 3] [PATCH] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-02-27 Thread Surya Kumari Jangala via Gcc-patches
Hello,

Ping https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609374.html

Thanks,

Surya

On 04/01/23 1:58 pm, Surya Kumari Jangala via Gcc-patches wrote:
> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
> 
> In the routine rs6000_analyze_swaps(), special handling of swappable
> instructions is done even if the webs that contain the swappable
> instructions are not optimized, i.e., the webs do not contain any
> permuting load/store instructions along with the associated register
> swap instructions. Doing special handling in such webs will result in
> the extracted lane being adjusted unnecessarily for vec_extract.
> 
> Modifying swappable instructions is also incorrect in webs where
> loads/stores on quad word aligned addresses are changed to lvx/stvx.
> Similarly, in webs where swap(load(vector constant)) instructions are
> replaced with load(swapped vector constant), the swappable
> instructions should not be modified.
> 
> 2023-01-04  Surya Kumari Jangala  
> 
> gcc/
>   PR rtl-optimization/106770
>   * rs6000-p8swap.cc (rs6000_analyze_swaps): .
> 
> gcc/testsuite/
>   PR rtl-optimization/106770
>   * gcc.target/powerpc/pr106770.c: New test.
> ---
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
> b/gcc/config/rs6000/rs6000-p8swap.cc
> index 19fbbfb67dc..7ed39251df9 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>unsigned int special_handling : 4;
>/* Set if the web represented by this entry cannot be optimized.  */
>unsigned int web_not_optimizable : 1;
> +  /* Set if the web represented by this entry has been optimized, ie,
> + register swaps of permuting loads/stores have been removed.  */
> +  unsigned int web_is_optimized : 1;
>/* Set if this insn should be deleted.  */
>unsigned int will_delete : 1;
>  };
> @@ -2627,22 +2630,43 @@ rs6000_analyze_swaps (function *fun)
>/* For each load and store in an optimizable web (which implies
>   the loads and stores are permuting), find the associated
>   register swaps and mark them for removal.  Due to various
> - optimizations we may mark the same swap more than once.  Also
> - perform special handling for swappable insns that require it.  */
> + optimizations we may mark the same swap more than once. Fix up
> + the non-permuting loads and stores by converting them into
> + permuting ones.  */
>for (i = 0; i < e; ++i)
>  if ((insn_entry[i].is_load || insn_entry[i].is_store)
>   && insn_entry[i].is_swap)
>{
>   swap_web_entry* root_entry
> = (swap_web_entry*)((_entry[i])->unionfind_root ());
> - if (!root_entry->web_not_optimizable)
> + if (!root_entry->web_not_optimizable) {
> mark_swaps_for_removal (insn_entry, i);
> +  root_entry->web_is_optimized = true;
> +}
>}
> -else if (insn_entry[i].is_swappable && insn_entry[i].special_handling)
> +else if (insn_entry[i].is_swappable
> + && (insn_entry[i].special_handling == SH_NOSWAP_LD ||
> + insn_entry[i].special_handling == SH_NOSWAP_ST))
> +  {
> +swap_web_entry* root_entry
> +  = (swap_web_entry*)((_entry[i])->unionfind_root ());
> +if (!root_entry->web_not_optimizable) {
> +  handle_special_swappables (insn_entry, i);
> +  root_entry->web_is_optimized = true;
> +}
> +  }
> +
> +  /* Perform special handling for swappable insns that require it. 
> + Note that special handling should be done only for those 
> + swappable insns that are present in webs optimized above.  */
> +  for (i = 0; i < e; ++i)
> +if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
> +!(insn_entry[i].special_handling == SH_NOSWAP_LD || 
> +  insn_entry[i].special_handling == SH_NOSWAP_ST))
>{
>   swap_web_entry* root_entry
> = (swap_web_entry*)((_entry[i])->unionfind_root ());
> - if (!root_entry->web_not_optimizable)
> + if (root_entry->web_is_optimized)
> handle_special_swappables (insn_entry, i);
>}
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> new file mode 100644
> index 000..84e9aead975
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O3 " } 

Re: [PING 2] [PATCH] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-02-16 Thread Surya Kumari Jangala via Gcc-patches
Ping. Please review the patch.

On 12/01/23 10:21 pm, Surya Kumari Jangala via Gcc-patches wrote:
> Ping
> 
> On 04/01/23 1:58 pm, Surya Kumari Jangala via Gcc-patches wrote:
>> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
>>
>> In the routine rs6000_analyze_swaps(), special handling of swappable
>> instructions is done even if the webs that contain the swappable
>> instructions are not optimized, i.e., the webs do not contain any
>> permuting load/store instructions along with the associated register
>> swap instructions. Doing special handling in such webs will result in
>> the extracted lane being adjusted unnecessarily for vec_extract.
>>
>> Modifying swappable instructions is also incorrect in webs where
>> loads/stores on quad word aligned addresses are changed to lvx/stvx.
>> Similarly, in webs where swap(load(vector constant)) instructions are
>> replaced with load(swapped vector constant), the swappable
>> instructions should not be modified.
>>
>> 2023-01-04  Surya Kumari Jangala  
>>
>> gcc/
>>  PR rtl-optimization/106770
>>  * rs6000-p8swap.cc (rs6000_analyze_swaps): .
>>
>> gcc/testsuite/
>>  PR rtl-optimization/106770
>>  * gcc.target/powerpc/pr106770.c: New test.
>> ---
>>
>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
>> b/gcc/config/rs6000/rs6000-p8swap.cc
>> index 19fbbfb67dc..7ed39251df9 100644
>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>>unsigned int special_handling : 4;
>>/* Set if the web represented by this entry cannot be optimized.  */
>>unsigned int web_not_optimizable : 1;
>> +  /* Set if the web represented by this entry has been optimized, ie,
>> + register swaps of permuting loads/stores have been removed.  */
>> +  unsigned int web_is_optimized : 1;
>>/* Set if this insn should be deleted.  */
>>unsigned int will_delete : 1;
>>  };
>> @@ -2627,22 +2630,43 @@ rs6000_analyze_swaps (function *fun)
>>/* For each load and store in an optimizable web (which implies
>>   the loads and stores are permuting), find the associated
>>   register swaps and mark them for removal.  Due to various
>> - optimizations we may mark the same swap more than once.  Also
>> - perform special handling for swappable insns that require it.  */
>> + optimizations we may mark the same swap more than once. Fix up
>> + the non-permuting loads and stores by converting them into
>> + permuting ones.  */
>>for (i = 0; i < e; ++i)
>>  if ((insn_entry[i].is_load || insn_entry[i].is_store)
>>  && insn_entry[i].is_swap)
>>{
>>  swap_web_entry* root_entry
>>= (swap_web_entry*)((_entry[i])->unionfind_root ());
>> -if (!root_entry->web_not_optimizable)
>> +if (!root_entry->web_not_optimizable) {
>>mark_swaps_for_removal (insn_entry, i);
>> +  root_entry->web_is_optimized = true;
>> +}
>>}
>> -else if (insn_entry[i].is_swappable && insn_entry[i].special_handling)
>> +else if (insn_entry[i].is_swappable
>> + && (insn_entry[i].special_handling == SH_NOSWAP_LD ||
>> + insn_entry[i].special_handling == SH_NOSWAP_ST))
>> +  {
>> +swap_web_entry* root_entry
>> +  = (swap_web_entry*)((_entry[i])->unionfind_root ());
>> +if (!root_entry->web_not_optimizable) {
>> +  handle_special_swappables (insn_entry, i);
>> +  root_entry->web_is_optimized = true;
>> +}
>> +  }
>> +
>> +  /* Perform special handling for swappable insns that require it. 
>> + Note that special handling should be done only for those 
>> + swappable insns that are present in webs optimized above.  */
>> +  for (i = 0; i < e; ++i)
>> +if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
>> +!(insn_entry[i].special_handling == SH_NOSWAP_LD || 
>> +  insn_entry[i].special_handling == SH_NOSWAP_ST))
>>{
>>  swap_web_entry* root_entry
>>= (swap_web_entry*)((_entry[i])->unionfind_root ());
>> -if (!root_entry->web_not_optimizable)
>> +if (root_entry->web_is_optimized)
>>handle_special_swappables (insn_entry, i);
>>}
>>  
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c 
>> 

[PING] [PATCH] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-01-12 Thread Surya Kumari Jangala via Gcc-patches
Ping

On 04/01/23 1:58 pm, Surya Kumari Jangala via Gcc-patches wrote:
> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
> 
> In the routine rs6000_analyze_swaps(), special handling of swappable
> instructions is done even if the webs that contain the swappable
> instructions are not optimized, i.e., the webs do not contain any
> permuting load/store instructions along with the associated register
> swap instructions. Doing special handling in such webs will result in
> the extracted lane being adjusted unnecessarily for vec_extract.
> 
> Modifying swappable instructions is also incorrect in webs where
> loads/stores on quad word aligned addresses are changed to lvx/stvx.
> Similarly, in webs where swap(load(vector constant)) instructions are
> replaced with load(swapped vector constant), the swappable
> instructions should not be modified.
> 
> 2023-01-04  Surya Kumari Jangala  
> 
> gcc/
>   PR rtl-optimization/106770
>   * rs6000-p8swap.cc (rs6000_analyze_swaps): .
> 
> gcc/testsuite/
>   PR rtl-optimization/106770
>   * gcc.target/powerpc/pr106770.c: New test.
> ---
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
> b/gcc/config/rs6000/rs6000-p8swap.cc
> index 19fbbfb67dc..7ed39251df9 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>unsigned int special_handling : 4;
>/* Set if the web represented by this entry cannot be optimized.  */
>unsigned int web_not_optimizable : 1;
> +  /* Set if the web represented by this entry has been optimized, ie,
> + register swaps of permuting loads/stores have been removed.  */
> +  unsigned int web_is_optimized : 1;
>/* Set if this insn should be deleted.  */
>unsigned int will_delete : 1;
>  };
> @@ -2627,22 +2630,43 @@ rs6000_analyze_swaps (function *fun)
>/* For each load and store in an optimizable web (which implies
>   the loads and stores are permuting), find the associated
>   register swaps and mark them for removal.  Due to various
> - optimizations we may mark the same swap more than once.  Also
> - perform special handling for swappable insns that require it.  */
> + optimizations we may mark the same swap more than once. Fix up
> + the non-permuting loads and stores by converting them into
> + permuting ones.  */
>for (i = 0; i < e; ++i)
>  if ((insn_entry[i].is_load || insn_entry[i].is_store)
>   && insn_entry[i].is_swap)
>{
>   swap_web_entry* root_entry
> = (swap_web_entry*)((_entry[i])->unionfind_root ());
> - if (!root_entry->web_not_optimizable)
> + if (!root_entry->web_not_optimizable) {
> mark_swaps_for_removal (insn_entry, i);
> +  root_entry->web_is_optimized = true;
> +}
>}
> -else if (insn_entry[i].is_swappable && insn_entry[i].special_handling)
> +else if (insn_entry[i].is_swappable
> + && (insn_entry[i].special_handling == SH_NOSWAP_LD ||
> + insn_entry[i].special_handling == SH_NOSWAP_ST))
> +  {
> +swap_web_entry* root_entry
> +  = (swap_web_entry*)((_entry[i])->unionfind_root ());
> +if (!root_entry->web_not_optimizable) {
> +  handle_special_swappables (insn_entry, i);
> +  root_entry->web_is_optimized = true;
> +}
> +  }
> +
> +  /* Perform special handling for swappable insns that require it. 
> + Note that special handling should be done only for those 
> + swappable insns that are present in webs optimized above.  */
> +  for (i = 0; i < e; ++i)
> +if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
> +!(insn_entry[i].special_handling == SH_NOSWAP_LD || 
> +  insn_entry[i].special_handling == SH_NOSWAP_ST))
>{
>   swap_web_entry* root_entry
> = (swap_web_entry*)((_entry[i])->unionfind_root ());
> - if (!root_entry->web_not_optimizable)
> + if (root_entry->web_is_optimized)
> handle_special_swappables (insn_entry, i);
>}
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> new file mode 100644
> index 000..84e9aead975
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O3 " } */
> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
> +
> +/* Test case to resolve PR106770  */
> +
> +#include 
> +
> +int cmp2(double a, double b)
> +{
> +vector double va = vec_promote(a, 1);
> +vector double vb = vec_promote(b, 1);
> +vector long long vlt = (vector long long)vec_cmplt(va, vb);
> +vector long long vgt = (vector long long)vec_cmplt(vb, va);
> +vector signed long long vr = vec_sub(vlt, vgt);
> +
> +return vec_extract(vr, 1);
> +}
> +


[PATCH] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-01-04 Thread Surya Kumari Jangala via Gcc-patches
swap: Fix incorrect lane extraction by vec_extract() [PR106770]

In the routine rs6000_analyze_swaps(), special handling of swappable
instructions is done even if the webs that contain the swappable
instructions are not optimized, i.e., the webs do not contain any
permuting load/store instructions along with the associated register
swap instructions. Doing special handling in such webs will result in
the extracted lane being adjusted unnecessarily for vec_extract.

Modifying swappable instructions is also incorrect in webs where
loads/stores on quad word aligned addresses are changed to lvx/stvx.
Similarly, in webs where swap(load(vector constant)) instructions are
replaced with load(swapped vector constant), the swappable
instructions should not be modified.

2023-01-04  Surya Kumari Jangala  

gcc/
PR rtl-optimization/106770
* rs6000-p8swap.cc (rs6000_analyze_swaps): .

gcc/testsuite/
PR rtl-optimization/106770
* gcc.target/powerpc/pr106770.c: New test.
---

diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
b/gcc/config/rs6000/rs6000-p8swap.cc
index 19fbbfb67dc..7ed39251df9 100644
--- a/gcc/config/rs6000/rs6000-p8swap.cc
+++ b/gcc/config/rs6000/rs6000-p8swap.cc
@@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
   unsigned int special_handling : 4;
   /* Set if the web represented by this entry cannot be optimized.  */
   unsigned int web_not_optimizable : 1;
+  /* Set if the web represented by this entry has been optimized, ie,
+ register swaps of permuting loads/stores have been removed.  */
+  unsigned int web_is_optimized : 1;
   /* Set if this insn should be deleted.  */
   unsigned int will_delete : 1;
 };
@@ -2627,22 +2630,43 @@ rs6000_analyze_swaps (function *fun)
   /* For each load and store in an optimizable web (which implies
  the loads and stores are permuting), find the associated
  register swaps and mark them for removal.  Due to various
- optimizations we may mark the same swap more than once.  Also
- perform special handling for swappable insns that require it.  */
+ optimizations we may mark the same swap more than once. Fix up
+ the non-permuting loads and stores by converting them into
+ permuting ones.  */
   for (i = 0; i < e; ++i)
 if ((insn_entry[i].is_load || insn_entry[i].is_store)
&& insn_entry[i].is_swap)
   {
swap_web_entry* root_entry
  = (swap_web_entry*)((_entry[i])->unionfind_root ());
-   if (!root_entry->web_not_optimizable)
+   if (!root_entry->web_not_optimizable) {
  mark_swaps_for_removal (insn_entry, i);
+  root_entry->web_is_optimized = true;
+}
   }
-else if (insn_entry[i].is_swappable && insn_entry[i].special_handling)
+else if (insn_entry[i].is_swappable
+ && (insn_entry[i].special_handling == SH_NOSWAP_LD ||
+ insn_entry[i].special_handling == SH_NOSWAP_ST))
+  {
+swap_web_entry* root_entry
+  = (swap_web_entry*)((_entry[i])->unionfind_root ());
+if (!root_entry->web_not_optimizable) {
+  handle_special_swappables (insn_entry, i);
+  root_entry->web_is_optimized = true;
+}
+  }
+
+  /* Perform special handling for swappable insns that require it. 
+ Note that special handling should be done only for those 
+ swappable insns that are present in webs optimized above.  */
+  for (i = 0; i < e; ++i)
+if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
+!(insn_entry[i].special_handling == SH_NOSWAP_LD || 
+  insn_entry[i].special_handling == SH_NOSWAP_ST))
   {
swap_web_entry* root_entry
  = (swap_web_entry*)((_entry[i])->unionfind_root ());
-   if (!root_entry->web_not_optimizable)
+   if (root_entry->web_is_optimized)
  handle_special_swappables (insn_entry, i);
   }
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c 
b/gcc/testsuite/gcc.target/powerpc/pr106770.c
new file mode 100644
index 000..84e9aead975
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 " } */
+/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
+
+/* Test case to resolve PR106770  */
+
+#include 
+
+int cmp2(double a, double b)
+{
+vector double va = vec_promote(a, 1);
+vector double vb = vec_promote(b, 1);
+vector long long vlt = (vector long long)vec_cmplt(va, vb);
+vector long long vgt = (vector long long)vec_cmplt(vb, va);
+vector signed long long vr = vec_sub(vlt, vgt);
+
+return vec_extract(vr, 1);
+}
+


Re: [PATCH] sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

2022-11-08 Thread Surya Kumari Jangala via Gcc-patches
Hi Richard,

On 21/09/22 1:03 pm, Richard Biener wrote:
> On Tue, Sep 20, 2022 at 9:18 AM Surya Kumari Jangala via Gcc-patches
>  wrote:
>>
>> Hi Jeff, Richard,
>> Thank you for reviewing the patch!
>> I have committed the patch to the gcc repo.
>> Can I backport this patch to prior versions of gcc, as this is an easy patch 
>> to backport and the issue exists in prior versions too?
> 
> It doesn't seem to be a regression so I'd error on the safe side here.

Can you please clarify, should this patch be backported? It is not very clear 
what "safe side" means here.

Thanks!
Surya

> 
> Richard.
> 
>> Regards,
>> Surya
>>
>>
>> On 31/08/22 9:09 pm, Jeff Law via Gcc-patches wrote:
>>>
>>>
>>> On 8/23/2022 5:49 AM, Surya Kumari Jangala via Gcc-patches wrote:
>>>> sched1: Fix -fcompare-debug issue in schedule_region [PR105586]
>>>>
>>>> In schedule_region(), a basic block that does not contain any real insns
>>>> is not scheduled and the dfa state at the entry of the bb is not copied
>>>> to the fallthru basic block. However a DEBUG insn is treated as a real
>>>> insn, and if a bb contains non-real insns and a DEBUG insn, it's dfa
>>>> state is copied to the fallthru bb. This was resulting in
>>>> -fcompare-debug failure as the incoming dfa state of the fallthru block
>>>> is different with -g. We should always copy the dfa state of a bb to
>>>> it's fallthru bb even if the bb does not contain real insns.
>>>>
>>>> 2022-08-22  Surya Kumari Jangala  
>>>>
>>>> gcc/
>>>> PR rtl-optimization/105586
>>>> * sched-rgn.cc (schedule_region): Always copy dfa state to
>>>> fallthru block.
>>>>
>>>> gcc/testsuite/
>>>> PR rtl-optimization/105586
>>>> * gcc.target/powerpc/pr105586.c: New test.
>>> Interesting.We may have stumbled over this bug internally a little 
>>> while ago -- not from a compare-debug standpoint, but from a "why isn't the 
>>> processor state copied to the fallthru block" point of view.   I had it on 
>>> my to investigate list, but hadn't gotten around to it yet.
>>>
>>> I think there were requests for ChangeLog updates and a function comment 
>>> for save_state_for_fallthru_edge.  OK with those updates.
>>>
>>> jeff
>>>


[PATCH] testsuite: Fix failure in test pr105586.c [PR107171]

2022-10-13 Thread Surya Kumari Jangala via Gcc-patches
testsuite: Fix failure in test pr105586.c [PR107171]

The test pr105586.c fails on a big endian system when run in 32bit
mode. The failure occurs as the test case does not guard against
unsupported __int128.

2022-10-13  Surya Kumari Jangala  

gcc/testsuite/
PR testsuite/107171
* gcc.target/powerpc/pr105586.c: Guard against unsupported
__int128.


diff --git a/gcc/testsuite/gcc.target/powerpc/pr105586.c 
b/gcc/testsuite/gcc.target/powerpc/pr105586.c
index bd397f5..3f88a09 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr105586.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr105586.c
@@ -1,4 +1,5 @@
 /* { dg-options "-mdejagnu-tune=power4 -O2 -fcompare-debug -fno-if-conversion 
-fno-guess-branch-probability" } */
+/* { dg-require-effective-target int128 } */
 
 extern int bar(int i);


Re: [PATCH] sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

2022-09-20 Thread Surya Kumari Jangala via Gcc-patches
Hi Jeff, Richard,
Thank you for reviewing the patch!
I have committed the patch to the gcc repo.
Can I backport this patch to prior versions of gcc, as this is an easy patch to 
backport and the issue exists in prior versions too?

Regards,
Surya


On 31/08/22 9:09 pm, Jeff Law via Gcc-patches wrote:
> 
> 
> On 8/23/2022 5:49 AM, Surya Kumari Jangala via Gcc-patches wrote:
>> sched1: Fix -fcompare-debug issue in schedule_region [PR105586]
>>
>> In schedule_region(), a basic block that does not contain any real insns
>> is not scheduled and the dfa state at the entry of the bb is not copied
>> to the fallthru basic block. However a DEBUG insn is treated as a real
>> insn, and if a bb contains non-real insns and a DEBUG insn, it's dfa
>> state is copied to the fallthru bb. This was resulting in
>> -fcompare-debug failure as the incoming dfa state of the fallthru block
>> is different with -g. We should always copy the dfa state of a bb to
>> it's fallthru bb even if the bb does not contain real insns.
>>
>> 2022-08-22  Surya Kumari Jangala  
>>
>> gcc/
>> PR rtl-optimization/105586
>> * sched-rgn.cc (schedule_region): Always copy dfa state to
>> fallthru block.
>>
>> gcc/testsuite/
>> PR rtl-optimization/105586
>> * gcc.target/powerpc/pr105586.c: New test.
> Interesting.    We may have stumbled over this bug internally a little while 
> ago -- not from a compare-debug standpoint, but from a "why isn't the 
> processor state copied to the fallthru block" point of view.   I had it on my 
> to investigate list, but hadn't gotten around to it yet.
> 
> I think there were requests for ChangeLog updates and a function comment for 
> save_state_for_fallthru_edge.  OK with those updates.
> 
> jeff
> 


Re: [PATCH] sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

2022-08-24 Thread Surya Kumari Jangala via Gcc-patches
Hi Peter, Segher,
Thanks for going thru the patch!
I will make the proposed changes to the Changelog.

Regards,
Surya


On 23/08/22 6:58 pm, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Aug 23, 2022 at 07:55:22AM -0500, Peter Bergner wrote:
>> It looks good to me, but I cannot approve it.
> 
> Same here (both statements).
> 
>> That said, you're missing
>> a ChangeLog entry for your new helper function.  The ChangeLog mentions
>> what changed, not why it changed.
> 
> And that is correct!  Changelogs should not say that, that isn't their
> purpose (in GCC), not what they are used for.  Explanations like that go
> in the email and/or the commit message.
> 
> The main remaining usefulness of changelogs is to spot unintended
> commmits.
> 
>> Maybe something like the following?
>>
>> gcc/
>>  PR rtl-optimization/105586
>>  * sched-rgn.cc (save_state_for_fallthru_edge): New function.
>>  (schedule_region): Use it for all blocks.
> 
> That looks perfect, it doesn't say "why" either :-)
> 
> 
> Segher


[PATCH] sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

2022-08-23 Thread Surya Kumari Jangala via Gcc-patches
sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

In schedule_region(), a basic block that does not contain any real insns
is not scheduled and the dfa state at the entry of the bb is not copied
to the fallthru basic block. However a DEBUG insn is treated as a real
insn, and if a bb contains non-real insns and a DEBUG insn, it's dfa
state is copied to the fallthru bb. This was resulting in
-fcompare-debug failure as the incoming dfa state of the fallthru block
is different with -g. We should always copy the dfa state of a bb to
it's fallthru bb even if the bb does not contain real insns.

2022-08-22  Surya Kumari Jangala  

gcc/
PR rtl-optimization/105586
* sched-rgn.cc (schedule_region): Always copy dfa state to
fallthru block.

gcc/testsuite/
PR rtl-optimization/105586
* gcc.target/powerpc/pr105586.c: New test.


diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index 0dc2a8f2851..6c9202c0b2b 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -3082,6 +3082,24 @@ free_bb_state_array (void)
   bb_state = NULL;
 }
 
+static void
+save_state_for_fallthru_edge (basic_block last_bb, state_t state)
+{
+  edge f = find_fallthru_edge (last_bb->succs);
+  if (f
+  && (!f->probability.initialized_p ()
+ || (f->probability.to_reg_br_prob_base () * 100
+ / REG_BR_PROB_BASE
+ >= param_sched_state_edge_prob_cutoff)))
+  {
+memcpy (bb_state[f->dest->index], state,
+   dfa_state_size);
+if (sched_verbose >= 5)
+  fprintf (sched_dump, "saving state for edge %d->%d\n",
+  f->src->index, f->dest->index);
+  }
+}
+
 /* Schedule a region.  A region is either an inner loop, a loop-free
subroutine, or a single basic block.  Each bb in the region is
scheduled after its flow predecessors.  */
@@ -3155,6 +3173,7 @@ schedule_region (int rgn)
   if (no_real_insns_p (head, tail))
{
  gcc_assert (first_bb == last_bb);
+ save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]);
  continue;
}
 
@@ -3173,26 +3192,13 @@ schedule_region (int rgn)
   curr_bb = first_bb;
   if (dbg_cnt (sched_block))
 {
- edge f;
  int saved_last_basic_block = last_basic_block_for_fn (cfun);
 
  schedule_block (_bb, bb_state[first_bb->index]);
  gcc_assert (EBB_FIRST_BB (bb) == first_bb);
  sched_rgn_n_insns += sched_n_insns;
  realloc_bb_state_array (saved_last_basic_block);
- f = find_fallthru_edge (last_bb->succs);
- if (f
- && (!f->probability.initialized_p ()
- || (f->probability.to_reg_br_prob_base () * 100
- / REG_BR_PROB_BASE
- >= param_sched_state_edge_prob_cutoff)))
-   {
- memcpy (bb_state[f->dest->index], curr_state,
- dfa_state_size);
- if (sched_verbose >= 5)
-   fprintf (sched_dump, "saving state for edge %d->%d\n",
-f->src->index, f->dest->index);
-   }
+ save_state_for_fallthru_edge (last_bb, curr_state);
 }
   else
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/pr105586.c 
b/gcc/testsuite/gcc.target/powerpc/pr105586.c
new file mode 100644
index 000..bd397f58bc0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr105586.c
@@ -0,0 +1,19 @@
+/* { dg-options "-mdejagnu-tune=power4 -O2 -fcompare-debug -fno-if-conversion 
-fno-guess-branch-probability" } */
+
+extern int bar(int i);
+
+typedef unsigned long u64;
+int g;
+
+__int128 h;
+
+void
+foo(int a, int b) {
+  int i;
+  char u8_1 = g, u8_3 = a;
+  u64 u64_1 = bar(0), u64_3 = u8_3 * u64_1;
+  __int128 u128_1 = h ^ __builtin_expect(i, 0);
+  u64 u64_4 = u64_1 << ((__builtin_sub_overflow_p(0, u8_1, (u64)0) ^ u128_1) & 
8);
+  u64 u64_r = b + u64_3 + u64_4;
+  bar((short)u64_r + u8_3);
+}


[PATCH] regrename: Fix -fcompare-debug issue in check_new_reg_p [PR105041]

2022-06-10 Thread Surya Kumari Jangala via Gcc-patches
regrename: Fix -fcompare-debug issue in check_new_reg_p [PR105041]

In check_new_reg_p, the nregs of a du chain is computed by obtaining the MODE
of the first element in the chain, and then calling hard_regno_nregs() with the
MODE. But the first element of the chain can be a DEBUG_INSN whose mode need
not be the same as the rest of the elements in the du chain. This
was resulting in fcompare-debug failure as check_new_reg_p was returning a
different result with -g for the same candidate register. We can instead obtain
nregs from the du chain itself.

2022-06-10  Surya Kumari Jangala  

gcc/
PR rtl-optimization/105041
* regrename.cc (check_new_reg_p): Use nregs value from du chain.

gcc/testsuite/
PR rtl-optimization/105041
* gcc.target/powerpc/pr105041.c: New test.


diff --git a/gcc/regrename.cc b/gcc/regrename.cc
index 10271e1..f651351 100644
--- a/gcc/regrename.cc
+++ b/gcc/regrename.cc
@@ -324,8 +324,7 @@ static bool
 check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
 class du_head *this_head, HARD_REG_SET this_unavailable)
 {
-  machine_mode mode = GET_MODE (*this_head->first->loc);
-  int nregs = hard_regno_nregs (new_reg, mode);
+  int nregs = this_head->nregs;
   int i;
   struct du_chain *tmp;
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr105041.c 
b/gcc/testsuite/gcc.target/powerpc/pr105041.c
new file mode 100644
index 000..89eed1c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr105041.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target be } */
+/* { dg-options "-m32 -mdejagnu-cpu=power4 -O2 -fcompare-debug 
-fharden-compares -frename-registers" } */
+
+double m;
+int n;
+
+unsigned int
+foo (unsigned int x, int y)
+{
+  long long int a = y, b = !a;
+  int c = 0;
+
+  if (b != x)
+while ((int) m == a)
+  {
+c = a;
+a = 0;
+  }
+
+  n = b = y;
+
+  return x + c;
+}


[COMMITTED] MAINTAINERS: Add myself for write after approval

2022-05-13 Thread Surya Kumari Jangala via Gcc-patches

2022-05-13  Surya Kumari Jangala 

    * MAINTAINERS: Add myself to write after approval.

diff --git a/MAINTAINERS b/MAINTAINERS
index a1b84ac5646..8bca7a636b7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -464,6 +464,7 @@ Daniel Jacobowitz 
 Andreas Jaeger 
 Harsha Jagasia 
 Fariborz Jahanian 
+Surya Kumari Jangala 
 Qian Jianhua 
 Janis Johnson 
 Teresa Johnson