[PING 4][PATCH v3] rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2024-05-06 Thread Surya Kumari Jangala
Ping

On 08/01/24 11:19 am, Surya Kumari Jangala wrote:
> Ping
> 
> On 28/11/23 6:24 pm, Surya Kumari Jangala wrote:
>> Ping
>>
>> On 10/11/23 12:27 pm, Surya Kumari Jangala wrote:
>>> Ping
>>>
>>> On 03/11/23 1:14 pm, Surya Kumari Jangala wrote:
>>>> Hi Segher,
>>>> I have incorporated changes in the code as per the review comments 
>>>> provided by you 
>>>> for version 2 of the patch. Please review.
>>>>
>>>> Regards,
>>>> Surya
>>>>
>>>>
>>>> rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]
>>>>
>>>> In the routine rs6000_analyze_swaps(), special handling of swappable
>>>> instructions is done even if the webs that contain the swappable 
>>>> instructions
>>>> are not optimized, i.e., the webs do not contain any permuting load/store
>>>> instructions along with the associated register swap instructions. Doing 
>>>> special
>>>> handling in such webs will result in the extracted lane being adjusted
>>>> unnecessarily for vec_extract.
>>>>
>>>> Another issue is that existing code treats non-permuting loads/stores as 
>>>> special
>>>> swappables. Non-permuting loads/stores (that have not yet been split into a
>>>> permuting load/store and a swap) are handled by converting them into a 
>>>> permuting
>>>> load/store (which effectively removes the swap). As a result, if special
>>>> swappables are handled only in webs containing permuting loads/stores, then
>>>> non-optimal code is generated for non-permuting loads/stores.
>>>>
>>>> Hence, in this patch, all webs containing either permuting loads/ stores or
>>>> non-permuting loads/stores are marked as requiring special handling of
>>>> swappables. Swaps associated with permuting loads/stores are marked for 
>>>> removal,
>>>> and non-permuting loads/stores are converted to permuting loads/stores. 
>>>> Then the
>>>> special swappables in the webs are fixed up.
>>>>
>>>> This patch also ensures that swappable instructions are not modified in the
>>>> following webs as it is incorrect to do so:
>>>>  - webs containing permuting load/store instructions and associated swap
>>>>instructions that are transformed by converting the permuting memory
>>>>instructions into non-permuting instructions and removing the swap
>>>>instructions.
>>>>  - webs where swap(load(vector constant)) instructions are replaced with
>>>>load(swapped vector constant).
>>>>
>>>> 2023-09-10  Surya Kumari Jangala  
>>>>
>>>> gcc/
>>>>PR rtl-optimization/PR106770
>>>>* config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New function.
>>>>(handle_non_permuting_mem_insn): New function.
>>>>(rs6000_analyze_swaps): Handle swappable instructions only in certain
>>>>webs.
>>>>(web_requires_special_handling): New instance variable.
>>>>(handle_special_swappables): Remove handling of non-permuting load/store
>>>>instructions.
>>>>
>>>> gcc/testsuite/
>>>>PR rtl-optimization/PR106770
>>>>* gcc.target/powerpc/pr106770.c: New test.
>>>> ---
>>>>
>>>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
>>>> b/gcc/config/rs6000/rs6000-p8swap.cc
>>>> index 0388b9bd736..02ea299bc3d 100644
>>>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>>>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>>>> @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
>>>>unsigned int special_handling : 4;
>>>>/* Set if the web represented by this entry cannot be optimized.  */
>>>>unsigned int web_not_optimizable : 1;
>>>> +  /* Set if the swappable insns in the web represented by this entry
>>>> + have to be fixed. Swappable insns have to be fixed in:
>>>> +   - webs containing permuting loads/stores and the swap insns
>>>> +   in such webs have been marked for removal
>>>> +   - webs where non-permuting loads/stores have been converted
>>>> +   to permuting loads/stores  */
>>>> +  unsigned int web_requires_special_handling : 1;
>>>>/* Set if this insn should be deleted.  */
>>>>unsigned i

[PING 4][PATCH v3] rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2024-02-27 Thread Surya Kumari Jangala
Ping

On 08/01/24 11:19 am, Surya Kumari Jangala wrote:
> Ping
> 
> On 28/11/23 6:24 pm, Surya Kumari Jangala wrote:
>> Ping
>>
>> On 10/11/23 12:27 pm, Surya Kumari Jangala wrote:
>>> Ping
>>>
>>> On 03/11/23 1:14 pm, Surya Kumari Jangala wrote:
>>>> Hi Segher,
>>>> I have incorporated changes in the code as per the review comments 
>>>> provided by you 
>>>> for version 2 of the patch. Please review.
>>>>
>>>> Regards,
>>>> Surya
>>>>
>>>>
>>>> rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]
>>>>
>>>> In the routine rs6000_analyze_swaps(), special handling of swappable
>>>> instructions is done even if the webs that contain the swappable 
>>>> instructions
>>>> are not optimized, i.e., the webs do not contain any permuting load/store
>>>> instructions along with the associated register swap instructions. Doing 
>>>> special
>>>> handling in such webs will result in the extracted lane being adjusted
>>>> unnecessarily for vec_extract.
>>>>
>>>> Another issue is that existing code treats non-permuting loads/stores as 
>>>> special
>>>> swappables. Non-permuting loads/stores (that have not yet been split into a
>>>> permuting load/store and a swap) are handled by converting them into a 
>>>> permuting
>>>> load/store (which effectively removes the swap). As a result, if special
>>>> swappables are handled only in webs containing permuting loads/stores, then
>>>> non-optimal code is generated for non-permuting loads/stores.
>>>>
>>>> Hence, in this patch, all webs containing either permuting loads/ stores or
>>>> non-permuting loads/stores are marked as requiring special handling of
>>>> swappables. Swaps associated with permuting loads/stores are marked for 
>>>> removal,
>>>> and non-permuting loads/stores are converted to permuting loads/stores. 
>>>> Then the
>>>> special swappables in the webs are fixed up.
>>>>
>>>> This patch also ensures that swappable instructions are not modified in the
>>>> following webs as it is incorrect to do so:
>>>>  - webs containing permuting load/store instructions and associated swap
>>>>instructions that are transformed by converting the permuting memory
>>>>instructions into non-permuting instructions and removing the swap
>>>>instructions.
>>>>  - webs where swap(load(vector constant)) instructions are replaced with
>>>>load(swapped vector constant).
>>>>
>>>> 2023-09-10  Surya Kumari Jangala  
>>>>
>>>> gcc/
>>>>PR rtl-optimization/PR106770
>>>>* config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New function.
>>>>(handle_non_permuting_mem_insn): New function.
>>>>(rs6000_analyze_swaps): Handle swappable instructions only in certain
>>>>webs.
>>>>(web_requires_special_handling): New instance variable.
>>>>(handle_special_swappables): Remove handling of non-permuting load/store
>>>>instructions.
>>>>
>>>> gcc/testsuite/
>>>>PR rtl-optimization/PR106770
>>>>* gcc.target/powerpc/pr106770.c: New test.
>>>> ---
>>>>
>>>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
>>>> b/gcc/config/rs6000/rs6000-p8swap.cc
>>>> index 0388b9bd736..02ea299bc3d 100644
>>>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>>>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>>>> @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
>>>>unsigned int special_handling : 4;
>>>>/* Set if the web represented by this entry cannot be optimized.  */
>>>>unsigned int web_not_optimizable : 1;
>>>> +  /* Set if the swappable insns in the web represented by this entry
>>>> + have to be fixed. Swappable insns have to be fixed in:
>>>> +   - webs containing permuting loads/stores and the swap insns
>>>> +   in such webs have been marked for removal
>>>> +   - webs where non-permuting loads/stores have been converted
>>>> +   to permuting loads/stores  */
>>>> +  unsigned int web_requires_special_handling : 1;
>>>>/* Set if this insn should be deleted.  */
>>>>unsigned i

Re: [PATCH] rs6000: New pass for replacement of adjacent lxv with lxvp.

2024-01-12 Thread Surya Kumari Jangala
Hi Ajit,
I have taken a quick look at the patch and my comments are inlined:

On 09/01/24 4:44 pm, Ajit Agarwal wrote:
> Hello All:
> 
> This pass is registered before ira rtl pass.
> Bootstrapped and regtested for powerpc64-linux-gnu.
> 
> No regressions for spec 2017 benchmarks and improvements for some of the
> FP and INT benchmarks.
> 
> Vladimir:
> 
> I did modify IRA and LRA register Allocators. Please review.
> 
> Thanks & Regards
> Ajit
> 
> rs6000: New pass for replacement of adjacent lxv with lxvp.

Please add PR number.

> 
> New pass to replace adjacent memory addresses lxv with lxvp.
> This pass is registered before ira rtl pass.

Please add explanation of what changes have been made in IRA/LRA
and why those changes are required.

> 
> 2024-01-09  Ajit Kumar Agarwal  
> 


> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index f0676c830e8..4cf15e807de 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -518,7 +518,7 @@ or1k*-*-*)
>   ;;
>  powerpc*-*-*)
>   cpu_type=rs6000
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-opt.o"
>   extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>   extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
>   extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
> @@ -555,7 +555,7 @@ riscv*)
>   ;;
>  rs6000*-*-*)
>   extra_options="${extra_options} g.opt fused-madd.opt 
> rs6000/rs6000-tables.opt"
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-opt.o"
>   extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>   target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-logue.cc 
> \$(srcdir)/config/rs6000/rs6000-call.cc"
>   target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
> diff --git a/gcc/config/rs6000/rs6000-passes.def 
> b/gcc/config/rs6000/rs6000-passes.def
> index ca899d5f7af..e6a9810ee24 100644
> --- a/gcc/config/rs6000/rs6000-passes.def
> +++ b/gcc/config/rs6000/rs6000-passes.def
> @@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
>   The power8 does not have instructions that automaticaly do the byte 
> swaps
>   for loads and stores.  */
>INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
> +  INSERT_PASS_BEFORE (pass_ira, 1, pass_analyze_vecload);

Please add comments, similar to the other INSERT_PASS_BEFORE(...).

>  
>/* Pass to do the PCREL_OPT optimization that combines the load of an
>   external symbol's address along with a single load or store using that
> diff --git a/gcc/config/rs6000/rs6000-vecload-opt.cc 
> b/gcc/config/rs6000/rs6000-vecload-opt.cc
> new file mode 100644
> index 000..f02c8337f2e
> --- /dev/null
> +++ b/gcc/config/rs6000/rs6000-vecload-opt.cc
> @@ -0,0 +1,395 @@
> +/* Subroutines used to replace lxv with lxvp
> +   for TARGET_POWER10 and TARGET_VSX,

s/,/.

Comment can be rewritten as follows to specify the fact that we replace
lxv's having adjacent addresses:
Subroutines used to replace lxv having adjacent addresses with lxvp.


> +/* Identify lxv instruction that are candidate of adjacent
> +   memory addresses and replace them with mma instruction lxvp.  */

The comment needs modification for better readability, perhaps as follows:
Identify lxv instructions that have adjacent memory addresses 
and replace them with an lxvp instruction.

> +unsigned int
> +rs6000_analyze_vecload (function *fun)
> +{
> +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> +  df_analyze ();
> +  df_set_flags (DF_DEFER_INSN_RESCAN);
> +
> +  /* Rebuild ud- and du-chains.  */
> +  df_remove_problem (df_chain);
> +  df_process_deferred_rescans ();
> +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> +  df_analyze ();
> +  df_set_flags (DF_DEFER_INSN_RESCAN);
> +
> +  basic_block bb;
> +  bool changed = false;
> +  rtx_insn *insn, *curr_insn = 0;
> +  rtx_insn *insn1 = 0, *insn2 = 0;
> +  bool first_vec_insn = false;
> +  unsigned int regno = 0;
> +
> +  FOR_ALL_BB_FN (bb, fun)

I am assuming that the 2 lxv instructions that we are searching for
should belong to the same BB. Otherwise, we risk moving a load insn across
basic blocks. In which case, the variable "first_vec_insn" has to be set to 
false here. It has to be false each time we start processing a new BB.

> +FOR_BB_INSNS_SAFE (bb, insn, curr_insn)
> +{
> +  if (LABEL_P (insn))
> + continue;
> +
> +  if (NONDEBUG_INSN_P (insn) && GET_CODE (PATTERN (insn)) == SET)
> + {

Please correct the indentation.

> +   rtx set = single_set (insn);
> +   rtx src = SET_SRC (set);
> +   machine_mode mode = GET_MODE (SET_DEST (set));
> +
> +   if (TARGET_VSX && TARGET_POWER10 && MEM_P (src))


[PING 3][PATCH v3] rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2024-01-07 Thread Surya Kumari Jangala
Ping

On 28/11/23 6:24 pm, Surya Kumari Jangala wrote:
> Ping
> 
> On 10/11/23 12:27 pm, Surya Kumari Jangala wrote:
>> Ping
>>
>> On 03/11/23 1:14 pm, Surya Kumari Jangala wrote:
>>> Hi Segher,
>>> I have incorporated changes in the code as per the review comments provided 
>>> by you 
>>> for version 2 of the patch. Please review.
>>>
>>> Regards,
>>> Surya
>>>
>>>
>>> rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]
>>>
>>> In the routine rs6000_analyze_swaps(), special handling of swappable
>>> instructions is done even if the webs that contain the swappable 
>>> instructions
>>> are not optimized, i.e., the webs do not contain any permuting load/store
>>> instructions along with the associated register swap instructions. Doing 
>>> special
>>> handling in such webs will result in the extracted lane being adjusted
>>> unnecessarily for vec_extract.
>>>
>>> Another issue is that existing code treats non-permuting loads/stores as 
>>> special
>>> swappables. Non-permuting loads/stores (that have not yet been split into a
>>> permuting load/store and a swap) are handled by converting them into a 
>>> permuting
>>> load/store (which effectively removes the swap). As a result, if special
>>> swappables are handled only in webs containing permuting loads/stores, then
>>> non-optimal code is generated for non-permuting loads/stores.
>>>
>>> Hence, in this patch, all webs containing either permuting loads/ stores or
>>> non-permuting loads/stores are marked as requiring special handling of
>>> swappables. Swaps associated with permuting loads/stores are marked for 
>>> removal,
>>> and non-permuting loads/stores are converted to permuting loads/stores. 
>>> Then the
>>> special swappables in the webs are fixed up.
>>>
>>> This patch also ensures that swappable instructions are not modified in the
>>> following webs as it is incorrect to do so:
>>>  - webs containing permuting load/store instructions and associated swap
>>>instructions that are transformed by converting the permuting memory
>>>instructions into non-permuting instructions and removing the swap
>>>instructions.
>>>  - webs where swap(load(vector constant)) instructions are replaced with
>>>load(swapped vector constant).
>>>
>>> 2023-09-10  Surya Kumari Jangala  
>>>
>>> gcc/
>>> PR rtl-optimization/PR106770
>>> * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New function.
>>> (handle_non_permuting_mem_insn): New function.
>>> (rs6000_analyze_swaps): Handle swappable instructions only in certain
>>> webs.
>>> (web_requires_special_handling): New instance variable.
>>> (handle_special_swappables): Remove handling of non-permuting load/store
>>> instructions.
>>>
>>> gcc/testsuite/
>>> PR rtl-optimization/PR106770
>>> * gcc.target/powerpc/pr106770.c: New test.
>>> ---
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
>>> b/gcc/config/rs6000/rs6000-p8swap.cc
>>> index 0388b9bd736..02ea299bc3d 100644
>>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>>> @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
>>>unsigned int special_handling : 4;
>>>/* Set if the web represented by this entry cannot be optimized.  */
>>>unsigned int web_not_optimizable : 1;
>>> +  /* Set if the swappable insns in the web represented by this entry
>>> + have to be fixed. Swappable insns have to be fixed in:
>>> +   - webs containing permuting loads/stores and the swap insns
>>> +in such webs have been marked for removal
>>> +   - webs where non-permuting loads/stores have been converted
>>> +to permuting loads/stores  */
>>> +  unsigned int web_requires_special_handling : 1;
>>>/* Set if this insn should be deleted.  */
>>>unsigned int will_delete : 1;
>>>  };
>>> @@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry 
>>> *insn_entry, unsigned i)
>>>if (dump_file)
>>> fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
>>>break;
>>> -case SH_NOSWAP_LD:
>>> -  /* Convert a non-permuting load to a permuting 

[PING 2][PATCH v3] rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-11-28 Thread Surya Kumari Jangala
Ping

On 10/11/23 12:27 pm, Surya Kumari Jangala wrote:
> Ping
> 
> On 03/11/23 1:14 pm, Surya Kumari Jangala wrote:
>> Hi Segher,
>> I have incorporated changes in the code as per the review comments provided 
>> by you 
>> for version 2 of the patch. Please review.
>>
>> Regards,
>> Surya
>>
>>
>> rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]
>>
>> In the routine rs6000_analyze_swaps(), special handling of swappable
>> instructions is done even if the webs that contain the swappable instructions
>> are not optimized, i.e., the webs do not contain any permuting load/store
>> instructions along with the associated register swap instructions. Doing 
>> special
>> handling in such webs will result in the extracted lane being adjusted
>> unnecessarily for vec_extract.
>>
>> Another issue is that existing code treats non-permuting loads/stores as 
>> special
>> swappables. Non-permuting loads/stores (that have not yet been split into a
>> permuting load/store and a swap) are handled by converting them into a 
>> permuting
>> load/store (which effectively removes the swap). As a result, if special
>> swappables are handled only in webs containing permuting loads/stores, then
>> non-optimal code is generated for non-permuting loads/stores.
>>
>> Hence, in this patch, all webs containing either permuting loads/ stores or
>> non-permuting loads/stores are marked as requiring special handling of
>> swappables. Swaps associated with permuting loads/stores are marked for 
>> removal,
>> and non-permuting loads/stores are converted to permuting loads/stores. Then 
>> the
>> special swappables in the webs are fixed up.
>>
>> This patch also ensures that swappable instructions are not modified in the
>> following webs as it is incorrect to do so:
>>  - webs containing permuting load/store instructions and associated swap
>>instructions that are transformed by converting the permuting memory
>>instructions into non-permuting instructions and removing the swap
>>instructions.
>>  - webs where swap(load(vector constant)) instructions are replaced with
>>load(swapped vector constant).
>>
>> 2023-09-10  Surya Kumari Jangala  
>>
>> gcc/
>>  PR rtl-optimization/PR106770
>>  * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New function.
>>  (handle_non_permuting_mem_insn): New function.
>>  (rs6000_analyze_swaps): Handle swappable instructions only in certain
>>  webs.
>>  (web_requires_special_handling): New instance variable.
>>  (handle_special_swappables): Remove handling of non-permuting load/store
>>  instructions.
>>
>> gcc/testsuite/
>>  PR rtl-optimization/PR106770
>>  * gcc.target/powerpc/pr106770.c: New test.
>> ---
>>
>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
>> b/gcc/config/rs6000/rs6000-p8swap.cc
>> index 0388b9bd736..02ea299bc3d 100644
>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>> @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
>>unsigned int special_handling : 4;
>>/* Set if the web represented by this entry cannot be optimized.  */
>>unsigned int web_not_optimizable : 1;
>> +  /* Set if the swappable insns in the web represented by this entry
>> + have to be fixed. Swappable insns have to be fixed in:
>> +   - webs containing permuting loads/stores and the swap insns
>> + in such webs have been marked for removal
>> +   - webs where non-permuting loads/stores have been converted
>> + to permuting loads/stores  */
>> +  unsigned int web_requires_special_handling : 1;
>>/* Set if this insn should be deleted.  */
>>unsigned int will_delete : 1;
>>  };
>> @@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry 
>> *insn_entry, unsigned i)
>>if (dump_file)
>>  fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
>>break;
>> -case SH_NOSWAP_LD:
>> -  /* Convert a non-permuting load to a permuting one.  */
>> -  permute_load (insn);
>> -  break;
>> -case SH_NOSWAP_ST:
>> -  /* Convert a non-permuting store to a permuting one.  */
>> -  permute_store (insn);
>> -  break;
>>  case SH_EXTRACT:
>>/* Change the lane on an extract operation.  */
>>adjust_extract (insn);
>> @@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun)
&

[PING][PATCH v3] rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-11-09 Thread Surya Kumari Jangala
Ping

On 03/11/23 1:14 pm, Surya Kumari Jangala wrote:
> Hi Segher,
> I have incorporated changes in the code as per the review comments provided 
> by you 
> for version 2 of the patch. Please review.
> 
> Regards,
> Surya
> 
> 
> rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]
> 
> In the routine rs6000_analyze_swaps(), special handling of swappable
> instructions is done even if the webs that contain the swappable instructions
> are not optimized, i.e., the webs do not contain any permuting load/store
> instructions along with the associated register swap instructions. Doing 
> special
> handling in such webs will result in the extracted lane being adjusted
> unnecessarily for vec_extract.
> 
> Another issue is that existing code treats non-permuting loads/stores as 
> special
> swappables. Non-permuting loads/stores (that have not yet been split into a
> permuting load/store and a swap) are handled by converting them into a 
> permuting
> load/store (which effectively removes the swap). As a result, if special
> swappables are handled only in webs containing permuting loads/stores, then
> non-optimal code is generated for non-permuting loads/stores.
> 
> Hence, in this patch, all webs containing either permuting loads/ stores or
> non-permuting loads/stores are marked as requiring special handling of
> swappables. Swaps associated with permuting loads/stores are marked for 
> removal,
> and non-permuting loads/stores are converted to permuting loads/stores. Then 
> the
> special swappables in the webs are fixed up.
> 
> This patch also ensures that swappable instructions are not modified in the
> following webs as it is incorrect to do so:
>  - webs containing permuting load/store instructions and associated swap
>instructions that are transformed by converting the permuting memory
>instructions into non-permuting instructions and removing the swap
>instructions.
>  - webs where swap(load(vector constant)) instructions are replaced with
>load(swapped vector constant).
> 
> 2023-09-10  Surya Kumari Jangala  
> 
> gcc/
>   PR rtl-optimization/PR106770
>   * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New function.
>   (handle_non_permuting_mem_insn): New function.
>   (rs6000_analyze_swaps): Handle swappable instructions only in certain
>   webs.
>   (web_requires_special_handling): New instance variable.
>   (handle_special_swappables): Remove handling of non-permuting load/store
>   instructions.
> 
> gcc/testsuite/
>   PR rtl-optimization/PR106770
>   * gcc.target/powerpc/pr106770.c: New test.
> ---
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
> b/gcc/config/rs6000/rs6000-p8swap.cc
> index 0388b9bd736..02ea299bc3d 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
>unsigned int special_handling : 4;
>/* Set if the web represented by this entry cannot be optimized.  */
>unsigned int web_not_optimizable : 1;
> +  /* Set if the swappable insns in the web represented by this entry
> + have to be fixed. Swappable insns have to be fixed in:
> +   - webs containing permuting loads/stores and the swap insns
> +  in such webs have been marked for removal
> +   - webs where non-permuting loads/stores have been converted
> +  to permuting loads/stores  */
> +  unsigned int web_requires_special_handling : 1;
>/* Set if this insn should be deleted.  */
>unsigned int will_delete : 1;
>  };
> @@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry *insn_entry, 
> unsigned i)
>if (dump_file)
>   fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
>break;
> -case SH_NOSWAP_LD:
> -  /* Convert a non-permuting load to a permuting one.  */
> -  permute_load (insn);
> -  break;
> -case SH_NOSWAP_ST:
> -  /* Convert a non-permuting store to a permuting one.  */
> -  permute_store (insn);
> -  break;
>  case SH_EXTRACT:
>/* Change the lane on an extract operation.  */
>adjust_extract (insn);
> @@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun)
>free (to_delete);
>  }
>  
> +/* Return true if insn is a non-permuting load/store.  */
> +static bool
> +non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
> +{
> +  return insn_entry[i].special_handling == SH_NOSWAP_LD
> +  || insn_entry[i].special_handling == SH_NOSWAP_ST;
> +}
> +
> +/* Convert a non-permuting load/store insn to a permuting one.  */
> +static void
&g

[PATCH v3] rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-11-03 Thread Surya Kumari Jangala
Hi Segher,
I have incorporated changes in the code as per the review comments provided by 
you 
for version 2 of the patch. Please review.

Regards,
Surya


rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]

In the routine rs6000_analyze_swaps(), special handling of swappable
instructions is done even if the webs that contain the swappable instructions
are not optimized, i.e., the webs do not contain any permuting load/store
instructions along with the associated register swap instructions. Doing special
handling in such webs will result in the extracted lane being adjusted
unnecessarily for vec_extract.

Another issue is that existing code treats non-permuting loads/stores as special
swappables. Non-permuting loads/stores (that have not yet been split into a
permuting load/store and a swap) are handled by converting them into a permuting
load/store (which effectively removes the swap). As a result, if special
swappables are handled only in webs containing permuting loads/stores, then
non-optimal code is generated for non-permuting loads/stores.

Hence, in this patch, all webs containing either permuting loads/ stores or
non-permuting loads/stores are marked as requiring special handling of
swappables. Swaps associated with permuting loads/stores are marked for removal,
and non-permuting loads/stores are converted to permuting loads/stores. Then the
special swappables in the webs are fixed up.

This patch also ensures that swappable instructions are not modified in the
following webs as it is incorrect to do so:
 - webs containing permuting load/store instructions and associated swap
   instructions that are transformed by converting the permuting memory
   instructions into non-permuting instructions and removing the swap
   instructions.
 - webs where swap(load(vector constant)) instructions are replaced with
   load(swapped vector constant).

2023-09-10  Surya Kumari Jangala  

gcc/
PR rtl-optimization/PR106770
* config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New function.
(handle_non_permuting_mem_insn): New function.
(rs6000_analyze_swaps): Handle swappable instructions only in certain
webs.
(web_requires_special_handling): New instance variable.
(handle_special_swappables): Remove handling of non-permuting load/store
instructions.

gcc/testsuite/
PR rtl-optimization/PR106770
* gcc.target/powerpc/pr106770.c: New test.
---

diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
b/gcc/config/rs6000/rs6000-p8swap.cc
index 0388b9bd736..02ea299bc3d 100644
--- a/gcc/config/rs6000/rs6000-p8swap.cc
+++ b/gcc/config/rs6000/rs6000-p8swap.cc
@@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
   unsigned int special_handling : 4;
   /* Set if the web represented by this entry cannot be optimized.  */
   unsigned int web_not_optimizable : 1;
+  /* Set if the swappable insns in the web represented by this entry
+ have to be fixed. Swappable insns have to be fixed in:
+   - webs containing permuting loads/stores and the swap insns
+in such webs have been marked for removal
+   - webs where non-permuting loads/stores have been converted
+to permuting loads/stores  */
+  unsigned int web_requires_special_handling : 1;
   /* Set if this insn should be deleted.  */
   unsigned int will_delete : 1;
 };
@@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry *insn_entry, 
unsigned i)
   if (dump_file)
fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
   break;
-case SH_NOSWAP_LD:
-  /* Convert a non-permuting load to a permuting one.  */
-  permute_load (insn);
-  break;
-case SH_NOSWAP_ST:
-  /* Convert a non-permuting store to a permuting one.  */
-  permute_store (insn);
-  break;
 case SH_EXTRACT:
   /* Change the lane on an extract operation.  */
   adjust_extract (insn);
@@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun)
   free (to_delete);
 }
 
+/* Return true if insn is a non-permuting load/store.  */
+static bool
+non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
+{
+  return insn_entry[i].special_handling == SH_NOSWAP_LD
+|| insn_entry[i].special_handling == SH_NOSWAP_ST;
+}
+
+/* Convert a non-permuting load/store insn to a permuting one.  */
+static void
+convert_mem_insn (swap_web_entry *insn_entry, unsigned int i)
+{
+  rtx_insn *insn = insn_entry[i].insn;
+  if (insn_entry[i].special_handling == SH_NOSWAP_LD)
+permute_load (insn);
+  if (insn_entry[i].special_handling == SH_NOSWAP_ST)
+permute_store (insn);
+}
+
 /* Main entry point for this pass.  */
 unsigned int
 rs6000_analyze_swaps (function *fun)
@@ -2624,25 +2642,55 @@ rs6000_analyze_swaps (function *fun)
   dump_swap_insn_table (insn_entry);
 }
 
-  /* For each load and store in an optimizable web (which implies
- the loads and stores are permut

Re: [PATCH v2] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-10-31 Thread Surya Kumari Jangala
Hi Segher,
My replies are inlined:

On 29/10/23 10:16 am, Segher Boessenkool wrote:
> Hi!
> 
> Please say "rs6000/p8swap:" in the subject, not "swap:" :-)
> 
> On Sun, Sep 10, 2023 at 10:58:32PM +0530, Surya Kumari Jangala wrote:
>> Another issue with always handling swappable instructions is that it is
>> incorrect to do so in webs where loads/stores on quad word aligned
>> addresses are changed to lvx/stvx.
> 
> Why?  Please say why in the commit message (the message you send with
> your patch should be the exact eventual commit message!)

ok, I will add more explanation.

> 
>> gcc/
>>  PR rtl-optimization/PR106770
>>  * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New
>>  function.
> 
> Please don't break commit message / changelog lines early unnecessarily.
> Lines are 80 chars, the leading tab counts as 8.

ok.

> 
>> +  /* Set if the swappable insns in the web represented by this entry
>> + have to be fixed. Swappable insns have to be fixed in :
> 
> (no space before colon)

ok.

> 
>> +static bool
>> +non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
>> +{
>> +  return (insn_entry[i].special_handling == SH_NOSWAP_LD ||
>> +  insn_entry[i].special_handling == SH_NOSWAP_ST);
>> +}
> 
> "return" is not a function, you don't need parens here.

ok.

> 
>> +/* Convert a non-permuting load/store insn to a permuting one.  */
>> +static void
>> +handle_non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
> 
> A better name would be good, "handle" is a weaselword :-)  It is a
> static, so a shorter name is completely acceptable (say, one that
> wouldn't be acceptable with bigger than file scope).

Ok. How does convert_mem_insn() sound?
Note: "handle" is used as a prefix for other functions in rs6000-p8swap.cc 
(such as handle_special_swappables()).
> 
>> +  rtx_insn *insn = insn_entry[i].insn;
>> +  if (insn_entry[i].special_handling == SH_NOSWAP_LD)
>> +permute_load (insn);
>> +  else if (insn_entry[i].special_handling == SH_NOSWAP_ST)
>> +permute_store (insn);
> 
> Lose the "else"?  The compiler can do micro-optimisations a million
> times better than any user could.  Simpler, more readable (better
> understandable!) code is much preferred.
> 
>> +  /* Perform special handling for swappable insns that require it.
> 
> That is a completely contentless sentence :-(
> 

This line was present in the original code. This is not something I added.
Let me try to add some more comments to make the explanation better.

>> + Note that special handling should be done only for those
>> + swappable insns that are present in webs marked as requiring
>> + special handling.  */
> 
> This one isn't much better.> 
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
>> @@ -0,0 +1,21 @@
>> +/* { dg-do compile } */
> 
> This is the default, you do not need this.

ok.

> 
>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power8 -O2 " } */
>> +/* The 2 xxpermdi instructions are generated by the two
>> +   calls to vec_promote() */
>> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
> 
> Please enclose in {}.  Use double quotes in Tcl only when tou want the
> interpolation they cause.  Default to using {} instead.

ok.

Regards,
Surya

> 
> So please fix those things, and write a better commit message.  Ideally
> the commit messsage will tell everything needed to understand the patch
> (so also to review the patch).  Maybe add examples where needed.  So
> reviewing the code in the patch should be an easy thing to do, after
> reading the commit message :-)
> 
> 
> Segher


[PING^3][PATCH v2] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-10-16 Thread Surya Kumari Jangala
Ping

On 03/10/23 3:53 pm, Surya Kumari Jangala wrote:
> Ping
> 
> On 20/09/23 7:31 am, Surya Kumari Jangala wrote:
>> Ping
>>
>> On 10/09/23 10:58 pm, Surya Kumari Jangala wrote:
>>> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
>>>
>>> In the routine rs6000_analyze_swaps(), special handling of swappable
>>> instructions is done even if the webs that contain the swappable
>>> instructions are not optimized, i.e., the webs do not contain any
>>> permuting load/store instructions along with the associated register
>>> swap instructions. Doing special handling in such webs will result in
>>> the extracted lane being adjusted unnecessarily for vec_extract.
>>>
>>> Another issue is that existing code treats non-permuting loads/stores
>>> as special swappables. Non-permuting loads/stores (that have not yet
>>> been split into a permuting load/store and a swap) are handled by
>>> converting them into a permuting load/store (which effectively removes
>>> the swap). As a result, if special swappables are handled only in webs
>>> containing permuting loads/stores, then non-optimal code is generated
>>> for non-permuting loads/stores.
>>>
>>> Hence, in this patch, all webs containing either permuting loads/
>>> stores or non-permuting loads/stores are marked as requiring special
>>> handling of swappables. Swaps associated with permuting loads/stores
>>> are marked for removal, and non-permuting loads/stores are converted to
>>> permuting loads/stores. Then the special swappables in the webs are
>>> fixed up.
>>>
>>> Another issue with always handling swappable instructions is that it is
>>> incorrect to do so in webs where loads/stores on quad word aligned
>>> addresses are changed to lvx/stvx. Similarly, in webs where
>>> swap(load(vector constant)) instructions are replaced with
>>> load(swapped vector constant), the swappable instructions should not be
>>> modified.
>>>
>>> 2023-09-10  Surya Kumari Jangala  
>>>
>>> gcc/
>>> PR rtl-optimization/PR106770
>>> * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New
>>> function.
>>> (handle_non_permuting_mem_insn): New function.
>>> (rs6000_analyze_swaps): Handle swappable instructions only in
>>> certain webs.
>>> (web_requires_special_handling): New instance variable.
>>> (handle_special_swappables): Remove handling of non-permuting
>>> load/store instructions.
>>>
>>> gcc/testsuite/
>>> PR rtl-optimization/PR106770
>>> * gcc.target/powerpc/pr106770.c: New test.
>>> ---
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
>>> b/gcc/config/rs6000/rs6000-p8swap.cc
>>> index 0388b9bd736..3a695aa1318 100644
>>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>>> @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
>>>unsigned int special_handling : 4;
>>>/* Set if the web represented by this entry cannot be optimized.  */
>>>unsigned int web_not_optimizable : 1;
>>> +  /* Set if the swappable insns in the web represented by this entry
>>> + have to be fixed. Swappable insns have to be fixed in :
>>> +   - webs containing permuting loads/stores and the swap insns
>>> +in such webs have been marked for removal
>>> +   - webs where non-permuting loads/stores have been converted
>>> +to permuting loads/stores  */
>>> +  unsigned int web_requires_special_handling : 1;
>>>/* Set if this insn should be deleted.  */
>>>unsigned int will_delete : 1;
>>>  };
>>> @@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry 
>>> *insn_entry, unsigned i)
>>>if (dump_file)
>>> fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
>>>break;
>>> -case SH_NOSWAP_LD:
>>> -  /* Convert a non-permuting load to a permuting one.  */
>>> -  permute_load (insn);
>>> -  break;
>>> -case SH_NOSWAP_ST:
>>> -  /* Convert a non-permuting store to a permuting one.  */
>>> -  permute_store (insn);
>>> -  break;
>>>  case SH_EXTRACT:
>>>/* Change the lane on an extract operation.  */
>>>adjust_extract (insn);
>>> @@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (fun

[PATCH] ira: Scale save/restore costs of callee save registers with block frequency

2023-10-03 Thread Surya Kumari Jangala
ira: Scale save/restore costs of callee save registers with block frequency

In assign_hard_reg(), when computing the costs of the hard registers, the
cost of saving/restoring a callee-save hard register in prolog/epilog is
taken into consideration. However, this cost is not scaled with the entry
block frequency. Without scaling, the cost of saving/restoring is quite
small and this can result in a callee-save register being chosen by
assign_hard_reg() even though there are free caller-save registers
available. Assigning a callee save register to a pseudo that is live
in the entire function and across a call will cause shrink wrap to fail.

2023-10-03  Surya Kumari Jangala  

gcc/
PR rtl-optimization/111673
* ira-color.cc (assign_hard_reg): Scale save/restore costs of
callee save registers with block frequency.

gcc/testsuite/
PR rtl-optimization/111673
* gcc.target/powerpc/pr111673/c: New test.
---

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index f2e8ea34152..eb20c52310d 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2175,7 +2175,8 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
add_cost = ((ira_memory_move_cost[mode][rclass][0]
 + ira_memory_move_cost[mode][rclass][1])
* saved_nregs / hard_regno_nregs (hard_regno,
- mode) - 1);
+ mode) - 1)
+   * REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
cost += add_cost;
full_cost += add_cost;
  }
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111673.c 
b/gcc/testsuite/gcc.target/powerpc/pr111673.c
new file mode 100644
index 000..e0c0f85460a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111673.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */
+
+/* Verify there is an early return without the prolog and shrink-wrap
+   the function. */
+
+int f (int);
+int
+advance (int dz)
+{
+  if (dz > 0)
+return (dz + dz) * dz;
+  else
+return dz * f (dz);
+}
+
+/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 
"pro_and_epilogue" } } */


[PING^2][PATCH v2] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-10-03 Thread Surya Kumari Jangala
Ping

On 20/09/23 7:31 am, Surya Kumari Jangala wrote:
> Ping
> 
> On 10/09/23 10:58 pm, Surya Kumari Jangala wrote:
>> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
>>
>> In the routine rs6000_analyze_swaps(), special handling of swappable
>> instructions is done even if the webs that contain the swappable
>> instructions are not optimized, i.e., the webs do not contain any
>> permuting load/store instructions along with the associated register
>> swap instructions. Doing special handling in such webs will result in
>> the extracted lane being adjusted unnecessarily for vec_extract.
>>
>> Another issue is that existing code treats non-permuting loads/stores
>> as special swappables. Non-permuting loads/stores (that have not yet
>> been split into a permuting load/store and a swap) are handled by
>> converting them into a permuting load/store (which effectively removes
>> the swap). As a result, if special swappables are handled only in webs
>> containing permuting loads/stores, then non-optimal code is generated
>> for non-permuting loads/stores.
>>
>> Hence, in this patch, all webs containing either permuting loads/
>> stores or non-permuting loads/stores are marked as requiring special
>> handling of swappables. Swaps associated with permuting loads/stores
>> are marked for removal, and non-permuting loads/stores are converted to
>> permuting loads/stores. Then the special swappables in the webs are
>> fixed up.
>>
>> Another issue with always handling swappable instructions is that it is
>> incorrect to do so in webs where loads/stores on quad word aligned
>> addresses are changed to lvx/stvx. Similarly, in webs where
>> swap(load(vector constant)) instructions are replaced with
>> load(swapped vector constant), the swappable instructions should not be
>> modified.
>>
>> 2023-09-10  Surya Kumari Jangala  
>>
>> gcc/
>>  PR rtl-optimization/PR106770
>>  * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New
>>  function.
>>  (handle_non_permuting_mem_insn): New function.
>>  (rs6000_analyze_swaps): Handle swappable instructions only in
>>  certain webs.
>>  (web_requires_special_handling): New instance variable.
>>  (handle_special_swappables): Remove handling of non-permuting
>>  load/store instructions.
>>
>> gcc/testsuite/
>>  PR rtl-optimization/PR106770
>>  * gcc.target/powerpc/pr106770.c: New test.
>> ---
>>
>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
>> b/gcc/config/rs6000/rs6000-p8swap.cc
>> index 0388b9bd736..3a695aa1318 100644
>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>> @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
>>unsigned int special_handling : 4;
>>/* Set if the web represented by this entry cannot be optimized.  */
>>unsigned int web_not_optimizable : 1;
>> +  /* Set if the swappable insns in the web represented by this entry
>> + have to be fixed. Swappable insns have to be fixed in :
>> +   - webs containing permuting loads/stores and the swap insns
>> + in such webs have been marked for removal
>> +   - webs where non-permuting loads/stores have been converted
>> + to permuting loads/stores  */
>> +  unsigned int web_requires_special_handling : 1;
>>/* Set if this insn should be deleted.  */
>>unsigned int will_delete : 1;
>>  };
>> @@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry 
>> *insn_entry, unsigned i)
>>if (dump_file)
>>  fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
>>break;
>> -case SH_NOSWAP_LD:
>> -  /* Convert a non-permuting load to a permuting one.  */
>> -  permute_load (insn);
>> -  break;
>> -case SH_NOSWAP_ST:
>> -  /* Convert a non-permuting store to a permuting one.  */
>> -  permute_store (insn);
>> -  break;
>>  case SH_EXTRACT:
>>/* Change the lane on an extract operation.  */
>>adjust_extract (insn);
>> @@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun)
>>free (to_delete);
>>  }
>>  
>> +/* Return true if insn is a non-permuting load/store.  */
>> +static bool
>> +non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
>> +{
>> +  return (insn_entry[i].special_handling == SH_NOSWAP_LD ||
>> +  insn_entry[i].special_handling == SH_NOSWAP_ST);
>> +}
>> +
>&g

[PING][PATCH v2] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-09-19 Thread Surya Kumari Jangala
Ping

On 10/09/23 10:58 pm, Surya Kumari Jangala wrote:
> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
> 
> In the routine rs6000_analyze_swaps(), special handling of swappable
> instructions is done even if the webs that contain the swappable
> instructions are not optimized, i.e., the webs do not contain any
> permuting load/store instructions along with the associated register
> swap instructions. Doing special handling in such webs will result in
> the extracted lane being adjusted unnecessarily for vec_extract.
> 
> Another issue is that existing code treats non-permuting loads/stores
> as special swappables. Non-permuting loads/stores (that have not yet
> been split into a permuting load/store and a swap) are handled by
> converting them into a permuting load/store (which effectively removes
> the swap). As a result, if special swappables are handled only in webs
> containing permuting loads/stores, then non-optimal code is generated
> for non-permuting loads/stores.
> 
> Hence, in this patch, all webs containing either permuting loads/
> stores or non-permuting loads/stores are marked as requiring special
> handling of swappables. Swaps associated with permuting loads/stores
> are marked for removal, and non-permuting loads/stores are converted to
> permuting loads/stores. Then the special swappables in the webs are
> fixed up.
> 
> Another issue with always handling swappable instructions is that it is
> incorrect to do so in webs where loads/stores on quad word aligned
> addresses are changed to lvx/stvx. Similarly, in webs where
> swap(load(vector constant)) instructions are replaced with
> load(swapped vector constant), the swappable instructions should not be
> modified.
> 
> 2023-09-10  Surya Kumari Jangala  
> 
> gcc/
>   PR rtl-optimization/PR106770
>   * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New
>   function.
>   (handle_non_permuting_mem_insn): New function.
>   (rs6000_analyze_swaps): Handle swappable instructions only in
>   certain webs.
>   (web_requires_special_handling): New instance variable.
>   (handle_special_swappables): Remove handling of non-permuting
>   load/store instructions.
> 
> gcc/testsuite/
>   PR rtl-optimization/PR106770
>   * gcc.target/powerpc/pr106770.c: New test.
> ---
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
> b/gcc/config/rs6000/rs6000-p8swap.cc
> index 0388b9bd736..3a695aa1318 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
>unsigned int special_handling : 4;
>/* Set if the web represented by this entry cannot be optimized.  */
>unsigned int web_not_optimizable : 1;
> +  /* Set if the swappable insns in the web represented by this entry
> + have to be fixed. Swappable insns have to be fixed in :
> +   - webs containing permuting loads/stores and the swap insns
> +  in such webs have been marked for removal
> +   - webs where non-permuting loads/stores have been converted
> +  to permuting loads/stores  */
> +  unsigned int web_requires_special_handling : 1;
>/* Set if this insn should be deleted.  */
>unsigned int will_delete : 1;
>  };
> @@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry *insn_entry, 
> unsigned i)
>if (dump_file)
>   fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
>break;
> -case SH_NOSWAP_LD:
> -  /* Convert a non-permuting load to a permuting one.  */
> -  permute_load (insn);
> -  break;
> -case SH_NOSWAP_ST:
> -  /* Convert a non-permuting store to a permuting one.  */
> -  permute_store (insn);
> -  break;
>  case SH_EXTRACT:
>/* Change the lane on an extract operation.  */
>adjust_extract (insn);
> @@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun)
>free (to_delete);
>  }
>  
> +/* Return true if insn is a non-permuting load/store.  */
> +static bool
> +non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
> +{
> +  return (insn_entry[i].special_handling == SH_NOSWAP_LD ||
> +   insn_entry[i].special_handling == SH_NOSWAP_ST);
> +}
> +
> +/* Convert a non-permuting load/store insn to a permuting one.  */
> +static void
> +handle_non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
> +{
> +  rtx_insn *insn = insn_entry[i].insn;
> +  if (insn_entry[i].special_handling == SH_NOSWAP_LD)
> +permute_load (insn);
> +  else if (insn_entry[i].special_handling == SH_NOSWAP_ST)
> +permute_store (insn);
> +}
> +
> 

[PATCH] ira: Consider save/restore costs of callee-save registers [PR110071]

2023-09-14 Thread Surya Kumari Jangala via Gcc-patches
ira: Consider save/restore costs of callee-save registers [PR110071]

In improve_allocation() routine, IRA checks for each allocno if spilling
any conflicting allocnos can improve the allocation of this allocno.
This routine computes the cost improvement for usage of each profitable
hard register for a given allocno. The existing code in
improve_allocation() does not consider the save/restore costs of callee
save registers while computing the cost improvement.

This can result in a callee save register being assigned to a pseudo
that is live in the entire function and across a call, overriding a
non-callee save register assigned to the pseudo by graph coloring. So
the entry basic block requires a prolog, thereby causing shrink wrap to
fail.

2023-09-14  Surya Kumari Jangala  

gcc/
PR rtl-optimization/110071
* ira-color.cc (improve_allocation): Consider cost of callee
save registers.

gcc/testsuite/
PR rtl-optimization/110071
* gcc.target/powerpc/pr110071.c: New test.
---

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 5807d6d26f6..f2e8ea34152 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -3150,13 +3150,15 @@ improve_allocation (void)
   int j, k, n, hregno, conflict_hregno, base_cost, class_size, word, nwords;
   int check, spill_cost, min_cost, nregs, conflict_nregs, r, best;
   bool try_p;
-  enum reg_class aclass;
+  enum reg_class aclass, rclass;
   machine_mode mode;
   int *allocno_costs;
   int costs[FIRST_PSEUDO_REGISTER];
   HARD_REG_SET conflicting_regs[2], profitable_hard_regs;
   ira_allocno_t a;
   bitmap_iterator bi;
+  int saved_nregs;
+  int add_cost;
 
   /* Don't bother to optimize the code with static chain pointer and
  non-local goto in order not to spill the chain pointer
@@ -3194,6 +3196,7 @@ improve_allocation (void)
  conflicting_regs,
  _hard_regs);
   class_size = ira_class_hard_regs_num[aclass];
+  mode = ALLOCNO_MODE (a);
   /* Set up cost improvement for usage of each profitable hard
 register for allocno A.  */
   for (j = 0; j < class_size; j++)
@@ -3207,6 +3210,22 @@ improve_allocation (void)
  costs[hregno] = (allocno_costs == NULL
   ? ALLOCNO_UPDATED_CLASS_COST (a) : allocno_costs[k]);
  costs[hregno] -= allocno_copy_cost_saving (a, hregno);
+
+ if ((saved_nregs = calculate_saved_nregs (hregno, mode)) != 0)
+ {
+   /* We need to save/restore the hard register in
+  epilogue/prologue.  Therefore we increase the cost.
+  Since the prolog is placed in the entry BB, the frequency
+  of the entry BB is considered while computing the cost.  */
+   rclass = REGNO_REG_CLASS (hregno);
+   add_cost = ((ira_memory_move_cost[mode][rclass][0]
++ ira_memory_move_cost[mode][rclass][1])
+   * saved_nregs / hard_regno_nregs (hregno,
+ mode) - 1)
+  * REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+   costs[hregno] += add_cost;
+ }
+
  costs[hregno] -= base_cost;
  if (costs[hregno] < 0)
try_p = true;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr110071.c 
b/gcc/testsuite/gcc.target/powerpc/pr110071.c
new file mode 100644
index 000..ec03fecfb15
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr110071.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */
+
+/* Verify there is an early return without the prolog and shrink-wrap
+   the function. */
+void bar ();
+long
+foo (long i, long cond)
+{
+  if (cond)
+bar ();
+  return i+1;
+}
+
+/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 
"pro_and_epilogue" } } */


[PATCH v2] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-09-10 Thread Surya Kumari Jangala via Gcc-patches
swap: Fix incorrect lane extraction by vec_extract() [PR106770]

In the routine rs6000_analyze_swaps(), special handling of swappable
instructions is done even if the webs that contain the swappable
instructions are not optimized, i.e., the webs do not contain any
permuting load/store instructions along with the associated register
swap instructions. Doing special handling in such webs will result in
the extracted lane being adjusted unnecessarily for vec_extract.

Another issue is that existing code treats non-permuting loads/stores
as special swappables. Non-permuting loads/stores (that have not yet
been split into a permuting load/store and a swap) are handled by
converting them into a permuting load/store (which effectively removes
the swap). As a result, if special swappables are handled only in webs
containing permuting loads/stores, then non-optimal code is generated
for non-permuting loads/stores.

Hence, in this patch, all webs containing either permuting loads/
stores or non-permuting loads/stores are marked as requiring special
handling of swappables. Swaps associated with permuting loads/stores
are marked for removal, and non-permuting loads/stores are converted to
permuting loads/stores. Then the special swappables in the webs are
fixed up.

Another issue with always handling swappable instructions is that it is
incorrect to do so in webs where loads/stores on quad word aligned
addresses are changed to lvx/stvx. Similarly, in webs where
swap(load(vector constant)) instructions are replaced with
load(swapped vector constant), the swappable instructions should not be
modified.

2023-09-10  Surya Kumari Jangala  

gcc/
PR rtl-optimization/PR106770
* config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New
function.
(handle_non_permuting_mem_insn): New function.
(rs6000_analyze_swaps): Handle swappable instructions only in
certain webs.
(web_requires_special_handling): New instance variable.
(handle_special_swappables): Remove handling of non-permuting
load/store instructions.

gcc/testsuite/
PR rtl-optimization/PR106770
* gcc.target/powerpc/pr106770.c: New test.
---

diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
b/gcc/config/rs6000/rs6000-p8swap.cc
index 0388b9bd736..3a695aa1318 100644
--- a/gcc/config/rs6000/rs6000-p8swap.cc
+++ b/gcc/config/rs6000/rs6000-p8swap.cc
@@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
   unsigned int special_handling : 4;
   /* Set if the web represented by this entry cannot be optimized.  */
   unsigned int web_not_optimizable : 1;
+  /* Set if the swappable insns in the web represented by this entry
+ have to be fixed. Swappable insns have to be fixed in :
+   - webs containing permuting loads/stores and the swap insns
+in such webs have been marked for removal
+   - webs where non-permuting loads/stores have been converted
+to permuting loads/stores  */
+  unsigned int web_requires_special_handling : 1;
   /* Set if this insn should be deleted.  */
   unsigned int will_delete : 1;
 };
@@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry *insn_entry, 
unsigned i)
   if (dump_file)
fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
   break;
-case SH_NOSWAP_LD:
-  /* Convert a non-permuting load to a permuting one.  */
-  permute_load (insn);
-  break;
-case SH_NOSWAP_ST:
-  /* Convert a non-permuting store to a permuting one.  */
-  permute_store (insn);
-  break;
 case SH_EXTRACT:
   /* Change the lane on an extract operation.  */
   adjust_extract (insn);
@@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun)
   free (to_delete);
 }
 
+/* Return true if insn is a non-permuting load/store.  */
+static bool
+non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
+{
+  return (insn_entry[i].special_handling == SH_NOSWAP_LD ||
+ insn_entry[i].special_handling == SH_NOSWAP_ST);
+}
+
+/* Convert a non-permuting load/store insn to a permuting one.  */
+static void
+handle_non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
+{
+  rtx_insn *insn = insn_entry[i].insn;
+  if (insn_entry[i].special_handling == SH_NOSWAP_LD)
+permute_load (insn);
+  else if (insn_entry[i].special_handling == SH_NOSWAP_ST)
+permute_store (insn);
+}
+
 /* Main entry point for this pass.  */
 unsigned int
 rs6000_analyze_swaps (function *fun)
@@ -2624,25 +2642,56 @@ rs6000_analyze_swaps (function *fun)
   dump_swap_insn_table (insn_entry);
 }
 
-  /* For each load and store in an optimizable web (which implies
- the loads and stores are permuting), find the associated
- register swaps and mark them for removal.  Due to various
- optimizations we may mark the same swap more than once.  Also
- perform special handling for swappable insns that require it.  */
+  /* There are two kinds of opt

[PING][PATCH] ira: update allocated_hardreg_p[] in improve_allocation() [PR110254]

2023-07-31 Thread Surya Kumari Jangala via Gcc-patches
Ping

On 21/07/23 3:43 pm, Surya Kumari Jangala via Gcc-patches wrote:
> The improve_allocation() routine does not update the
> allocated_hardreg_p[] array after an allocno is assigned a register.
> 
> If the register chosen in improve_allocation() is one that already has
> been assigned to a conflicting allocno, then allocated_hardreg_p[]
> already has the corresponding bit set to TRUE, so nothing needs to be
> done.
> 
> But improve_allocation() can also choose a register that has not been
> assigned to a conflicting allocno, and also has not been assigned to any
> other allocno. In this case, allocated_hardreg_p[] has to be updated.
> 
> 2023-07-21  Surya Kumari Jangala  
> 
> gcc/
>   PR rtl-optimization/PR110254
>   * ira-color.cc (improve_allocation): Update array
> ---
> 
> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> index 1fb2958bddd..5807d6d26f6 100644
> --- a/gcc/ira-color.cc
> +++ b/gcc/ira-color.cc
> @@ -3340,6 +3340,10 @@ improve_allocation (void)
>   }
>/* Assign the best chosen hard register to A.  */
>ALLOCNO_HARD_REGNO (a) = best;
> +
> +  for (j = nregs - 1; j >= 0; j--)
> + allocated_hardreg_p[best + j] = true;
> +
>if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
>   fprintf (ira_dump_file, "Assigning %d to a%dr%d\n",
>best, ALLOCNO_NUM (a), ALLOCNO_REGNO (a));


[PATCH] ira: update allocated_hardreg_p[] in improve_allocation() [PR110254]

2023-07-21 Thread Surya Kumari Jangala via Gcc-patches
The improve_allocation() routine does not update the
allocated_hardreg_p[] array after an allocno is assigned a register.

If the register chosen in improve_allocation() is one that already has
been assigned to a conflicting allocno, then allocated_hardreg_p[]
already has the corresponding bit set to TRUE, so nothing needs to be
done.

But improve_allocation() can also choose a register that has not been
assigned to a conflicting allocno, and also has not been assigned to any
other allocno. In this case, allocated_hardreg_p[] has to be updated.

2023-07-21  Surya Kumari Jangala  

gcc/
PR rtl-optimization/PR110254
* ira-color.cc (improve_allocation): Update array
---

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 1fb2958bddd..5807d6d26f6 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -3340,6 +3340,10 @@ improve_allocation (void)
}
   /* Assign the best chosen hard register to A.  */
   ALLOCNO_HARD_REGNO (a) = best;
+
+  for (j = nregs - 1; j >= 0; j--)
+   allocated_hardreg_p[best + j] = true;
+
   if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
fprintf (ira_dump_file, "Assigning %d to a%dr%d\n",
 best, ALLOCNO_NUM (a), ALLOCNO_REGNO (a));


Re: [PATCH v2] rs6000: fmr gets used instead of faster xxlor [PR93571]

2023-06-14 Thread Surya Kumari Jangala via Gcc-patches



On 25/02/23 3:20 pm, Ajit Agarwal via Gcc-patches wrote:
> Hello All:
> 
> Here is the patch that uses xxlor instead of fmr where possible.
> Performance results shows that fmr is better in power9 and 
> power10 architectures whereas xxlor is better in power7 and
> power 8 architectures. fmr is the only option before p7.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu
> 
> Thanks & Regards
> Ajit
> 
>   rs6000: Use xxlor instead of fmr where possible
> 
>   Replaces fmr with xxlor instruction for power7 and power8
>   architectures whereas for power9 and power10 keep fmr
>   instruction.
> 
>   Perf measurement results:
> 
>   Power9 fmr:  201,847,661 cycles.
>   Power9 xxlor: 201,877,78 cycles.
>   Power8 fmr: 200,901,043 cycles.
>   Power8 xxlor: 201,020,518 cycles.

'fmr' is better than 'xxlor' for power8 according to the above numbers. Should 
we then replace fmr with xxlor?

-Surya


Re: [PATCH v4 3/4] ree: Main functionality to improve ree pass for rs6000 target.

2023-04-24 Thread Surya Kumari Jangala via Gcc-patches



On 21/04/23 8:51 pm, Ajit Agarwal via Gcc-patches wrote:

> +/* Return TRUE if the cfg has following properties.
> + bb1
> + |\
> + | \
> + |  bb2
> + |  /
> + bb3
> +
> +   whereas bb1 has IF_THEN_ELSE  and bb2 has the definition and bb3 has
> +   zero/sign/AND extensions.  */
> +

Any specific reason for requiring CFGs to have only this particular shape? The 
patch should be generic enough to work for all CFGs.

Regards,
Surya

> +static bool
> +feasible_cfg (ext_cand *cand, rtx_insn *def_insn)
> +{
> +  basic_block bb = BLOCK_FOR_INSN (cand->insn);
> +  edge fallthru_edge;
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +{
> +  rtx_insn *insn = BB_END (e->src) ? PREV_INSN (BB_END (e->src)) : NULL;
> +
> +  if (insn == NULL)
> + continue;
> +
> +  if (DEBUG_INSN_P (insn))
> + continue;
> +
> +  rtx set = single_set (insn);
> +
> +  /* Block has IF_THEN_ELSE  */
> +  if (insn && set
> +   && GET_CODE (set) == SET && SET_SRC (set)
> +   && GET_CODE (SET_SRC (set)) == IF_THEN_ELSE)
> + {
> +   if (e->dest == bb)
> + {
> +   basic_block jump_block = e->dest;
> +   if (jump_block != bb)
> + return false;
> +  }
> +  else
> +{
> +  /* def_insn block has single successor and fall through
> + edge target are the block for cand insn.  */
> +  if (single_succ_p (e->dest))
> +{
> +  fallthru_edge = single_succ_edge (e->dest);
> +  if (BB_END (fallthru_edge->dest)
> +  && bb != fallthru_edge->dest)
> +return false;
> +}
> + }
> +   }
> +}
> +
> +  /* def_insn block has single successor and fall through
> + edge target are the block for cand insn.  */
> +  if (single_succ_p (BLOCK_FOR_INSN (def_insn)))
> +{
> +  fallthru_edge = single_succ_edge (BLOCK_FOR_INSN (def_insn));
> +  if (BB_END (fallthru_edge->dest)
> +   && bb != fallthru_edge->dest)
> + return false;
> +}
> +   else
> + return false;
> +
> +  return true;
> +}
> +
> +/* Return TRUE if the candidate extension INSN and def_insn are
> +   feasible for extension elimination.
> +
> +   Things to consider:
> +
> +   cfg properties are feasible for extension elimination.
> +
> +   sign_extend with def insn as PLUS and the reaching definition
> +   of def_insn are not ASHIFT and LSHIFTRT.
> +
> +   zero_extend with def insn as XOR/IOR and the reachin definition
> +   of def_insn are not ASHIFT and LSHIFTRT.
> +
> +   The destination register of the extension insn must not be
> +   used or set between the def_insn and cand->insn exclusive.
> +
> +   AND with zero extension properties has USE and the register
> +   of cand insn are same as register of USE operand.  */
> +
> +static bool
> +eliminate_across_bbs_p (ext_cand *cand, rtx_insn *def_insn)
> +{
> +  basic_block bb = BLOCK_FOR_INSN (cand->insn);
> +
> +  if (!feasible_cfg (cand, def_insn))
> +return false;
> +
> +  rtx cand_set = single_set(cand->insn);
> +  /* The destination register of the extension insn must not be
> +  used or set between the def_insn and cand->insn exclusive.  */
> +  if (INSN_CHAIN_CODE_P (GET_CODE (def_insn))
> +  && INSN_CHAIN_CODE_P (cand->code))
> +if ((cand->code == ZERO_EXTEND)
> +  && REG_P (SET_DEST (cand_set)) && NEXT_INSN (def_insn)
> +  && reg_used_set_between_p(SET_DEST (cand_set), def_insn, cand->insn))
> +  return false;
> +
> +  if (cand->code == ZERO_EXTEND
> +  && (bb != BLOCK_FOR_INSN (def_insn)
> +  || DF_INSN_LUID (def_insn) > DF_INSN_LUID (cand->insn)))
> +return false;
> +
> +  if (rtx_is_zext_p (cand->insn))
> +{
> +  if (GET_CODE (PATTERN (BB_END (bb))) != USE)
> + return false;
> +
> +  if (REGNO (XEXP (PATTERN (BB_END (bb)), 0)) != REGNO (SET_DEST 
> (cand->expr)))
> + return false;
> +}
> +
> +  rtx set = single_set (def_insn);
> +
> +  if (!set)
> +return false;
> +
> +  if (cand->code == SIGN_EXTEND
> +  && GET_CODE (set) == SET)
> +{
> +  rtx orig_src = SET_SRC (set);
> +  machine_mode ext_src_mode;
> +
> +  ext_src_mode = GET_MODE (XEXP (SET_SRC (cand->expr), 0));
> +
> +  if (GET_MODE (SET_DEST (set)) != ext_src_mode)
> + return false;
> +
> +  if (GET_CODE (orig_src) != PLUS)
> + return false;
> +
> +  if (!REG_P (XEXP (orig_src, 0)))
> + return false;
> +
> +  if (!REG_P (XEXP (orig_src,1)))
> + return false;
> +
> +  if (GET_CODE (orig_src) == PLUS)
> + {
> +   bool def_src1
> + = def_arith_p (def_insn,
> +XEXP (SET_SRC (set), 0));
> +   bool def_src2
> + = def_arith_p (def_insn,
> +XEXP (SET_SRC (set), 1));
> +
> +   if (def_src1 || def_src2)
> + return false;
> + }
> +  

Re: [PATCH] rs6000: suboptimal code for returning bool value on target ppc

2023-03-16 Thread Surya Kumari Jangala via Gcc-patches
The issue of suboptimal code exists even for integer return value and not just 
bool return value. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103784#c9 
So the patch would need to take care of integer return values too.

On 16/03/23 10:50 am, Ajit Agarwal via Gcc-patches wrote:
> Hello All:
> 
> 
> This patch eliminates unnecessary zero extension instruction from power 
> generated assembly.
> Bootstrapped and regtested on powerpc64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
>   rs6000: suboptimal code for returning bool value on target ppc.
> 
>   New pass to eliminate unnecessary zero extension. This pass
>   is registered after cse rtl pass.
> 
>   2023-03-16  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-passes.def: Registered zero elimination
>   pass.
>   * config/rs6000/rs6000-zext-elim.cc: Add new pass.
>   * config.gcc: Add new executable.
>   * config/rs6000/rs6000-protos.h: Add new prototype for zero
>   elimination pass.
>   * config/rs6000/rs6000.cc: Add new prototype for zero
>   elimination pass.
>   * config/rs6000/t-rs6000: Add new rule.
>   * expr.cc: Modified gcc assert.
>   * explow.cc: Modified gcc assert.
>   * optabs.cc: Modified gcc assert.
> ---
>  gcc/config.gcc|   4 +-
>  gcc/config/rs6000/rs6000-passes.def   |   2 +
>  gcc/config/rs6000/rs6000-protos.h |   1 +
>  gcc/config/rs6000/rs6000-zext-elim.cc | 361 ++
>  gcc/config/rs6000/rs6000.cc   |   2 +
>  gcc/config/rs6000/t-rs6000|   5 +
>  gcc/explow.cc |   3 +-
>  gcc/expr.cc   |   4 +-
>  gcc/optabs.cc |   3 +-
>  9 files changed, 379 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/config/rs6000/rs6000-zext-elim.cc
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index da3a6d3ba1f..e8ac9d882f0 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -503,7 +503,7 @@ or1k*-*-*)
>   ;;
>  powerpc*-*-*)
>   cpu_type=rs6000
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-zext-elim.o 
> rs6000-logue.o"
>   extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>   extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
>   extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
> @@ -538,7 +538,7 @@ riscv*)
>   ;;
>  rs6000*-*-*)
>   extra_options="${extra_options} g.opt fused-madd.opt 
> rs6000/rs6000-tables.opt"
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-zext-elim.o 
> rs6000-logue.o"
>   extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>   target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-logue.cc 
> \$(srcdir)/config/rs6000/rs6000-call.cc"
>   target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
> diff --git a/gcc/config/rs6000/rs6000-passes.def 
> b/gcc/config/rs6000/rs6000-passes.def
> index ca899d5f7af..d7500feddf1 100644
> --- a/gcc/config/rs6000/rs6000-passes.def
> +++ b/gcc/config/rs6000/rs6000-passes.def
> @@ -28,6 +28,8 @@ along with GCC; see the file COPYING3.  If not see
>   The power8 does not have instructions that automaticaly do the byte 
> swaps
>   for loads and stores.  */
>INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
> +  INSERT_PASS_AFTER (pass_cse, 1, pass_analyze_zext);
> +
>  
>/* Pass to do the PCREL_OPT optimization that combines the load of an
>   external symbol's address along with a single load or store using that
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index 1a4fc1df668..f6cf2d673d4 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -340,6 +340,7 @@ namespace gcc { class context; }
>  class rtl_opt_pass;
>  
>  extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
> +extern rtl_opt_pass *make_pass_analyze_zext (gcc::context *);
>  extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *);
>  extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
>  extern bool rs6000_quadword_masked_address_p (const_rtx exp);
> diff --git a/gcc/config/rs6000/rs6000-zext-elim.cc 
> b/gcc/config/rs6000/rs6000-zext-elim.cc
> new file mode 100644
> index 000..777c7a5a387
> --- /dev/null
> +++ b/gcc/config/rs6000/rs6000-zext-elim.cc
> @@ -0,0 +1,361 @@
> +/* Subroutine to eliminate redundant zero extend for power architecture.
> +   Copyright (C) 1991-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.

Re: [PATCH] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-03-03 Thread Surya Kumari Jangala via Gcc-patches



On 27/02/23 9:58 pm, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Jan 04, 2023 at 01:58:19PM +0530, Surya Kumari Jangala wrote:
>> In the routine rs6000_analyze_swaps(), special handling of swappable
>> instructions is done even if the webs that contain the swappable
>> instructions are not optimized, i.e., the webs do not contain any
>> permuting load/store instructions along with the associated register
>> swap instructions. Doing special handling in such webs will result in
>> the extracted lane being adjusted unnecessarily for vec_extract.
>>
>> Modifying swappable instructions is also incorrect in webs where
>> loads/stores on quad word aligned addresses are changed to lvx/stvx.
>> Similarly, in webs where swap(load(vector constant)) instructions are
>> replaced with load(swapped vector constant), the swappable
>> instructions should not be modified.
>>
>> 2023-01-04  Surya Kumari Jangala  
>>
>> gcc/
>>  PR rtl-optimization/106770
>>  * rs6000-p8swap.cc (rs6000_analyze_swaps): .
> 
> Please add an entry?  Or multiple ones, actually.  Describe all changes.
Ok

> 
>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>>unsigned int special_handling : 4;
>>/* Set if the web represented by this entry cannot be optimized.  */
>>unsigned int web_not_optimizable : 1;
>> +  /* Set if the web represented by this entry has been optimized, ie,
> 
> s/ie/i.e./
> 
>> + register swaps of permuting loads/stores have been removed.  */
> 
> If it really means only exactly this, then the name isn't so good.

There is another bit in this class named "web_not_optimizable". This stands for 
webs that cannot be optimized. I am reusing this name for the new bit I am 
adding, and I have named this bit as "web_is_optimized".

> 
>> +  unsigned int web_is_optimized : 1;
> 
> And if it is as general as the name suggests, then the comment is no
> good.  Which is it?  :-)
> 
>>/* For each load and store in an optimizable web (which implies
>>   the loads and stores are permuting), find the associated
>>   register swaps and mark them for removal.  Due to various
>> - optimizations we may mark the same swap more than once.  Also
>> - perform special handling for swappable insns that require it.  */
>> + optimizations we may mark the same swap more than once. Fix up
>> + the non-permuting loads and stores by converting them into
>> + permuting ones.  */
> 
> Two spaces after a full stop is correct.  Please put that back.
Ok

> 
> Is it a good idea convert from/to swapping load/stores in this pass at
> all?  Doesdn't that belong elsewhere?  Like, in combine, where we
> already should do this.  Why does that not work> 
>> -if (!root_entry->web_not_optimizable)
>> +if (!root_entry->web_not_optimizable) {
> 
> Blocks start on a new line, indented.
Ok

> 
>>mark_swaps_for_removal (insn_entry, i);
>> +  root_entry->web_is_optimized = true;
> 
> Indent using tabs where possible.
Ok

> 
>> +swap_web_entry* root_entry
>> +  = (swap_web_entry*)((_entry[i])->unionfind_root ());
> 
> Space before *, in all cases. Space before the second (.  There are too
> many brackets here, too.
Ok

> 
>> +  /* Perform special handling for swappable insns that require it. 
> 
> No trailing spaces.
Ok

> 
>> + Note that special handling should be done only for those 
>> + swappable insns that are present in webs optimized above.  */
>> +  for (i = 0; i < e; ++i)
>> +if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
>> +!(insn_entry[i].special_handling == SH_NOSWAP_LD || 
>> +  insn_entry[i].special_handling == SH_NOSWAP_ST))
>>{
>>  swap_web_entry* root_entry
>>= (swap_web_entry*)((_entry[i])->unionfind_root ());
>> -if (!root_entry->web_not_optimizable)
>> +if (root_entry->web_is_optimized)
>>handle_special_swappables (insn_entry, i);
>>}
> 
> Why this change?

The swap pass analyzes vector computations and removes unnecessary doubleword 
swaps (xxswapdi instructions). The swap pass first constructs webs and removes 
swap instructions if possible. If the web contains operations that are 
sensitive to element order (ie, insns requiring special handling), such as an 
extract, then such instructions should be modified. For example, for an extract 
operation the lane is changed.

[PING 3] [PATCH] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-02-27 Thread Surya Kumari Jangala via Gcc-patches
Hello,

Ping https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609374.html

Thanks,

Surya

On 04/01/23 1:58 pm, Surya Kumari Jangala via Gcc-patches wrote:
> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
> 
> In the routine rs6000_analyze_swaps(), special handling of swappable
> instructions is done even if the webs that contain the swappable
> instructions are not optimized, i.e., the webs do not contain any
> permuting load/store instructions along with the associated register
> swap instructions. Doing special handling in such webs will result in
> the extracted lane being adjusted unnecessarily for vec_extract.
> 
> Modifying swappable instructions is also incorrect in webs where
> loads/stores on quad word aligned addresses are changed to lvx/stvx.
> Similarly, in webs where swap(load(vector constant)) instructions are
> replaced with load(swapped vector constant), the swappable
> instructions should not be modified.
> 
> 2023-01-04  Surya Kumari Jangala  
> 
> gcc/
>   PR rtl-optimization/106770
>   * rs6000-p8swap.cc (rs6000_analyze_swaps): .
> 
> gcc/testsuite/
>   PR rtl-optimization/106770
>   * gcc.target/powerpc/pr106770.c: New test.
> ---
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
> b/gcc/config/rs6000/rs6000-p8swap.cc
> index 19fbbfb67dc..7ed39251df9 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>unsigned int special_handling : 4;
>/* Set if the web represented by this entry cannot be optimized.  */
>unsigned int web_not_optimizable : 1;
> +  /* Set if the web represented by this entry has been optimized, ie,
> + register swaps of permuting loads/stores have been removed.  */
> +  unsigned int web_is_optimized : 1;
>/* Set if this insn should be deleted.  */
>unsigned int will_delete : 1;
>  };
> @@ -2627,22 +2630,43 @@ rs6000_analyze_swaps (function *fun)
>/* For each load and store in an optimizable web (which implies
>   the loads and stores are permuting), find the associated
>   register swaps and mark them for removal.  Due to various
> - optimizations we may mark the same swap more than once.  Also
> - perform special handling for swappable insns that require it.  */
> + optimizations we may mark the same swap more than once. Fix up
> + the non-permuting loads and stores by converting them into
> + permuting ones.  */
>for (i = 0; i < e; ++i)
>  if ((insn_entry[i].is_load || insn_entry[i].is_store)
>   && insn_entry[i].is_swap)
>{
>   swap_web_entry* root_entry
> = (swap_web_entry*)((_entry[i])->unionfind_root ());
> - if (!root_entry->web_not_optimizable)
> + if (!root_entry->web_not_optimizable) {
> mark_swaps_for_removal (insn_entry, i);
> +  root_entry->web_is_optimized = true;
> +}
>}
> -else if (insn_entry[i].is_swappable && insn_entry[i].special_handling)
> +else if (insn_entry[i].is_swappable
> + && (insn_entry[i].special_handling == SH_NOSWAP_LD ||
> + insn_entry[i].special_handling == SH_NOSWAP_ST))
> +  {
> +swap_web_entry* root_entry
> +  = (swap_web_entry*)((_entry[i])->unionfind_root ());
> +if (!root_entry->web_not_optimizable) {
> +  handle_special_swappables (insn_entry, i);
> +  root_entry->web_is_optimized = true;
> +}
> +  }
> +
> +  /* Perform special handling for swappable insns that require it. 
> + Note that special handling should be done only for those 
> + swappable insns that are present in webs optimized above.  */
> +  for (i = 0; i < e; ++i)
> +if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
> +!(insn_entry[i].special_handling == SH_NOSWAP_LD || 
> +  insn_entry[i].special_handling == SH_NOSWAP_ST))
>{
>   swap_web_entry* root_entry
> = (swap_web_entry*)((_entry[i])->unionfind_root ());
> - if (!root_entry->web_not_optimizable)
> + if (root_entry->web_is_optimized)
> handle_special_swappables (insn_entry, i);
>}
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> new file mode 100644
> index 000..84e9aead975
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O3 " } 

Re: [PING 2] [PATCH] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-02-16 Thread Surya Kumari Jangala via Gcc-patches
Ping. Please review the patch.

On 12/01/23 10:21 pm, Surya Kumari Jangala via Gcc-patches wrote:
> Ping
> 
> On 04/01/23 1:58 pm, Surya Kumari Jangala via Gcc-patches wrote:
>> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
>>
>> In the routine rs6000_analyze_swaps(), special handling of swappable
>> instructions is done even if the webs that contain the swappable
>> instructions are not optimized, i.e., the webs do not contain any
>> permuting load/store instructions along with the associated register
>> swap instructions. Doing special handling in such webs will result in
>> the extracted lane being adjusted unnecessarily for vec_extract.
>>
>> Modifying swappable instructions is also incorrect in webs where
>> loads/stores on quad word aligned addresses are changed to lvx/stvx.
>> Similarly, in webs where swap(load(vector constant)) instructions are
>> replaced with load(swapped vector constant), the swappable
>> instructions should not be modified.
>>
>> 2023-01-04  Surya Kumari Jangala  
>>
>> gcc/
>>  PR rtl-optimization/106770
>>  * rs6000-p8swap.cc (rs6000_analyze_swaps): .
>>
>> gcc/testsuite/
>>  PR rtl-optimization/106770
>>  * gcc.target/powerpc/pr106770.c: New test.
>> ---
>>
>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
>> b/gcc/config/rs6000/rs6000-p8swap.cc
>> index 19fbbfb67dc..7ed39251df9 100644
>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>>unsigned int special_handling : 4;
>>/* Set if the web represented by this entry cannot be optimized.  */
>>unsigned int web_not_optimizable : 1;
>> +  /* Set if the web represented by this entry has been optimized, ie,
>> + register swaps of permuting loads/stores have been removed.  */
>> +  unsigned int web_is_optimized : 1;
>>/* Set if this insn should be deleted.  */
>>unsigned int will_delete : 1;
>>  };
>> @@ -2627,22 +2630,43 @@ rs6000_analyze_swaps (function *fun)
>>/* For each load and store in an optimizable web (which implies
>>   the loads and stores are permuting), find the associated
>>   register swaps and mark them for removal.  Due to various
>> - optimizations we may mark the same swap more than once.  Also
>> - perform special handling for swappable insns that require it.  */
>> + optimizations we may mark the same swap more than once. Fix up
>> + the non-permuting loads and stores by converting them into
>> + permuting ones.  */
>>for (i = 0; i < e; ++i)
>>  if ((insn_entry[i].is_load || insn_entry[i].is_store)
>>  && insn_entry[i].is_swap)
>>{
>>  swap_web_entry* root_entry
>>= (swap_web_entry*)((_entry[i])->unionfind_root ());
>> -if (!root_entry->web_not_optimizable)
>> +if (!root_entry->web_not_optimizable) {
>>mark_swaps_for_removal (insn_entry, i);
>> +  root_entry->web_is_optimized = true;
>> +}
>>}
>> -else if (insn_entry[i].is_swappable && insn_entry[i].special_handling)
>> +else if (insn_entry[i].is_swappable
>> + && (insn_entry[i].special_handling == SH_NOSWAP_LD ||
>> + insn_entry[i].special_handling == SH_NOSWAP_ST))
>> +  {
>> +swap_web_entry* root_entry
>> +  = (swap_web_entry*)((_entry[i])->unionfind_root ());
>> +if (!root_entry->web_not_optimizable) {
>> +  handle_special_swappables (insn_entry, i);
>> +  root_entry->web_is_optimized = true;
>> +}
>> +  }
>> +
>> +  /* Perform special handling for swappable insns that require it. 
>> + Note that special handling should be done only for those 
>> + swappable insns that are present in webs optimized above.  */
>> +  for (i = 0; i < e; ++i)
>> +if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
>> +!(insn_entry[i].special_handling == SH_NOSWAP_LD || 
>> +  insn_entry[i].special_handling == SH_NOSWAP_ST))
>>{
>>  swap_web_entry* root_entry
>>= (swap_web_entry*)((_entry[i])->unionfind_root ());
>> -if (!root_entry->web_not_optimizable)
>> +if (root_entry->web_is_optimized)
>>handle_special_swappables (insn_entry, i);
>>}
>>  
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c 
>> 

[PING] [PATCH] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-01-12 Thread Surya Kumari Jangala via Gcc-patches
Ping

On 04/01/23 1:58 pm, Surya Kumari Jangala via Gcc-patches wrote:
> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
> 
> In the routine rs6000_analyze_swaps(), special handling of swappable
> instructions is done even if the webs that contain the swappable
> instructions are not optimized, i.e., the webs do not contain any
> permuting load/store instructions along with the associated register
> swap instructions. Doing special handling in such webs will result in
> the extracted lane being adjusted unnecessarily for vec_extract.
> 
> Modifying swappable instructions is also incorrect in webs where
> loads/stores on quad word aligned addresses are changed to lvx/stvx.
> Similarly, in webs where swap(load(vector constant)) instructions are
> replaced with load(swapped vector constant), the swappable
> instructions should not be modified.
> 
> 2023-01-04  Surya Kumari Jangala  
> 
> gcc/
>   PR rtl-optimization/106770
>   * rs6000-p8swap.cc (rs6000_analyze_swaps): .
> 
> gcc/testsuite/
>   PR rtl-optimization/106770
>   * gcc.target/powerpc/pr106770.c: New test.
> ---
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
> b/gcc/config/rs6000/rs6000-p8swap.cc
> index 19fbbfb67dc..7ed39251df9 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>unsigned int special_handling : 4;
>/* Set if the web represented by this entry cannot be optimized.  */
>unsigned int web_not_optimizable : 1;
> +  /* Set if the web represented by this entry has been optimized, ie,
> + register swaps of permuting loads/stores have been removed.  */
> +  unsigned int web_is_optimized : 1;
>/* Set if this insn should be deleted.  */
>unsigned int will_delete : 1;
>  };
> @@ -2627,22 +2630,43 @@ rs6000_analyze_swaps (function *fun)
>/* For each load and store in an optimizable web (which implies
>   the loads and stores are permuting), find the associated
>   register swaps and mark them for removal.  Due to various
> - optimizations we may mark the same swap more than once.  Also
> - perform special handling for swappable insns that require it.  */
> + optimizations we may mark the same swap more than once. Fix up
> + the non-permuting loads and stores by converting them into
> + permuting ones.  */
>for (i = 0; i < e; ++i)
>  if ((insn_entry[i].is_load || insn_entry[i].is_store)
>   && insn_entry[i].is_swap)
>{
>   swap_web_entry* root_entry
> = (swap_web_entry*)((_entry[i])->unionfind_root ());
> - if (!root_entry->web_not_optimizable)
> + if (!root_entry->web_not_optimizable) {
> mark_swaps_for_removal (insn_entry, i);
> +  root_entry->web_is_optimized = true;
> +}
>}
> -else if (insn_entry[i].is_swappable && insn_entry[i].special_handling)
> +else if (insn_entry[i].is_swappable
> + && (insn_entry[i].special_handling == SH_NOSWAP_LD ||
> + insn_entry[i].special_handling == SH_NOSWAP_ST))
> +  {
> +swap_web_entry* root_entry
> +  = (swap_web_entry*)((_entry[i])->unionfind_root ());
> +if (!root_entry->web_not_optimizable) {
> +  handle_special_swappables (insn_entry, i);
> +  root_entry->web_is_optimized = true;
> +}
> +  }
> +
> +  /* Perform special handling for swappable insns that require it. 
> + Note that special handling should be done only for those 
> + swappable insns that are present in webs optimized above.  */
> +  for (i = 0; i < e; ++i)
> +if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
> +!(insn_entry[i].special_handling == SH_NOSWAP_LD || 
> +  insn_entry[i].special_handling == SH_NOSWAP_ST))
>{
>   swap_web_entry* root_entry
> = (swap_web_entry*)((_entry[i])->unionfind_root ());
> - if (!root_entry->web_not_optimizable)
> + if (root_entry->web_is_optimized)
> handle_special_swappables (insn_entry, i);
>}
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> new file mode 100644
> index 000..84e9aead975
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O3 " } */
> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
> +
> +/* Test case to resolve PR106770  */
> +
> +#include 
> +
> +int cmp2(double a, double b)
> +{
> +vector double va = vec_promote(a, 1);
> +vector double vb = vec_promote(b, 1);
> +vector long long vlt = (vector long long)vec_cmplt(va, vb);
> +vector long long vgt = (vector long long)vec_cmplt(vb, va);
> +vector signed long long vr = vec_sub(vlt, vgt);
> +
> +return vec_extract(vr, 1);
> +}
> +


[PATCH] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-01-04 Thread Surya Kumari Jangala via Gcc-patches
swap: Fix incorrect lane extraction by vec_extract() [PR106770]

In the routine rs6000_analyze_swaps(), special handling of swappable
instructions is done even if the webs that contain the swappable
instructions are not optimized, i.e., the webs do not contain any
permuting load/store instructions along with the associated register
swap instructions. Doing special handling in such webs will result in
the extracted lane being adjusted unnecessarily for vec_extract.

Modifying swappable instructions is also incorrect in webs where
loads/stores on quad word aligned addresses are changed to lvx/stvx.
Similarly, in webs where swap(load(vector constant)) instructions are
replaced with load(swapped vector constant), the swappable
instructions should not be modified.

2023-01-04  Surya Kumari Jangala  

gcc/
PR rtl-optimization/106770
* rs6000-p8swap.cc (rs6000_analyze_swaps): .

gcc/testsuite/
PR rtl-optimization/106770
* gcc.target/powerpc/pr106770.c: New test.
---

diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
b/gcc/config/rs6000/rs6000-p8swap.cc
index 19fbbfb67dc..7ed39251df9 100644
--- a/gcc/config/rs6000/rs6000-p8swap.cc
+++ b/gcc/config/rs6000/rs6000-p8swap.cc
@@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
   unsigned int special_handling : 4;
   /* Set if the web represented by this entry cannot be optimized.  */
   unsigned int web_not_optimizable : 1;
+  /* Set if the web represented by this entry has been optimized, ie,
+ register swaps of permuting loads/stores have been removed.  */
+  unsigned int web_is_optimized : 1;
   /* Set if this insn should be deleted.  */
   unsigned int will_delete : 1;
 };
@@ -2627,22 +2630,43 @@ rs6000_analyze_swaps (function *fun)
   /* For each load and store in an optimizable web (which implies
  the loads and stores are permuting), find the associated
  register swaps and mark them for removal.  Due to various
- optimizations we may mark the same swap more than once.  Also
- perform special handling for swappable insns that require it.  */
+ optimizations we may mark the same swap more than once. Fix up
+ the non-permuting loads and stores by converting them into
+ permuting ones.  */
   for (i = 0; i < e; ++i)
 if ((insn_entry[i].is_load || insn_entry[i].is_store)
&& insn_entry[i].is_swap)
   {
swap_web_entry* root_entry
  = (swap_web_entry*)((_entry[i])->unionfind_root ());
-   if (!root_entry->web_not_optimizable)
+   if (!root_entry->web_not_optimizable) {
  mark_swaps_for_removal (insn_entry, i);
+  root_entry->web_is_optimized = true;
+}
   }
-else if (insn_entry[i].is_swappable && insn_entry[i].special_handling)
+else if (insn_entry[i].is_swappable
+ && (insn_entry[i].special_handling == SH_NOSWAP_LD ||
+ insn_entry[i].special_handling == SH_NOSWAP_ST))
+  {
+swap_web_entry* root_entry
+  = (swap_web_entry*)((_entry[i])->unionfind_root ());
+if (!root_entry->web_not_optimizable) {
+  handle_special_swappables (insn_entry, i);
+  root_entry->web_is_optimized = true;
+}
+  }
+
+  /* Perform special handling for swappable insns that require it. 
+ Note that special handling should be done only for those 
+ swappable insns that are present in webs optimized above.  */
+  for (i = 0; i < e; ++i)
+if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
+!(insn_entry[i].special_handling == SH_NOSWAP_LD || 
+  insn_entry[i].special_handling == SH_NOSWAP_ST))
   {
swap_web_entry* root_entry
  = (swap_web_entry*)((_entry[i])->unionfind_root ());
-   if (!root_entry->web_not_optimizable)
+   if (root_entry->web_is_optimized)
  handle_special_swappables (insn_entry, i);
   }
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c 
b/gcc/testsuite/gcc.target/powerpc/pr106770.c
new file mode 100644
index 000..84e9aead975
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 " } */
+/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
+
+/* Test case to resolve PR106770  */
+
+#include 
+
+int cmp2(double a, double b)
+{
+vector double va = vec_promote(a, 1);
+vector double vb = vec_promote(b, 1);
+vector long long vlt = (vector long long)vec_cmplt(va, vb);
+vector long long vgt = (vector long long)vec_cmplt(vb, va);
+vector signed long long vr = vec_sub(vlt, vgt);
+
+return vec_extract(vr, 1);
+}
+


Re: [PATCH] sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

2022-11-08 Thread Surya Kumari Jangala via Gcc-patches
Hi Richard,

On 21/09/22 1:03 pm, Richard Biener wrote:
> On Tue, Sep 20, 2022 at 9:18 AM Surya Kumari Jangala via Gcc-patches
>  wrote:
>>
>> Hi Jeff, Richard,
>> Thank you for reviewing the patch!
>> I have committed the patch to the gcc repo.
>> Can I backport this patch to prior versions of gcc, as this is an easy patch 
>> to backport and the issue exists in prior versions too?
> 
> It doesn't seem to be a regression so I'd error on the safe side here.

Can you please clarify, should this patch be backported? It is not very clear 
what "safe side" means here.

Thanks!
Surya

> 
> Richard.
> 
>> Regards,
>> Surya
>>
>>
>> On 31/08/22 9:09 pm, Jeff Law via Gcc-patches wrote:
>>>
>>>
>>> On 8/23/2022 5:49 AM, Surya Kumari Jangala via Gcc-patches wrote:
>>>> sched1: Fix -fcompare-debug issue in schedule_region [PR105586]
>>>>
>>>> In schedule_region(), a basic block that does not contain any real insns
>>>> is not scheduled and the dfa state at the entry of the bb is not copied
>>>> to the fallthru basic block. However a DEBUG insn is treated as a real
>>>> insn, and if a bb contains non-real insns and a DEBUG insn, it's dfa
>>>> state is copied to the fallthru bb. This was resulting in
>>>> -fcompare-debug failure as the incoming dfa state of the fallthru block
>>>> is different with -g. We should always copy the dfa state of a bb to
>>>> it's fallthru bb even if the bb does not contain real insns.
>>>>
>>>> 2022-08-22  Surya Kumari Jangala  
>>>>
>>>> gcc/
>>>> PR rtl-optimization/105586
>>>> * sched-rgn.cc (schedule_region): Always copy dfa state to
>>>> fallthru block.
>>>>
>>>> gcc/testsuite/
>>>> PR rtl-optimization/105586
>>>> * gcc.target/powerpc/pr105586.c: New test.
>>> Interesting.We may have stumbled over this bug internally a little 
>>> while ago -- not from a compare-debug standpoint, but from a "why isn't the 
>>> processor state copied to the fallthru block" point of view.   I had it on 
>>> my to investigate list, but hadn't gotten around to it yet.
>>>
>>> I think there were requests for ChangeLog updates and a function comment 
>>> for save_state_for_fallthru_edge.  OK with those updates.
>>>
>>> jeff
>>>


[PATCH] testsuite: Fix failure in test pr105586.c [PR107171]

2022-10-13 Thread Surya Kumari Jangala via Gcc-patches
testsuite: Fix failure in test pr105586.c [PR107171]

The test pr105586.c fails on a big endian system when run in 32bit
mode. The failure occurs as the test case does not guard against
unsupported __int128.

2022-10-13  Surya Kumari Jangala  

gcc/testsuite/
PR testsuite/107171
* gcc.target/powerpc/pr105586.c: Guard against unsupported
__int128.


diff --git a/gcc/testsuite/gcc.target/powerpc/pr105586.c 
b/gcc/testsuite/gcc.target/powerpc/pr105586.c
index bd397f5..3f88a09 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr105586.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr105586.c
@@ -1,4 +1,5 @@
 /* { dg-options "-mdejagnu-tune=power4 -O2 -fcompare-debug -fno-if-conversion 
-fno-guess-branch-probability" } */
+/* { dg-require-effective-target int128 } */
 
 extern int bar(int i);


Re: [PATCH] sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

2022-09-20 Thread Surya Kumari Jangala via Gcc-patches
Hi Jeff, Richard,
Thank you for reviewing the patch!
I have committed the patch to the gcc repo.
Can I backport this patch to prior versions of gcc, as this is an easy patch to 
backport and the issue exists in prior versions too?

Regards,
Surya


On 31/08/22 9:09 pm, Jeff Law via Gcc-patches wrote:
> 
> 
> On 8/23/2022 5:49 AM, Surya Kumari Jangala via Gcc-patches wrote:
>> sched1: Fix -fcompare-debug issue in schedule_region [PR105586]
>>
>> In schedule_region(), a basic block that does not contain any real insns
>> is not scheduled and the dfa state at the entry of the bb is not copied
>> to the fallthru basic block. However a DEBUG insn is treated as a real
>> insn, and if a bb contains non-real insns and a DEBUG insn, it's dfa
>> state is copied to the fallthru bb. This was resulting in
>> -fcompare-debug failure as the incoming dfa state of the fallthru block
>> is different with -g. We should always copy the dfa state of a bb to
>> it's fallthru bb even if the bb does not contain real insns.
>>
>> 2022-08-22  Surya Kumari Jangala  
>>
>> gcc/
>> PR rtl-optimization/105586
>> * sched-rgn.cc (schedule_region): Always copy dfa state to
>> fallthru block.
>>
>> gcc/testsuite/
>> PR rtl-optimization/105586
>> * gcc.target/powerpc/pr105586.c: New test.
> Interesting.    We may have stumbled over this bug internally a little while 
> ago -- not from a compare-debug standpoint, but from a "why isn't the 
> processor state copied to the fallthru block" point of view.   I had it on my 
> to investigate list, but hadn't gotten around to it yet.
> 
> I think there were requests for ChangeLog updates and a function comment for 
> save_state_for_fallthru_edge.  OK with those updates.
> 
> jeff
> 


Re: [PATCH] sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

2022-08-24 Thread Surya Kumari Jangala via Gcc-patches
Hi Peter, Segher,
Thanks for going thru the patch!
I will make the proposed changes to the Changelog.

Regards,
Surya


On 23/08/22 6:58 pm, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Aug 23, 2022 at 07:55:22AM -0500, Peter Bergner wrote:
>> It looks good to me, but I cannot approve it.
> 
> Same here (both statements).
> 
>> That said, you're missing
>> a ChangeLog entry for your new helper function.  The ChangeLog mentions
>> what changed, not why it changed.
> 
> And that is correct!  Changelogs should not say that, that isn't their
> purpose (in GCC), not what they are used for.  Explanations like that go
> in the email and/or the commit message.
> 
> The main remaining usefulness of changelogs is to spot unintended
> commmits.
> 
>> Maybe something like the following?
>>
>> gcc/
>>  PR rtl-optimization/105586
>>  * sched-rgn.cc (save_state_for_fallthru_edge): New function.
>>  (schedule_region): Use it for all blocks.
> 
> That looks perfect, it doesn't say "why" either :-)
> 
> 
> Segher


[PATCH] sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

2022-08-23 Thread Surya Kumari Jangala via Gcc-patches
sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

In schedule_region(), a basic block that does not contain any real insns
is not scheduled and the dfa state at the entry of the bb is not copied
to the fallthru basic block. However a DEBUG insn is treated as a real
insn, and if a bb contains non-real insns and a DEBUG insn, it's dfa
state is copied to the fallthru bb. This was resulting in
-fcompare-debug failure as the incoming dfa state of the fallthru block
is different with -g. We should always copy the dfa state of a bb to
it's fallthru bb even if the bb does not contain real insns.

2022-08-22  Surya Kumari Jangala  

gcc/
PR rtl-optimization/105586
* sched-rgn.cc (schedule_region): Always copy dfa state to
fallthru block.

gcc/testsuite/
PR rtl-optimization/105586
* gcc.target/powerpc/pr105586.c: New test.


diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index 0dc2a8f2851..6c9202c0b2b 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -3082,6 +3082,24 @@ free_bb_state_array (void)
   bb_state = NULL;
 }
 
+static void
+save_state_for_fallthru_edge (basic_block last_bb, state_t state)
+{
+  edge f = find_fallthru_edge (last_bb->succs);
+  if (f
+  && (!f->probability.initialized_p ()
+ || (f->probability.to_reg_br_prob_base () * 100
+ / REG_BR_PROB_BASE
+ >= param_sched_state_edge_prob_cutoff)))
+  {
+memcpy (bb_state[f->dest->index], state,
+   dfa_state_size);
+if (sched_verbose >= 5)
+  fprintf (sched_dump, "saving state for edge %d->%d\n",
+  f->src->index, f->dest->index);
+  }
+}
+
 /* Schedule a region.  A region is either an inner loop, a loop-free
subroutine, or a single basic block.  Each bb in the region is
scheduled after its flow predecessors.  */
@@ -3155,6 +3173,7 @@ schedule_region (int rgn)
   if (no_real_insns_p (head, tail))
{
  gcc_assert (first_bb == last_bb);
+ save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]);
  continue;
}
 
@@ -3173,26 +3192,13 @@ schedule_region (int rgn)
   curr_bb = first_bb;
   if (dbg_cnt (sched_block))
 {
- edge f;
  int saved_last_basic_block = last_basic_block_for_fn (cfun);
 
  schedule_block (_bb, bb_state[first_bb->index]);
  gcc_assert (EBB_FIRST_BB (bb) == first_bb);
  sched_rgn_n_insns += sched_n_insns;
  realloc_bb_state_array (saved_last_basic_block);
- f = find_fallthru_edge (last_bb->succs);
- if (f
- && (!f->probability.initialized_p ()
- || (f->probability.to_reg_br_prob_base () * 100
- / REG_BR_PROB_BASE
- >= param_sched_state_edge_prob_cutoff)))
-   {
- memcpy (bb_state[f->dest->index], curr_state,
- dfa_state_size);
- if (sched_verbose >= 5)
-   fprintf (sched_dump, "saving state for edge %d->%d\n",
-f->src->index, f->dest->index);
-   }
+ save_state_for_fallthru_edge (last_bb, curr_state);
 }
   else
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/pr105586.c 
b/gcc/testsuite/gcc.target/powerpc/pr105586.c
new file mode 100644
index 000..bd397f58bc0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr105586.c
@@ -0,0 +1,19 @@
+/* { dg-options "-mdejagnu-tune=power4 -O2 -fcompare-debug -fno-if-conversion 
-fno-guess-branch-probability" } */
+
+extern int bar(int i);
+
+typedef unsigned long u64;
+int g;
+
+__int128 h;
+
+void
+foo(int a, int b) {
+  int i;
+  char u8_1 = g, u8_3 = a;
+  u64 u64_1 = bar(0), u64_3 = u8_3 * u64_1;
+  __int128 u128_1 = h ^ __builtin_expect(i, 0);
+  u64 u64_4 = u64_1 << ((__builtin_sub_overflow_p(0, u8_1, (u64)0) ^ u128_1) & 
8);
+  u64 u64_r = b + u64_3 + u64_4;
+  bar((short)u64_r + u8_3);
+}


[PATCH] regrename: Fix -fcompare-debug issue in check_new_reg_p [PR105041]

2022-06-10 Thread Surya Kumari Jangala via Gcc-patches
regrename: Fix -fcompare-debug issue in check_new_reg_p [PR105041]

In check_new_reg_p, the nregs of a du chain is computed by obtaining the MODE
of the first element in the chain, and then calling hard_regno_nregs() with the
MODE. But the first element of the chain can be a DEBUG_INSN whose mode need
not be the same as the rest of the elements in the du chain. This
was resulting in fcompare-debug failure as check_new_reg_p was returning a
different result with -g for the same candidate register. We can instead obtain
nregs from the du chain itself.

2022-06-10  Surya Kumari Jangala  

gcc/
PR rtl-optimization/105041
* regrename.cc (check_new_reg_p): Use nregs value from du chain.

gcc/testsuite/
PR rtl-optimization/105041
* gcc.target/powerpc/pr105041.c: New test.


diff --git a/gcc/regrename.cc b/gcc/regrename.cc
index 10271e1..f651351 100644
--- a/gcc/regrename.cc
+++ b/gcc/regrename.cc
@@ -324,8 +324,7 @@ static bool
 check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
 class du_head *this_head, HARD_REG_SET this_unavailable)
 {
-  machine_mode mode = GET_MODE (*this_head->first->loc);
-  int nregs = hard_regno_nregs (new_reg, mode);
+  int nregs = this_head->nregs;
   int i;
   struct du_chain *tmp;
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr105041.c 
b/gcc/testsuite/gcc.target/powerpc/pr105041.c
new file mode 100644
index 000..89eed1c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr105041.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target be } */
+/* { dg-options "-m32 -mdejagnu-cpu=power4 -O2 -fcompare-debug 
-fharden-compares -frename-registers" } */
+
+double m;
+int n;
+
+unsigned int
+foo (unsigned int x, int y)
+{
+  long long int a = y, b = !a;
+  int c = 0;
+
+  if (b != x)
+while ((int) m == a)
+  {
+c = a;
+a = 0;
+  }
+
+  n = b = y;
+
+  return x + c;
+}


[COMMITTED] MAINTAINERS: Add myself for write after approval

2022-05-13 Thread Surya Kumari Jangala via Gcc-patches

2022-05-13  Surya Kumari Jangala 

    * MAINTAINERS: Add myself to write after approval.

diff --git a/MAINTAINERS b/MAINTAINERS
index a1b84ac5646..8bca7a636b7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -464,6 +464,7 @@ Daniel Jacobowitz 
 Andreas Jaeger 
 Harsha Jagasia 
 Fariborz Jahanian 
+Surya Kumari Jangala 
 Qian Jianhua 
 Janis Johnson 
 Teresa Johnson