Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-16 Thread luoxhu via Gcc-patches
On 2020/9/15 14:51, Richard Biener wrote: >> I only see VAR_DECL and PARM_DECL, is there any function to check the tree >> variable is global? I added DECL_REGISTER, but the RTL still expands to >> stack: > > is_global_var () or alternatively !auto_var_in_fn_p (), I think doing > IFN_SET

Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-14 Thread luoxhu via Gcc-patches
On 2020/9/14 17:47, Richard Biener wrote: On Mon, Sep 14, 2020 at 10:05 AM luoxhu wrote: Not sure whether this reflects the issues you discussed above. I constructed below test cases and tested with and without this patch, only if "a+c"(which means store only), the performance

Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-14 Thread luoxhu via Gcc-patches
On 2020/9/10 18:08, Richard Biener wrote: > On Wed, Sep 9, 2020 at 6:03 PM Segher Boessenkool > wrote: >> >> On Wed, Sep 09, 2020 at 04:28:19PM +0200, Richard Biener wrote: >>> On Wed, Sep 9, 2020 at 3:49 PM Segher Boessenkool >>> wrote: Hi! On Tue, Sep 08, 2020 at

Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-08 Thread luoxhu via Gcc-patches
On 2020/9/8 16:26, Richard Biener wrote: >> Seems not only pseudo, for example "v = vec_insert (i, v, n);" >> the vector variable will be store to stack first, then [r112:DI] is a >> memory here to be processed. So the patch loads it from stack(insn #10) to >> temp vector register first, and

Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-08 Thread luoxhu via Gcc-patches
Hi Richi, On 2020/9/7 19:57, Richard Biener wrote: > + if (TREE_CODE (to) == ARRAY_REF) > + { > + tree op0 = TREE_OPERAND (to, 0); > + if (TREE_CODE (op0) == VIEW_CONVERT_EXPR > + && expand_view_convert_to_vec_set (to, from, to_rtx)) > + { > +

[PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-06 Thread luoxhu via Gcc-patches
Hi, On 2020/9/4 18:23, Segher Boessenkool wrote: diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c index 03b00738a5e..00c65311f76 100644 --- a/gcc/config/rs6000/rs6000-c.c +++ b/gcc/config/rs6000/rs6000-c.c /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2)

Re: [PATCH] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-04 Thread luoxhu via Gcc-patches
On 2020/9/4 15:23, Richard Biener wrote: > On Fri, Sep 4, 2020 at 9:19 AM Richard Biener > wrote: >> >> On Fri, Sep 4, 2020 at 8:38 AM luoxhu wrote: >>> >>> >>> >>> On 2020/9/4 14:16, luoxhu via Gcc-patches wrote: >>>

Re: [PATCH] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-04 Thread luoxhu via Gcc-patches
On 2020/9/4 14:16, luoxhu via Gcc-patches wrote: Hi, Yes, I checked and found that both vec_set and vec_extract doesn't support variable index for most targets, store_bit_field_1 and extract_bit_field_1 would only consider use optabs when index is integer value. Anyway, it shouldn't

Re: [PATCH] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-04 Thread luoxhu via Gcc-patches
Hi, On 2020/9/3 18:29, Richard Biener wrote: > On Thu, Sep 3, 2020 at 11:20 AM luoxhu wrote: >> >> >> >> On 2020/9/2 17:30, Richard Biener wrote: >>>> so maybe bypass convert_vector_to_array_for_subscript for special >>>> circumstance >>&

Re: [PATCH] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-03 Thread luoxhu via Gcc-patches
On 2020/9/2 17:30, Richard Biener wrote: >> so maybe bypass convert_vector_to_array_for_subscript for special >> circumstance >> like "i = v[n%4]" or "v[n&3]=i" to generate vec_extract or vec_insert builtin >> call a relative simpler method? > I think you have it backward. You need to work

Re: [PATCH] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-02 Thread luoxhu via Gcc-patches
Hi, On 2020/9/1 21:07, Richard Biener wrote: > On Tue, Sep 1, 2020 at 10:11 AM luoxhu via Gcc-patches > wrote: >> >> Hi, >> >> On 2020/9/1 01:04, Segher Boessenkool wrote: >>> Hi! >>> >>> On Mon, Aug 31, 2020 at 04:06:47AM -0500, Xion

Re: [PATCH] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-01 Thread luoxhu via Gcc-patches
Hi, On 2020/9/1 00:47, will schmidt wrote: >> + tmode = TYPE_MODE (TREE_TYPE (arg0)); >> + mode1 = TYPE_MODE (TREE_TYPE (TREE_TYPE (arg0))); >> + mode2 = TYPE_MODE ((TREE_TYPE (arg2))); >> + gcc_assert (VECTOR_MODE_P (tmode)); >> + >> + op0 = expand_expr (arg0, NULL_RTX, tmode,

Re: [PATCH] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-01 Thread luoxhu via Gcc-patches
Hi, On 2020/9/1 01:04, Segher Boessenkool wrote: > Hi! > > On Mon, Aug 31, 2020 at 04:06:47AM -0500, Xiong Hu Luo wrote: >> vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value >> to be insert, arg2 is the place to insert arg1 to arg0. This patch adds >>

Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-08-14 Thread luoxhu via Gcc-patches
Hi, On 2020/8/13 20:52, Jan Hubicka wrote: >> Since there are no other callers outside of these specialized nodes, the >> guessed profile count should be same equal? Perf tool shows that even >> each specialized node is called only once, none of them take same time for >> each call: >> >>

Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-08-13 Thread luoxhu via Gcc-patches
Hi, On 2020/8/13 01:53, Jan Hubicka wrote: > Hello, > with Martin we spent some time looking into exchange2 and my > understanding of the problem is the following: > > There is the self recursive function digits_2 with the property that it > has 10 nested loops and calls itself from the

Re: [PATCH v5] dse: Remove partial load after full store for high part access[PR71309]

2020-08-05 Thread luoxhu via Gcc-patches
Hi Richard, On 2020/8/3 22:01, Richard Sandiford wrote: /* Try a wider mode if truncating the store mode to NEW_MODE requires a real instruction. */ if (maybe_lt (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (store_mode)) @@ -1779,6 +1780,25 @@ find_shift_sequence

Re: [PATCH v5] dse: Remove partial load after full store for high part access[PR71309]

2020-08-03 Thread luoxhu via Gcc-patches
On 2020/8/3 22:01, Richard Sandiford wrote: /* Try a wider mode if truncating the store mode to NEW_MODE requires a real instruction. */ if (maybe_lt (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (store_mode)) @@ -1779,6 +1780,25 @@ find_shift_sequence (poly_int64

[PATCH v5] dse: Remove partial load after full store for high part access[PR71309]

2020-08-03 Thread luoxhu via Gcc-patches
Thanks, the v5 update as comments: 1. Move const_rhs shift out of loop; 2. Iterate from int size for read_mode. This patch could optimize(works for char/short/int/void*): 6: r119:TI=[r118:DI+0x10] 7: [r118:DI]=r119:TI 8: r121:DI=[r118:DI+0x8] => 6: r119:TI=[r118:DI+0x10] 16:

Re: [PATCH v4] dse: Remove partial load after full store for high part access[PR71309]

2020-07-28 Thread luoxhu via Gcc-patches
Gentle ping in case this mail is missed, Thanks :) https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550602.html Xionghu On 2020/7/24 18:47, luoxhu via Gcc-patches wrote: Hi Richard, This is the updated version that could pass all regression test on Power9-LE. Just need another "may

[PATCH v4] dse: Remove partial load after full store for high part access[PR71309]

2020-07-24 Thread luoxhu via Gcc-patches
Hi Richard, This is the updated version that could pass all regression test on Power9-LE. Just need another "maybe_lt (GET_MODE_SIZE (new_mode), access_size)" before generating shift for store_info->const_rhs to ensure correct constant is generated, take testsuite/gfortran1/equiv_2.x for

Re: [PATCH v3] dse: Remove partial load after full store for high part access[PR71309]

2020-07-23 Thread luoxhu via Gcc-patches
On 2020/7/23 04:30, Richard Sandiford wrote: > > I now realise the reason is that the starting mode is too wide. > I think we should fix that by doing: > >FOR_EACH_MODE_IN_CLASS (new_mode_iter, MODE_INT) > { >… > > and then add: > >if (maybe_lt (GET_MODE_SIZE

Re: [PATCH v3] dse: Remove partial load after full store for high part access[PR71309]

2020-07-22 Thread luoxhu via Gcc-patches
Hi, On 2020/7/22 19:05, Richard Sandiford wrote: > This wasn't really what I meant. Using subregs is fine, but I was > thinking of: > >/* Also try a wider mode if the necessary punning is either not >desirable or not possible. */ >if (!CONSTANT_P (store_info->rhs) >

Re: [PATCH v2] dse: Remove partial load after full store for high part access[PR71309]

2020-07-22 Thread luoxhu via Gcc-patches
Hi, On 2020/7/21 23:30, Richard Sandiford wrote: > Xiong Hu Luo writes:>> @@ -1872,9 +1872,27 @@ > get_stored_val (store_info *store_info, machine_mode read_mode, >> { >> poly_int64 shift = gap * BITS_PER_UNIT; >> poly_int64 access_size = GET_MODE_SIZE (read_mode) + gap;

Re: [PATCH] rs6000: Define movsf_from_si2 to extract high part SF element from DImode[PR89310]

2020-07-20 Thread luoxhu via Gcc-patches
On 2020/7/20 23:31, Segher Boessenkool wrote: On Mon, Jul 13, 2020 at 02:30:28PM +0800, luoxhu wrote: For extracting high part element from DImode register like: {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;} split it before reload with "and mask" to avoid generating sh

Re: [PATCH] rs6000: Define movsf_from_si2 to extract high part SF element from DImode[PR89310]

2020-07-14 Thread luoxhu via Gcc-patches
I re-run these cases on Power8-LE, and confirmed these could pass, what is your platform please? BTW, TARGET_NO_SF_SUBREG ensured TARGET_POWERPC64 for this define_insn_and_split. Thanks. Xionghu > > Thanks, David > > On Mon, Jul 13, 2020 at 2:30 AM luoxhu wrote: >> >> Hi,

Re: [PATCH] rs6000: Define movsf_from_si2 to extract high part SF element from DImode[PR89310]

2020-07-13 Thread luoxhu via Gcc-patches
Hi, On 2020/7/11 08:54, Segher Boessenkool wrote: > Hi! > > On Fri, Jul 10, 2020 at 09:39:40AM +0800, luoxhu wrote: >> OK, seems the md file needs a format tool too... > > Heh. Just make sure it looks good (that is, does what it looks like), > looks like the res

Re: [PATCH 2/2] rs6000: Define define_insn_and_split to split unspec sldi+or to rldimi

2020-07-12 Thread luoxhu via Gcc-patches
On 2020/7/11 08:28, Segher Boessenkool wrote: Hi! On Thu, Jul 09, 2020 at 09:14:45PM -0500, Xiong Hu Luo wrote: * config/rs6000/rs6000.md (rotl_unspec): New define_insn_and_split. +; rldimi with UNSPEC_SI_FROM_SF. +(define_insn_and_split "*rotl_unspec" Please have

Re: [PATCH] rs6000: Split movsf_from_si from high word before reload[PR89310]

2020-07-10 Thread luoxhu via Gcc-patches
On 2020/7/10 03:25, Segher Boessenkool wrote: > >> + "TARGET_NO_SF_SUBREG" >> + "#" >> + "&& vsx_reg_sfsubreg_ok (operands[0], SFmode)" > > Put this in the insn condition? And since this is just a predicate, > you can just use it instead of gpc_reg_operand. > > (The split condition

Re: [PATCH 1/2] rs6000: Init V4SF vector without converting SP to DP

2020-07-09 Thread luoxhu via Gcc-patches
Update patch to keep the logic for non TARGET_P8_VECTOR targets. Please ignore the previous [PATCH 1/2], Sorry! Move V4SF to V4SI, init vector like V4SI and move to V4SF back. Better instruction sequence could be generated on Power9: lfs + xxpermdi + xvcvdpsp + vmrgew => lwz + (sldi + or) +

Re: [PATCH] rs6000: Define movsf_from_si2 to extract high part SF element from DImode[PR89310]

2020-07-09 Thread luoxhu via Gcc-patches
Hi, On 2020/7/10 03:25, Segher Boessenkool wrote: > Hi! > > On Thu, Jul 09, 2020 at 11:09:42AM +0800, luoxhu wrote: >>> Maybe change it back to just SI? It won't match often at all for QI or >>> HI anyway, it seems. Sorry for that detour. Should be good wi

Re: [PATCH] rs6000: Split movsf_from_si from high word before reload[PR89310]

2020-07-08 Thread luoxhu via Gcc-patches
On 2020/7/9 06:43, Segher Boessenkool wrote: > Hi! > > On Wed, Jul 08, 2020 at 11:19:21AM +0800, luoxhu wrote: >> For extracting high part element from DImode register like: >> >> {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;} >> >> sp

Re: [PATCH] rs6000: Split movsf_from_si from high word before reload[PR89310]

2020-07-07 Thread luoxhu via Gcc-patches
On 2020/7/8 05:31, Segher Boessenkool wrote: > Hi! > > On Tue, Jul 07, 2020 at 04:39:58PM +0800, luoxhu wrote: >>> Lots of questions, sorry! >> >> Thanks for the nice suggestions of the initial patch contains many issues:), > > Pretty much all of it sh

Re: [PATCH] rs6000: Split movsf_from_si from high word before reload[PR89310]

2020-07-07 Thread luoxhu via Gcc-patches
On 2020/7/7 08:18, Segher Boessenkool wrote: > Hi! > > On Sun, Jul 05, 2020 at 09:17:57PM -0500, Xionghu Luo wrote: >> For extracting high part element from DImode register like: >> >> {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;} >> >> split it before reload with "and mask" to avoid

Ping^1 : [PATCH] [stage1] ipa-cp: Fix PGO regression caused by r278808

2020-06-15 Thread luoxhu via Gcc-patches
Gentle ping... On 2020/6/1 09:45, Xionghu Luo wrote: resend the patch for stage1: https://gcc.gnu.org/pipermail/gcc-patches/2020-January/538186.html The performance of exchange2 built with PGO will decrease ~28% by r278808 due to profile count set incorrectly. The cloned nodes are updated to

Re: [PATCH] rs6000: Use REAL_TYPE to copy when block move array in structure[PR65421]

2020-06-08 Thread luoxhu via Gcc-patches
Hi, On 2020/6/3 04:32, Segher Boessenkool wrote: > Hi Xiong Hu, > > On Tue, Jun 02, 2020 at 04:41:50AM -0500, Xionghu Luo wrote: >> Double array in structure as function arguments or return value is accessed >> by BLKmode, they are stored to stack and load from stack with redundant >> conversion

Re: [PATCH v2] Fold (add -1; zero_ext; add +1) operations to zero_ext when not overflow (PR37451, part of PR61837)

2020-05-13 Thread luoxhu via Gcc-patches
On 2020/5/13 02:24, Richard Sandiford wrote: > luoxhu writes: >> + /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e: >> + >> + 73: r145:SI=r123:DI#0-0x1 >> + 74: r144:DI=zero_extend (r145:SI) >> + 75: r143:DI=r144:DI+0x1 >> +

[PATCH v2] Fold (add -1; zero_ext; add +1) operations to zero_ext when not overflow (PR37451, part of PR61837)

2020-05-12 Thread luoxhu via Gcc-patches
Minor refine of checking iterations nonoverflow and a testcase for stage 1. This "subtract/extend/add" existed for a long time and still annoying us (PR37451, part of PR61837) when converting from 32bits to 64bits, as the ctr register is used as 64bits on powerpc64, Andraw Pinski had a patch but

Re: [PATCH v2] Add handling of MULT_EXPR/PLUS_EXPR for wrapping overflow in affine combination(PR83403)

2020-05-11 Thread luoxhu via Gcc-patches
在 2020-05-06 20:09,Richard Biener 写道: On Thu, 30 Apr 2020, luoxhu wrote: Update the patch with overflow check. Bootstrap and regression tested PASS on Power8-LE. Use determine_value_range to get value range info for fold convert expressions with internal operation PLUS_EXPR/MINUS_EXPR

[PATCH v2] Add handling of MULT_EXPR/PLUS_EXPR for wrapping overflow in affine combination(PR83403)

2020-04-30 Thread luoxhu via Gcc-patches
Update the patch with overflow check. Bootstrap and regression tested PASS on Power8-LE. Use determine_value_range to get value range info for fold convert expressions with internal operation PLUS_EXPR/MINUS_EXPR/MULT_EXPR when not overflow on wrapping overflow inner type. i.e.: (long

Re: [PATCH] Add value range info for affine combination to improve store motion (PR83403)

2020-04-29 Thread luoxhu via Gcc-patches
On 2020/4/28 18:30, Richard Biener wrote: > > OK, I guess instead of get_range_info expr_to_aff_combination could > simply use determine_value_range (op0, , ) == VR_RANGE > (the && TREE_CODE (op0) == SSA_NAME check can then be removed)? > Tried with determine_value_range, it works and is

Re: [PATCH] Add value range info for affine combination to improve store motion (PR83403)

2020-04-28 Thread luoxhu via Gcc-patches
On 2020/4/28 15:01, Richard Biener wrote: > On Tue, 28 Apr 2020, Xionghu Luo wrote: > >> From: Xionghu Luo >> >> Get and propagate value range info to convert expressions with convert >> operation on PLUS_EXPR/MINUS_EXPR/MULT_EXPR when not overflow. i.e.: >> >> (long unsigned int)((unsigned

Re: [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837)

2020-04-20 Thread luoxhu via Gcc-patches
Tiny update to accommodate unsigned int compare. On 2020/4/20 16:21, luoxhu via Gcc-patches wrote: Hi, On 2020/4/18 00:32, Segher Boessenkool wrote: On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote: On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote: luoxhu

Re: [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837)

2020-04-20 Thread luoxhu via Gcc-patches
Hi, On 2020/4/18 00:32, Segher Boessenkool wrote: > On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote: >> On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote: >>> luoxhu--- via Gcc-patches writes: >>>> -count = simplify_gen_binary

Re: [PATCH v2] rs6000: Don't use HARD_FRAME_POINTER_REGNUM if it's not live in pro_and_epilogue (PR91518)

2020-04-16 Thread luoxhu via Gcc-patches
On 2020/4/17 08:52, Segher Boessenkool wrote: > Hi! > > On Mon, Apr 13, 2020 at 10:11:43AM +0800, luoxhu wrote: >> frame_pointer_needed is set to true in reload pass setup_can_eliminate, >> but regs_ever_live[31] is false, pro_and_epilogue uses it without live >>

[PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837)

2020-04-15 Thread luoxhu--- via Gcc-patches
From: Xionghu Luo This "subtract/extend/add" existed for a long time and still annoying us (PR37451, PR61837) when converting from 32bits to 64bits, as the ctr register is used as 64bits on powerpc64, Andraw Pinski had a patch but caused some issue and reverted by Joseph S. Myers(PR37451,

[PATCH v2] rs6000: Don't use HARD_FRAME_POINTER_REGNUM if it's not live in pro_and_epilogue (PR91518)

2020-04-12 Thread luoxhu via Gcc-patches
This bug is exposed by FRE refactor of r263875. Comparing the fre dump file shows no obvious change of the segment fault function proves it to be a target issue. frame_pointer_needed is set to true in reload pass setup_can_eliminate, but regs_ever_live[31] is false, pro_and_epilogue uses it

Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization

2020-04-03 Thread luoxhu via Gcc-patches
On 2020/4/3 06:16, Segher Boessenkool wrote: > Hi! > > On Mon, Mar 30, 2020 at 11:59:57AM +0800, luoxhu wrote: >>> Do we want something later in the RTL pipeline to make "addi"s etc. again? > > (This would be a good thing to consider -- maybe a define_in

Re: [PATCH] rs6000: Save/restore r31 if frame_pointer_needed is true

2020-03-29 Thread luoxhu via Gcc-patches
On 2020/3/28 00:04, Segher Boessenkool wrote: Hi! On Fri, Mar 27, 2020 at 09:34:00AM +0800, luoxhu wrote: On 2020/3/27 07:59, Segher Boessenkool wrote: On Wed, Mar 25, 2020 at 11:15:22PM -0500, luo...@linux.ibm.com wrote: frame_pointer_needed is set to true in reload pass

Re: [PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization

2020-03-29 Thread luoxhu via Gcc-patches
On 2020/3/27 22:33, Segher Boessenkool wrote: > Hi! > > On Thu, Mar 26, 2020 at 05:06:43AM -0500, luo...@linux.ibm.com wrote: >> Remove split code from add3 to allow a later pass to split. >> This allows later logic to hoist out constant load in add instructions. >> In loop, lis+ori could be

Re: [PATCH] rs6000: Save/restore r31 if frame_pointer_needed is true

2020-03-26 Thread luoxhu via Gcc-patches
On 2020/3/27 07:59, Segher Boessenkool wrote: > Hi! > > On Wed, Mar 25, 2020 at 11:15:22PM -0500, luo...@linux.ibm.com wrote: >> frame_pointer_needed is set to true in reload pass setup_can_eliminate, >> but regs_ever_live[31] is false, so pro_and_epilogue doesn't save/restore >> r31 even it

[PATCH] rs6000: Don't split constant oprator when add, move to temp register for future optimization

2020-03-26 Thread luoxhu--- via Gcc-patches
From: Xionghu Luo Remove split code from add3 to allow a later pass to split. This allows later logic to hoist out constant load in add instructions. In loop, lis+ori could be hoisted out to improve performance compared with previous addis+addi (About 15% on typical case), weak point is one more

[PATCH] rs6000: Save/restore r31 if frame_pointer_needed is true

2020-03-25 Thread luoxhu--- via Gcc-patches
From: Xionghu Luo This P1 bug is exposed by FRE refactor of r263875. Comparing the fre dump file shows no obvious change of the segment fault function proves it to be a target issue. frame_pointer_needed is set to true in reload pass setup_can_eliminate, but regs_ever_live[31] is false, so

Re: [PATCH] Backport to gcc-9: PR92398: Fix testcase failure of pr72804.c

2020-03-09 Thread luoxhu
On 2020/3/10 05:28, Segher Boessenkool wrote: On Thu, Mar 05, 2020 at 02:21:58AM -0600, luo...@linux.ibm.com wrote: From: Xionghu Luo Backport the patch to fix failures on P9 and P8BE, P7LE for PR94036. No changes were needed? Yes, no conflicts of the patch and instruction counts are

[PATCH] Backport to gcc-9: PR92398: Fix testcase failure of pr72804.c

2020-03-05 Thread luoxhu
From: Xionghu Luo Backport the patch to fix failures on P9 and P8BE, P7LE for PR94036. Tested pass on P9/P8/P7, ok to commit? (gcc-8 is not needed as the test doesn't exists.) P9LE generated instruction is not worse than P8LE. mtvsrdd;xxlnot;stxv vs. not;not;std;std. It can have longer latency,

*Ping^1* [PATCH v3] ipa-cp: Fix PGO regression caused by r278808

2020-03-03 Thread luoxhu
341 Author: hubicka Date: Thu Nov 28 14:16:29 2019 + * ipa-cp.c (update_profiling_info): Fix scaling. Fix v3 patch and logs are here: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00764.html Thanks Xionghu On 2020/1/14 14:45, luoxhu wrote: > Hi, > > On 2020/1/3 00:58

Re: [RFC] Run store-merging pass once more before pass fre/pre

2020-02-26 Thread luoxhu
On 2020/2/18 17:57, Richard Biener wrote: > On Tue, 18 Feb 2020, Xionghu Luo wrote: > >> Store-merging pass should run twice, the reason is pass fre/pre will do >> some kind of optimizations to instructions by: >>1. Converting the load from address to load from function arguments >>

Ping^1: [PATCH v3] ipa-cp: Fix PGO regression caused by r278808

2020-02-09 Thread luoxhu
Ping, attachment of https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00764/exchange2.tar.gz shows the profile count difference on cloned nodes digits_2.constprop.[0...8] without/with this patch. Thanks! Xiong Hu On 2020/1/14 14:45, luoxhu wrote: > Hi, > > On 2020/1/3 00:58, Jan Hubi

Re: [PATCH] Add Optimization for various IPA parameters.

2020-01-13 Thread luoxhu
On 2020/1/11 20:20, Tamar Christina wrote: Hi Martin, This change (r280099) is causing a major performance regression on exchange2 in SPEC2017 dropping the benchmark by more than 30%. It seems the parameters no longer do anything. i.e. -flto --param ipa-cp-eval-threshold=1 --param

Re: [PATCH v7] Missed function specialization + partial devirtualization

2020-01-12 Thread luoxhu
On 2020/1/10 19:08, Jan Hubicka wrote: > OK. You will need to do the obvious updates for Martin's patch > which turned some member functions into static functions. > > Honza Thanks a lot! Rebased & updated, will commit below patch shortly when git push is ready. v8: 1. Rebase to master with

Re: [PATCH] Use cgraph_node::dump_{asm_},name where possible.

2020-01-08 Thread luoxhu
On 2020/1/8 22:54, Martin Liška wrote: diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index bd44063a1ac..789564ba335 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -1148,8 +1148,7 @@ symbol_table::materialize_all_clones (void) if (symtab->dump_file)

Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

2020-01-07 Thread luoxhu
On 2020/1/7 23:40, Jan Hubicka wrote: >> >> >> On 2020/1/7 16:40, Jan Hubicka wrote: On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote: > Inline should return failure either (newsize > param_large_function_insns) > OR (newsize > limit). Sometimes newsize is larger than >

Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

2020-01-07 Thread luoxhu
On 2020/1/7 16:40, Jan Hubicka wrote: >> On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote: >>> Inline should return failure either (newsize > param_large_function_insns) >>> OR (newsize > limit). Sometimes newsize is larger than >>> param_large_function_insns, but smaller than limit,

Re: [PATCH] ipa-inline: Adjust condition for caller_growth_limits

2020-01-06 Thread luoxhu
On 2020/1/7 02:01, Jeff Law wrote: On Mon, 2020-01-06 at 01:03 -0600, Xiong Hu Luo wrote: Inline should return failure either (newsize > param_large_function_insns) OR (newsize > limit). Sometimes newsize is larger than param_large_function_insns, but smaller than limit, inline doesn't return

Re: [PATCH v2] ipa-cp: Fix PGO regression caused by r278808

2019-12-30 Thread luoxhu
_checking_assert (src_val); } } XiongHu Feng ________ From: luoxhu Sent: Monday, December 30, 2019 4:11 PM To: Jan Hubicka; Martin Jambor Cc: Martin Liška; gcc-patches@gcc.gnu.org; seg...@kernel.crashing.org; wschm...@linux.ibm.com; guoji...@l

[PATCH v2] ipa-cp: Fix PGO regression caused by r278808

2019-12-30 Thread luoxhu
v2 Changes: 1. Enable proportion orig_sum to the new nodes for self recursive node: new_sum = (orig_sum + new_sum) \ * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth). 2. Add value range for param_ipa_cp_max_recursive_depth. The performance of exchange2 built with PGO

[PATCH v7] Missed function specialization + partial devirtualization

2019-12-25 Thread luoxhu
>>> profile_count indir_cnt = indirect->count; >>> indirect = indirect->clone (id->dst_node, call_stmt, >>> gimple_uid (stmt), >>> num, den, >>>

Re: [PATCH] [RFC] ipa: duplicate ipa_size_summary for cloned nodes

2019-12-18 Thread luoxhu
On 2019/12/18 23:48, Jan Hubicka wrote: >> The size_info of ipa_size_summary are created by r277424. It should be >> duplicated for cloned nodes, otherwise self_size and >> estimated_self_stack_size >> would be 0, causing param large-function-insns and large-function-growth >> working >>

*Ping* Re: [PATCH v6] Missed function specialization + partial devirtualization

2019-12-17 Thread luoxhu
Ping :) Patch is here: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00099.html On 2019/12/3 10:31, luoxhu wrote: Hi Martin and Honza, On 2019/11/18 21:02, Martin Liška wrote: On 11/16/19 10:59 AM, luoxhu wrote: Sorry that I don't quite understand your meanning here.  I didn't grep

Re: [RFC] ipa-cp: Fix PGO regression caused by r278808

2019-12-13 Thread luoxhu
Thanks Honza, On 2019/12/10 19:06, Jan Hubicka wrote: >> Hi, >> >> On Tue, Dec 10 2019, Jan Hubicka wrote: >>> Hi, >>> I think the updating should treat self recursive edges as loops: that is >>> calculate SUM of counts incomming edges which are not self recursive, >>> calculate probability of

[PATCH v6] Missed function specialization + partial devirtualization

2019-12-02 Thread luoxhu
Hi Martin and Honza, On 2019/11/18 21:02, Martin Liška wrote: > On 11/16/19 10:59 AM, luoxhu wrote: >> Sorry that I don't quite understand your meanning here.  I didn't grep the >> word "cgraph_edge_summary" in source code, do you mean add new structure > >

Ping: [PATCH] Add explicit description for -finline

2019-11-27 Thread luoxhu
On 2019/11/4 11:42, luoxhu wrote: On 2019/11/2 00:23, Joseph Myers wrote: On Thu, 31 Oct 2019, Xiong Hu Luo wrote: +@code{-finline} enables inlining of function declared \"inline\". +@code{-finline} is enabled at levels -O1, -O2, -O3 and -Os, but not -Og. Use @option{} to mark

Re: [PATCH] Fix two potential memory leak

2019-11-26 Thread luoxhu
Thanks, On 2019/11/26 18:15, Jan Hubicka wrote: >> Hi, >> >> On 2019/11/26 16:04, Jan Hubicka wrote: Summary variables should be deleted at the end of write_summary. It's first newed in generate_summary, and second newed in read_summary. Therefore, delete the first in

Re: [PATCH] Fix two potential memory leak

2019-11-26 Thread luoxhu
Hi, On 2019/11/26 16:04, Jan Hubicka wrote: Summary variables should be deleted at the end of write_summary. It's first newed in generate_summary, and second newed in read_summary. Therefore, delete the first in write_summary, delete the second in execute. gcc/ChangeLog: 2019-11-26

[PATCH] Fix two potential memory leak

2019-11-25 Thread luoxhu
Summary variables should be deleted at the end of write_summary. It's first newed in generate_summary, and second newed in read_summary. Therefore, delete the first in write_summary, delete the second in execute. gcc/ChangeLog: 2019-11-26 Luo Xiong Hu * ipa-pure-const.c

Re: [PATCH v3] PR92398: Fix testcase failure of pr72804.c

2019-11-24 Thread luoxhu
Hi, >> +++ b/gcc/testsuite/gcc.target/powerpc/pr72804-1.c > >> +/* store generates difference instructions as below: >> + P9: mtvsrdd;xxlnot;stxv. >> + P8/P7/P6 LE: not;not;std;std. >> + P8 BE: mtvsrd;mtvsrd;xxpermdi;xxlnor;stxvd2x. >> + P7/P6 BE: std;std;addi;lxvd2x;xxlnor;stxvd2x. */

Re: [PATCH v3] PR92398: Fix testcase failure of pr72804.c

2019-11-21 Thread luoxhu
Hi Segher, Update the code as you wish, Thanks: P9LE generated instruction is not worse than P8LE. mtvsrdd;xxlnot;stxv vs. not;not;std;std. Update the test case to fix failures. v4: Define and use check_effective_target_xxx etc. power9+: power9, power10 ... power8: power8 only.

Re: [PATCH v3] PR92398: Fix testcase failure of pr72804.c

2019-11-19 Thread luoxhu
P9LE generated instruction is not worse than P8LE. mtvsrdd;xxlnot;stxv vs. not;not;std;std. Update the test case to fix failures. v3: Define and use check_effective_target_xxx etc. pre_power8: ... power6, power7. power8: power8 only. post_power8: power8, power9 ... post_power9: power9, power10

Re: [PATCH v2] PR92398: Fix testcase failure of pr72804.c

2019-11-18 Thread luoxhu
Hi, On 2019/11/15 18:17, Segher Boessenkool wrote: > Hi! > > On Thu, Nov 14, 2019 at 09:12:32PM -0600, Xiong Hu Luo wrote: >> P9LE generated instruction is not worse than P8LE. >> mtvsrdd;xxlnot;stxv vs. not;not;std;std. >> Update the test case to fix failures. > > So this no longer runs it for

Re: Ping*2: [PATCH v5] Missed function specialization + partial devirtualization

2019-11-16 Thread luoxhu
Hi Thanks, On 2019/11/14 17:04, Jan Hubicka wrote: >> PR ipa/69678 >> * cgraph.c (symbol_table::create_edge): Init speculative_id. >> (cgraph_edge::make_speculative): Add param for setting speculative_id. >> (cgraph_edge::speculative_call_info): Find reference by >>

Re: [PATCH 1/2] Update iterator of next

2019-11-15 Thread luoxhu
On 2019/11/15 17:19, Jan Hubicka wrote: >> On Fri, Nov 15, 2019 at 9:10 AM Jan Hubicka wrote: >>> next is initialized only in the loop before, it is never updated in it's own loop. gcc/ChangeLog 2019-11-15 Xiong Hu Luo * ipa-inline.c

Re: [PATCH] PR92398: Fix testcase failure of pr72804.c

2019-11-14 Thread luoxhu
On 2019/11/15 11:12, Xiong Hu Luo wrote: P9LE generated instruction is not worse than P8LE. mtvsrdd;xxlnot;stxv vs. not;not;std;std. Update the test case to fix failures. gcc/testsuite/ChangeLog: 2019-11-15 Luo Xiong Hu testsuite/pr92398 *

Ping*2: [PATCH v5] Missed function specialization + partial devirtualization

2019-11-13 Thread luoxhu
Rebase to trunk including void gimple_ic_transform. This patch aims to fix PR69678 caused by PGO indirect call profiling performance issues. The bug that profiling data is never working was fixed by Martin's pull back of topN patches, performance got GEOMEAN ~1% improvement(+24% for 511.povray_r

[PATCH] Fix copy-paste typo syntax error by r277872

2019-11-06 Thread luoxhu
Tested pass and committed to r277904. gcc/testsuite/ChangeLog: 2019-11-07 Xiong Hu Luo * gcc.target/powerpc/pr72804.c: Move inline options from dg-require-effective-target to dg-options. --- gcc/testsuite/gcc.target/powerpc/pr72804.c | 4 ++-- 1 file changed, 2

Re: [PATCH v2] PR92090: Fix testcase failures by r276469

2019-11-05 Thread luoxhu
On 2019/11/6 02:20, Joseph Myers wrote: > On Tue, 5 Nov 2019, Kewen.Lin wrote: > >> Very good point! Since gcc doesn't pursue 100% testsuite pass rate, I >> noticed >> there are a few failures exposed/caused by some PRs all the time. Could we >> just leave the test case there without any pre

Ping: [PATCH v5] Missed function specialization + partial devirtualization

2019-11-05 Thread luoxhu
On 2019/10/22 22:07, Martin Liška wrote: On 9/27/19 9:13 AM, luoxhu wrote: Thanks for your time of so many round of reviews. You're welcome. One last request would be please to make gimple_ic_transform a void function. See attached patch. I'll remind the patch today to Honza. Thanks

Re: [PATCH v3] PR92090: Fix testcase failures by r276469

2019-11-04 Thread luoxhu
Hi, On 2019/11/5 06:57, Joseph Myers wrote: > On Mon, 4 Nov 2019, luoxhu wrote: > >> -finline-functions is enabled by default for O2 since r276469, update the >> test cases with -fno-inline-functions. >> >> v2: disable inlining for the failed cases. Add two more fa

Re: [PATCH] Add explicit description for -finline

2019-11-03 Thread luoxhu
On 2019/11/2 00:23, Joseph Myers wrote: > On Thu, 31 Oct 2019, Xiong Hu Luo wrote: > >> +@code{-finline} enables inlining of function declared \"inline\". >> +@code{-finline} is enabled at levels -O1, -O2, -O3 and -Os, but not -Og. > > Use @option{} to mark up option names (both -finline and all

[PATCH v2] PR92090: Fix testcase failures by r276469

2019-11-03 Thread luoxhu
-finline-functions is enabled by default for O2 since r276469, update the test cases with -fno-inline-functions. v2: disable inlining for the failed cases. Add two more failed cases not listed in BZ. Tested on P8LE, P8BE and P9LE. gcc/testsuite/ChangeLog: 2019-10-30 Xiong Hu Luo

Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-10-23 Thread luoxhu
Hi, On 2019/10/17 16:23, Feng Xue OS wrote: > IPA does not allow constant propagation on parameter that is used to control > function recursion. > > recur_fn (i) > { >if ( !terminate_recursion (i)) > { >... >recur_fn (i + 1); >... > } >... > } > > This

Re: Ping: [PATCH V4] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-10-23 Thread luoxhu
Hi Feng, Thanks for the patch. It works for me as expected. I am not a reviewer, just tiny comment after tried. This is quite a good case for newbies to go through the ipa-cp pass. Is it necessary to update the test case a bit as attached to include more circumstances for callee's aggregate

Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-10-17 Thread luoxhu
Hi Feng, On 2019/10/17 16:23, Feng Xue OS wrote: > IPA does not allow constant propagation on parameter that is used to control > function recursion. > > recur_fn (i) > { >if ( !terminate_recursion (i)) > { >... >recur_fn (i + 1); >... > } >... > } > >

Ping: [PATCH v5] Missed function specialization + partial devirtualization

2019-10-15 Thread luoxhu
Ping: Attachment: v5-0001-Missed-function-specialization-partial-devirtuali.patch: https://gcc.gnu.org/ml/gcc-patches/2019-09/txtuTT17jV7n5.txt Thanks, Xiong Hu On 2019/9/27 15:13, luoxhu wrote: Hi Martin, Thanks for your time of so many round of reviews. It really helped me a lot

Re: [PATCH] Fix dump message issue

2019-10-13 Thread luoxhu
On 2019/10/14 00:32, Jeff Law wrote: > On 10/8/19 4:45 AM, Martin Jambor wrote: >> Hi, >> >> On Tue, Oct 08 2019, luoxhu wrote: >>> '}' is missed at the end. >> >> heh, yeah, I wonder for how long. >> >> If it irritates you, I'd say the

[PATCH] Fix dump message issue

2019-10-08 Thread luoxhu
'}' is missed at the end. gcc/ChangeLog: tree-sra.c (dump_access): Add missing braces. --- gcc/tree-sra.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 48589323a1e..cb59b91f20e 100644 --- a/gcc/tree-sra.c +++

Re: [PATCH] Come up with ipa passes introduction in gccint documentation

2019-10-08 Thread luoxhu
Hi, This is the formal documentation patch for IPA passes. Thanks. None of the IPA passes are documented in passes.texi. This patch adds a section IPA passes just before GIMPLE passes and RTL passes in Chapter 9 "Passes and Files of the Compiler". Also, a short description for each IPA pass

Re: [RFC] Come up with ipa passes introduction in gccint documentation

2019-09-29 Thread luoxhu
Hi Segher, On 2019/9/30 00:17, Segher Boessenkool wrote: > Hi! > > Just some editorial comments... The idea of the patch is fine IMHO. > (I am not maintainer of this, take all my comments for what they are). > > On Sun, Sep 29, 2019 at 02:56:37AM -0500, Xiong Hu Luo wrote: >> To simplify

Re: [PATCH v5] Missed function specialization + partial devirtualization

2019-09-27 Thread luoxhu
Hi Martin, Thanks for your time of so many round of reviews. It really helped me a lot. Updated with your comments and attached for Honza's review and approve. :) Xiong Hu BR On 2019/9/26 16:36, Martin Liška wrote: On 9/26/19 7:23 AM, luoxhu wrote: Thanks Martin, On 2019/9/25 18:57

Re: [PATCH v4] Missed function specialization + partial devirtualization

2019-09-25 Thread luoxhu
Thanks Martin, On 2019/9/25 18:57, Martin Liška wrote: On 9/25/19 5:45 AM, luoxhu wrote: Hi, Sorry for replying so late due to cauldron conference and other LTO issues I was working on. Hello. That's fine, we still have plenty of time for patch review. Not fixed issues which I reported

[PATCH v4] Missed function specialization + partial devirtualization

2019-09-24 Thread luoxhu
Hi, Sorry for replying so late due to cauldron conference and other LTO issues I was working on. v4 Changes: 1. Rebase to trunk. 2. Remove num_of_ics and use vector's length to avoid redundancy. 3. Update the code in ipa-profile.c to improve review feasibility. 4. Add function

[PATCH] Backport r274411 from trunk to gcc-9-branch

2019-08-25 Thread luoxhu
This is the backport patch to gcc-9-branch, please ignore the previous mail. Backport r274411 of "Enable math functions linking with static library for LTO" from mainline to gcc-9-branch. Bootstrapped/Regression-tested on Linux POWER8 LE. gcc/ChangeLog 2019-08-26 Xiong Hu Luo

  1   2   >