Re: [PATCH 0/7] Support vector load/store with length

2020-05-26 Thread Kewen.Lin via Gcc-patches
Hi Richi, on 2020/5/26 下午3:12, Richard Biener wrote: > On Tue, 26 May 2020, Kewen.Lin wrote: > >> Hi all, >> >> This patch set adds support for vector load/store with length, Power >> ISA 3.0 brings instructions lxvl/stxvl to perform vector load/store with >&g

[PATCH 7/7] rs6000/testsuite: Vector with length test cases

2020-05-26 Thread Kewen.Lin via Gcc-patches
gcc/testsuite/ChangeLog 2020-MM-DD Kewen Lin * gcc.target/powerpc/p9-vec-length-1.h: New test. * gcc.target/powerpc/p9-vec-length-2.h: New test. * gcc.target/powerpc/p9-vec-length-3.h: New test. * gcc.target/powerpc/p9-vec-length-4.h: New test. *

[PATCH 6/7] ivopts: Add handlings for vector with length IFNs

2020-05-25 Thread Kewen.Lin via Gcc-patches
gcc/ChangeLog 2020-MM-DD Kewen Lin * tree-ssa-loop-ivopts.c (get_mem_type_for_internal_fn): Handle IFN_LEN_LOAD and IFN_LEN_STORE. (get_alias_ptr_type_for_ptr_address): Likewise. --- gcc/tree-ssa-loop-ivopts.c | 4 1 file changed, 4 insertions(+) diff --git

[PATCH 5/7] vect: Support vector load/store with length in vectorizer

2020-05-25 Thread Kewen.Lin via Gcc-patches
gcc/ChangeLog 2020-MM-DD Kewen Lin * doc/invoke.texi (vect-with-length-scope): Document new option. * params.opt (vect-with-length-scope): New. * tree-vect-loop-manip.c (vect_set_loop_lens_directly): New function. (vect_set_loop_condition_len): Likewise.

[PATCH 4/7] hook/rs6000: Add vectorize length mode for vector with length

2020-05-25 Thread Kewen.Lin via Gcc-patches
gcc/ChangeLog: 2020-MM-DD Kewen Lin * config/rs6000/rs6000.c (TARGET_VECTORIZE_LENGTH_MODE): New macro. * doc/tm.texi: Regenerate. * doc/tm.texi.in: New hook. * target.def: Likewise. --- gcc/config/rs6000/rs6000.c | 3 +++ gcc/doc/tm.texi| 6

[PATCH 3/7] vect: Factor out codes for niters smaller than vf check

2020-05-25 Thread Kewen.Lin via Gcc-patches
gcc/ChangeLog: 2020-MM-DD Kewen Lin * tree-vect-loop.c (known_niters_smaller_than_vf): New function, factored out from ... (vect_analyze_loop_costing): ... here. --- gcc/tree-vect-loop.c | 31 ++- 1 file changed, 22 insertions(+), 9

[PATCH 2/7] rs6000: lenload/lenstore optab support

2020-05-25 Thread Kewen.Lin via Gcc-patches
gcc/ChangeLog: 2020-MM-DD Kewen Lin * config/rs6000/vsx.md (lenloaddi): New define_expand. (lenstoredi): Likewise. --- gcc/config/rs6000/vsx.md | 30 ++ 1 file changed, 30 insertions(+) diff --git a/gcc/config/rs6000/vsx.md

[PATCH 1/7] ifn/optabs: Support vector load/store with length

2020-05-25 Thread Kewen.Lin via Gcc-patches
gcc/ChangeLog: 2020-MM-DD Kewen Lin * doc/md.texi (lenload@var{m}@var{n}): Document. (lenstore@var{m}@var{n}): Likewise. * internal-fn.c (len_load_direct): New macro. (len_store_direct): Likewise. (expand_len_load_optab_fn): Likewise.

[PATCH 0/7] Support vector load/store with length

2020-05-25 Thread Kewen.Lin via Gcc-patches
Hi all, This patch set adds support for vector load/store with length, Power ISA 3.0 brings instructions lxvl/stxvl to perform vector load/store with length, it's good to be exploited for those cases we don't have enough stuffs to fill in the whole vector like epilogues. This support mainly

Re: [PATCH 2/2] rs6000: tune loop size for cunroll at O2

2020-05-19 Thread Kewen.Lin via Gcc-patches
Hi Jeff, on 2020/5/20 上午11:58, Jiufu Guo via Gcc-patches wrote: > Hi, > > This patch check the size of a loop to be unrolled/peeled completely, > and set the limits to a number (24). This prevents large loop from > being unrolled, then avoid binary size increasing, and this limit keeps >

Re: [PATCH 2/4 GCC11] Add target hook stride_dform_valid_p

2020-05-12 Thread Kewen.Lin via Gcc-patches
, Kewen on 2020/3/3 下午8:25, Kewen.Lin wrote: > Hi Richard, > > Thanks for your comments! It's a good idea to use param due to the > flexibility. And yes, it sounds good to have more targets to try and > make it better. But I have a bit concern on turning it on by default. >

[PATCH 3/4 V3 GCC11] IVOPTs Consider cost_step on different forms during unrolling

2020-05-12 Thread Kewen.Lin via Gcc-patches
estimate_unroll_factor, update consider_reg_offset_for_unroll_p. on 2020/2/25 下午5:48, Kewen.Lin wrote: > Hi, > > As the proposed hook changes, updated this with main changes: > 1) Check with addr_offset_valid_p instead. > 2) Check the 1st and the last use for the whol

Re: [PATCH, vect] Check alignment for no peeling gaps handling

2020-04-28 Thread Kewen.Lin via Gcc-patches
Hi, Gentle ping for this patch. https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543701.html BR, Kewen on 2020/4/10 下午5:28, Kewen.Lin via Gcc-patches wrote: > Hi, > > This is one fix following Richi's comments here: > https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542232

Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-15 Thread Kewen.Lin via Gcc-patches
on 2020/4/15 下午2:21, Richard Biener via Gcc-patches wrote: > On Wed, Apr 15, 2020 at 3:56 AM Jiufu Guo via Gcc-patches > wrote: >> >> Hi, >> >> As you may know, we have loop unroll pass in RTL which was introduced a few >> years ago, and works for a long time. Currently, this unroller is using

[PATCH, vect] Check alignment for no peeling gaps handling

2020-04-10 Thread Kewen.Lin via Gcc-patches
Hi, This is one fix following Richi's comments here: https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542232.html I noticed the current half vector support for no peeling gaps handled some cases which never check the half size vector support. By further investigation, those cases are safe

[PATCH, testsuite] Fix PR94079 by respecting vect_hw_misalign

2020-04-08 Thread Kewen.Lin via Gcc-patches
Hi, This is another vect case which requires special handling with vect_hw_misalign. The alignment of the second part requires misaligned vector access supports. This patch is to adjust the related guard condition and comments. Verified it on ppc64-redhat-linux (Power7 BE). Is it ok for

[PATCH] Fix PR94443 with gsi_insert_seq_before

2020-04-02 Thread Kewen.Lin via Gcc-patches
on 2020/4/2 上午6:51, H.J. Lu wrote: > > This caused: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94449 > Thanks for reporting this. The attached patch is to fix the stupid mistake by using gsi_insert_seq_before instead of gsi_insert_before. BTW, the regression testing on one x86_64

Re: [PATCH] Fix PR94401 by considering reverse overrun

2020-04-02 Thread Kewen.Lin via Gcc-patches
on 2020/4/2 下午5:21, Richard Biener wrote: > On Thu, Apr 2, 2020 at 9:15 AM Kewen.Lin wrote: >> >> Hi, >> >> The commit r10-7415 brings scalar type consideration >> to eliminate epilogue peeling for gaps, but it exposed >> one problem that the current handlin

Re: [PATCH] Fix PR94401 by considering reverse overrun

2020-04-02 Thread Kewen.Lin via Gcc-patches
Hi, on 2020/4/2 下午4:28, Jakub Jelinek wrote: > Hi! > > On Thu, Apr 02, 2020 at 03:15:42PM +0800, Kewen.Lin via Gcc-patches wrote: > > Just formatting nits, not commenting on what the actual patch does. > >> --- a/gcc/tree-vect-stmts.c >> +++ b/gcc/tree-vect-s

[PATCH] Fix PR94401 by considering reverse overrun

2020-04-02 Thread Kewen.Lin via Gcc-patches
Hi, The commit r10-7415 brings scalar type consideration to eliminate epilogue peeling for gaps, but it exposed one problem that the current handling doesn't consider the memory access type VMAT_CONTIGUOUS_REVERSE, for which the overrun happens on low address side. This patch is to make the

[PATCH] Fix PR94043 by making vect_live_op generate lc-phi

2020-03-30 Thread Kewen.Lin via Gcc-patches
Hi, As PR94043 shows, my commit r10-4524 exposed one issue in vectorizable_live_operation, which inserts one extra BB before the single exit, leading unexpected operand expansion and unexpected loop depth assertion. As Richi suggested, this patch is to teach vectorizable_live_operation to

[PATCH v3] Fix PR90332 by extending half size vector mode

2020-03-26 Thread Kewen.Lin via Gcc-patches
Hi Richi, on 2020/3/25 下午4:25, Richard Biener wrote: > On Tue, Mar 24, 2020 at 9:30 AM Kewen.Lin wrote: >> >> Hi, >> >> The new version with refactoring has been attached. >> Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9. >> >> I

[PATCH v2] Fix PR90332 by extending half size vector mode

2020-03-24 Thread Kewen.Lin via Gcc-patches
Hi, on 2020/3/18 下午11:10, Richard Biener wrote: > On Wed, Mar 18, 2020 at 2:56 PM Kewen.Lin wrote: >> >> Hi Richi, >> >> Thanks for your comments. >> >> on 2020/3/18 下午6:39, Richard Biener wrote: >>> On Wed, Mar 18, 2020 at 11:06 AM Kewen.Li

Re: [PATCH] Fix PR90332 by extending half size vector mode

2020-03-18 Thread Kewen.Lin via Gcc-patches
on 2020/3/18 下午6:40, Richard Biener wrote: > On Wed, Mar 18, 2020 at 11:39 AM Richard Biener > wrote: >> >> On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin wrote: >>> >>> Hi, >>> >>> As PR90332 shows, the current scalar epilogue peeling for

Re: [PATCH] Fix PR90332 by extending half size vector mode

2020-03-18 Thread Kewen.Lin via Gcc-patches
Hi Richi, Thanks for your comments. on 2020/3/18 下午6:39, Richard Biener wrote: > On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin wrote: >> >> Hi, >> >> As PR90332 shows, the current scalar epilogue peeling for gaps >> elimination requires expected vec_init optab

[PATCH] Fix PR90332 by extending half size vector mode

2020-03-18 Thread Kewen.Lin via Gcc-patches
Hi, As PR90332 shows, the current scalar epilogue peeling for gaps elimination requires expected vec_init optab with two half size vector mode. On Power, we don't support vector mode like V8QI, so can't support optab like vec_initv16qiv8qi. But we want to leverage existing scalar mode like DI

Re: [testsuite] Fix PR93935 to guard case under vect_hw_misalign

2020-03-11 Thread Kewen.Lin via Gcc-patches
Hi, Gentle ping this patch, also request to backport to gcc9 after some burn-in time. BR, Kewen on 2020/2/26 下午2:17, Kewen.Lin wrote: > Hi, > > This patch is to apply the same fix as r267528 to another similar case > bb-slp-over-widen-2.c which requires misaligned vector access. &

Re: [testsuite] Fix PR94019 to allow one vector char when !vect_hw_misalign

2020-03-04 Thread Kewen.Lin
Hi Richard, on 2020/3/5 上午3:09, Richard Sandiford wrote: > "Kewen.Lin" writes: >> Hi, >> >> >> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c >> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c >> @@ -41,6 +41,10 @@ main (

Re: [testsuite] Fix PR94019 to allow one vector char when !vect_hw_misalign

2020-03-04 Thread Kewen.Lin
Hi Segher, on 2020/3/5 上午2:44, Segher Boessenkool wrote: > Hi! > > On Wed, Mar 04, 2020 at 03:13:51PM +0800, Kewen.Lin wrote: >> As PR94019 shows, without misaligned vector access support but with >> realign load, the vectorized loop will end up with realign scheme. >>

Re: [PATCH] [rs6000] Rewrite the declaration of a variable

2020-03-04 Thread Kewen.Lin
on 2020/3/4 下午3:24, binbin wrote: > Hi > > On 2020/3/4 上午8:33, Segher Boessenkool wrote: >> Hi! >> >> On Tue, Mar 03, 2020 at 10:13:56AM -0600, Bin Bin Lv wrote: >>> Rewrite the declaration of toc_section from the source file rs6000.c to its >>> header file for standardizing the code. >> >>> diff

[testsuite] Fix PR94019 to allow one vector char when !vect_hw_misalign

2020-03-03 Thread Kewen.Lin
Hi, As PR94019 shows, without misaligned vector access support but with realign load, the vectorized loop will end up with realign scheme. It generates mask (control vector) with return type vector signed char which breaks the not check. The fix is to differentiate powerpc vect_hw_misalign and

[testsuite] Fix PR94023 to guard case under vect_hw_misalign

2020-03-03 Thread Kewen.Lin
Hi, As PR94023 shows, the expected SLP requires misaligned vector access support. This patch is to guard the check under the target condition vect_hw_misalign to ensure that. Verified it on ppc64-redhat-linux (Power7 BE). Is it ok for trunk, and backport to GCC 9 after some burn-in time? BR,

Re: [PATCH 2/4 GCC11] Add target hook stride_dform_valid_p

2020-03-03 Thread Kewen.Lin
>> Hi Segher and Richard S., >> >> Sorry for late response. Thanks for your comments on legitimate_address_p >> hook >> and function addr_offset_valid_p. I updated the IVOPTs part with >> addr_offset_valid_p, although rs6000_legitimate_offset_address_p doesn't >> check >> strictly all the time

[testsuite] Fix PR93935 to guard case under vect_hw_misalign

2020-02-25 Thread Kewen.Lin
Hi, This patch is to apply the same fix as r267528 to another similar case bb-slp-over-widen-2.c which requires misaligned vector access. Verified it on ppc64-redhat-linux (Power7 BE). Is it ok for trunk? BR, Kewen --- gcc/testsuite/ChangeLog 2020-02-26 Kewen Lin PR

[testsuite] Update several scev/IVOPTs cases

2020-02-25 Thread Kewen.Lin
Hi, Several scev/IVOPTs cases aim to check some array references are sceved and later marked as REFERENCE ADDRESS IV groups. With IV group type dumping improving, these check strings can be improved. Otherwise, they become fragile with dumping changes. This patch is to keep check strings

[PATCH 3/4 V2 GCC11] IVOPTs Consider cost_step on different forms during unrolling

2020-02-25 Thread Kewen.Lin
Hi, As the proposed hook changes, updated this with main changes: 1) Check with addr_offset_valid_p instead. 2) Check the 1st and the last use for the whole address group. 3) Scale up group costs accordingly. Bootstrapped/regtested on powerpc64le-linux-gnu (LE). BR, Kewen ---

Re: [PATCH 2/4 GCC11] Add target hook stride_dform_valid_p

2020-02-25 Thread Kewen.Lin
on 2020/1/20 下午9:14, Segher Boessenkool wrote: > Hi! > > On Mon, Jan 20, 2020 at 10:42:12AM +, Richard Sandiford wrote: >> "Kewen.Lin" writes: >>> gcc/ChangeLog >>> >>> 2020-01-16 Kewen Lin >>> >>>

Re: [PATCH, IRA] Fix PR91052 by skipping multiple_sets insn in combine_and_move_insns

2020-02-11 Thread Kewen.Lin
on 2020/2/12 上午12:24, Vladimir Makarov wrote: > On 2/11/20 3:01 AM, Kewen.Lin wrote: >> Hi, >> >> As PR91052's comments show, commit r272731 exposed one issue in function >> combine_and_move_insns.  Function combine_and_move_insns perform the >> below unexpecte

[PATCH, IRA] Fix PR91052 by skipping multiple_sets insn in combine_and_move_insns

2020-02-11 Thread Kewen.Lin
Hi, As PR91052's comments show, commit r272731 exposed one issue in function combine_and_move_insns. Function combine_and_move_insns perform the below unexpected transformation. ** Before: ** 67: NOTE_INSN_BASIC_BLOCK 8 ... 59: {r184:SF=[sfp:SI-0x190];r121:SI=sfp:SI-0x190;} ==> move

[PATCH 1/4 v3 GCC11] Add middle-end unroll factor estimation

2020-02-10 Thread Kewen.Lin
ion. * tree-ssa-loop.c (tree_average_num_loop_insns): New function. * tree-ssa-loop.h (tree_average_num_loop_insns): New declaration. on 2020/2/11 上午7:34, Segher Boessenkool wrote: > Hi! > > On Mon, Feb 10, 2020 at 02:20:17PM +0800, Kewen.Lin wrote: >> * tre

Re: [PATCH 1/4 v2 GCC11] Add middle-end unroll factor estimation

2020-02-10 Thread Kewen.Lin
Hi Jeff, on 2020/2/11 上午10:14, Jiufu Guo wrote: > "Kewen.Lin" writes: > >> Hi Segher, >> >> Thanks for your comments! Updated to v2 as below: >> >> 1) Removed unnecessary hook loop_unroll_adjust_tree. >> 2) Updated estimated_uf to est

Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

2020-02-10 Thread Kewen.Lin
on 2020/2/11 上午5:29, Segher Boessenkool wrote: > Hi! > > On Mon, Feb 10, 2020 at 02:17:04PM +0800, Kewen.Lin wrote: >> on 2020/1/20 下午8:33, Segher Boessenkool wrote: >>> On Thu, Jan 16, 2020 at 05:36:52PM +0800, Kewen.Lin wrote: >>>> As we discussed in the th

[PATCH 4/4 v2 GCC11] rs6000: P9 D-form test cases

2020-02-09 Thread Kewen.Lin
-dform-2.c: New test. * gcc.target/powerpc/p9-dform-3.c: New test. * gcc.target/powerpc/p9-dform-4.c: New test. * gcc.target/powerpc/p9-dform-generic.h: New test. on 2020/1/20 下午9:19, Segher Boessenkool wrote: > Hi! > > On Thu, Jan 16, 2020 at 05:42:41PM +0800,

[PATCH 1/4 v2 GCC11] Add middle-end unroll factor estimation

2020-02-09 Thread Kewen.Lin
(tree_average_num_loop_insns): New function. * tree-ssa-loop.h (tree_average_num_loop_insns): New declare. BR, Kewen on 2020/1/20 下午9:02, Segher Boessenkool wrote: > Hi! > > On Thu, Jan 16, 2020 at 05:39:40PM +0800, Kewen.Lin wrote: >> --- a/gcc/cfgloop.h >> +++ b/gcc/cfgloop.

Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

2020-02-09 Thread Kewen.Lin
Hi Segher, on 2020/1/20 下午8:33, Segher Boessenkool wrote: > Hi! > > On Thu, Jan 16, 2020 at 05:36:52PM +0800, Kewen.Lin wrote: >> As we discussed in the thread >> https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html >> Original: https://gcc.gnu.org/ml/gcc-pa

[PATCH 4/4 GCC11] rs6000: P9 D-form test cases

2020-01-16 Thread Kewen.Lin
gcc/testsuite/ChangeLog 2020-01-16 Kelvin Nilsen Kewen Lin * gcc.target/powerpc/p9-dform-0.c: New test. * gcc.target/powerpc/p9-dform-1.c: New test. * gcc.target/powerpc/p9-dform-2.c: New test. * gcc.target/powerpc/p9-dform-3.c: New test.

[PATCH 3/4 GCC11] IVOPTs Consider cost_step on different forms during unrolling

2020-01-16 Thread Kewen.Lin
gcc/ChangeLog 2020-01-16 Kewen Lin * tree-ssa-loop-ivopts.c (struct iv_group): New field dform_p. (struct iv_cand): New field dform_p. (struct ivopts_data): New field mark_dform_p. (record_group): Initialize dform_p. (mark_dform_groups): New function.

[PATCH 2/4 GCC11] Add target hook stride_dform_valid_p

2020-01-16 Thread Kewen.Lin
gcc/ChangeLog 2020-01-16 Kewen Lin * config/rs6000/rs6000.c (TARGET_STRIDE_DFORM_VALID_P): New macro. (rs6000_stride_dform_valid_p): New function. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_STRIDE_DFORM_VALID_P): New hook. * target.def

[PATCH 1/4 GCC11] Add middle-end unroll factor estimation

2020-01-16 Thread Kewen.Lin
gcc/ChangeLog 2020-01-16 Kewen Lin * cfgloop.h (struct loop): New field estimated_uf. * config/rs6000/rs6000.c (TARGET_LOOP_UNROLL_ADJUST_TREE): New macro. (rs6000_loop_unroll_adjust_tree): New function. * doc/tm.texi: Regenerate. * doc/tm.texi.in

[PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

2020-01-16 Thread Kewen.Lin
Hi, As we discussed in the thread https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html, I'm working to teach IVOPTs to consider D-form group access during unrolling. The difference on D-form and other forms during unrolling is

Re: [PATCH] Fix typo and avoid possible memory leak

2020-01-14 Thread Kewen.Lin
on 2020/1/13 下午6:46, Richard Sandiford wrote: > "Kewen.Lin" writes: >> Hi, >> >> Function average_num_loop_insns forgets to free loop body in early return. >> Besides, overflow comparison checks 100 (e6) but the return value is >> 1

[PATCH] Fix typo and avoid possible memory leak

2020-01-12 Thread Kewen.Lin
Hi, Function average_num_loop_insns forgets to free loop body in early return. Besides, overflow comparison checks 100 (e6) but the return value is 10 (e5), I guess it's unexpected, a typo? Bootstrapped and regress tested on powerpc64le-linux-gnu. I guess this should go to GCC11? Is

Re: [RFC] IVOPTs select cand with preferred D-form access

2020-01-08 Thread Kewen.Lin
Hi Bin, > I am a bit worried that would make IVOPTs heavy too, it might be > possible to compute heuristics whether loop should be unrolled as a > post-IVOPTs transformation. Of course the transformation needs to do > more work than simply unrolling in order to take advantage of > aforementioned

Re: [RFC] IVOPTs select cand with preferred D-form access

2020-01-07 Thread Kewen.Lin
on 2020/1/7 下午7:25, Richard Biener wrote: > On Tue, 7 Jan 2020, Kewen.Lin wrote: > >> on 2020/1/7 下午5:14, Richard Biener wrote: >>> On Mon, 6 Jan 2020, Kewen.Lin wrote: >>> >>>> We are thinking whether it can be handled in IVOPTs instead of one RTL >&

Re: [RFC] IVOPTs select cand with preferred D-form access

2020-01-07 Thread Kewen.Lin
on 2020/1/7 下午5:14, Richard Biener wrote: > On Mon, 6 Jan 2020, Kewen.Lin wrote: > >> We are thinking whether it can be handled in IVOPTs instead of one RTL pass. >> >> During IVOPTs selecting IV cands, it doesn't know the loop will be unrolled >> so >> it

[RFC/PATCH] IVOPTs select cand with preferred D-form access

2020-01-06 Thread Kewen.Lin
Hi all, Recently I'm investigating on an issue related to use D-form/X-form vector memory access, it's the same as what the patch https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01879.html was intended to deal with. Power9 introduces DQ-form instructions for vector memory access, we perfer to use

[PATCH, rs6000] Adjust vectorization cost for scalar COND_EXPR

2019-12-11 Thread Kewen.Lin
Hi, We found that the vectorization cost modeling on scalar COND_EXPR is a bit off on rs6000. One typical case is 548.exchange2_r, -Ofast -mcpu=power9 -mrecip -fvect-cost-model=unlimited is better than -Ofast -mcpu=power9 -mrecip (the default is -fvect-cost-model=dynamic) by 1.94%. Scalar

[PATCH, rs6000] Fix PR92760 by checking VECTOR_MEM_NONE_P instead

2019-12-03 Thread Kewen.Lin
Hi, PR92760 exposed one issue that VECTOR_UNIT_NONE_P (V2DImode) is true on Power7 then we won't return it as preferred_simd_mode but ISA 2.06 (Power7) does introduce partial support on vector doubleword (very limitted) and more basic support origins from ISA 2.07 (Power8) though. To make

Re: [PATCH] [rs6000] Fix PR92098

2019-11-26 Thread Kewen.Lin
Hi Lijia, on 2019/11/27 下午2:31, Li Jia He wrote: > Hi, > > In order to fix PR92098, we need to define vec_cmp_* and vcond_mask_*. In > fact, > PR92132 already fixed the issue on the trunk. We need to backport PR92132 int > part to gcc-9-branch. This patch backport

[PATCH] Fix PR91790 by considering different first_stmt_info for realign

2019-11-26 Thread Kewen.Lin
Hi, As PR91790 exposed, when we have one slp node whose first_stmt_info_for_drptr is different from first_stmt_info, it's possible that the first_stmt DR isn't initialized yet before stmt SLP_TREE_SCALAR_STMTS[0] of slp node. So we shouldn't use first_stmt_info for vect_setup_realignment, instead

[PATCH, rs6000] Fix PR92566 by checking VECTOR_UNIT_NONE_P

2019-11-26 Thread Kewen.Lin
Hi, As Segher pointed out in PR92566, we shouldn't offer some vector modes which aren't supported under current setting. This patch is to make it check by VECTOR_UNIT_NONE_P which is initialized as current architecture masks. Bootstrapped and tested on powerpc64le-linux-gnu. Is it ok for trunk?

Re: [PATCH, rs6000] Refactor FP vector comparison operators

2019-11-24 Thread Kewen.Lin
Hi Segher, on 2019/11/23 上午12:08, Segher Boessenkool wrote: > Hi! >> 2019-11-21 Kewen Lin >> >> * config/rs6000/vector.md (vector_fp_comparison_simple): >> New code iterator. >> (vector_fp_comparison_complex): Likewise. >> (vector_ for VEC_F and >>

Re: [PATCH, rs6000] Refactor FP vector comparison operators

2019-11-21 Thread Kewen.Lin
Hi Segher, on 2019/11/20 下午10:06, Segher Boessenkool wrote: > Hi! > > On Wed, Nov 20, 2019 at 03:31:36PM +0800, Kewen.Lin wrote: > Yeah. Just doing can_create_pseudo in the insn condition (and in the > split condition, via &&) will work -- there just is this window

Re: [PATCH, rs6000] Refactor FP vector comparison operators

2019-11-19 Thread Kewen.Lin
Hi Segher, on 2019/11/20 上午1:29, Segher Boessenkool wrote: > Hi! > > On Tue, Nov 12, 2019 at 06:41:07PM +0800, Kewen.Lin wrote: >> +;; code iterators and attributes for vector FP comparison operators: >> +(define_code_iterator vector_fp_comparis

Re: [PATCH, testsuite] Fix PR92464 by adjust test case loop bound

2019-11-13 Thread Kewen.Lin
Hi Segher, on 2019/11/13 下午6:42, Segher Boessenkool wrote: > Hi! > > On Wed, Nov 13, 2019 at 03:31:11PM +0800, Kewen.Lin wrote: >> As PR92464 shows, the recent vectorization cost adjustment on load >> insns is responsible for this regression. It leads the profitable &

[PATCH, testsuite] Fix PR92464 by adjust test case loop bound

2019-11-12 Thread Kewen.Lin
Hi, As PR92464 shows, the recent vectorization cost adjustment on load insns is responsible for this regression. It leads the profitable min iteration count to change from 19 to 12. The case happens to hit the threshold. By actual runtime performance evaluation, the vectorized version perform

Re: [PATCH, rs6000] Refactor FP vector comparison operators

2019-11-12 Thread Kewen.Lin
Hi Segher, on 2019/11/11 下午8:51, Segher Boessenkool wrote: > Hi! > >> pattern 1: >> lt(a,b) = gt(b,a) >> le(a,b) = ge(b,a) > > This is done by swap_condition normally. Nice! Done. > >> pattern 2: >> unge(a,b) = ~gt(b,a) >> unle(a,b) = ~gt(a,b) >> ne(a,b) = ~eq(a,b) >>

[PATCH, rs6000] Refactor FP vector comparison operators

2019-11-10 Thread Kewen.Lin
Hi, This is a subsequent patch to refactor the existing float point vector comparison operator supports. The patch to fix PR92132 supplemented vector float point comparison by exposing the names for unordered/ordered/uneq/ltgt and adding ungt/unge/unlt/unle/ ne. As Segher pointed out, some

Re: [PATCH rs6000]Fix PR92132

2019-11-10 Thread Kewen.Lin
Hi Segher, on 2019/11/9 上午1:36, Segher Boessenkool wrote: > Hi! > > On Fri, Nov 08, 2019 at 10:38:13AM +0800, Kewen.Lin wrote: >>>> + [(set (match_operand: 0 "vint_operand") >>>> + (match_operator 1 "comparison_operator" >>> &g

Re: [PATCH rs6000]Fix PR92132

2019-11-07 Thread Kewen.Lin
Hi Segher, on 2019/11/8 上午8:07, Segher Boessenkool wrote: > Hi! > >>> Half are pretty simple: >>> >>> lt(a,b) = gt(b,a) >>> gt(a,b) = gt(a,b) >>> eq(a,b) = eq(a,b) >>> le(a,b) = ge(b,a) >>> ge(a,b) = ge(a,b) >>> >>> ltgt(a,b) = ge(a,b) ^ ge(b,a) >>> ord(a,b) = ge(a,b) | ge(b,a) >>> >>> The

Re: [PATCH, rs6000 v2] Make load cost more in vectorization cost for P8/P9

2019-11-07 Thread Kewen.Lin
Hi Segher, on 2019/11/8 上午6:36, Segher Boessenkool wrote: > On Thu, Nov 07, 2019 at 11:22:12AM +0800, Kewen.Lin wrote: >> One updated patch to enable it everywhere attached. > >> 2019-11-07 Kewen Lin >> >> * config/rs6000/rs6000.c (rs6000_bui

Re: [PATCH rs6000]Fix PR92132

2019-11-07 Thread Kewen.Lin
Hi Segher, on 2019/11/7 上午7:49, Segher Boessenkool wrote: > > The expander named "one_cmpl3": > > Erm. 2, not 3 :-) > > (define_expand "one_cmpl2" > [(set (match_operand:BOOL_128 0 "vlogical_operand") > (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")))] > "" > "") >

[PATCH, rs6000 v2] Make load cost more in vectorization cost for P8/P9

2019-11-06 Thread Kewen.Lin
Hi Segher, on 2019/11/7 上午1:38, Segher Boessenkool wrote: > Hi! > > On Tue, Nov 05, 2019 at 10:14:46AM +0800, Kewen.Lin wrote: >>>> + benefits were observed on Power8 and up, we can unify it if similar >>>> + profits are measured on Power6 and Power7.

Re: [PATCH rs6000]Fix PR92132

2019-11-05 Thread Kewen.Lin
Hi Segher, Thanks for the comments! on 2019/11/2 上午7:17, Segher Boessenkool wrote: > On Tue, Oct 29, 2019 at 01:16:53PM +0800, Kewen.Lin wrote: >> (vcond_mask_): New expand. > > Say for which mode please? Like > (vcond_mask_ for VEC_I and VEC_I): New expand.

Re: [PATCH v2] PR92090: Fix testcase failures by r276469

2019-11-04 Thread Kewen.Lin
on 2019/11/5 上午6:57, Joseph Myers wrote: > On Mon, 4 Nov 2019, luoxhu wrote: > >> -finline-functions is enabled by default for O2 since r276469, update the >> test cases with -fno-inline-functions. >> >> v2: disable inlining for the failed cases. Add two more failed cases >> not listed in BZ.

Re: [PATCH, rs6000] Make load cost more in vectorization cost for P8/P9

2019-11-04 Thread Kewen.Lin
Hi Segher, Thanks for the comments! on 2019/11/5 上午4:21, Segher Boessenkool wrote: > Hi! > > On Mon, Nov 04, 2019 at 03:16:06PM +0800, Kewen.Lin wrote: >> To align with rs6000_insn_cost costing more for load type insns, > > (Which itself has history in rs6000_rtx_cos

Re: [PATCH V3] rs6000: Refine small loop unroll in loop_unroll_adjust hook

2019-11-04 Thread Kewen.Lin
Hi Jeff, Thanks for the patch, I learned a lot from it. Some nits embedded. on 2019/11/4 下午2:31, Jiufu Guo wrote: > Hi, > > In this patch, loop unroll adjust hook is introduced for powerpc. We can do > target related hueristic adjustment in this hook. In this patch, small loops > is unrolled

[PATCH, rs6000] Make load cost more in vectorization cost for P8/P9

2019-11-03 Thread Kewen.Lin
Hi, To align with rs6000_insn_cost costing more for load type insns, this patch is to make load insns cost more in vectorization cost function. Considering that the result of load usually is used somehow later (true-dep) but store won't, we keep the store as before. The SPEC2017 performance

Re: [PATCH 3/3 V2][rs6000] vector conversion RTL pattern update for diff unit size

2019-10-31 Thread Kewen.Lin
Hi Segher, on 2019/11/1 上午2:49, Segher Boessenkool wrote: > Hi! > > On Thu, Oct 31, 2019 at 05:35:22PM +0800, Kewen.Lin wrote: >>>> +/* Half VMX/VSX vector (for select) */ >>>> +VECTOR_MODE (FLOAT, SF, 2); /* V2SF */ >>>> +VECTOR_M

[PATCH 3/3 V2][rs6000] vector conversion RTL pattern update for diff unit size

2019-10-31 Thread Kewen.Lin
Hi Segher, Thanks a lot for the comments. on 2019/10/31 上午2:49, Segher Boessenkool wrote: > Hi! > > On Wed, Oct 23, 2019 at 05:42:45PM +0800, Kewen.Lin wrote: >> Following the previous one 2/3, this patch is to update the >> vector conversions between fixed poi

[PATCH, rs6000] Fix PR92127

2019-10-30 Thread Kewen.Lin
Hi, As PR92127 shows, recent commit r276645 enables more unrollings, two ppc vectorization cost model test cases are fragile and failed after the change. This patch is to disable unrolling for the loops of interest to make test cases more robust. Verified on ppc64-redhat-linux. Should be fine

Re: [PATCH rs6000]Fix PR92132

2019-10-28 Thread Kewen.Lin
Fixed one place without consistent mode. Bootstrapped and regress testing passed on powerpc64le-linux. Thanks! Kewen --- gcc/ChangeLog 2019-10-25 Kewen Lin PR target/92132 * config/rs6000/rs6000.md (one_cmpl3_internal): Expose name. * config/rs6000/vector.md

[PATCH rs6000]Fix PR92132

2019-10-25 Thread Kewen.Lin
Hi, To support full condition reduction vectorization, we have to define vec_cmp_* and vcond_mask_*. This patch is to add related expands. Add vector_{ungt,unge,unlt,unle} for unique vector_* interface support. Regression testing just launched. gcc/ChangeLog 2019-10-25 Kewen Lin

[PATCH 3/3][rs6000] vector conversion RTL pattern update for diff unit size

2019-10-23 Thread Kewen.Lin
Hi, Following the previous one 2/3, this patch is to update the vector conversions between fixed point and floating point with different element unit sizes, such as: SP <-> DI, DP <-> SI. Bootstrap and regression testing just launched. gcc/ChangeLog 2019-10-23 Kewen Lin *

[PATCH 2/3][rs6000] vector conversion RTL pattern update for same unit size

2019-10-23 Thread Kewen.Lin
Hi, For those fixed point <-> floating point vector conversion with same element unit size, such as: SP <-> SI, DP <-> DI, it's fine to use the existing RTL operations like any_fix/any_float for them. This patch is to update them with any_fix/any_float. Bootstrapped and regress tested on

[PATCH 1/3][rs6000] Replace vsx_xvcdpsp by vsx_xvcvdpsp

2019-10-23 Thread Kewen.Lin
Hi, I noticed that vsx_xvcdpsp and vsx_xvcvdpsp are almost the same, and vsx_xvcdpsp looks replaceable with vsx_xvcvdpsp since it's only called by gen_*. Bootstrapped and regress tested on powerpc64le-linux-gnu. gcc/ChangeLog 2019-10-23 Kewen Lin * config/rs6000/vsx.md

[PATCH, rs6000] Lower vec_promote_demote vectorization cost for P8/P9

2019-10-09 Thread Kewen.Lin
Hi, This patch is to lower vec_promote_demote vectorization cost in rs6000_builtin_vectorization_cost. It's similar to what we committed for vec_perm, the current cost for vec_promote_demote is also overpriced for Power8 and Power9 since Power8 and Power9 has supported more units for

Re: [PATCH, rs6000] Lower vec_perm vectorization cost for P8/P9

2019-09-29 Thread Kewen.Lin
Hi Segher, on 2019/9/29 下午3:28, Segher Boessenkool wrote: > Hi! > > On Sun, Sep 29, 2019 at 01:38:31PM +0800, Kewen.Lin wrote: >> Recently we are revisiting vectorization cost setting in >> rs6000_builtin_vectorization_cost, and found the current cost of >> vec_p

[PATCH, rs6000] Lower vec_perm vectorization cost for P8/P9

2019-09-28 Thread Kewen.Lin
Hi, Recently we are revisiting vectorization cost setting in rs6000_builtin_vectorization_cost, and found the current cost of vec_perm on VSX looks overpriced for Power8 and Power9. The high cost was set for Power7 single VSU pipe, but Power8 and Power9 have supported more VSX units, the

Re: [PATCH, rs6000] Support float from/to long conversion vectorization

2019-09-28 Thread Kewen.Lin
Hi Segher, on 2019/9/28 上午12:12, Segher Boessenkool wrote: > On Fri, Sep 27, 2019 at 04:52:30PM +0800, Kewen.Lin wrote: >>> (Maybe one of the gen* tools complains any_fix needs a mode? :QI will do >>> if so, or :P if you like that better). >> >> I didn't enc

Re: [PATCH, rs6000] Support float from/to long conversion vectorization

2019-09-27 Thread Kewen.Lin
Hi Segher, on 2019/9/27 下午3:27, Segher Boessenkool wrote: > Hi Kewen, > >> +;; Support signed/unsigned long long to float conversion vectorization. >> +(define_expand "vec_pack_float_v2di" >> + [(match_operand:V4SF 0 "vfloat_operand") >> + (any_float:V4SF (parallel [(match_operand:V2DI 1

[PATCH, rs6000] Support float from/to long conversion vectorization

2019-09-26 Thread Kewen.Lin
Hi, This patch is to add the support for float from/to long conversion vectorization. ISA 2.06 supports the vector version instructions for conversion between float and long long (both signed and unsigned), but vectorizer can't exploit them since the optab check fails. So this patch is mainly to

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-09-14 Thread Kewen.Lin
on 2019/9/12 下午4:14, Richard Biener wrote: > On Wed, 11 Sep 2019, Kewen.Lin wrote: > >> Hi, >> >> Sorry for the late update. I've updated the words of target hooks part. >> >> Could someone help to review it? Thanks in advance! >> >> By the

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-09-11 Thread Kewen.Lin
/pr32044.c: Likewise. on 2019/8/23 下午6:18, Segher Boessenkool wrote: > Hi! > > On Fri, Aug 23, 2019 at 05:43:32PM +0800, Bin.Cheng wrote: >> On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin wrote: >> Not sure if non-ivopts parts are already approved? If so, the patch >>

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-24 Thread Kewen.Lin
Hi Bin, on 2019/8/23 下午5:43, Bin.Cheng wrote: > On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin wrote: >> >> Hi Bin >> >> on 2019/8/23 上午10:19, Bin.Cheng wrote: >>> On Thu, Aug 22, 2019 at 3:09 PM Kewen.Lin wrote: >>>> >>>> Hi Bin, >>

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-23 Thread Kewen.Lin
Hi Bin on 2019/8/23 上午10:19, Bin.Cheng wrote: > On Thu, Aug 22, 2019 at 3:09 PM Kewen.Lin wrote: >> >> Hi Bin, >> >> on 2019/8/22 下午1:46, Bin.Cheng wrote: >>> On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin wrote: >>>> >>>> Hi Bin, &

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-22 Thread Kewen.Lin
Hi Bin, on 2019/8/22 下午1:46, Bin.Cheng wrote: > On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin wrote: >> >> Hi Bin, >> >> Thanks for your time! >> >> on 2019/8/21 下午8:32, Bin.Cheng wrote: >>> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin wrote: >>&g

Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-21 Thread Kewen.Lin
Hi Bin, Thanks for your time! on 2019/8/21 下午8:32, Bin.Cheng wrote: > On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin wrote: >> >> Hi! >> >> Comparing to the previous versions of implementation mainly based on the >> existing IV cands but zeroing the related group/us

[PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts

2019-08-14 Thread Kewen.Lin
Hi! Comparing to the previous versions of implementation mainly based on the existing IV cands but zeroing the related group/use cost, this new one is based on Richard and Segher's suggestion introducing one doloop dedicated IV cand. Some key points are listed below: 1) New field doloop_p

Re: [PATCH V5, rs6000] Support vrotr3 for int vector types

2019-08-05 Thread Kewen.Lin
Hi Segher, on 2019/8/6 上午5:21, Segher Boessenkool wrote: > On Mon, Aug 05, 2019 at 11:41:41AM +0800, Kewen.Lin wrote: >> on 2019/8/4 上午4:52, Segher Boessenkool wrote: >>> On Fri, Aug 02, 2019 at 04:59:44PM +0800, Kewen.Lin wrote: > There are two cases: either all elements ar

<    9   10   11   12   13   14   15   >