[PATCH v2 0/2] Fix vec_sel code generation and merge xxsel to vsel

2021-09-16 Thread Xionghu Luo via Gcc-patches
These two patches are updated version from: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579490.html Changes: 1. Fix alignment error in md files. 2. Replace rtx_equal_p with match_dup. 3. Use register_operand instead of gpc_reg_operand to align with vperm/xxperm. 4. Regression

[PATCH v2 2/2] rs6000: Fold xxsel to vsel since they have same semantics

2021-09-16 Thread Xionghu Luo via Gcc-patches
Fold xxsel to vsel like xxperm/vperm to avoid duplicate code. gcc/ChangeLog: 2021-09-17 Xionghu Luo * config/rs6000/altivec.md: Add vsx register constraints. * config/rs6000/vsx.md (vsx_xxsel): Delete. (vsx_xxsel2): Likewise. (vsx_xxsel3): Likewise.

Re: Ping ^ 3: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-09-15 Thread Xionghu Luo via Gcc-patches
Ping^3, thanks. https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html On 2021/9/6 08:52, Xionghu Luo via Gcc-patches wrote: Ping^2, thanks. https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html On 2021/6/30 09:42, Xionghu Luo via Gcc-patches wrote: Gentle ping, thanks

Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

2021-09-13 Thread Xionghu Luo via Gcc-patches
On 2021/9/13 16:17, Richard Biener wrote: On Mon, 13 Sep 2021, Xionghu Luo wrote: On 2021/9/10 21:54, Xionghu Luo via Gcc-patches wrote: On 2021/9/9 18:55, Richard Biener wrote: diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index 5d6845478e7..4b187c2cdaf 100644

Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

2021-09-12 Thread Xionghu Luo via Gcc-patches
On 2021/9/10 21:54, Xionghu Luo via Gcc-patches wrote: On 2021/9/9 18:55, Richard Biener wrote: diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index 5d6845478e7..4b187c2cdaf 100644 --- a/gcc/tree-ssa-loop-im.c +++ b/gcc/tree-ssa-loop-im.c @@ -3074,15 +3074,13

Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

2021-09-10 Thread Xionghu Luo via Gcc-patches
On 2021/9/9 18:55, Richard Biener wrote: diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index 5d6845478e7..4b187c2cdaf 100644 --- a/gcc/tree-ssa-loop-im.c +++ b/gcc/tree-ssa-loop-im.c @@ -3074,15 +3074,13 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)

Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

2021-09-09 Thread Xionghu Luo via Gcc-patches
On 2021/9/2 18:37, Richard Biener wrote: On Thu, 2 Sep 2021, Xionghu Luo wrote: On 2021/9/2 16:50, Richard Biener wrote: On Thu, 2 Sep 2021, Richard Biener wrote: On Thu, 2 Sep 2021, Xionghu Luo wrote: On 2021/9/1 17:58, Richard Biener wrote: This fixes the CFG walk order of

Re: [RFC] Don't move cold code out of loop by checking bb count

2021-09-08 Thread Xionghu Luo via Gcc-patches
On 2021/8/26 19:33, Richard Biener wrote: On Tue, Aug 10, 2021 at 4:03 AM Xionghu Luo wrote: Hi, On 2021/8/6 20:15, Richard Biener wrote: On Mon, Aug 2, 2021 at 7:05 AM Xiong Hu Luo wrote: There was a patch trying to avoid move cold block out of loop:

Re: Ping ^ 2: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-09-06 Thread Xionghu Luo via Gcc-patches
On 2021/9/4 05:44, Segher Boessenkool wrote: Hi! On Fri, Sep 03, 2021 at 10:31:24AM +0800, Xionghu Luo wrote: fmod/fmodf and remainder/remainderf could be expanded instead of library call when fast-math build, which is much faster. Thank you very much for this patch. Some trivial

Ping ^ 2: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-09-05 Thread Xionghu Luo via Gcc-patches
Ping^2, thanks. https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html On 2021/6/30 09:47, Xionghu Luo via Gcc-patches wrote: Gentle ping, thanks. https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html On 2021/6/9 16:03, Xionghu Luo via Gcc-patches wrote: Hi, On 2021/6

Ping ^ 2: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-09-05 Thread Xionghu Luo via Gcc-patches
Ping^2, thanks. https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html On 2021/6/30 09:42, Xionghu Luo via Gcc-patches wrote: Gentle ping, thanks. https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote: Hi, On 2021/5/13

Re: Ping ^ 2: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-09-02 Thread Xionghu Luo via Gcc-patches
Resend the patch that addressed Will's comments. fmod/fmodf and remainder/remainderf could be expanded instead of library call when fast-math build, which is much faster. fmodf: fdivs f0,f1,f2 frizf0,f0 fnmsubs f1,f2,f0,f1 remainderf: fdivs f0,f1,f2 frin

Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

2021-09-02 Thread Xionghu Luo via Gcc-patches
On 2021/9/2 16:50, Richard Biener wrote: > On Thu, 2 Sep 2021, Richard Biener wrote: > >> On Thu, 2 Sep 2021, Xionghu Luo wrote: >> >>> >>> >>> On 2021/9/1 17:58, Richard Biener wrote: This fixes the CFG walk order of fill_always_executed_in to use RPO oder rather than the dominator

Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

2021-09-01 Thread Xionghu Luo via Gcc-patches
On 2021/9/1 17:58, Richard Biener wrote: This fixes the CFG walk order of fill_always_executed_in to use RPO oder rather than the dominator based order computed by get_loop_body_in_dom_order. That fixes correctness issues with unordered dominator children. The RPO order computed by

Re: [PATCH v3] Fix incomplete computation in fill_always_executed_in_1

2021-08-31 Thread Xionghu Luo via Gcc-patches
On 2021/8/30 17:19, Richard Biener wrote: bitmap_set_bit (work_set, loop->header->index); + unsigned bb_index; - for (i = 0; i < loop->num_nodes; i++) - { - edge_iterator ei; - bb = bbs[i]; + unsigned array_size = last_basic_block_for_fn (cfun) + 1;

Re: [PATCH v3] Fix incomplete computation in fill_always_executed_in_1

2021-08-30 Thread Xionghu Luo via Gcc-patches
On 2021/8/27 15:45, Richard Biener wrote: On Thu, 26 Aug 2021, Xionghu Luo wrote: On 2021/8/24 16:20, Richard Biener wrote: On Tue, 24 Aug 2021, Xionghu Luo wrote: On 2021/8/19 20:11, Richard Biener wrote: - class loop *inn_loop = loop; if (ALWAYS_EXECUTED_IN

Re: [PATCH v3] Fix incomplete computation in fill_always_executed_in_1

2021-08-25 Thread Xionghu Luo via Gcc-patches
On 2021/8/24 16:20, Richard Biener wrote: > On Tue, 24 Aug 2021, Xionghu Luo wrote: > >> >> >> On 2021/8/19 20:11, Richard Biener wrote: - class loop *inn_loop = loop; if (ALWAYS_EXECUTED_IN (loop->header) == NULL) { @@ -3232,19 +3231,6 @@

Re: [PATCH v2] Fix incomplete computation in fill_always_executed_in_1

2021-08-24 Thread Xionghu Luo via Gcc-patches
On 2021/8/19 20:11, Richard Biener wrote: >> - class loop *inn_loop = loop; >> >> if (ALWAYS_EXECUTED_IN (loop->header) == NULL) >> { >> @@ -3232,19 +3231,6 @@ fill_always_executed_in_1 (class loop *loop, sbitmap >> contains_call) >> to disprove this if possible). */

[PATCH v2] Don't move cold code out of loop by checking bb count

2021-08-18 Thread Xionghu Luo via Gcc-patches
On 2021/8/10 12:25, Ulrich Drepper wrote: > On Tue, Aug 10, 2021 at 4:03 AM Xionghu Luo via Gcc-patches > wrote: >> For this case, theorotically I think the master GCC will optimize it to: >> >>invariant; >>for

Re: [PATCH v2] Fix incomplete computation in fill_always_executed_in_1

2021-08-18 Thread Xionghu Luo via Gcc-patches
On 2021/8/17 17:10, Xionghu Luo via Gcc-patches wrote: > > > On 2021/8/17 15:12, Richard Biener wrote: >> On Tue, 17 Aug 2021, Xionghu Luo wrote: >> >>> Hi, >>> >>> On 2021/8/16 19:46, Richard Biener wrote: >>>> On Mon

[PATCH v2] Fix incomplete computation in fill_always_executed_in_1

2021-08-17 Thread Xionghu Luo via Gcc-patches
On 2021/8/17 15:12, Richard Biener wrote: > On Tue, 17 Aug 2021, Xionghu Luo wrote: > >> Hi, >> >> On 2021/8/16 19:46, Richard Biener wrote: >>> On Mon, 16 Aug 2021, Xiong Hu Luo wrote: >>> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for nested loops. inn_loop

Re: [PATCH] Fix incorrect computation in fill_always_executed_in_1

2021-08-16 Thread Xionghu Luo via Gcc-patches
On 2021/8/17 13:17, Xionghu Luo via Gcc-patches wrote: Hi, On 2021/8/16 19:46, Richard Biener wrote: On Mon, 16 Aug 2021, Xiong Hu Luo wrote: It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for nested loops.  inn_loop is updated to inner loop, so it need be restored when

Re: [PATCH] Fix incorrect computation in fill_always_executed_in_1

2021-08-16 Thread Xionghu Luo via Gcc-patches
Hi, On 2021/8/16 19:46, Richard Biener wrote: On Mon, 16 Aug 2021, Xiong Hu Luo wrote: It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for nested loops. inn_loop is updated to inner loop, so it need be restored when exiting from innermost loop. With this patch, the store

Re: [PATCH] Fix loop split incorrect count and probability

2021-08-11 Thread Xionghu Luo via Gcc-patches
On 2021/8/11 17:16, Richard Biener wrote: On Wed, 11 Aug 2021, Xionghu Luo wrote: On 2021/8/10 22:47, Richard Biener wrote: On Mon, 9 Aug 2021, Xionghu Luo wrote: Thanks, On 2021/8/6 19:46, Richard Biener wrote: On Tue, 3 Aug 2021, Xionghu Luo wrote: loop split condition is moved

Re: [PATCH] Fix loop split incorrect count and probability

2021-08-11 Thread Xionghu Luo via Gcc-patches
On 2021/8/10 22:47, Richard Biener wrote: > On Mon, 9 Aug 2021, Xionghu Luo wrote: > >> Thanks, >> >> On 2021/8/6 19:46, Richard Biener wrote: >>> On Tue, 3 Aug 2021, Xionghu Luo wrote: >>> loop split condition is moved between loop1 and loop2, the split bb's count and probability

Re: [RFC] Don't move cold code out of loop by checking bb count

2021-08-09 Thread Xionghu Luo via Gcc-patches
Hi, On 2021/8/6 20:15, Richard Biener wrote: > On Mon, Aug 2, 2021 at 7:05 AM Xiong Hu Luo wrote: >> >> There was a patch trying to avoid move cold block out of loop: >> >> https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html >> >> Richard suggested to "never hoist anything from a bb with

Re: [PATCH] Fix loop split incorrect count and probability

2021-08-08 Thread Xionghu Luo via Gcc-patches
Thanks, On 2021/8/6 19:46, Richard Biener wrote: > On Tue, 3 Aug 2021, Xionghu Luo wrote: > >> loop split condition is moved between loop1 and loop2, the split bb's >> count and probability should also be duplicated instead of (100% vs INV), >> secondly, the original loop1 and loop2 count need

Re: [PATCH] Fix loop split incorrect count and probability

2021-08-03 Thread Xionghu Luo via Gcc-patches
I' like to split this patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html to two patches: 0001-Fix-loop-split-incorrect-count-and-probability.patch 0002-Don-t-move-cold-code-out-of-loop-by-checking-bb-coun.patch since they are solving two different things, please help to

[PATCH] Fix loop split incorrect count and probability

2021-08-03 Thread Xionghu Luo via Gcc-patches
loop split condition is moved between loop1 and loop2, the split bb's count and probability should also be duplicated instead of (100% vs INV), secondly, the original loop1 and loop2 count need be propotional from the original loop. Regression tested pass, OK for master? diff

Re: [PATCH] New hook adjust_iv_update_pos

2021-07-12 Thread Xionghu Luo via Gcc-patches
gt;>>> On 2021/6/25 18:02, Richard Biener wrote: >>>>> On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 2021/6/25 16:54, Richard Biener wrote: >>>>>&g

Re: Ping ^ 2: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-07-11 Thread Xionghu Luo via Gcc-patches
On 2021/7/10 02:40, will schmidt wrote: > On Wed, 2021-06-30 at 09:44 +0800, Xionghu Luo via Gcc-patches wrote: >> Gentle ping ^2, thanks. >> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568143.html >> >> >> On 2021/5/14 15:13, Xionghu Luo v

Ping: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-06-29 Thread Xionghu Luo via Gcc-patches
Gentle ping, thanks. https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html On 2021/6/9 16:03, Xionghu Luo via Gcc-patches wrote: Hi, On 2021/6/9 07:25, Segher Boessenkool wrote: On Mon, May 24, 2021 at 04:02:13AM -0500, Xionghu Luo wrote: vmrghb only accepts permute index {0, 16

Ping ^ 2: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-06-29 Thread Xionghu Luo via Gcc-patches
Gentle ping ^2, thanks. https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568143.html On 2021/5/14 15:13, Xionghu Luo via Gcc-patches wrote: Test SPEC2017 Ofast P8LE for this patch : 511.povray_r +1.14%, 526.blender_r +1.72%, no obvious changes to others. On 2021/5/6 10:36, Xionghu Luo

Ping: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-06-29 Thread Xionghu Luo via Gcc-patches
Gentle ping, thanks. https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote: Hi, On 2021/5/13 18:49, Segher Boessenkool wrote: Hi! On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote: The vsel instruction is a bit-wise

Re: [PATCH] New hook adjust_iv_update_pos

2021-06-29 Thread Xionghu Luo via Gcc-patches
> >>>> On 2021/6/25 16:54, Richard Biener wrote: >>>>> On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches >>>>> wrote: >>>>>> >>>>>> From: Xiong Hu Luo >>>>>> >>>>>> ad

Re: [PATCH] New hook adjust_iv_update_pos

2021-06-28 Thread Xionghu Luo via Gcc-patches
On 2021/6/25 18:02, Richard Biener wrote: > On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo wrote: >> >> >> >> On 2021/6/25 16:54, Richard Biener wrote: >>> On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches >>> wrote: >>>>

Re: [PATCH] New hook adjust_iv_update_pos

2021-06-25 Thread Xionghu Luo via Gcc-patches
On 2021/6/25 18:02, Richard Biener wrote: On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo wrote: On 2021/6/25 16:54, Richard Biener wrote: On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches wrote: From: Xiong Hu Luo adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help

Re: [PATCH] New hook adjust_iv_update_pos

2021-06-25 Thread Xionghu Luo via Gcc-patches
Luo via Gcc-patches wrote: From: Xiong Hu Luo adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance on Power. For example, it generates mismatched address offset after adjust iv update statement position: [local count: 70988443]: _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1

Re: [PATCH] New hook adjust_iv_update_pos

2021-06-25 Thread Xionghu Luo via Gcc-patches
On 2021/6/25 16:54, Richard Biener wrote: On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches wrote: From: Xiong Hu Luo adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance on Power. For example, it generates mismatched address offset after adjust iv update

[PATCH] New hook adjust_iv_update_pos

2021-06-25 Thread Xionghu Luo via Gcc-patches
From: Xiong Hu Luo adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance on Power. For example, it generates mismatched address offset after adjust iv update statement position: [local count: 70988443]: _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1]; ivtmp.30_415 =

Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-15 Thread Xionghu Luo via Gcc-patches
On 2021/6/12 04:16, Segher Boessenkool wrote: On Thu, Jun 10, 2021 at 03:11:08PM +0800, Xionghu Luo wrote: On 2021/6/10 00:24, Segher Boessenkool wrote: "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR && !altivec_indexed_or_indirect_operand (operands[0],

Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-10 Thread Xionghu Luo via Gcc-patches
On 2021/6/10 00:24, Segher Boessenkool wrote: > On Wed, Jun 09, 2021 at 11:20:20AM +0800, Xionghu Luo wrote: >> On 2021/6/9 04:11, Segher Boessenkool wrote: >>> On Fri, Jun 04, 2021 at 09:40:58AM +0800, Xionghu Luo wrote: >> rejecting combination of insns 6 and 7 >> original costs 4 + 4

git gcc-commit-mklog doesn't extract PR number to ChangeLog

2021-06-09 Thread Xionghu Luo via Gcc-patches
Hi, I noticed that the "git gcc-commit-mklog" command doesn't extract PR number from title to ChangeLog automatically, then the committed patch doesn't update the related bugzilla PR website after check in the patch? Martin, what's your opinion about this since you are much familar about this?

Re: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-06-09 Thread Xionghu Luo via Gcc-patches
Hi, On 2021/6/9 07:25, Segher Boessenkool wrote: On Mon, May 24, 2021 at 04:02:13AM -0500, Xionghu Luo wrote: vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for vmrghlb. (vmrglb) + if (BYTES_BIG_ENDIAN) +

Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-08 Thread Xionghu Luo via Gcc-patches
On 2021/6/9 04:11, Segher Boessenkool wrote: > Hi! > > On Fri, Jun 04, 2021 at 09:40:58AM +0800, Xionghu Luo wrote: Combine still fail to merge the two instructions: Trying 6 -> 7: 6: r120:KF#0=r125:KF#0<-<0x40 REG_DEAD r125:KF 7:

Re: [PATCH v2] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-08 Thread Xionghu Luo via Gcc-patches
On 2021/6/9 05:07, Segher Boessenkool wrote: > Hi! > > On Tue, Jun 08, 2021 at 09:11:33AM +0800, Xionghu Luo wrote: >> On P8LE, extra rot64+rot64 load or store instructions are generated >> in float128 to vector __int128 conversion. >> >> This patch teaches pass swaps to also handle such

[PATCH v2] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-07 Thread Xionghu Luo via Gcc-patches
Update the patch according to the comments. Thanks. On P8LE, extra rot64+rot64 load or store instructions are generated in float128 to vector __int128 conversion. This patch teaches pass swaps to also handle such pattens to remove extra swap instructions. (insn 7 6 8 2 (set (subreg:V1TI

Ping: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-06-06 Thread Xionghu Luo via Gcc-patches
Ping, thanks. On 2021/5/24 17:02, Xionghu Luo wrote: From: Xiong Hu Luo vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for vmrghlb. Remove UNSPEC_VMRGH_DIRECT/UNSPEC_VMRGL_DIRECT pattern as vec_select +

Ping^2: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-06-06 Thread Xionghu Luo via Gcc-patches
Ping, thanks. On 2021/5/14 15:13, Xionghu Luo via Gcc-patches wrote: Test SPEC2017 Ofast P8LE for this patch : 511.povray_r +1.14%, 526.blender_r +1.72%, no obvious changes to others. On 2021/5/6 10:36, Xionghu Luo via Gcc-patches wrote: Gentle ping, thanks. On 2021/4/16 15:10, Xiong Hu

Ping: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-06-06 Thread Xionghu Luo via Gcc-patches
Gentle ping, thanks. https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote: Hi, On 2021/5/13 18:49, Segher Boessenkool wrote: Hi! On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote: The vsel instruction is a bit-wise

Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-03 Thread Xionghu Luo via Gcc-patches
On 2021/6/4 04:16, Segher Boessenkool wrote: > Hi! > > On Thu, Jun 03, 2021 at 08:46:46AM +0800, Xionghu Luo wrote: >> On 2021/6/3 06:20, Segher Boessenkool wrote: >>> On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote: On P8LE, extra rot64+rot64 load or store instructions are

Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-03 Thread Xionghu Luo via Gcc-patches
Hi, On 2021/6/3 21:09, Bill Schmidt wrote: > On 6/2/21 7:46 PM, Xionghu Luo wrote: >> Hi, >> >> On 2021/6/3 06:20, Segher Boessenkool wrote: >>> On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote: On P8LE, extra rot64+rot64 load or store instructions are generated in float128

Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-03 Thread Xionghu Luo via Gcc-patches
On 2021/6/4 04:31, Segher Boessenkool wrote: > On Thu, Jun 03, 2021 at 02:49:15PM +0800, Xionghu Luo wrote: >> If remove the rotate in simplify-rtx like below: >> >> +++ b/gcc/simplify-rtx.c >> @@ -3830,10 +3830,16 @@ simplify_context::simplify_binary_operation_1 >> (rtx_code code, >>

Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-03 Thread Xionghu Luo via Gcc-patches
On 2021/6/3 08:46, Xionghu Luo via Gcc-patches wrote: > Hi, > > On 2021/6/3 06:20, Segher Boessenkool wrote: >> On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote: >>> On P8LE, extra rot64+rot64 load or store instructions are generated >>> in float

Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-02 Thread Xionghu Luo via Gcc-patches
Hi, On 2021/6/3 06:20, Segher Boessenkool wrote: > On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote: >> On P8LE, extra rot64+rot64 load or store instructions are generated >> in float128 to vector __int128 conversion. >> >> This patch teaches pass swaps to also handle such pattens to

[PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-02 Thread Xionghu Luo via Gcc-patches
On P8LE, extra rot64+rot64 load or store instructions are generated in float128 to vector __int128 conversion. This patch teaches pass swaps to also handle such pattens to remove extra swap instructions. (insn 7 6 8 2 (set (subreg:V1TI (reg:KF 123) 0) (rotate:V1TI (mem/u/c:V1TI (reg/f:DI

[PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-05-24 Thread Xionghu Luo via Gcc-patches
From: Xiong Hu Luo vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for vmrghlb. Remove UNSPEC_VMRGH_DIRECT/UNSPEC_VMRGL_DIRECT pattern as vec_select + vec_concat as normal RTL. Tested pass on P8LE, P9LE and

Re: [PATCH] Run pass_sink_code once more before store_mergin

2021-05-18 Thread Xionghu Luo via Gcc-patches
Hi, On 2021/5/18 15:02, Richard Biener wrote: > Can you, for the new gcc.dg/tree-ssa/ssa-sink-18.c testcase, add > a comment explaining what operations we expect to sink? The testcase > is likely somewhat fragile in the exact number of sinkings > (can you check on some other target and maybe

Re: [PATCH] Run pass_sink_code once more before store_mergin

2021-05-17 Thread Xionghu Luo via Gcc-patches
Hi, On 2021/5/17 16:11, Richard Biener wrote: On Fri, 14 May 2021, Xionghu Luo wrote: Hi Richi, On 2021/4/21 19:54, Richard Biener wrote: On Tue, 20 Apr 2021, Xionghu Luo wrote: On 2021/4/15 19:34, Richard Biener wrote: On Thu, 15 Apr 2021, Xionghu Luo wrote: Thanks, On 2021/4/14

Re: Ping: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-05-14 Thread Xionghu Luo via Gcc-patches
Test SPEC2017 Ofast P8LE for this patch : 511.povray_r +1.14%, 526.blender_r +1.72%, no obvious changes to others. On 2021/5/6 10:36, Xionghu Luo via Gcc-patches wrote: Gentle ping, thanks. On 2021/4/16 15:10, Xiong Hu Luo wrote: fmod/fmodf and remainder/remainderf could be expanded instead

Re: [RFC] Run pass_sink_code once more after ivopts/fre

2021-05-14 Thread Xionghu Luo via Gcc-patches
Hi Richi, On 2021/4/21 19:54, Richard Biener wrote: > On Tue, 20 Apr 2021, Xionghu Luo wrote: > >> >> >> On 2021/4/15 19:34, Richard Biener wrote: >>> On Thu, 15 Apr 2021, Xionghu Luo wrote: >>> Thanks, On 2021/4/14 14:41, Richard Biener wrote: >> "#538,#235,#234,#233" will

Re: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-05-14 Thread Xionghu Luo via Gcc-patches
Hi, On 2021/5/13 18:49, Segher Boessenkool wrote: Hi! On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote: The vsel instruction is a bit-wise select instruction. Using an IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code being generated in the combine pass. Per

*Ping*: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-05-12 Thread Xionghu Luo via Gcc-patches
On 2021/4/30 14:32, Xionghu Luo wrote: The vsel instruction is a bit-wise select instruction. Using an IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code being generated in the combine pass. Per element selection is a subset of per bit-wise selection,with the patch the

Ping: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-05-05 Thread Xionghu Luo via Gcc-patches
Gentle ping, thanks. On 2021/4/16 15:10, Xiong Hu Luo wrote: fmod/fmodf and remainder/remainderf could be expanded instead of library call when fast-math build, which is much faster. fmodf: fdivs f0,f1,f2 frizf0,f0 fnmsubs f1,f2,f0,f1 remainderf: fdivs

[PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-04-30 Thread Xionghu Luo via Gcc-patches
The vsel instruction is a bit-wise select instruction. Using an IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code being generated in the combine pass. Per element selection is a subset of per bit-wise selection,with the patch the pattern is written using bit operations. But

Re: [RFC] Run pass_sink_code once more after ivopts/fre

2021-04-15 Thread Xionghu Luo via Gcc-patches
Thanks, On 2021/4/14 14:41, Richard Biener wrote: >> "#538,#235,#234,#233" will all be sunk from bb 35 to bb 37 by rtl-sink, >> but it moves #538 first, then #235, there is strong dependency here. It >> seemsdoesn't like the LCM framework that could solve all and do the >> delete-insert in one

Re: [RFC] Run pass_sink_code once more after ivopts/fre

2021-04-13 Thread Xionghu Luo via Gcc-patches
Hi, On 2021/3/26 15:35, Xionghu Luo via Gcc-patches wrote: >> Also we already have a sinking pass on RTL which even computes >> a proper PRE on the reverse graph - -fgcse-sm aka store-motion.c. >> I'm not sure whether this deals with non-stores but the >> LCM machine

Re: [PATCH] Improve rtx insn vec output

2021-04-07 Thread Xionghu Luo via Gcc-patches
On 2021/4/7 14:57, Richard Biener wrote: On Wed, Apr 7, 2021 at 7:42 AM Xionghu Luo wrote: print_rtl will dump the rtx_insn from current until LAST. But it is only useful to see the particular insn that called by print_rtx_insn_vec, Let's call print_rtl_single to display that insn in the

[PATCH] Improve rtx insn vec output

2021-04-06 Thread Xionghu Luo via Gcc-patches
print_rtl will dump the rtx_insn from current until LAST. But it is only useful to see the particular insn that called by print_rtx_insn_vec, Let's call print_rtl_single to display that insn in the gcse and store-motion pass dump. 2021-04-07 Xionghu Luo gcc/ChangeLog: * fold-const.c

[PATCH] rs6000: Enable 32bit variable vec_insert [PR99718]

2021-03-26 Thread Xionghu Luo via Gcc-patches
From: "luo...@cn.ibm.com" 32bit and P7 VSX could also benefit a lot from the variable vec_insert implementation with shift/insert/shift back method. Tested pass on P7BE/P8BE/P8LE{-m32,m64} and P9LE{m64}. gcc/ChangeLog: PR target/99718 * config/rs6000/altivec.md

Re: [RFC] Run pass_sink_code once more after ivopts/fre

2021-03-26 Thread Xionghu Luo via Gcc-patches
Hi, sorry for late response, On 2021/3/23 16:50, Richard Biener wrote: >>> It definitely should be before uncprop (but context stops there). And yes, >>> re-running passes isn't the very, very best thing to do without explaining >>> it cannot be done in other ways. Not for late stage 3 anyway.

Re: [PATCH] rs6000: Correct Power8 cost of l2 cache size [PR97329]

2021-03-24 Thread Xionghu Luo via Gcc-patches
On 2021/3/24 23:56, David Edelsohn wrote: On Wed, Mar 24, 2021 at 1:44 AM Xionghu Luo wrote: l2 cache size for Power8 is 512kB, correct the copy paste error from Power7. Tested no performance change for SPEC2017. gcc/ChangeLog: 2021-03-24 Xionghu Luo * config/rs6000/rs6000.c

[PATCH] rs6000: Don't generate IFN VEC_SET for m32 [PR99718]

2021-03-23 Thread Xionghu Luo via Gcc-patches
UNSPEC_SI_FROM_SF is not supported for -m32 caused ICE on P8BE-32bit, since P8 Vector and above doesn't have fast mechanism to move SFmode to SImode for m32, don't generate IFN VEC_SET for it. Tested pass on P8BE/LE {m32,m64}. gcc/ChangeLog: 2021-03-24 Xionghu Luo *

[PATCH] rs6000: Correct Power8 cost of l2 cache size [PR97329]

2021-03-23 Thread Xionghu Luo via Gcc-patches
l2 cache size for Power8 is 512kB, correct the copy paste error from Power7. Tested no performance change for SPEC2017. gcc/ChangeLog: 2021-03-24 Xionghu Luo * config/rs6000/rs6000.c (struct processor_costs): Change to 512. --- gcc/config/rs6000/rs6000.c | 2 +- 1 file

Re: [RFC] Run pass_sink_code once more after ivopts/fre

2021-03-22 Thread Xionghu Luo via Gcc-patches
On 2020/12/23 00:53, Richard Biener wrote: On December 21, 2020 10:03:43 AM GMT+01:00, Xiong Hu Luo wrote: Here comes another case that requires run a pass once more, as this is not the common suggested direction to solve problems, not quite sure whether it is still a reasonble fix here.

Re: Ping^2: [PATCH v2] rs6000: Convert the vector element register to SImode [PR98914]

2021-03-17 Thread Xionghu Luo via Gcc-patches
On 2021/3/17 15:53, Jakub Jelinek wrote: On Wed, Mar 17, 2021 at 11:35:18AM +0800, Xionghu Luo wrote: + machine_mode idx_mode = GET_MODE (idx); + if (idx_mode != DImode) +idx = convert_modes (DImode, idx_mode, idx, 1); Segher mentioned you can remove the if (idx_mode != DImode) too,

Re: Ping^2: [PATCH v2] rs6000: Convert the vector element register to SImode [PR98914]

2021-03-16 Thread Xionghu Luo via Gcc-patches
Thanks Jakub & Segher, On 2021/3/17 06:47, Segher Boessenkool wrote: Hi! On Tue, Mar 16, 2021 at 07:57:17PM +0100, Jakub Jelinek wrote: On Thu, Mar 11, 2021 at 07:57:23AM +0800, Xionghu Luo via Gcc-patches wrote: diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c i

Ping^2: [PATCH v2] rs6000: Convert the vector element register to SImode [PR98914]

2021-03-10 Thread Xionghu Luo via Gcc-patches
Ping^2 for stage 4 P1 issue and attached the patch, Thanks! On 2021/3/3 09:12, Xionghu Luo via Gcc-patches wrote: On 2021/2/25 14:33, Xionghu Luo via Gcc-patches wrote: On 2021/2/25 00:57, Segher Boessenkool wrote: Hi! On Wed, Feb 24, 2021 at 09:06:24AM +0800, Xionghu Luo wrote

Ping: [PATCH v2] rs6000: Convert the vector element register to SImode [PR98914]

2021-03-02 Thread Xionghu Luo via Gcc-patches
On 2021/2/25 14:33, Xionghu Luo via Gcc-patches wrote: > > > On 2021/2/25 00:57, Segher Boessenkool wrote: >> Hi! >> >> On Wed, Feb 24, 2021 at 09:06:24AM +0800, Xionghu Luo wrote: >>> vec_insert defines the element argument type to be signed int by EL

Re: [PATCH v2] rs6000: Convert the vector element register to SImode [PR98914]

2021-02-24 Thread Xionghu Luo via Gcc-patches
On 2021/2/25 00:57, Segher Boessenkool wrote: > Hi! > > On Wed, Feb 24, 2021 at 09:06:24AM +0800, Xionghu Luo wrote: >> vec_insert defines the element argument type to be signed int by ELFv2 >> ABI, When expanding a vector with a variable rtx, convert the rtx type >> SImode. > > But that is

[PATCH v2] rs6000: Convert the vector element register to SImode [PR98914]

2021-02-23 Thread Xionghu Luo via Gcc-patches
vec_insert defines the element argument type to be signed int by ELFv2 ABI, When expanding a vector with a variable rtx, convert the rtx type SImode. gcc/ChangeLog: 2021-02-24 Xionghu Luo PR target/98914 * config/rs6000/rs6000.c (rs6000_expand_vector_set): Convert

Ping: [PATCH] rs6000: Convert the vector element register to SImode [PR98914]

2021-02-17 Thread Xionghu Luo via Gcc-patches
Gentle ping, thanks. On 2021/2/3 17:01, Xionghu Luo wrote: v[k] will also be expanded to IFN VEC_SET if k is long type when built with -Og. -O0 didn't exposed the issue due to v is TREE_ADDRESSABLE, -O1 and above also didn't capture it because of v[k] is not optimized to

[PATCH] rs6000: Convert the vector element register to SImode [PR98914]

2021-02-03 Thread Xionghu Luo via Gcc-patches
v[k] will also be expanded to IFN VEC_SET if k is long type when built with -Og. -O0 didn't exposed the issue due to v is TREE_ADDRESSABLE, -O1 and above also didn't capture it because of v[k] is not optimized to VIEW_CONVERT_EXPR(v)[k_1]. vec_insert defines the element argument type to be signed

[PATCH] testsuite: Update pr79251 ilp32 store regex.

2021-01-31 Thread Xionghu Luo via Gcc-patches
BE ilp32 Linux generates extra stack stwu instructions which shouldn't be counted in, \m … \M is needed around each instruction, not just the beginning and end of the entire pattern. Pre-approved, committing. gcc/testsuite/ChangeLog: 2021-02-01 Xionghu Luo *

[PATCH] testsuite: Run vec_insert case on P8 and P9 with option specified

2021-01-28 Thread Xionghu Luo via Gcc-patches
Move common functions to header file for cleanup. gcc/testsuite/ChangeLog: 2021-01-27 Xionghu Luo * gcc.target/powerpc/pr79251.p8.c: Move definition to ... * gcc.target/powerpc/pr79251.h: ...this. * gcc.target/powerpc/pr79251.p9.c: Likewise. *

Re: [PATCH] rs6000: Fix vec insert ilp32 ICE and test failures [PR98799]

2021-01-26 Thread Xionghu Luo via Gcc-patches
Hi, On 2021/1/27 03:00, David Edelsohn wrote: > On Tue, Jan 26, 2021 at 2:46 AM Xionghu Luo wrote: >> >> From: "luo...@cn.ibm.com" >> >> UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT >> is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for >> variable vector

[PATCH] rs6000: Fix vec insert ilp32 ICE and test failures [PR98799]

2021-01-25 Thread Xionghu Luo via Gcc-patches
From: "luo...@cn.ibm.com" UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for variable vector insert. Remove rs6000_expand_vector_set_var helper function, adjust the p8 and p9 definitions position and make them

Re: Ping ^ 4: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8

2021-01-14 Thread Xionghu Luo via Gcc-patches
Ping^4, thanks. On 2020/12/23 10:18, Xionghu Luo via Gcc-patches wrote: Ping^3 for stage 3. And this followed patch: [PATCH 4/4] rs6000: Update testcases' instruction count. Thanks:) On 2020/12/3 22:16, Xionghu Luo via Gcc-patches wrote: Ping. Thanks. On 2020/11/27 09:04, Xionghu Luo

Ping ^ 3: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8

2020-12-22 Thread Xionghu Luo via Gcc-patches
Ping^3 for stage 3. And this followed patch: [PATCH 4/4] rs6000: Update testcases' instruction count. Thanks:) On 2020/12/3 22:16, Xionghu Luo via Gcc-patches wrote: Ping. Thanks. On 2020/11/27 09:04, Xionghu Luo via Gcc-patches wrote: Hi Segher, Thanks for the approval of [PATCH 1/4

Re: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8

2020-12-09 Thread Xionghu Luo via Gcc-patches
Ping^2. Thanks. On 2020/12/3 22:16, Xionghu Luo via Gcc-patches wrote: Ping. Thanks. On 2020/11/27 09:04, Xionghu Luo via Gcc-patches wrote: Hi Segher, Thanks for the approval of [PATCH 1/4] and [PATCH 2/4], what's your opinion of this [PATCH 3/4] for P8, please?  xxinsertw only exists since

Re: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8

2020-12-03 Thread Xionghu Luo via Gcc-patches
Ping. Thanks. On 2020/11/27 09:04, Xionghu Luo via Gcc-patches wrote: Hi Segher, Thanks for the approval of [PATCH 1/4] and [PATCH 2/4], what's your opinion of this [PATCH 3/4] for P8, please? xxinsertw only exists since v3.0, so we had to implement by another way. Xionghu On 2020/10/10

Re: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8

2020-11-26 Thread Xionghu Luo via Gcc-patches
Hi Segher, Thanks for the approval of [PATCH 1/4] and [PATCH 2/4], what's your opinion of this [PATCH 3/4] for P8, please? xxinsertw only exists since v3.0, so we had to implement by another way. Xionghu On 2020/10/10 16:08, Xionghu Luo wrote: > gcc/ChangeLog: > > 2020-10-10 Xionghu Luo

Re: Ping^3: [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET

2020-11-23 Thread Xionghu Luo via Gcc-patches
Ping^3, thanks. https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555905.html On 2020/11/13 10:05, Xionghu Luo via Gcc-patches wrote: Ping^2, thanks. On 2020/11/5 09:34, Xionghu Luo via Gcc-patches wrote: Ping. On 2020/10/10 16:08, Xionghu Luo wrote: Originated from https

Re: [PATCH] rs6000: Don't split constant operator add before reload, move to temp register for future optimization

2020-11-13 Thread Xionghu Luo via Gcc-patches
Hi, On 2020/10/27 05:10, Segher Boessenkool wrote: > On Wed, Oct 21, 2020 at 03:25:29AM -0500, Xionghu Luo wrote: >> Don't split code from add3 for SDI to allow a later pass to split. > > This is very problematic. > >> This allows later logic to hoist out constant load in add instructions. > >

Ping^2: [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET

2020-11-12 Thread Xionghu Luo via Gcc-patches
Ping^2, thanks. On 2020/11/5 09:34, Xionghu Luo via Gcc-patches wrote: Ping. On 2020/10/10 16:08, Xionghu Luo wrote: Originated from https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html with patch split and some refinement per review comments. Patch of IFN VEC_SET

Ping: [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET

2020-11-04 Thread Xionghu Luo via Gcc-patches
Ping. On 2020/10/10 16:08, Xionghu Luo wrote: Originated from https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html with patch split and some refinement per review comments. Patch of IFN VEC_SET for ARRAY_REF(VIEW_CONVERT_EXPR) is committed, this patch set enables expanding IFN

Re: [PATCH] Add debug_bb_details and debug_bb_n_details

2020-10-25 Thread Xionghu Luo via Gcc-patches
On 2020/10/23 18:18, Richard Biener wrote: > On Fri, 23 Oct 2020, Xiong Hu Luo wrote: > >> Sometimes debug_bb_slim_bb_n_slim is not enough, how about adding >> this debug_bb_details_bb_n_details? Or any other similar call >> existed? > There's already debug_bb and debug_bb_n in cfg.c which

[PATCH] rs6000: Don't split constant operator add before reload, move to temp register for future optimization

2020-10-21 Thread Xionghu Luo via Gcc-patches
This is a revised version of the patch posted at https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542718.html, resend this since this is a quite high priority performance issue for Power. Don't split code from add3 for SDI to allow a later pass to split. This allows later logic to hoist out

Re: [PATCH] ipa-inline: Improve growth accumulation for recursive calls

2020-10-16 Thread Xionghu Luo via Gcc-patches
On 2020/9/12 01:36, Tamar Christina wrote: > Hi Martin, > >> >> can you please confirm that the difference between these two is all due to >> the last option -fno-inline-functions-called-once ? Is LTo necessary? >> I.e., can >> you run the benchmark also built with the branch compiler and

[PATCH 2/4] rs6000: Support variable insert and Expand vec_insert in expander [PR79251]

2020-10-10 Thread Xionghu Luo via Gcc-patches
vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value to be insert, arg2 is the place to insert arg1 to arg0. Current expander generates stxv+stwx+lxv if arg2 is variable instead of constant, which causes serious store hit load performance issue on Power. This patch tries 1)

<    1   2   3   >