These two patches are updated version from:
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579490.html
Changes:
1. Fix alignment error in md files.
2. Replace rtx_equal_p with match_dup.
3. Use register_operand instead of gpc_reg_operand to align with
vperm/xxperm.
4. Regression
Fold xxsel to vsel like xxperm/vperm to avoid duplicate code.
gcc/ChangeLog:
2021-09-17 Xionghu Luo
* config/rs6000/altivec.md: Add vsx register constraints.
* config/rs6000/vsx.md (vsx_xxsel): Delete.
(vsx_xxsel2): Likewise.
(vsx_xxsel3): Likewise.
Ping^3, thanks.
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html
On 2021/9/6 08:52, Xionghu Luo via Gcc-patches wrote:
Ping^2, thanks.
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html
On 2021/6/30 09:42, Xionghu Luo via Gcc-patches wrote:
Gentle ping, thanks
On 2021/9/13 16:17, Richard Biener wrote:
On Mon, 13 Sep 2021, Xionghu Luo wrote:
On 2021/9/10 21:54, Xionghu Luo via Gcc-patches wrote:
On 2021/9/9 18:55, Richard Biener wrote:
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 5d6845478e7..4b187c2cdaf 100644
On 2021/9/10 21:54, Xionghu Luo via Gcc-patches wrote:
On 2021/9/9 18:55, Richard Biener wrote:
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 5d6845478e7..4b187c2cdaf 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -3074,15 +3074,13
On 2021/9/9 18:55, Richard Biener wrote:
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 5d6845478e7..4b187c2cdaf 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -3074,15 +3074,13 @@ fill_always_executed_in_1 (class loop *loop, sbitmap
contains_call)
On 2021/9/2 18:37, Richard Biener wrote:
On Thu, 2 Sep 2021, Xionghu Luo wrote:
On 2021/9/2 16:50, Richard Biener wrote:
On Thu, 2 Sep 2021, Richard Biener wrote:
On Thu, 2 Sep 2021, Xionghu Luo wrote:
On 2021/9/1 17:58, Richard Biener wrote:
This fixes the CFG walk order of
On 2021/8/26 19:33, Richard Biener wrote:
On Tue, Aug 10, 2021 at 4:03 AM Xionghu Luo wrote:
Hi,
On 2021/8/6 20:15, Richard Biener wrote:
On Mon, Aug 2, 2021 at 7:05 AM Xiong Hu Luo wrote:
There was a patch trying to avoid move cold block out of loop:
On 2021/9/4 05:44, Segher Boessenkool wrote:
Hi!
On Fri, Sep 03, 2021 at 10:31:24AM +0800, Xionghu Luo wrote:
fmod/fmodf and remainder/remainderf could be expanded instead of library
call when fast-math build, which is much faster.
Thank you very much for this patch.
Some trivial
Ping^2, thanks.
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html
On 2021/6/30 09:47, Xionghu Luo via Gcc-patches wrote:
Gentle ping, thanks.
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html
On 2021/6/9 16:03, Xionghu Luo via Gcc-patches wrote:
Hi,
On 2021/6
Ping^2, thanks.
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html
On 2021/6/30 09:42, Xionghu Luo via Gcc-patches wrote:
Gentle ping, thanks.
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html
On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote:
Hi,
On 2021/5/13
Resend the patch that addressed Will's comments.
fmod/fmodf and remainder/remainderf could be expanded instead of library
call when fast-math build, which is much faster.
fmodf:
fdivs f0,f1,f2
frizf0,f0
fnmsubs f1,f2,f0,f1
remainderf:
fdivs f0,f1,f2
frin
On 2021/9/2 16:50, Richard Biener wrote:
> On Thu, 2 Sep 2021, Richard Biener wrote:
>
>> On Thu, 2 Sep 2021, Xionghu Luo wrote:
>>
>>>
>>>
>>> On 2021/9/1 17:58, Richard Biener wrote:
This fixes the CFG walk order of fill_always_executed_in to use
RPO oder rather than the dominator
On 2021/9/1 17:58, Richard Biener wrote:
This fixes the CFG walk order of fill_always_executed_in to use
RPO oder rather than the dominator based order computed by
get_loop_body_in_dom_order. That fixes correctness issues with
unordered dominator children.
The RPO order computed by
On 2021/8/30 17:19, Richard Biener wrote:
bitmap_set_bit (work_set, loop->header->index);
+ unsigned bb_index;
- for (i = 0; i < loop->num_nodes; i++)
- {
- edge_iterator ei;
- bb = bbs[i];
+ unsigned array_size = last_basic_block_for_fn (cfun) + 1;
On 2021/8/27 15:45, Richard Biener wrote:
On Thu, 26 Aug 2021, Xionghu Luo wrote:
On 2021/8/24 16:20, Richard Biener wrote:
On Tue, 24 Aug 2021, Xionghu Luo wrote:
On 2021/8/19 20:11, Richard Biener wrote:
- class loop *inn_loop = loop;
if (ALWAYS_EXECUTED_IN
On 2021/8/24 16:20, Richard Biener wrote:
> On Tue, 24 Aug 2021, Xionghu Luo wrote:
>
>>
>>
>> On 2021/8/19 20:11, Richard Biener wrote:
- class loop *inn_loop = loop;
if (ALWAYS_EXECUTED_IN (loop->header) == NULL)
{
@@ -3232,19 +3231,6 @@
On 2021/8/19 20:11, Richard Biener wrote:
>> - class loop *inn_loop = loop;
>>
>> if (ALWAYS_EXECUTED_IN (loop->header) == NULL)
>> {
>> @@ -3232,19 +3231,6 @@ fill_always_executed_in_1 (class loop *loop, sbitmap
>> contains_call)
>> to disprove this if possible). */
On 2021/8/10 12:25, Ulrich Drepper wrote:
> On Tue, Aug 10, 2021 at 4:03 AM Xionghu Luo via Gcc-patches
> wrote:
>> For this case, theorotically I think the master GCC will optimize it to:
>>
>>invariant;
>>for
On 2021/8/17 17:10, Xionghu Luo via Gcc-patches wrote:
>
>
> On 2021/8/17 15:12, Richard Biener wrote:
>> On Tue, 17 Aug 2021, Xionghu Luo wrote:
>>
>>> Hi,
>>>
>>> On 2021/8/16 19:46, Richard Biener wrote:
>>>> On Mon
On 2021/8/17 15:12, Richard Biener wrote:
> On Tue, 17 Aug 2021, Xionghu Luo wrote:
>
>> Hi,
>>
>> On 2021/8/16 19:46, Richard Biener wrote:
>>> On Mon, 16 Aug 2021, Xiong Hu Luo wrote:
>>>
It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
nested loops. inn_loop
On 2021/8/17 13:17, Xionghu Luo via Gcc-patches wrote:
Hi,
On 2021/8/16 19:46, Richard Biener wrote:
On Mon, 16 Aug 2021, Xiong Hu Luo wrote:
It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
nested loops. inn_loop is updated to inner loop, so it need be restored
when
Hi,
On 2021/8/16 19:46, Richard Biener wrote:
On Mon, 16 Aug 2021, Xiong Hu Luo wrote:
It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
nested loops. inn_loop is updated to inner loop, so it need be restored
when exiting from innermost loop. With this patch, the store
On 2021/8/11 17:16, Richard Biener wrote:
On Wed, 11 Aug 2021, Xionghu Luo wrote:
On 2021/8/10 22:47, Richard Biener wrote:
On Mon, 9 Aug 2021, Xionghu Luo wrote:
Thanks,
On 2021/8/6 19:46, Richard Biener wrote:
On Tue, 3 Aug 2021, Xionghu Luo wrote:
loop split condition is moved
On 2021/8/10 22:47, Richard Biener wrote:
> On Mon, 9 Aug 2021, Xionghu Luo wrote:
>
>> Thanks,
>>
>> On 2021/8/6 19:46, Richard Biener wrote:
>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
>>>
loop split condition is moved between loop1 and loop2, the split bb's
count and probability
Hi,
On 2021/8/6 20:15, Richard Biener wrote:
> On Mon, Aug 2, 2021 at 7:05 AM Xiong Hu Luo wrote:
>>
>> There was a patch trying to avoid move cold block out of loop:
>>
>> https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
>>
>> Richard suggested to "never hoist anything from a bb with
Thanks,
On 2021/8/6 19:46, Richard Biener wrote:
> On Tue, 3 Aug 2021, Xionghu Luo wrote:
>
>> loop split condition is moved between loop1 and loop2, the split bb's
>> count and probability should also be duplicated instead of (100% vs INV),
>> secondly, the original loop1 and loop2 count need
I' like to split this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
to two patches:
0001-Fix-loop-split-incorrect-count-and-probability.patch
0002-Don-t-move-cold-code-out-of-loop-by-checking-bb-coun.patch
since they are solving two different things, please help to
loop split condition is moved between loop1 and loop2, the split bb's
count and probability should also be duplicated instead of (100% vs INV),
secondly, the original loop1 and loop2 count need be propotional from the
original loop.
Regression tested pass, OK for master?
diff
gt;>>> On 2021/6/25 18:02, Richard Biener wrote:
>>>>> On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2021/6/25 16:54, Richard Biener wrote:
>>>>>&g
On 2021/7/10 02:40, will schmidt wrote:
> On Wed, 2021-06-30 at 09:44 +0800, Xionghu Luo via Gcc-patches wrote:
>> Gentle ping ^2, thanks.
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568143.html
>>
>>
>> On 2021/5/14 15:13, Xionghu Luo v
Gentle ping, thanks.
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html
On 2021/6/9 16:03, Xionghu Luo via Gcc-patches wrote:
Hi,
On 2021/6/9 07:25, Segher Boessenkool wrote:
On Mon, May 24, 2021 at 04:02:13AM -0500, Xionghu Luo wrote:
vmrghb only accepts permute index {0, 16
Gentle ping ^2, thanks.
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568143.html
On 2021/5/14 15:13, Xionghu Luo via Gcc-patches wrote:
Test SPEC2017 Ofast P8LE for this patch : 511.povray_r +1.14%,
526.blender_r +1.72%, no obvious changes to others.
On 2021/5/6 10:36, Xionghu Luo
Gentle ping, thanks.
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html
On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote:
Hi,
On 2021/5/13 18:49, Segher Boessenkool wrote:
Hi!
On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote:
The vsel instruction is a bit-wise
>
>>>> On 2021/6/25 16:54, Richard Biener wrote:
>>>>> On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
>>>>> wrote:
>>>>>>
>>>>>> From: Xiong Hu Luo
>>>>>>
>>>>>> ad
On 2021/6/25 18:02, Richard Biener wrote:
> On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo wrote:
>>
>>
>>
>> On 2021/6/25 16:54, Richard Biener wrote:
>>> On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
>>> wrote:
>>>>
On 2021/6/25 18:02, Richard Biener wrote:
On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo wrote:
On 2021/6/25 16:54, Richard Biener wrote:
On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
wrote:
From: Xiong Hu Luo
adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help
Luo via Gcc-patches
wrote:
From: Xiong Hu Luo
adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
on Power. For example, it generates mismatched address offset after
adjust iv update statement position:
[local count: 70988443]:
_84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1
On 2021/6/25 16:54, Richard Biener wrote:
On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
wrote:
From: Xiong Hu Luo
adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
on Power. For example, it generates mismatched address offset after
adjust iv update
From: Xiong Hu Luo
adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
on Power. For example, it generates mismatched address offset after
adjust iv update statement position:
[local count: 70988443]:
_84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
ivtmp.30_415 =
On 2021/6/12 04:16, Segher Boessenkool wrote:
On Thu, Jun 10, 2021 at 03:11:08PM +0800, Xionghu Luo wrote:
On 2021/6/10 00:24, Segher Boessenkool wrote:
"!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
&& !altivec_indexed_or_indirect_operand (operands[0],
On 2021/6/10 00:24, Segher Boessenkool wrote:
> On Wed, Jun 09, 2021 at 11:20:20AM +0800, Xionghu Luo wrote:
>> On 2021/6/9 04:11, Segher Boessenkool wrote:
>>> On Fri, Jun 04, 2021 at 09:40:58AM +0800, Xionghu Luo wrote:
>> rejecting combination of insns 6 and 7
>> original costs 4 + 4
Hi,
I noticed that the "git gcc-commit-mklog" command doesn't extract PR
number from title to ChangeLog automatically, then the committed patch
doesn't update the related bugzilla PR website after check in the patch?
Martin, what's your opinion about this since you are much familar about
this?
Hi,
On 2021/6/9 07:25, Segher Boessenkool wrote:
On Mon, May 24, 2021 at 04:02:13AM -0500, Xionghu Luo wrote:
vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for vmrghlb.
(vmrglb)
+ if (BYTES_BIG_ENDIAN)
+
On 2021/6/9 04:11, Segher Boessenkool wrote:
> Hi!
>
> On Fri, Jun 04, 2021 at 09:40:58AM +0800, Xionghu Luo wrote:
Combine still fail to merge the two instructions:
Trying 6 -> 7:
6: r120:KF#0=r125:KF#0<-<0x40
REG_DEAD r125:KF
7:
On 2021/6/9 05:07, Segher Boessenkool wrote:
> Hi!
>
> On Tue, Jun 08, 2021 at 09:11:33AM +0800, Xionghu Luo wrote:
>> On P8LE, extra rot64+rot64 load or store instructions are generated
>> in float128 to vector __int128 conversion.
>>
>> This patch teaches pass swaps to also handle such
Update the patch according to the comments. Thanks.
On P8LE, extra rot64+rot64 load or store instructions are generated
in float128 to vector __int128 conversion.
This patch teaches pass swaps to also handle such pattens to remove
extra swap instructions.
(insn 7 6 8 2 (set (subreg:V1TI
Ping, thanks.
On 2021/5/24 17:02, Xionghu Luo wrote:
From: Xiong Hu Luo
vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for vmrghlb.
Remove UNSPEC_VMRGH_DIRECT/UNSPEC_VMRGL_DIRECT pattern as vec_select
+
Ping, thanks.
On 2021/5/14 15:13, Xionghu Luo via Gcc-patches wrote:
Test SPEC2017 Ofast P8LE for this patch : 511.povray_r +1.14%,
526.blender_r +1.72%, no obvious changes to others.
On 2021/5/6 10:36, Xionghu Luo via Gcc-patches wrote:
Gentle ping, thanks.
On 2021/4/16 15:10, Xiong Hu
Gentle ping, thanks.
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html
On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote:
Hi,
On 2021/5/13 18:49, Segher Boessenkool wrote:
Hi!
On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote:
The vsel instruction is a bit-wise
On 2021/6/4 04:16, Segher Boessenkool wrote:
> Hi!
>
> On Thu, Jun 03, 2021 at 08:46:46AM +0800, Xionghu Luo wrote:
>> On 2021/6/3 06:20, Segher Boessenkool wrote:
>>> On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote:
On P8LE, extra rot64+rot64 load or store instructions are
Hi,
On 2021/6/3 21:09, Bill Schmidt wrote:
> On 6/2/21 7:46 PM, Xionghu Luo wrote:
>> Hi,
>>
>> On 2021/6/3 06:20, Segher Boessenkool wrote:
>>> On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote:
On P8LE, extra rot64+rot64 load or store instructions are generated
in float128
On 2021/6/4 04:31, Segher Boessenkool wrote:
> On Thu, Jun 03, 2021 at 02:49:15PM +0800, Xionghu Luo wrote:
>> If remove the rotate in simplify-rtx like below:
>>
>> +++ b/gcc/simplify-rtx.c
>> @@ -3830,10 +3830,16 @@ simplify_context::simplify_binary_operation_1
>> (rtx_code code,
>>
On 2021/6/3 08:46, Xionghu Luo via Gcc-patches wrote:
> Hi,
>
> On 2021/6/3 06:20, Segher Boessenkool wrote:
>> On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote:
>>> On P8LE, extra rot64+rot64 load or store instructions are generated
>>> in float
Hi,
On 2021/6/3 06:20, Segher Boessenkool wrote:
> On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote:
>> On P8LE, extra rot64+rot64 load or store instructions are generated
>> in float128 to vector __int128 conversion.
>>
>> This patch teaches pass swaps to also handle such pattens to
On P8LE, extra rot64+rot64 load or store instructions are generated
in float128 to vector __int128 conversion.
This patch teaches pass swaps to also handle such pattens to remove
extra swap instructions.
(insn 7 6 8 2 (set (subreg:V1TI (reg:KF 123) 0)
(rotate:V1TI (mem/u/c:V1TI (reg/f:DI
From: Xiong Hu Luo
vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for vmrghlb.
Remove UNSPEC_VMRGH_DIRECT/UNSPEC_VMRGL_DIRECT pattern as vec_select
+ vec_concat as normal RTL.
Tested pass on P8LE, P9LE and
Hi,
On 2021/5/18 15:02, Richard Biener wrote:
> Can you, for the new gcc.dg/tree-ssa/ssa-sink-18.c testcase, add
> a comment explaining what operations we expect to sink? The testcase
> is likely somewhat fragile in the exact number of sinkings
> (can you check on some other target and maybe
Hi,
On 2021/5/17 16:11, Richard Biener wrote:
On Fri, 14 May 2021, Xionghu Luo wrote:
Hi Richi,
On 2021/4/21 19:54, Richard Biener wrote:
On Tue, 20 Apr 2021, Xionghu Luo wrote:
On 2021/4/15 19:34, Richard Biener wrote:
On Thu, 15 Apr 2021, Xionghu Luo wrote:
Thanks,
On 2021/4/14
Test SPEC2017 Ofast P8LE for this patch : 511.povray_r +1.14%,
526.blender_r +1.72%, no obvious changes to others.
On 2021/5/6 10:36, Xionghu Luo via Gcc-patches wrote:
Gentle ping, thanks.
On 2021/4/16 15:10, Xiong Hu Luo wrote:
fmod/fmodf and remainder/remainderf could be expanded instead
Hi Richi,
On 2021/4/21 19:54, Richard Biener wrote:
> On Tue, 20 Apr 2021, Xionghu Luo wrote:
>
>>
>>
>> On 2021/4/15 19:34, Richard Biener wrote:
>>> On Thu, 15 Apr 2021, Xionghu Luo wrote:
>>>
Thanks,
On 2021/4/14 14:41, Richard Biener wrote:
>> "#538,#235,#234,#233" will
Hi,
On 2021/5/13 18:49, Segher Boessenkool wrote:
Hi!
On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote:
The vsel instruction is a bit-wise select instruction. Using an
IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
being generated in the combine pass. Per
On 2021/4/30 14:32, Xionghu Luo wrote:
The vsel instruction is a bit-wise select instruction. Using an
IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
being generated in the combine pass. Per element selection is a
subset of per bit-wise selection,with the patch the
Gentle ping, thanks.
On 2021/4/16 15:10, Xiong Hu Luo wrote:
fmod/fmodf and remainder/remainderf could be expanded instead of library
call when fast-math build, which is much faster.
fmodf:
fdivs f0,f1,f2
frizf0,f0
fnmsubs f1,f2,f0,f1
remainderf:
fdivs
The vsel instruction is a bit-wise select instruction. Using an
IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
being generated in the combine pass. Per element selection is a
subset of per bit-wise selection,with the patch the pattern is
written using bit operations. But
Thanks,
On 2021/4/14 14:41, Richard Biener wrote:
>> "#538,#235,#234,#233" will all be sunk from bb 35 to bb 37 by rtl-sink,
>> but it moves #538 first, then #235, there is strong dependency here. It
>> seemsdoesn't like the LCM framework that could solve all and do the
>> delete-insert in one
Hi,
On 2021/3/26 15:35, Xionghu Luo via Gcc-patches wrote:
>> Also we already have a sinking pass on RTL which even computes
>> a proper PRE on the reverse graph - -fgcse-sm aka store-motion.c.
>> I'm not sure whether this deals with non-stores but the
>> LCM machine
On 2021/4/7 14:57, Richard Biener wrote:
On Wed, Apr 7, 2021 at 7:42 AM Xionghu Luo wrote:
print_rtl will dump the rtx_insn from current until LAST. But it is only
useful to see the particular insn that called by print_rtx_insn_vec,
Let's call print_rtl_single to display that insn in the
print_rtl will dump the rtx_insn from current until LAST. But it is only
useful to see the particular insn that called by print_rtx_insn_vec,
Let's call print_rtl_single to display that insn in the gcse and store-motion
pass dump.
2021-04-07 Xionghu Luo
gcc/ChangeLog:
* fold-const.c
From: "luo...@cn.ibm.com"
32bit and P7 VSX could also benefit a lot from the variable vec_insert
implementation with shift/insert/shift back method.
Tested pass on P7BE/P8BE/P8LE{-m32,m64} and P9LE{m64}.
gcc/ChangeLog:
PR target/99718
* config/rs6000/altivec.md
Hi, sorry for late response,
On 2021/3/23 16:50, Richard Biener wrote:
>>> It definitely should be before uncprop (but context stops there). And yes,
>>> re-running passes isn't the very, very best thing to do without explaining
>>> it cannot be done in other ways. Not for late stage 3 anyway.
On 2021/3/24 23:56, David Edelsohn wrote:
On Wed, Mar 24, 2021 at 1:44 AM Xionghu Luo wrote:
l2 cache size for Power8 is 512kB, correct the copy paste error from Power7.
Tested no performance change for SPEC2017.
gcc/ChangeLog:
2021-03-24 Xionghu Luo
* config/rs6000/rs6000.c
UNSPEC_SI_FROM_SF is not supported for -m32 caused ICE on P8BE-32bit,
since P8 Vector and above doesn't have fast mechanism to move SFmode to
SImode for m32, don't generate IFN VEC_SET for it.
Tested pass on P8BE/LE {m32,m64}.
gcc/ChangeLog:
2021-03-24 Xionghu Luo
*
l2 cache size for Power8 is 512kB, correct the copy paste error from Power7.
Tested no performance change for SPEC2017.
gcc/ChangeLog:
2021-03-24 Xionghu Luo
* config/rs6000/rs6000.c (struct processor_costs): Change to
512.
---
gcc/config/rs6000/rs6000.c | 2 +-
1 file
On 2020/12/23 00:53, Richard Biener wrote:
On December 21, 2020 10:03:43 AM GMT+01:00, Xiong Hu Luo
wrote:
Here comes another case that requires run a pass once more, as this is
not the common suggested direction to solve problems, not quite sure
whether it is still a reasonble fix here.
On 2021/3/17 15:53, Jakub Jelinek wrote:
On Wed, Mar 17, 2021 at 11:35:18AM +0800, Xionghu Luo wrote:
+ machine_mode idx_mode = GET_MODE (idx);
+ if (idx_mode != DImode)
+idx = convert_modes (DImode, idx_mode, idx, 1);
Segher mentioned you can remove the if (idx_mode != DImode) too,
Thanks Jakub & Segher,
On 2021/3/17 06:47, Segher Boessenkool wrote:
Hi!
On Tue, Mar 16, 2021 at 07:57:17PM +0100, Jakub Jelinek wrote:
On Thu, Mar 11, 2021 at 07:57:23AM +0800, Xionghu Luo via Gcc-patches wrote:
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
i
Ping^2 for stage 4 P1 issue and attached the patch, Thanks!
On 2021/3/3 09:12, Xionghu Luo via Gcc-patches wrote:
On 2021/2/25 14:33, Xionghu Luo via Gcc-patches wrote:
On 2021/2/25 00:57, Segher Boessenkool wrote:
Hi!
On Wed, Feb 24, 2021 at 09:06:24AM +0800, Xionghu Luo wrote
On 2021/2/25 14:33, Xionghu Luo via Gcc-patches wrote:
>
>
> On 2021/2/25 00:57, Segher Boessenkool wrote:
>> Hi!
>>
>> On Wed, Feb 24, 2021 at 09:06:24AM +0800, Xionghu Luo wrote:
>>> vec_insert defines the element argument type to be signed int by EL
On 2021/2/25 00:57, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Feb 24, 2021 at 09:06:24AM +0800, Xionghu Luo wrote:
>> vec_insert defines the element argument type to be signed int by ELFv2
>> ABI, When expanding a vector with a variable rtx, convert the rtx type
>> SImode.
>
> But that is
vec_insert defines the element argument type to be signed int by ELFv2
ABI, When expanding a vector with a variable rtx, convert the rtx type
SImode.
gcc/ChangeLog:
2021-02-24 Xionghu Luo
PR target/98914
* config/rs6000/rs6000.c (rs6000_expand_vector_set): Convert
Gentle ping, thanks.
On 2021/2/3 17:01, Xionghu Luo wrote:
v[k] will also be expanded to IFN VEC_SET if k is long type when built
with -Og. -O0 didn't exposed the issue due to v is TREE_ADDRESSABLE,
-O1 and above also didn't capture it because of v[k] is not optimized to
v[k] will also be expanded to IFN VEC_SET if k is long type when built
with -Og. -O0 didn't exposed the issue due to v is TREE_ADDRESSABLE,
-O1 and above also didn't capture it because of v[k] is not optimized to
VIEW_CONVERT_EXPR(v)[k_1].
vec_insert defines the element argument type to be signed
BE ilp32 Linux generates extra stack stwu instructions which shouldn't
be counted in, \m … \M is needed around each instruction, not just the
beginning and end of the entire pattern. Pre-approved, committing.
gcc/testsuite/ChangeLog:
2021-02-01 Xionghu Luo
*
Move common functions to header file for cleanup.
gcc/testsuite/ChangeLog:
2021-01-27 Xionghu Luo
* gcc.target/powerpc/pr79251.p8.c: Move definition to ...
* gcc.target/powerpc/pr79251.h: ...this.
* gcc.target/powerpc/pr79251.p9.c: Likewise.
*
Hi,
On 2021/1/27 03:00, David Edelsohn wrote:
> On Tue, Jan 26, 2021 at 2:46 AM Xionghu Luo wrote:
>>
>> From: "luo...@cn.ibm.com"
>>
>> UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT
>> is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for
>> variable vector
From: "luo...@cn.ibm.com"
UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT
is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for
variable vector insert. Remove rs6000_expand_vector_set_var helper
function, adjust the p8 and p9 definitions position and make them
Ping^4, thanks.
On 2020/12/23 10:18, Xionghu Luo via Gcc-patches wrote:
Ping^3 for stage 3.
And this followed patch:
[PATCH 4/4] rs6000: Update testcases' instruction count.
Thanks:)
On 2020/12/3 22:16, Xionghu Luo via Gcc-patches wrote:
Ping. Thanks.
On 2020/11/27 09:04, Xionghu Luo
Ping^3 for stage 3.
And this followed patch:
[PATCH 4/4] rs6000: Update testcases' instruction count.
Thanks:)
On 2020/12/3 22:16, Xionghu Luo via Gcc-patches wrote:
Ping. Thanks.
On 2020/11/27 09:04, Xionghu Luo via Gcc-patches wrote:
Hi Segher,
Thanks for the approval of [PATCH 1/4
Ping^2. Thanks.
On 2020/12/3 22:16, Xionghu Luo via Gcc-patches wrote:
Ping. Thanks.
On 2020/11/27 09:04, Xionghu Luo via Gcc-patches wrote:
Hi Segher,
Thanks for the approval of [PATCH 1/4] and [PATCH 2/4], what's your
opinion of this [PATCH 3/4] for P8, please? xxinsertw only exists since
Ping. Thanks.
On 2020/11/27 09:04, Xionghu Luo via Gcc-patches wrote:
Hi Segher,
Thanks for the approval of [PATCH 1/4] and [PATCH 2/4], what's your
opinion of this [PATCH 3/4] for P8, please? xxinsertw only exists since
v3.0, so we had to implement by another way.
Xionghu
On 2020/10/10
Hi Segher,
Thanks for the approval of [PATCH 1/4] and [PATCH 2/4], what's your
opinion of this [PATCH 3/4] for P8, please? xxinsertw only exists since
v3.0, so we had to implement by another way.
Xionghu
On 2020/10/10 16:08, Xionghu Luo wrote:
> gcc/ChangeLog:
>
> 2020-10-10 Xionghu Luo
Ping^3, thanks.
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555905.html
On 2020/11/13 10:05, Xionghu Luo via Gcc-patches wrote:
Ping^2, thanks.
On 2020/11/5 09:34, Xionghu Luo via Gcc-patches wrote:
Ping.
On 2020/10/10 16:08, Xionghu Luo wrote:
Originated from
https
Hi,
On 2020/10/27 05:10, Segher Boessenkool wrote:
> On Wed, Oct 21, 2020 at 03:25:29AM -0500, Xionghu Luo wrote:
>> Don't split code from add3 for SDI to allow a later pass to split.
>
> This is very problematic.
>
>> This allows later logic to hoist out constant load in add instructions.
>
>
Ping^2, thanks.
On 2020/11/5 09:34, Xionghu Luo via Gcc-patches wrote:
Ping.
On 2020/10/10 16:08, Xionghu Luo wrote:
Originated from
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html
with patch split and some refinement per review comments.
Patch of IFN VEC_SET
Ping.
On 2020/10/10 16:08, Xionghu Luo wrote:
Originated from
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html
with patch split and some refinement per review comments.
Patch of IFN VEC_SET for ARRAY_REF(VIEW_CONVERT_EXPR) is committed,
this patch set enables expanding IFN
On 2020/10/23 18:18, Richard Biener wrote:
> On Fri, 23 Oct 2020, Xiong Hu Luo wrote:
>
>> Sometimes debug_bb_slim_bb_n_slim is not enough, how about adding
>> this debug_bb_details_bb_n_details? Or any other similar call
>> existed?
> There's already debug_bb and debug_bb_n in cfg.c which
This is a revised version of the patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542718.html, resend
this since this is a quite high priority performance issue for Power.
Don't split code from add3 for SDI to allow a later pass to split.
This allows later logic to hoist out
On 2020/9/12 01:36, Tamar Christina wrote:
> Hi Martin,
>
>>
>> can you please confirm that the difference between these two is all due to
>> the last option -fno-inline-functions-called-once ? Is LTo necessary?
>> I.e., can
>> you run the benchmark also built with the branch compiler and
vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
to be insert, arg2 is the place to insert arg1 to arg0. Current expander
generates stxv+stwx+lxv if arg2 is variable instead of constant, which
causes serious store hit load performance issue on Power. This patch tries
1)
101 - 200 of 215 matches
Mail list logo