uses of that
target hook)?
Thanks!
Aaron
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
SPEC2017 testing on p10 shows that this optimization does not have a
positive impact on performance. So we are no longer going to enable it
by default. The test cases for it needed to be updated so they always
enable it to test it.
OK for trunk and backport to 11 if bootstrap/regtest passes?
From: Aaron Sawdey
Update the count of matches for the fusion combine patterns after
the recent changes to them. At Segher's request, used \m and \M
in the match patterns. Also I have grouped together all alternatives of
each fusion insn, which should hopefully make this test a little less
The add-logical and add-add fusion patterns all have constraint
alternatives "=0,1,,r" for the output (3). The inputs 0 and 1
are used in the first fusion instruction and then either may be
reused as a temp for the output of the first insn which is
input to the second. However, if input 2 is the
These tests have become unstable and SMS either succeeds or doesn't
depending on things like changes in instruction latency. Removing
the scan-rtl-dump-times checks for powerpc*-*-*.
If bootstrap/regtest is passes, ok for trunk and backport to 11?
Thanks!
Aaron
gcc/testsuite
*
This certainly causes a bootstrap miscompare, and might also be
responsible for PR/100820. The operands to subf were reversed
in the logical-add/sub fusion patterns, and I screwed up my
bootstrap test which is how it ended up getting committed.
If bootstrap and regtest passes, ok for trunk (and
For some reason this never showed up on gcc-patches, trying again.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> Begin forwarded message:
>
> From: Aaron Sawdey
> Subject: [PATCH,rs6000] Fix p10 fusion test cases for -m32
> Date: May 25, 2021 at
One last addendum to this. I discovered that that needs a "sort"
in front of "keys %logicals_addsub" because otherwise you may get
the operators in different orders sometimes which leads to fusion.md
having the patterns in different orders which isn't helpful for
sane debugging. Segher and I
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Apr 26, 2021, at 2:00 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> This adds some test cases to make sure that the combine patterns for p10
> fusion are working.
>
>
Ping.
In answer to Will’s question — some of these are not immediately used but will
be in other pending patches.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Apr 26, 2021, at 1:04 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
>
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Apr 26, 2021, at 3:21 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> Two more sets of combine patterns for p10 fusion. These require
> the "Add insn types for fusion pairs&
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Dec 9, 2020, at 11:04 AM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> Ping. I've folded in the changes to comments suggested by Will Schmidt.
>
> This patch implements a RTL p
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Jan 3, 2021, at 2:44 PM, Aaron Sawdey wrote:
>
> Ping.
>
> Aaron Sawdey, Ph.D. saw...@linux.ibm.com
> IBM Linux on POWER Toolchain
>
>
>> On Dec 11, 2020, at 1:53 PM, acsaw...@li
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Jan 3, 2021, at 2:43 PM, Aaron Sawdey wrote:
>
> Ping.
>
> Aaron Sawdey, Ph.D. saw...@linux.ibm.com
> IBM Linux on POWER Toolchain
>
>
>> On Dec 10, 2020, at 8:41 PM, acsaw...@li
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Jan 3, 2021, at 2:42 PM, Aaron Sawdey wrote:
>
> Ping.
>
> I assume we’re going to want a separate patch for the new instruction type.
>
> Aaron Sawdey, Ph.D. saw...@linux.ibm.com
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Dec 11, 2020, at 1:53 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> This adds some test cases to make sure that the combine patterns for p10
> fusion are working.
>
&g
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Dec 10, 2020, at 8:41 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> This patch adds a new function to genfusion.pl to generate patterns for
> logical-logical fusion. They are
Ping.
I assume we’re going to want a separate patch for the new instruction type.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Dec 4, 2020, at 1:19 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> This patch adds the first ba
> On Nov 20, 2020, at 4:57 AM, Aaron Sawdey via Gcc-patches
> wrote:
>
>
>> On Nov 20, 2020, at 3:55 AM, Richard Sandiford
>> wrote:
>>
>> acsawdey--- via Gcc-patches writes:
>>> @@ -16767,7 +16768,7 @@ loc_descriptor (rtx rtl, machine_mode mod
k it deserves a comment at least.
>
> The rest looks good to me FWIW.
>
> Richard
I should look at this again — since I originally put that in, I switched the
target
portion of what I’ve been doing to use an UNSPEC to remove all use of an
opaque mode const_int from the rtf. This may not be needed any more.
Thanks,
Aaron
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
For some reason this patch never showed up on gcc-patches.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> Begin forwarded message:
>
> From: acsaw...@linux.ibm.com
> Subject: [PATCH,rs6000] Make MMA builtins use opaque modes [v2]
> Date: November 19, 2
Ping.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Oct 26, 2020, at 4:44 PM, acsaw...@linux.ibm.com wrote:
>
> From: Aaron Sawdey
>
> This patch adds the first couple patterns to support p10 fusion. These
> will allow combine to create a sing
it. There is no
solution like that for the MMA builtins that use POImode and are (in theory)
exposed to the same problem.
So I ask again, how can we tell extract_low_bits() that POImode is off limits
to its prying fingers?
Thanks,
Aaron
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux
Not exactly a patch ping, but I was hoping we could re-engage the discussion on
this and figure out how we can make POImode work for powerpc.
How does x86 solve this? There was some suggestion that it has some similar
situations?
Thanks,
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux
This is a (hopefully temporary) fix to PR96791. This will make
the default be -mno-block-ops-vector-pair even on power10, so we will
not hit the issue of DSE trying to truncate a POImode register. I am
still concerned it will be possible to hit this because the MMA builtins
will also generate
So, would it be legitimate for extract_low_bits to query if the truncate
pattern it will likely use is actually available?
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Sep 10, 2020, at 10:10 AM, Segher Boessenkool
> wrote:
>
> Hi!
>
> On
If it feels like a hack, that would because it is a hack.
What I’d really like to discuss is how to accomplish the real goal: keep
anything from trying to do other operations (zero/sign extend for one) to
POImode.
Is there an existing mechanism for this?
Thanks,
Aaron
Aaron Sawdey, Ph.D
Now that the documentation for partial modes says they have a known
number of bits of precision, would it make sense for extract_low_bits to
check this before attempting to extract the bits?
This would solve the problem we have been having with POImode and
extract_low_bits -- DSE tries to use it
Meant to CC a few people, oops.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Sep 2, 2020, at 9:22 AM, Aaron Sawdey via Gcc wrote:
>
>
> PR96791 is happening because DSE is trying to truncate a
> POImode reg down to DImode. The POI
: Is there an existing way to do
this? Or, do we need another target hook of some kind to
check this sort of thing?
Thanks,
Aaron
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
I've modified slightly per Will & Segher's comments, re-regstrapped and
posting what I've actually committed.
Aaron
This patch adds a few new instructions to inline expansion of
memcpy/memmove. Generation of all these are controlled by
the option -mblock-ops-unaligned-vsx which is set on by
This patch adds a few new instructions to inline expansion of
memcpy/memmove. Generation of all these is controlled by
the option -mblock-ops-unaligned-vsx which is set on by default if the
target has TARGET_EFFICIENT_UNALIGNED_VSX.
* unaligned vsx load/store (V2DImode)
* unaligned vsx pair
Because the check for power10_hw is not called
check_effective_target_power10_hw, it needs to be looked
for by is-effective-target-keyword. Also reorder things
in is-effective-target to put power10_hw with the other
ppc stuff.
These little fixes for power10 dejagnu support were pre-approved
for
Add a test for dejagnu to determine if execution of MMA instructions is
supported in the test environment. Add an execution test to make sure
that __builtin_cpu_supports("mma") is true if we can execute MMA
instructions.
OK for trunk and backport to 10?
Thanks!
Aaron
gcc/testsuite/
nvironment correctly identifies itself,
and that it can execute MMA code and get the right answer.
A future patch will add an effective-target test for powerpc_mma_hw,
which these mma tests will also need to check for.
OK for trunk and backport to 10?
2020-06-30 Rajalakshmi Srinivasaraghavan
This fixed the ICE I was seeing, thanks.
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Jul 10, 2020, at 10:40 AM, Richard Sandiford
> wrote:
>
> In some cases, expand_expr_real_2 prefers to use the mode of the
> caller-suggested target inst
2020-06-30 Rajalakshmi Srinivasaraghavan
Aaron Sawdey
gcc/testsuite/
* gcc.target/powerpc/p10-identify.c: New file.
* gcc.target/powerpc/mma-single-test.c: New file.
* gcc.target/powerpc/mma-double-test.c: New file.
---
.../gcc.target/powerpc/mma
The code snippet for this test was returning 1 if power10
instructions executed correctly. It should return 0 if the
test passes.
Approved offline by Segher with slight change. Will
push after posting.
* lib/target-supports.exp (check_power10_hw_available):
Return 0 for passing
The code snippet for this test was returning 1 if power10
instructions executed correctly. It should return 0 if the
test passes.
OK for trunk and backport to 10?
Thanks,
Aaron
* lib/target-supports.exp (check_power10_hw_available):
Return 0 for passing test.
---
Updated slightly, removed -Wno-psabi as requested and also fixed the
fact that it wasn't actually checking __builtin_cpu_is or
__builtin_cpu_supports. OK for trunk and backport to 10?
Thanks,
Aaron
2020-06-30 Rajalakshmi Srinivasaraghavan
Aaron Sawdey
gcc/testsuite
. Actually the power10_hw test I think requires
current glibc to pick up the change that lets
__builtin_cpu_is("power10") work. OK for trunk?
Thanks,
Aaron
2020-06-30 Rajalakshmi Srinivasaraghavan
Aaron Sawdey
gcc/testsuite/
* gcc.target/powerpc/mma-single-test.c
Update config.gcc so that we can use --with-cpu=power10.
I've tested that this does do the expected thing
with --with-cpu=power10 and also that it still builds and
bootstraps correctly using --with-cpu=power9 on power9. If there isn't
any other testing I need to do for this, ok for trunk?
Now that this has been in trunk for a bit with no issues, ok to back port to 10?
Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
> On Jun 3, 2020, at 4:10 PM, Aaron Sawdey wrote:
>
> This passed regstrap and was approved offline by Segher, posting
> th
This passed regstrap and was approved offline by Segher, posting
the final form (minus my debug code, oops).
The same problem also arises for plfs where prefixed_load_p()
doesn't recognize it so we get just lfs in the asm output
with an @pcrel address.
PR target/95347
*
The same problem also arises for plfs where prefixed_load_p()
doesn't recognize it so we get just lfs in the asm output
with a @pcrel address.
OK for trunk if regstrap on ppc64le passes?
Thanks,
Aaron
PR target/95347
* config/rs6000/rs6000.c (is_stfs_insn): Rename to
. Then if we use NON_PREFIXED_DEFAULT, address_to_insn_form()
can see that it has the PCREL symbol ref.
OK for trunk if regstrap on ppc64le passes?
Thanks,
Aaron
2020-05-29 Aaron Sawdey
PR target/95347
* config/rs6000/rs6000.c (prefixed_store_p): Add special case
for stfs
, also with the doubleword swap, which was wrong.
While adding comments I realized we have exactly the same problem with
pstq/stq so I have added fixes for that as well. Assuming that regstrap
passes, OK for trunk?
Thanks,
Aaron
2020-04-20 Aaron Sawdey
PR target/94622
with the doubleword swap, which was wrong.
So, of course you can't use set_attr with an if_then_else. The below
code actually builds and passes regstrap on ppc64le power9.
OK for trunk?
Thanks,
Aaron
2020-04-20 Aaron Sawdey
PR target/94622
* config/rs6000/sync.md
with the doubleword swap, which was wrong.
OK for trunk if regstrap passes on ppc64le power9?
Thanks,
Aaron
2020-04-20 Aaron Sawdey
PR target/94622
* config/rs6000/sync.md (load_quadpti): Make this have attr prefixed
if TARGET_PREFIXED.
(atomic_load): Do
This is a fix for PR92379. Passes regstrap on ppc64le. Pre-approved by
Segher, committing after posting.
2020-03-13 Aaron Sawdey
PR target/92379
* config/rs6000/rs6000.c (num_insns_constant_multi) Don't shift a
64-bit value by 64 bits (UB).
diff --git a/gcc/config/rs6000/rs6000
On 10/2/19 5:44 PM, Aaron Sawdey wrote:
> On 10/2/19 5:35 PM, Jakub Jelinek wrote:
>> On Wed, Oct 02, 2019 at 09:21:23AM -0500, Aaron Sawdey wrote:
>>>>> 2019-09-27 Aaron Sawdey
>>>>>
>>>>> * builtins.c (expand_builtin_memory_copy_args):
On 10/2/19 5:35 PM, Jakub Jelinek wrote:
> On Wed, Oct 02, 2019 at 09:21:23AM -0500, Aaron Sawdey wrote:
>>>> 2019-09-27 Aaron Sawdey
>>>>
>>>>* builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm.
>>>>
On 10/1/19 4:45 PM, Jeff Law wrote:
> On 9/27/19 12:23 PM, Aaron Sawdey wrote:
>> This is the third piece of my effort to improve inline expansion of memmove.
>> The
>> first two parts I posted back in June fixed the names of the optab entries
>> involved so that opta
(power9), if tests are ok, is this ok for trunk
after the movmem optab patch posted last week is approved?
Thanks!
Aaron
2019-09-30 Aaron Sawdey
* config/rs6000/rs6000-protos.h (expand_block_move): Change prototype.
* config/rs6000/rs6000-string.c (expand_block_move): Add
erlap case.
Bootstrap/regtest passed on ppc64le, in progress on x86_64. If everything
passes,
is this ok for trunk?
2019-09-27 Aaron Sawdey
* builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm.
(expand_builtin_memcpy): Use might_ov
bootstrap/regtest on ppc64le and x86_64. Ok for trunk?
2019-07-02 Aaron Sawdey
* optabs.def (movmem_optab): Add movmem back for memmove().
* doc/md.texi: Add description of movmem pattern for overlapping move.
Index: gcc/doc/md.texi
On 6/25/19 4:43 PM, Jeff Law wrote:
> On 6/25/19 2:22 PM, acsaw...@linux.ibm.com wrote:
>> From: Aaron Sawdey
>>
>> * builtins.c (get_memory_rtx): Fix comment.
>> * optabs.def (movmem_optab): Change to cpymem_optab.
>> * expr.c (emit_block_move_via
On 6/25/19 4:43 PM, Jeff Law wrote:
> On 6/25/19 2:22 PM, acsaw...@linux.ibm.com wrote:
>> From: Aaron Sawdey
>>
>> * builtins.c (get_memory_rtx): Fix comment.
>> * optabs.def (movmem_optab): Change to cpymem_optab.
>> * expr.c (emit_block_move_via
On 5/15/19 1:01 PM, Jakub Jelinek wrote:
> On Wed, May 15, 2019 at 12:59:01PM -0500, Aaron Sawdey wrote:
>> 1) rename optab movmem and the underlying patterns to cpymem.
>> 2) add a new optab movmem that is really memmove() and add support for
>> having __builtin_memmove() u
On 5/15/19 11:31 AM, Jakub Jelinek wrote:
> On Wed, May 15, 2019 at 11:23:54AM -0500, Aaron Sawdey wrote:
>> My goals for this are:
>> * memcpy() call becomes __builtin_memcpy and goes to optab[cpymem]
>> * memmove() call becomes __builtin_memmove (or __builtin_memcpy base
On 5/15/19 9:02 AM, Michael Matz wrote:
> On Wed, 15 May 2019, Aaron Sawdey wrote:
>> Next question would be how do we move from the existing movmem pattern
>> (which Michael Matz tells us should be renamed cpymem anyway) to this
>> new thing. Are you proposing that we s
On 5/15/19 7:22 AM, Richard Biener wrote:
> On Tue, May 14, 2019 at 9:21 PM Aaron Sawdey wrote:
>> I'd be interested in any comments about pieces of this machinery that need to
>> work a certain way, or other related issues that should be addressed in
>> between
On 5/15/19 8:10 AM, Michael Matz wrote:> On Tue, 14 May 2019, Aaron Sawdey
wrote:
>
>> memcpy -> expand with movmem pattern
>> memmove (no overlap) -> transform to memcpy -> expand with movmem pattern
>> memmove (overlap) -> remains memmove -> gl
this machinery that need to
work a certain way, or other related issues that should be addressed in
between expand_builtin_memcpy() and emit_block_move_via_movmem().
Thanks!
Aaron
--
Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technolo
On 2/18/19 10:41 AM, Alexander Monakov wrote:
> On Mon, 18 Feb 2019, Aaron Sawdey wrote:
>
>> The code in emit_case_dispatch_table() will very clearly always emit
>> branch/label/jumptable_data/barrier
>> so this does need to be handled. So, yes tablejump always looks
/ppc32), ok for trunk?
2019-02-18 Aaron Sawdey
PR rtl-optimization/88347
* schedule-ebb.c (begin_move_insn): Apply Segher's patch to handle
a jump table before the barrier.
On 1/24/19 9:43 AM, Alexander Monakov wrote:
> On Wed, 23 Jan 2019, Alexander Monakov wr
?
Thanks!
Aaron
2019-02-13 Aaron Sawdey
* shrink-wrap.c (move_insn_for_shrink_wrap): Fix LABEL_NUSES counts
on copied instruction.
Index: gcc/shrink-wrap.c
===
--- gcc/shrink-wrap.c (revision 268783)
+++ gcc
Missed two more conditional branches created by inline expansion that should
have had
branch probability notes.
2019-02-08 Aaron Sawdey
* config/rs6000/rs6000-string.c (expand_compare_loop,
expand_block_compare): Insert REG_BR_PROB notes in inline expansion of
memcmp
, which is what caused the long branches in 89112. With this
patch, the test
case for 89112 does not have any long branches within the expansion of memcmp,
and the code
for each memcmp is contiguous.
OK for trunk and 8 backport if bootstrap/regtest passes?
Thanks!
Aaron
2019-02-04 Aaron Sawdey
and backport to 8?
Thanks!
2019-02-02 Aaron Sawdey
* config/rs6000/rs6000.md (tf_): generate a local label
for the long branch case.
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 268403
The patch for this was committed to trunk as 267562 (see below). Is this also
ok for backport to 8?
Thanks,
Aaron
On 12/20/18 5:44 PM, Segher Boessenkool wrote:
> On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote:
>> On 12/20/18 3:51 AM, Segher Boessenkool wrote:
>&g
On 12/20/18 5:44 PM, Segher Boessenkool wrote:
> On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote:
>> On 12/20/18 3:51 AM, Segher Boessenkool wrote:
>>> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
>>>> Because of POWER9 dd2.1 iss
On 12/20/18 3:51 AM, Segher Boessenkool wrote:
> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
>> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
>> to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
>> inline
267299.
>
> Aaron, does this fix the issue you saw?
>
> Thanks, and sorry again about the breakage.
> Dave
>
Dave,
Thanks for the quick response, the build issue is fixed with r267299.
Aaron
--
Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 hom
enum tree_code, tree, enum tree_code, tree);
>> -extern void warn_tautological_cmp (location_t, enum tree_code, tree, tree);
>> +extern void warn_tautological_cmp (const op_location_t &, enum tree_code,
>> + tree, tree);
>> extern void warn_logical_not_parentheses (location_t, enum tree_code, tree,
>> tree);
>> extern bool warn_if_unused_value (const_tree, location_t);
>> diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c
>> index fc7f87c..fce9d84 100644
>> --- a/gcc/c-family/c-warn.c
>> +++ b/gcc/c-family/c-warn.c
>> @@ -322,7 +322,8 @@ find_array_ref_with_const_idx_r (tree *expr_p, int *,
>> void *)
>> if ((TREE_CODE (expr) == ARRAY_REF
>> || TREE_CODE (expr) == ARRAY_RANGE_REF)
>> - && TREE_CODE (TREE_OPERAND (expr, 1)) == INTEGER_CST)
>> + && (TREE_CODE (tree_strip_any_location_wrapper (TREE_OPERAND (expr,
>> 1)))
>> + == INTEGER_CST))
>> return integer_type_node;
>
> I think we want fold_for_warn here. OK with that change (assuming it passes).
>
> Jason
>
--
Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain
://patchwork.ozlabs.org/patch/814059/
OK for trunk if bootstrap/regtest ok?
Thanks!
Aaron
2018-12-19 Aaron Sawdey
* config/rs6000/rs6000-string.c (expand_block_move): Don't use
unaligned vsx and avoid lxvd2x/stxvd2x.
(gen_lvx_v4si_move): New function.
Index: gcc/config/rs6000
align >= 128)
- || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX)))
+ && (bytes >= 16 && ( align >= 128 || unaligned_vsx_ok)))
{
clear_bytes = 16;
mode = V4SImode;
On 11/26/18 4:29 PM, Segher Boessenkool wrote:
> On Mon, Nov 2
on a couple different ppc64 architectures (unless anyone has any objections).
Thanks,
Aaron
2018-11-26 Aaron Sawdey
Backport from mainline
2018-10-25 Aaron Sawdey
* config/rs6000/rs6000-string.c (expand_strncmp_gpr_sequence): Change to
a shorter sequence
for the last 32
bytes of any block being cleared. So this change puts the test up front so it
is not affected by the decrement of bytes.
OK for trunk if regstrap passes?
Thanks!
Aaron
2018-11-26 Aaron Sawdey
* config/rs6000/rs6000-string.c (expand_block_clear): Change how
On 11/15/18 4:02 AM, Richard Biener wrote:
> On Wed, Nov 14, 2018 at 5:43 PM Aaron Sawdey wrote:
>>
>> This patch generalizes some the functions added earlier to do vsx expansion
>> of strncmp
>> so that the can also generate the code needed for memcmp. I reorgani
the
gpr inline code
if the strings are equal and is comparable if the strings have a 10% chance of
being
equal (spread across the string).
Currently regtesting, ok for trunk if tests pass?
Thanks!
Aaron
2018-11-14 Aaron Sawdey
* config/rs6000/rs6000-string.c (emit_vsx_zero_reg): New
power8/power9, ok for trunk?
Thanks!
Aaron
2018-11-05 Aaron Sawdey
* config/rs6000/rs6000.md (bswap2): Force address into register
if not in indexed or indirect form.
(bswap2_load): Change predicate to indexed_or_indirect_operand.
(bswap2_store): Ditto
, tmp_reg_src2, addr2,
orig_src2);
/* We must always left-align the data we read, and
--
Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain
the incoming rtx which matches what the insns this is used to prepare for
are using as their predicate.
Bootstrap/regtest passes on ppc64le (power7, power9), ok for trunk?
2018-11-01 Aaron Sawdey
* config/rs6000/rs6000-protos.h (rs6000_address_for_fpconvert): Remove
prototype
I had to make one more change to make this actually work. In
rs6000_force_indexed_or_indirect_mem() it was necessary to
return the updated rtx.
Bootstrap/regtest passes on ppc64le (power7, power9), ok for trunk?
Thanks!
Aaron
2018-10-30 Aaron Sawdey
* config/rs6000/rs6000.md
On 10/27/18 12:52 PM, Segher Boessenkool wrote:
> Hi Aaron,
>
> On Sat, Oct 27, 2018 at 11:20:01AM -0500, Aaron Sawdey wrote:
>> --- gcc/config/rs6000/rs6000.md (revision 265393)
>> +++ gcc/config/rs6000/rs6000.md (working copy)
>> @@ -2512,9 +2512,27 @@
other cases where it will update them if there is more register pressure. in
either
case the code is more compact and makes full use of the indexed addressing of
ldbrx.
Bootstrap/regtest passed on ppc64le targeting power7/power8/power9, ok for
trunk?
Thanks!
Aaron
2018-10-27 Aaron Sawdey
is faster for long strings that do not
differ, but that isn't important because if vsx is enabled, the gpr
sequence is only used for 15 bytes or less.
Bootstrap/regtest passes on ppc64le (power8, power9), ppc64 (power8)
and ppc32 (power8). Ok for trunk?
Thanks,
Aaron
2018-10-25 Aaron Sawdey
On 10/17/18 4:03 PM, Florian Weimer wrote:
> * Aaron Sawdey:
>
>> I've previously posted a patch to add vector/vsx inline expansion of
>> strcmp/strncmp for the power8/power9 processors. Here are some of the
>> other items I have in the pipeline that I hope to get into
512 bytes inline before dumping to the library
function.
If anyone has any other input on the inline expansion work I've been
doing for the rs6000 target, please let me know.
Thanks!
Aaron
--
Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM
On 10/2/18 3:38 AM, Segher Boessenkool wrote:
> On Mon, Oct 01, 2018 at 11:09:44PM -0500, Aaron Sawdey wrote:
>> PR/87474 happens because I didn't check that both vector and VSX instructions
>> were enabled, so insns that are disabled get generated with
>> -mno-power8-
PR/87474 happens because I didn't check that both vector and VSX instructions
were enabled, so insns that are disabled get generated with -mno-power8-vector.
Regstrap passes on ppc64le, ok for trunk?
Thanks!
Aaron
2018-10-01 Aaron Sawdey
PR target/87474
* config/rs6000
) and ppc64le (power8 and
power9). Ok for trunk?
Thanks!
Aaron
2018-08-22 Aaron Sawdey
* config/rs6000/altivec.md (altivec_eq): Remove star.
* config/rs6000/rs6000-string.c (do_load_for_compare): Support
vector load modes.
(expand_strncmp_vec_sequence): New function
Just teasing things apart a bit more in this function so I can add
vec/vsx code generation without making it enormous and
incomprehensible.
Bootstrap/regtest passes on powerpc64le, ok for trunk?
Thanks,
Aaron
2018-07-31 Aaron Sawdey
* config/rs6000/rs6000-string.c
runs show
the performance regression is fixed.
Regstrap passes on powerpc64le, ok for trunk and backport to 8?
Thanks,
Aaron
2018-06-25 Aaron Sawdey
* config/rs6000/rs6000-string.c (expand_block_clear): Don't use
unaligned vsx for 16B memset.
--
Aaron Sawdey, Ph.D
and backport to 8?
Thanks,
Aaron
2018-06-19 Aaron Sawdey
* config/rs6000/rs6000-string.c (expand_strn_compare): Handle -m32
correctly.
--
Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC
for trunk?
Thanks!
Aaron
2018-06-14 Aaron Sawdey
* config/rs6000/rs6000-string.c (select_block_compare_mode): Check
TARGET_EFFICIENT_OVERLAPPING_UNALIGNED here instead of in caller.
(do_and3, do_and3_mask, do_compb3, do_rotl3): New functions
This also affects gcc 7 and is fixed by the same patch. I've tested the
backport to 7 on ppc64le and it causes no new fails. OK for backport to
7 (and 6 if it's also needed there)?
Thanks,
Aaron
On Fri, 2018-04-13 at 15:37 -0500, Aaron Sawdey wrote:
> Per the discussion on the 83660, I
in there, it has side effects and this
problem will not occur.
Doing bootstrap/regtest on ppc64le with -mcpu=power7 since that is
where this issue arises. OK for trunk if everything passes?
Thanks,
Aaron
2018-04-13 Aaron Sawdey <acsaw...@linux.ibm.com>
PR target
in invoke.texi. This is the last piece for 85321.
Testing in progress on linux-ppc64le, ok for trunk if tests are ok?
Thanks,
Aaron
2018-04-10 Aaron Sawdey <acsaw...@linux.ibm.com>
PR target/85321
* doc/invoke.texi (RS/6000 and PowerPC Options): Document options
1 - 100 of 189 matches
Mail list logo