Ping Richard. Any comments?
Hi Felix,
Sorry for the delay responding, I've been out of the office recently
and I'm only just catching up on a backlog of GCC related emails.
I'm in two minds about this; I can potentially see the need for
attributes to enable long calls for specific
These three are logically independent, but all on a common theme, and I've
tested them all together by
bootstrapped + check-gcc on aarch64-none-elf cross-tested check-gcc on
aarch64_be-none-elf
Ok for trunk?
Hi Alan,
It seems that we are duplicating the work for the vld1_dup part.
PING?
BTW: It seems that Alan's way of improving vld1(q?)_dup intrinsic is more
elegant.
So is the improvement of vcls(q?) vcnt(q?) OK for trunk? Thanks.
Hi,
This patch converts vcls(q?) vcnt(q?) and vld1(q?)_dup intrinsics to use
builtin functions instead of the previous inline
On 13 November 2014 06:14, Yangfei (Felix) felix.y...@huawei.com wrote:
Hi,
We find that the VALLDI mode iterator used in *aarch64_simd_ld1rmode
pattern is not appropriate.
The reason is that it's impossible to get a new operand of DImode by
vec_duplicating an operand of the same
Ping? I hope this patch can catch up with stage 1 of GCC-5.0. Thanks.
Hi Felix,
Sorry for the delay responding, I've been out of the office recently
and I'm only just catching up on a backlog of GCC related emails.
I'm in two minds about this; I can potentially see the need for
On 11/12/2014 11:01 AM, Yangfei (Felix) wrote:
+(define_expand doloop_end
+ [(use (match_operand 0 )) ; loop pseudo
+ (use (match_operand 1 ))] ; label
+
+
+{
Drop the surrounding the { }.
r~
Hello,
I updated the patch with the redundant removed
Hi,
This patch converts vcls(q?) vcnt(q?) and vld1(q?)_dup intrinsics to use
builtin functions instead of the previous inline assembly syntax.
Regtested with aarch64-linux-gnu on QEMU. Also passed the glorious
testsuite of Christophe Lyon.
OK for the trunk?
Index:
Index: gcc/ChangeLog
===
--- gcc/ChangeLog (revision 217506)
+++ gcc/ChangeLog (working copy)
@@ -1,3 +1,9 @@
+2014-11-13 Felix Yang felix.y...@huawei.com
+
+ * ipa-utils.h: Fix typo in comments.
+ *
No, we noticed this issue when improving the vld1(q?)_dup intrinsics. Thanks.
Is there a case or PR to demonstrate the issue? If yes, better to include it
as a test
case.
Thanks,
Joey
On Thu, Nov 13, 2014 at 2:14 PM, Yangfei (Felix) felix.y...@huawei.com
wrote:
Hi,
We find
case be rewritten then?
- Joey
On Fri, Nov 14, 2014 at 9:32 AM, Yangfei (Felix) felix.y...@huawei.com
wrote:
No, we noticed this issue when improving the vld1(q?)_dup intrinsics.
Thanks.
Is there a case or PR to demonstrate the issue? If yes, better to
include it as a test case
Hi Felix,
Sorry for the delay responding, I've been out of the office recently and I'm
only
just catching up on a backlog of GCC related emails.
I'm in two minds about this; I can potentially see the need for attributes to
enable long calls for specific calls, and maybe also for pragmas
Hello,
Any comments on this patch? Thanks.
Hi,
This patch adds doloop_end standard pattern for AArch64 port so that
-fmodulo-sched can do its job.
Reg-tested for aarch64-linux-gnu with QEMU. OK for the trunk?
Index: gcc/ChangeLog
Hello,
Any comments on this patch? Thanks.
The idea is simple: Use movw for certain const source operand
instead of
ldrh. And exclude the const values which cannot be handled by
mov/mvn/movw.
I am doing regression test for this patch. Assuming no issue
pops
Hi,
We find that the VALLDI mode iterator used in *aarch64_simd_ld1rmode
pattern is not appropriate.
The reason is that it's impossible to get a new operand of DImode by
vec_duplicating an operand of the same mode.
So this patch just excludes the DImode and uses VALL instead.
Hi,
As commented at https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00684.html,
this is a simple patch enabling neon memset inlining on
cortex-a53/cortex-a57 in AArch32 mode.
Test on
arm-none-linux-gnueabihf/--with-cpu=cortex-a57/--with-fpu=crypto-neon-fp-ar
m
v8/--with-float=hard. I will
On Thu, Nov 13, 2014 at 2:33 PM, Yangfei (Felix) felix.y...@huawei.com
wrote:
Hi,
As commented at
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00684.html,
this is a simple patch enabling neon memset inlining on
cortex-a53/cortex-a57 in AArch32 mode.
Test on
arm-none-linux
Just curious about this. Can we make sure that FPAdvSIMD are
always
enabled?
I am not sure I understand correct, do you mean enable options like
below default?
--with-fpu=crypto-neon-fp-armv8/--with-float=hard
Thanks,
bin
I mean the NEON hardware. The processor
Hello,
I have written a testsuite for AArch32 Neon intrinsics, available at
https://gitorious.org/arm-neon-tests
I am in the process of converting in into DejaGnu form for integration into
GCC.
My most recent submission was
On Wed, Oct 22, 2014 at 10:49 PM, Michael Collison
michael.colli...@linaro.org
wrote:
Patch that removes extraneous comment attached.
The CLZ_DEFINED_VALUE_AT_ZERO macro is hard coded to return 32. For
the vector intrinsic vclz this is incorrect and should return the
value
Hi,
Dose anybody have time to review this? Thanks.
Hello,
Ping for https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02933.html
Thanks
Hi,
This patch adds doloop_end standard pattern for AArch64 port so that
-fmodulo-sched can do its job.
Reg-tested for aarch64-linux-gnu with QEMU. OK for the trunk?
Index: gcc/ChangeLog
===
--- gcc/ChangeLog
The idea is simple: Use movw for certain const source operand instead
of
ldrh. And exclude the const values which cannot be handled by
mov/mvn/movw.
I am doing regression test for this patch. Assuming no issue pops up,
OK for trunk?
So, doesn't that makes the bug latent
On 10 October 2014 16:19, Alan Hayward alan.hayw...@arm.com wrote:
This patch is dependant on [AArch64] [BE] [1/2] Make large opaque
integer modes endianness-safe.”
It fixes up movoi/ci/xi for Big Endian, so that we end up with the lsb
of a big-endian integer to be in the low byte of
Hello,
It seems that this patch breaks the compile of some testcases under
big-endian.
On example: testsuite/gcc.target/aarch64/advsimd-intrinsics/ vldX.c
Any thing I missed? Please confirm. Thanks.
$ aarch64_be-linux-gnu-gcc vldX.c
vldX.c: In function 'exec_vldX':
Hello,
Ping for https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02933.html
Thanks
Hi,
This patch fixes PR63742 by improving arm *movhi_insn_arch4 pattern to make
it works under big-endian.
The idea is simple: Use movw for certain const source operand instead of
ldrh. And exclude the const values which cannot be handled by mov/mvn/movw.
I am doing regression
So we've been seeing
FAIL: gcc.target/aarch64/vldN_dup_1.c
on aarch64_be-none-elf, since this patch went in. Felix, did you test for
bigendian?
However, this failure is fixed if I apply David Sherwood's patch set:
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00942.html
+ c_register_pragma (0, long_calls_off, aarch64_pr_long_calls_off);
\
+} while (0)
+
#define FUNCTION_ARG_PADDING(MODE, TYPE) \
(aarch64_pad_arg_upward (MODE, TYPE) ? upward : downward)
Hi,
I added four testcases to ensure that this patch tested properly.
Reg-tested
On Thu, Oct 23, 2014 at 11:51 PM, Yangfei (Felix) felix.y...@huawei.com
wrote:
Thanks for the explanation. I think I am clear about what you are thinking
now.
That's an interesting question. I am not sure about reason why GCC's reload
cannot handle a doloop_end insn.
I guess maybe
+/* Handle pragmas for compatibility with Intel's compilers. */
+#define REGISTER_TARGET_PRAGMAS() do {
\
+ c_register_pragma (0, long_calls, aarch64_pr_long_calls);
\
+ c_register_pragma (0, no_long_calls, aarch64_pr_no_long_calls);
Hi,
This patch adds support for -mlong-calls option for aarch64 port. Major
code borrowed from ARM.
I'm doing regression test for it right now. Any comments?
Index: gcc/config/aarch64/aarch64.opt
===
---
On Fri, Oct 24, 2014 at 8:35 PM, Yangfei (Felix) felix.y...@huawei.com
wrote:
Hi,
I find that the -mlong-calls option is not there for AARCH64. So
can this port generate long calls?
Any plan on this option? I would like to have a try on this if it's
missing :-)
Thanks
Thanks for the reply. It seems that -mcmodel=large is different from
-mlong-calls.
GCC still emit the BL instruction with -mcmodel=large. I thinks GCC should
emit BLR instruction with -mlong-calls, right?
Oh right. Also it looks like it is not hooked up but the support is partly
On Thu, Oct 23, 2014 at 9:12 PM, Yangfei (Felix) felix.y...@huawei.com
wrote:
Here the key point is we need a general purpose register for the loop
instruction.
So the question to ask here is, How does this work today, without loop
instructions? Somehow--even when it has been spilled
1. The original xtensa port never generates loop instruction at all.
2. A port doesn't need to implement hwloop_pattern_reg hook if it has no
zero-cost loop instruction.
Is that clear?
I mean without your patch at all.
On Thu, Oct 23, 2014 at 11:30 PM, Yangfei (Felix) felix.y
.
On Thu, Oct 23, 2014 at 11:40 PM, Yangfei (Felix) felix.y...@huawei.com
wrote:
1. The original xtensa port never generates loop instruction at all.
2. A port doesn't need to implement hwloop_pattern_reg hook if it has no
zero-cost loop instruction.
Is that clear?
We are talking in circles. I
Hi,
I find that the -mlong-calls option is not there for AARCH64. So can this
port generate long calls?
Any plan on this option? I would like to have a try on this if it's missing
:-)
Thanks.
On 24 October 2014 03:21, Yangfei (Felix) felix.y...@huawei.com wrote:
Thanks for the comments. I updated the patch with the intrinsic moved to its
place.
Attached please find the new version of the patch.
OK for the trunk?
Index: gcc/ChangeLog
Hi,
I find that the -mlong-calls option is not there for AARCH64. So can this
port
generate long calls?
Any plan on this option? I would like to have a try on this if it's
missing :-)
Thanks.
Any comments?
Hi,
This patch converts the vld[234](q?)_dup intrinsics to use builtin
functions instead of the previous inline assembly syntax.
It can fix the performance issue on PR63173. Reg-tested with
aarch64-linux-gnu-gcc on qemu.
OK for the trunk? Thanks
Index: gcc/ChangeLog
On Tue, Oct 21, 2014 at 7:20 PM, Yangfei (Felix) felix.y...@huawei.com
wrote:
If the tripcount spill issue is not handled in the pattern, ICE may happen
then.
Here reload is trying to spill pseudo 173, but a memory operand is not
allowed
in zero_cost_loop_end pattern
+__extension__ static __inline float64x2x4_t __attribute__
+((__always_inline__))
+vld4q_dup_f64 (const float64_t * __a) {
+ float64x2x4_t ret;
+ __builtin_aarch64_simd_xi __o;
+ __o = __builtin_aarch64_ld4rv2df ((const __builtin_aarch64_simd_df
+*) __a);
+ ret.val[0] =
On Tue, Oct 21, 2014 at 7:20 PM, Yangfei (Felix) felix.y...@huawei.com
wrote:
If the tripcount spill issue is not handled in the pattern, ICE may
happen then.
Here reload is trying to spill pseudo 173, but a memory operand is
not allowed
in zero_cost_loop_end pattern
Hi Sterling,
Attached please find the testcase for the spill issue. Try it out with the
patch :-)
On Wed, Oct 15, 2014 at 7:10 PM, Yangfei (Felix) felix.y...@huawei.com
wrote:
Hi Sterling,
Since the patch is delayed for a long time, I'm kind of pushing it.
Sorry
at 7:10 PM, Yangfei (Felix)
felix.y...@huawei.com
wrote:
Hi Sterling,
Since the patch is delayed for a long time, I'm kind of pushing
it. Sorry for
that.
Yeah, you are right. We have some performance issue here as GCC
may
use one more general register in some cases
Hi,
I am trying to improve the AARCH64 NEON intrinsics. It seems that we don't
enough testcases for this part in GCC testsuite.
How do you guys test your patch on this part? Any suggestions? Thanks.
Hi Christophe,
Thank you for reply. The testsuite is useful for me. Hope to see more
progress in your work : - )
On 20 October 2014 14:01, Yangfei (Felix) felix.y...@huawei.com wrote:
Hi,
I am trying to improve the AARCH64 NEON intrinsics. It seems that we
don't enough testcases
Hi Sterling,
Since the patch is delayed for a long time, I'm kind of pushing it. Sorry
for that.
Yeah, you are right. We have some performance issue here as GCC may use one
more general register in some cases with this patch.
Take the following arraysum testcase for example. In
#define TARGET_DEFAULT \
((XCHAL_HAVE_L32R? 0 : MASK_CONST16) |\
Cheers,
Felix
On Tue, Jan 14, 2014 at 1:23 AM, Sterling Augustine
augustine.sterl...@gmail.com wrote:
On Thu, Jan 9, 2014 at 7:48 PM, Yangfei (Felix) felix.y...@huawei.com
wrote:
And here
PING?
The enclosed patch for 4.8 4.9 branch is a backport of r211885 from trunk.
The only change is to use:
for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++)
other than the new FOR_EACH_INSN_INFO_DEF interface.
Bootstrapped on x86_64-SUSE-Linux for both branches.
On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote:
The enclosed patch for 4.8 4.9 branch is a backport of r211885 from trunk.
The only change is to use:
for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++)
other than the new FOR_EACH_INSN_INFO_DEF interface.
On Thu, Oct 09, 2014 at 09:04:49AM +, Yangfei (Felix) wrote:
On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote:
The enclosed patch for 4.8 4.9 branch is a backport of r211885 from
trunk.
The only change is to use:
for (def_rec = DF_INSN_INFO_DEFS (insn_info
On Thu, Oct 09, 2014 at 09:04:49AM +, Yangfei (Felix) wrote:
On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote:
The enclosed patch for 4.8 4.9 branch is a backport of r211885
from
trunk.
The only change is to use:
for (def_rec
PING ?
Cheers,
Felix
On Wed, Sep 24, 2014 at 8:07 PM, Felix Yang fei.yang0...@gmail.com
wrote:
Hi Jeff,
Thanks for the comments. I updated the patch adding some
enhancements.
Bootstrapped on x86_64-suse-linux. Please apply this patch if OK for
trunk.
Three
On 09/22/14 08:40, Felix Yang wrote:
Hi,
I find that update_equiv_regs in ira.c sets the wrong EQUIV
reg-note for pseudo with more than one definiton in certain situation.
Here is a simplified RTL snippet after this function finishs handling:
(insn 77 37 78 2 (set
And here is the xtensa configuration tested (include/xtensa-config.h):
#define XCHAL_HAVE_BE 0
#define XCHAL_HAVE_LOOPS1
Hi Sterling,
Please note that version 2 of the patch is for gcc trunk, not for
gcc-4.8 branch.
Since the doloop_end pattern format
Hi Bernd,
The patch is OK to me. But do we need reorder_loops for the c6x backend ?
I mean we can set the do_reorder parameter to FALSE to save compile
time, since c6x backend only choose hw-doloops whose body contains only one
basic block.
Cheers,
Felix
On 01/05/2014 05:10
Yes, I see. Thanks for taking this patch.
Cheers,
Fei.
On 12/30/2013, 2:27 AM, Yangfei (Felix) wrote:
Add one entry to ChangeLog:
Index: gcc/ChangeLog
===
--- gcc/ChangeLog (revision 206236)
+++ gcc
Ping.
Attached please find patch is for trunk (gcc-4.9):
Index: gcc/ira-costs.c
===
--- gcc/ira-costs.c (revision 206236)
+++ gcc/ira-costs.c (working copy)
@@ -155,7 +155,7 @@ inline bool
cost_classes_hasher::equal (const
Add one entry to ChangeLog:
Index: gcc/ChangeLog
===
--- gcc/ChangeLog (revision 206236)
+++ gcc/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2013-12-30 Felix Yang felix.y...@huawei.com
+
+ * gcc/ira-costs.c
Hi Vladimir,
I am trying to fix a potential bug of cost_classes_eq in ira-costs.c. This
patch is for gcc-4.8 branch and should work for gcc-4.9.
Library function memcmp return 0 if no difference for the two compared data.
Please take a look.
/* Compares cost classes info V1 and V2.
101 - 161 of 161 matches
Mail list logo