Re: [Testsuite] Fix Cilk's exp to add -B for libcilkrts (was: Re: [Build, Driver] Add -lcilkrts for -fcilkplus)
Tobias Burnus bur...@net-b.de writes: Rainer Orth wrote: Tobias Burnus bur...@net-b.de writes: H.J. Lu wrote: xgcc: error: libcilkrts.spec: No such file or directory Hmm, I really wonder why it fails for you while it works for me: Do you happen to have the same/a recent version installed at the same prefix your build under test is configured for? I had - after I removed it, I could reproduce it. Sorry! Fixed by the attached testsuite patch. HJ: Does it now pass for you? For me it now does. OK for the trunk? Ok. In cases of massive reindentation like this, it's often more helpful to post diff -w output to better see the gist of the changes. Thanks for fixing this. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[Patch]Simplify SUBREG with operand whose target bits are cleared by AND operation
Hi there, When compile below case for ARM Thumb-2 target: long long int test (unsigned long long int a, unsigned int b) { return (a 0x) * b; } I find the GCC function simplify_subreg fails to simplify rtx (subreg:SI (and:DI (reg/v:DI 115 [ a ]) (const_int 4294967295 [0x])) 4) to zero during the fwprop1 pass, considering the fact that the high 32-bit part of (a 0x) is zero. This leads to some unnecessary multiplications for high 32-bit part of the result of AND operation. The attached patch is trying to improve simplify_rtx to handle such case. Other target like x86 seems hasn't such issue because it generates different RTX to handle 64bit multiplication on a 32bit machine. Bootstrapped gcc on x86 machine, no problem. Tested with gcc regression test for x86 and Thumb2, no regression. Is it OK to stage-1? BR, Terrydiff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 04af01e..0ed88fb 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -6099,6 +6099,19 @@ simplify_subreg (enum machine_mode outermode, rtx op, return CONST0_RTX (outermode); } + /* The AND operation may clear the target bits of SUBREG to zero. + Then we just need to return a zero. Here is an example: + (subreg:SI (and:DI (reg:DI X) (const_int 0x)) 4). */ + if (GET_CODE (op) == AND SCALAR_INT_MODE_P (innermode)) +{ + unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte); + unsigned HOST_WIDE_INT nzmask = nonzero_bits (op, innermode); + unsigned HOST_WIDE_INT smask = GET_MODE_MASK (outermode); + + if (((smask bitpos) nzmask) == 0) + return CONST0_RTX (outermode); +} + if (SCALAR_INT_MODE_P (outermode) SCALAR_INT_MODE_P (innermode) GET_MODE_PRECISION (outermode) GET_MODE_PRECISION (innermode) diff --git a/gcc/testsuite/gcc.target/arm/umull.c b/gcc/testsuite/gcc.target/arm/umull.c new file mode 100644 index 000..2e39baa --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/umull.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-skip-if { arm_thumb1 } } */ +/* { dg-options -O2 } */ + +long long int +test (unsigned long long int a, unsigned int b) +{ + return (a 0x) * b; +} + +/* { dg-final { scan-assembler-not mla } } */
RE: [Patch]Simplify SUBREG with operand whose target bits are cleared by AND operation
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Terry Guo Sent: Friday, March 28, 2014 3:48 PM To: gcc-patches@gcc.gnu.org Subject: [Patch]Simplify SUBREG with operand whose target bits are cleared by AND operation Hi there, When compile below case for ARM Thumb-2 target: long long int test (unsigned long long int a, unsigned int b) { return (a 0x) * b; } I find the GCC function simplify_subreg fails to simplify rtx (subreg:SI (and:DI (reg/v:DI 115 [ a ]) (const_int 4294967295 [0x])) 4) to zero during the fwprop1 pass, considering the fact that the high 32-bit part of (a 0x) is zero. This leads to some unnecessary multiplications for high 32-bit part of the result of AND operation. The attached patch is trying to improve simplify_rtx to handle such case. Other target like x86 seems hasn't such issue because it generates different RTX to handle 64bit multiplication on a 32bit machine. Bootstrapped gcc on x86 machine, no problem. Tested with gcc regression test for x86 and Thumb2, no regression. Is it OK to stage-1? BR, Terry Sorry for missing the ChangeLog part: gcc/ 2014-03-28 Terry Guo terry@arm.com * fwprop.c (simplify_subreg): Handle case that bits are cleared by AND operation. gcc/testsuite/ 2014-03-28 Terry Guo terry@arm.com * gcc.target/arm/umull.c: New testcase.
Re: C++ PATCH for c++/60566 (dtor devirtualization and missing thunks)
Andreas Schwab sch...@suse.de writes: Jason Merrill ja...@redhat.com writes: diff --git a/gcc/testsuite/g++.dg/abi/thunk6.C b/gcc/testsuite/g++.dg/abi/thunk6.C new file mode 100644 index 000..e3d07f2 --- /dev/null +++ b/gcc/testsuite/g++.dg/abi/thunk6.C @@ -0,0 +1,18 @@ +// PR c++/60566 +// We need to emit the construction vtable thunk for ~C even if we aren't +// going to use it. + +struct A +{ + virtual void f() = 0; + virtual ~A() {} +}; + +struct B: virtual A { int i; }; +struct C: virtual A { int i; ~C(); }; + +C::~C() {} + +int main() {} + +// { dg-final { scan-assembler _ZTv0_n32_N1CD1Ev } } FAIL: g++.dg/abi/thunk6.C -std=c++11 scan-assembler _ZTv0_n32_N1CD1Ev $ grep _ZTv0_ thunk6.s .globl _ZTv0_n16_N1CD1Ev .type _ZTv0_n16_N1CD1Ev, @function _ZTv0_n16_N1CD1Ev: .size _ZTv0_n16_N1CD1Ev, .-_ZTv0_n16_N1CD1Ev .globl _ZTv0_n16_N1CD0Ev .type _ZTv0_n16_N1CD0Ev, @function _ZTv0_n16_N1CD0Ev: .size _ZTv0_n16_N1CD0Ev, .-_ZTv0_n16_N1CD0Ev It would help to state which target this is... Same for the 32-bit multilib on Solaris/SPARC and x86 (i386-pc-solaris2.11, sparc-sun-solaris2.11). Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[AArch64] Implement ADD in vector registers for 32-bit scalar values.
Hi, There is no way to perform scalar addition in the vector register file, but with the RTX costs in place we start rewriting (x 1) to (x + x) on almost all cores. The code which makes this decision has no idea that we will end up doing this (it happens well before reload) and so we end up with very ugly code generation in the case where addition was selected, but we are operating in vector registers. This patch relies on the same gimmick we are already using to allow shifts on 32-bit scalars in the vector register file - Use a vector 32x2 operation instead, knowing that we can safely ignore the top bits. This restores some normality to scalar_shift_1.c, however the test that we generate a left shift by one is clearly bogus, so remove that. This patch is pretty ugly, but it does generate superficially better looking code for this testcase. Tested on aarch64-none-elf with no issues. OK for stage 1? Thanks, James --- gcc/ 2014-03-27 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in vector registers. gcc/testsuite/ 2014-03-27 James Greenhalgh james.greenha...@arm.com * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c86a29d8e7f8df21f25e14d22df1c3e8c37c907f..9c544a0a473732ebdf9238205db96d0d0c57de9a 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1063,16 +1063,17 @@ (define_expand addmode3 (define_insn *addsi3_aarch64 [(set -(match_operand:SI 0 register_operand =rk,rk,rk) +(match_operand:SI 0 register_operand =rk,rk,w,rk) (plus:SI - (match_operand:SI 1 register_operand %rk,rk,rk) - (match_operand:SI 2 aarch64_plus_operand I,r,J)))] + (match_operand:SI 1 register_operand %rk,rk,w,rk) + (match_operand:SI 2 aarch64_plus_operand I,r,w,J)))] @ add\\t%w0, %w1, %2 add\\t%w0, %w1, %w2 + add\\t%0.2s, %1.2s, %2.2s sub\\t%w0, %w1, #%n2 - [(set_attr type alu_imm,alu_reg,alu_imm)] + [(set_attr type alu_imm,alu_reg,neon_add,alu_imm)] ) ;; zero_extend version of above diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c index 7cb17f8..826bafc 100644 --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c @@ -193,7 +193,6 @@ test_corners_sisd_di (Int64x1 b) return b; } /* { dg-final { scan-assembler sshr\td\[0-9\]+,\ d\[0-9\]+,\ 63 } } */ -/* { dg-final { scan-assembler shl\td\[0-9\]+,\ d\[0-9\]+,\ 1 } } */ Int32x1 test_corners_sisd_si (Int32x1 b) @@ -207,7 +206,6 @@ test_corners_sisd_si (Int32x1 b) return b; } /* { dg-final { scan-assembler sshr\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 31 } } */ -/* { dg-final { scan-assembler shl\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 1 } } */
Re: [PING^7][PATCH] Add a couple of dialect and warning options regarding Objective-C instance variable scope
Ping! On 03/23/2014 03:20 AM, Dimitris Papavasiliou wrote: Ping! On 03/13/2014 11:54 AM, Dimitris Papavasiliou wrote: Ping! On 03/06/2014 07:44 PM, Dimitris Papavasiliou wrote: Ping! On 02/27/2014 11:44 AM, Dimitris Papavasiliou wrote: Ping! On 02/20/2014 12:11 PM, Dimitris Papavasiliou wrote: Hello all, Pinging this patch review request again. See previous messages quoted below for details. Regards, Dimitris On 02/13/2014 04:22 PM, Dimitris Papavasiliou wrote: Hello, Pinging this patch review request. Can someone involved in the Objective-C language frontend have a quick look at the description of the proposed features and tell me if it'd be ok to have them in the trunk so I can go ahead and create proper patches? Thanks, Dimitris On 02/06/2014 11:25 AM, Dimitris Papavasiliou wrote: Hello, This is a patch regarding a couple of Objective-C related dialect options and warning switches. I have already submitted it a while ago but gave up after pinging a couple of times. I am now informed that should have kept pinging until I got someone's attention so I'm resending it. The patch is now against an old revision and as I stated originally it's probably not in a state that can be adopted as is. I'm sending it as is so that the implemented features can be assesed in terms of their usefulness and if they're welcome I'd be happy to make any necessary changes to bring it up-to-date, split it into smaller patches, add test-cases and anything else that is deemed necessary. Here's the relevant text from my initial message: Two of these switches are related to a feature request I submitted a while ago, Bug 56044 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56044). I won't reproduce the entire argument here since it is available in the feature request. The relevant functionality in the patch comes in the form of two switches: -Wshadow-ivars which controls the local declaration of ‘somevar’ hides instance variable warning which curiously is enabled by default instead of being controlled at least by -Wshadow. The patch changes it so that this warning can be enabled and disabled specifically through -Wshadow-ivars as well as with all other shadowing-related warnings through -Wshadow. The reason for the extra switch is that, while searching through the Internet for a solution to this problem I have found out that other people are inconvenienced by this particular warning as well so it might be useful to be able to turn it off while keeping all the other shadowing-related warnings enabled. -flocal-ivars which when true, as it is by default, treats instance variables as having local scope. If false (-fno-local-ivars) instance variables must always be referred to as self-ivarname and references of ivarname resolve to the local or global scope as usual. I've also taken the opportunity of adding another switch unrelated to the above but related to instance variables: -fivar-visibility which can be set to either private, protected (the default), public and package. This sets the default instance variable visibility which normally is implicitly protected. My use-case for it is basically to be able to set it to public and thus effectively disable this visibility mechanism altogether which I find no use for and therefore have to circumvent. I'm not sure if anyone else feels the same way towards this but I figured it was worth a try. I'm attaching a preliminary patch against the current revision in case anyone wants to have a look. The changes are very small and any blatant mistakes should be immediately obvious. I have to admit to having virtually no knowledge of the internals of GCC but I have tried to keep in line with formatting guidelines and general style as well as looking up the particulars of the way options are handled in the available documentation to avoid blind copy-pasting. I have also tried to test the functionality both in my own (relatively large, or at least not too small) project and with small test programs and everything works as expected. Finallly, I tried running the tests too but these fail to complete both in the patched and unpatched version, possibly due to the way I've configured GCC. Dimitris
Re: [PATCH] Fix PR c++/60573
On 2014-03-27 21:16, Adam Butcher wrote: On 2014-03-27 20:45, Adam Butcher wrote: PR c++/60573 * name-lookup.h (cp_binding_level): New field scope_defines_class_p. * semantics.c (begin_class_definition): Set scope_defines_class_p. * pt.c (instantiate_class_template_1): Likewise. * parser.c (synthesize_implicit_template_parm): Use cp_binding_level:: scope_defines_class_p rather than TYPE_BEING_DEFINED as the predicate for unwinding to class-defining scope to handle the erroneous definition of a generic function of an arbitrarily nested class within an enclosing class. Still got issues with this. It fails on out-of-line defs. I'll have another look. Turns out the solution was OK but I didn't account for the class-defining scope being reused for subsequent out-of-line declarations. I've made 'scope_defines_class_p' in to the now transient 'defining_class_p' predicate which is reset on leaving scope. I've ditched the 'scope_' prefix and also ditched the modifications to 'instantiate_class_template_1'. The patch delta is included below (but will probably be munged by my webmail client). I'll reply to this with the full patch. There is also the fix for PR c++/60626 (http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01294.html) that deals with another form of erroneous generic function declarations with nested class scope. Cheers, Adam diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c index 53f14f3..0137c3f 100644 --- a/gcc/cp/name-lookup.c +++ b/gcc/cp/name-lookup.c @@ -1630,10 +1630,14 @@ leave_scope (void) free_binding_level = scope; } - /* Find the innermost enclosing class scope, and reset - CLASS_BINDING_LEVEL appropriately. */ if (scope-kind == sk_class) { + /* Reset DEFINING_CLASS_P to allow for reuse of a +class-defining scope in a non-defining context. */ + scope-defining_class_p = 0; + + /* Find the innermost enclosing class scope, and reset +CLASS_BINDING_LEVEL appropriately. */ class_binding_level = NULL; for (scope = current_binding_level; scope; scope = scope-level_chain) if (scope-kind == sk_class) diff --git a/gcc/cp/name-lookup.h b/gcc/cp/name-lookup.h index 9e5d812..40e0338 100644 --- a/gcc/cp/name-lookup.h +++ b/gcc/cp/name-lookup.h @@ -255,9 +255,12 @@ struct GTY(()) cp_binding_level { unsigned more_cleanups_ok : 1; unsigned have_cleanups : 1; - /* Set if this scope is of sk_class kind and is the defining - scope for this_entity. */ - unsigned scope_defines_class_p : 1; + /* Transient state set if this scope is of sk_class kind + and is in the process of defining 'this_entity'. Reset + on leaving the class definition to allow for the scope + to be subsequently re-used as a non-defining scope for + 'this_entity'. */ + unsigned defining_class_p : 1; /* 23 bits left to fill a 32-bit word. */ }; diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 4919a67..0945bfd 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -32027,7 +32027,7 @@ synthesize_implicit_template_parm (cp_parser *parser) declarator should be injected into the scope of 'A' as if the ill-formed template was specified explicitly. */ - while (scope-kind == sk_class !scope-scope_defines_class_p) + while (scope-kind == sk_class !scope-defining_class_p) { parent_scope = scope; scope = scope-level_chain; diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 90faeec..c791d03 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -8905,12 +8905,9 @@ instantiate_class_template_1 (tree type) return type; /* Now we're really doing the instantiation. Mark the type as in - the process of being defined... */ + the process of being defined. */ TYPE_BEING_DEFINED (type) = 1; - /* ... and the scope defining it. */ - class_binding_level-scope_defines_class_p = 1; - /* We may be in the middle of deferred access check. Disable it now. */ push_deferring_access_checks (dk_no_deferred); diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index deba2ab..207a42d 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -2777,7 +2777,7 @@ begin_class_definition (tree t) maybe_process_partial_specialization (t); pushclass (t); TYPE_BEING_DEFINED (t) = 1; - class_binding_level-scope_defines_class_p = 1; + class_binding_level-defining_class_p = 1; if (flag_pack_struct) {
[PATCH] Fix PR c++/60573
PR c++/60573 * name-lookup.h (cp_binding_level): New transient field defining_class_p to indicate whether a scope is in the process of defining a class. * semantics.c (begin_class_definition): Set defining_class_p. * name-lookup.c (leave_scope): Reset defining_class_p. * parser.c (synthesize_implicit_template_parm): Use cp_binding_level:: defining_class_p rather than TYPE_BEING_DEFINED as the predicate for unwinding to class-defining scope to handle the erroneous definition of a generic function of an arbitrarily nested class within an enclosing class. PR c++/60573 * g++.dg/cpp1y/pr60573.C: New testcase. --- gcc/cp/name-lookup.c | 8 ++-- gcc/cp/name-lookup.h | 9 - gcc/cp/parser.c | 23 +-- gcc/cp/semantics.c | 1 + gcc/testsuite/g++.dg/cpp1y/pr60573.C | 28 5 files changed, 60 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp1y/pr60573.C diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c index 53f14f3..0137c3f 100644 --- a/gcc/cp/name-lookup.c +++ b/gcc/cp/name-lookup.c @@ -1630,10 +1630,14 @@ leave_scope (void) free_binding_level = scope; } - /* Find the innermost enclosing class scope, and reset - CLASS_BINDING_LEVEL appropriately. */ if (scope-kind == sk_class) { + /* Reset DEFINING_CLASS_P to allow for reuse of a +class-defining scope in a non-defining context. */ + scope-defining_class_p = 0; + + /* Find the innermost enclosing class scope, and reset +CLASS_BINDING_LEVEL appropriately. */ class_binding_level = NULL; for (scope = current_binding_level; scope; scope = scope-level_chain) if (scope-kind == sk_class) diff --git a/gcc/cp/name-lookup.h b/gcc/cp/name-lookup.h index a63442f..40e0338 100644 --- a/gcc/cp/name-lookup.h +++ b/gcc/cp/name-lookup.h @@ -255,7 +255,14 @@ struct GTY(()) cp_binding_level { unsigned more_cleanups_ok : 1; unsigned have_cleanups : 1; - /* 24 bits left to fill a 32-bit word. */ + /* Transient state set if this scope is of sk_class kind + and is in the process of defining 'this_entity'. Reset + on leaving the class definition to allow for the scope + to be subsequently re-used as a non-defining scope for + 'this_entity'. */ + unsigned defining_class_p : 1; + + /* 23 bits left to fill a 32-bit word. */ }; /* The binding level currently in effect. */ diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index e729d65..0945bfd 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -32000,7 +32000,7 @@ synthesize_implicit_template_parm (cp_parser *parser) { /* If not defining a class, then any class scope is a scope level in an out-of-line member definition. In this case simply wind back -beyond the first such scope to inject the template argument list. +beyond the first such scope to inject the template parameter list. Otherwise wind back to the class being defined. The latter can occur in class member friend declarations such as: @@ -32011,12 +32011,23 @@ synthesize_implicit_template_parm (cp_parser *parser) friend void A::foo (auto); }; - The template argument list synthesized for the friend declaration - must be injected in the scope of 'B', just beyond the scope of 'A' - introduced by 'A::'. */ + The template parameter list synthesized for the friend declaration + must be injected in the scope of 'B'. This can also occur in + erroneous cases such as: - while (scope-kind == sk_class - !TYPE_BEING_DEFINED (scope-this_entity)) + struct A { +struct B { + void foo (auto); +}; +void B::foo (auto) {} + }; + + Here the attempted definition of 'B::foo' within 'A' is ill-formed + but, nevertheless, the template parameter list synthesized for the + declarator should be injected into the scope of 'A' as if the + ill-formed template was specified explicitly. */ + + while (scope-kind == sk_class !scope-defining_class_p) { parent_scope = scope; scope = scope-level_chain; diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 886fbb8..207a42d 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -2777,6 +2777,7 @@ begin_class_definition (tree t) maybe_process_partial_specialization (t); pushclass (t); TYPE_BEING_DEFINED (t) = 1; + class_binding_level-defining_class_p = 1; if (flag_pack_struct) { diff --git a/gcc/testsuite/g++.dg/cpp1y/pr60573.C b/gcc/testsuite/g++.dg/cpp1y/pr60573.C new file mode 100644
Skip gcc.dg/tree-ssa/isolate-*.c for AVR Target
Hi all, The tests added in gcc.dg/tree-ssa/isolate-*.c is failing for AVR target, Because the isolate erroneous path pass needs -fdelete-null-pointer-checks option to be enabled. For AVR target that option is disabled, this cause the tests to fail. Following Patch skip the isolate-* tests if keeps_null_pointer_checks is true. 2014-03-28 Vishnu K S vishnu@atmel.com * gcc/testsuite/gcc.dg/tree-ssa/isolate-1.c: Skip test for AVR * gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c: Ditto * gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c: Ditto * gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c: Ditto * gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c: Ditto --- a/gcc/testsuite/gcc.dg/tree-ssa/isolate-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/isolate-1.c @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-options -O2 -fdump-tree-isolate-paths } */ +/* { dg-skip-if keeps_null_pointer_checks } */ struct demangle_component diff --git a/gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c b/gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c index bfcaa2b..912d98e 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options -O2 -fisolate-erroneous-paths-attribute -fdump-tree-isolate-paths -fdump-tree-phicprop1 } */ +/* { dg-skip-if keeps_null_pointer_checks } */ int z; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c b/gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c index 780..9c2c5d5 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options -O2 -fdump-tree-isolate-paths } */ +/* { dg-skip-if keeps_null_pointer_checks } */ typedef long unsigned int size_t; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c b/gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c index c9c074d..d50a2b2 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options -O2 -fisolate-erroneous-paths-attribute -fdump-tree-isolate-paths -fdump-tree-phicprop1 } */ +/* { dg-skip-if keeps_null_pointer_checks } */ extern void foo(void *) __attribute__ ((__nonnull__ (1))); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c b/gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c index 4d01d5c..e6ae37a 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options -O2 -fdump-tree-isolate-paths -fdump-tree-optimized } */ +/* { dg-skip-if keeps_null_pointer_checks } */ Regards, Vishnu KS
Re: RFA: Fix PR rtl-optimization/60651
However, the first call is for blocks with incoming abnormal edges. If these are empty, the change as I wrote it yesterday is fine, but not when they are non-empty; in that case, we should indeed insert before the first instruction in that block. OK, so the issue is specific to empty basic blocks and boils down to inserting instructions in a FIFO manner into them. This can be archived by finding an insert-before position using NEXT_INSN on the basic block head; this amounts to the very same insertion place as inserting after the basic block head. Also, we will continue to set no location, and use the same bb, because both add_insn_before and add_insn_after (in contradiction to its block comment) will infer the basic block from the insn given (in the case for add_insn_before, I assume that the basic block doesn't start with a BARRIER - that would be invalid - and that the insn it starts with has a valid BLOCK_FOR_INSN setting the same way the basic block head has. This looks reasonable, but I think that we need more commentary because it's not straightforward to understand, so I would: 1. explicitly state that we enforce an order on the entities in addition to the order on priority, both in the code (for example create a 4th paragraph in the comment at the top of the file, before More details ...) and in the doc as you already did, but ordering the two orders for the sake of clarity: first the order on priority then, for the same priority, the order to the entities. 2. add a line in the head comment of new_seginfo saying that INSN may not be a NOTE_BASIC_BLOCK, unless BB is empty. 3. add a comment above the trick in optimize_mode_switching saying that it is both required to implement the FIFO insertion and valid because we know that the basic block was initially empty. It's not clear to me whether this is a regression or not, so you'll also need to run it by the RMs. In the meantime I have installed the attached patchlet. 2014-03-28 Eric Botcazou ebotca...@adacore.com * mode-switching.c: Make small adjustments to the top comment. -- Eric BotcazouIndex: mode-switching.c === --- mode-switching.c (revision 208879) +++ mode-switching.c (working copy) @@ -45,20 +45,20 @@ along with GCC; see the file COPYING3. and finding all the insns which require a specific mode. Each insn gets a unique struct seginfo element. These structures are inserted into a list for each basic block. For each entity, there is an array of bb_info over - the flow graph basic blocks (local var 'bb_info'), and contains a list + the flow graph basic blocks (local var 'bb_info'), which contains a list of all insns within that basic block, in the order they are encountered. For each entity, any basic block WITHOUT any insns requiring a specific - mode are given a single entry, without a mode. (Each basic block - in the flow graph must have at least one entry in the segment table.) + mode are given a single entry without a mode (each basic block in the + flow graph must have at least one entry in the segment table). The LCM algorithm is then run over the flow graph to determine where to - place the sets to the highest-priority value in respect of first the first + place the sets to the highest-priority mode with respect to the first insn in any one block. Any adjustments required to the transparency vectors are made, then the next iteration starts for the next-lower priority mode, till for each entity all modes are exhausted. - More details are located in the code for optimize_mode_switching(). */ + More details can be found in the code of optimize_mode_switching. */ /* This structure contains the information for each insn which requires either single or double mode to be set.
Re: Fix PR ipa/60315 (inliner explosion)
On Thu, Mar 27, 2014 at 12:02:01PM +0100, Andreas Schwab wrote: --- testsuite/g++.dg/torture/pr60315.C (revision 0) +++ testsuite/g++.dg/torture/pr60315.C (revision 0) @@ -0,0 +1,32 @@ +// { dg-do compile } +struct Base { +virtual int f() = 0; +}; + +struct Derived : public Base { +virtual int f() final override { +return 42; +} +}; + +extern Base* b; + +int main() { +return (static_castDerived*(b)-*(Derived::f))(); +} FAIL: g++.dg/torture/pr60315.C -O0 (test for excess errors) Excess errors: /usr/local/gcc/gcc-20140327/gcc/testsuite/g++.dg/torture/pr60315.C:7:19: warning: override controls (override/final) only available with -std=c++11 or -std=gnu++11 /usr/local/gcc/gcc-20140327/gcc/testsuite/g++.dg/torture/pr60315.C:7:21: warning: override controls (override/final) only available with -std=c++11 or -std=gnu++11 As dg-torture.exp doesn't cycle through c++98/c++11/c++14, I've committed this fix as obvious: 2014-03-28 Jakub Jelinek ja...@redhat.com PR ipa/60315 * g++.dg/torture/pr60315.C: Add -std=c++11 to dg-options. --- gcc/testsuite/g++.dg/torture/pr60315.C.jj 2014-03-26 10:13:22.0 +0100 +++ gcc/testsuite/g++.dg/torture/pr60315.C 2014-03-28 11:07:08.671208010 +0100 @@ -1,4 +1,7 @@ +// PR ipa/60315 // { dg-do compile } +// { dg-options -std=c++11 } + struct Base { virtual int f() = 0; }; Jakub
Re: C++ PATCH for c++/60566 (dtor devirtualization and missing thunks)
On Fri, Mar 28, 2014 at 10:06:45AM +0100, Rainer Orth wrote: FAIL: g++.dg/abi/thunk6.C -std=c++11 scan-assembler _ZTv0_n32_N1CD1Ev $ grep _ZTv0_ thunk6.s .globl _ZTv0_n16_N1CD1Ev .type _ZTv0_n16_N1CD1Ev, @function _ZTv0_n16_N1CD1Ev: .size _ZTv0_n16_N1CD1Ev, .-_ZTv0_n16_N1CD1Ev .globl _ZTv0_n16_N1CD0Ev .type _ZTv0_n16_N1CD0Ev, @function _ZTv0_n16_N1CD0Ev: .size _ZTv0_n16_N1CD0Ev, .-_ZTv0_n16_N1CD0Ev It would help to state which target this is... Same for the 32-bit multilib on Solaris/SPARC and x86 (i386-pc-solaris2.11, sparc-sun-solaris2.11). Seems it fails on all ilp32 targets I've tried and succeeds on all lp64 targets (including ia64), so I think we should do following. Ok for trunk? 2014-03-28 Jakub Jelinek ja...@redhat.com PR c++/58678 * g++.dg/abi/thunk6.C: Scan assembler for _ZTv0_n32_N1CD1Ev only for lp64 targets and scan for _ZTv0_n16_N1CD1Ev for ilp32 targets. --- gcc/testsuite/g++.dg/abi/thunk6.C.jj2014-03-26 20:31:53.0 +0100 +++ gcc/testsuite/g++.dg/abi/thunk6.C 2014-03-28 11:20:45.051852976 +0100 @@ -15,4 +15,5 @@ C::~C() {} int main() {} -// { dg-final { scan-assembler _ZTv0_n32_N1CD1Ev } } +// { dg-final { scan-assembler _ZTv0_n32_N1CD1Ev { target lp64 } } } +// { dg-final { scan-assembler _ZTv0_n16_N1CD1Ev { target ilp32 } } } Jakub
Evident fix for copy_loops.
Hi All, I found out that a field 'safelen of struct loop is not copied in copy_loops. Is it OK for trunk? ChangeLog: 2014-03-28 Yuri Rumyantsev ysrum...@gmail.com * tree-inline.c (copy_loops): Add missed copy of 'safelen'. copy-loops-fix Description: Binary data
[C++ PATCH] Fix __atomic_exchange (PR c++/60689)
Hi! __atomic_exchange doesn't work in C++. The problem is that add_atomic_size_parameter, if there is no room in params vector, creates new params vector big enough that the extra argument fixs in, but doesn't add the extra argument in, because it relies on the subsequent build_function_call_vec to call resolve_overloaded_builtin recursively and add the parameter. The C build_function_call_vec does that, but C++ doesn't, there resolve_overloaded_builtin is called from finish_call_expr instead. My first attempt to fix this - changing add_atomic_size_parameter to add that argument - broke C, where the recursive resolve_overloaded_builtin - get_atomic_generic_size would complain about too many arguments. Here is one possible fix for this, let the C++ build_function_call_vec (which is only called from two c-family/c-common.c spots that expect this behavior) behave more like the C call. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk (and 4.8)? Another alternative is some static flag in resolve_overloaded_function that would just punt if the function is called recursively, but I think if we can avoid adding global state, we should (otherwise JIT won't like it too much). Yet another possibility would be to rename all calls in C FE to build_function_call_vec to say c_build_function_call_vec and add that function which would call resolve_overloaded_builtin and then tail call to build_function_call_vec which wouldn't do that. Then c-family/ would keep its current two calls to that function, which wouldn't recurse anymore, and we'd need to change add_atomic_size_parameter to push the argument. 2014-03-28 Jakub Jelinek ja...@redhat.com PR c++/60689 * typeck.c (build_function_call_vec): Call resolve_overloaded_builtin. * c-c++-common/pr60689.c: New test. --- gcc/cp/typeck.c.jj 2014-03-10 10:50:14.0 +0100 +++ gcc/cp/typeck.c 2014-03-28 08:07:49.737656541 +0100 @@ -3363,10 +3363,23 @@ build_function_call (location_t /*loc*/, /* Used by the C-common bits. */ tree -build_function_call_vec (location_t /*loc*/, veclocation_t /*arg_loc*/, +build_function_call_vec (location_t loc, veclocation_t /*arg_loc*/, tree function, vectree, va_gc *params, vectree, va_gc * /*origtypes*/) { + /* This call is here to match what the C FE does in its + build_function_call_vec. See PR60689. */ + if (TREE_CODE (function) == FUNCTION_DECL) +{ + /* Implement type-directed function overloading for builtins. +resolve_overloaded_builtin and targetm.resolve_overloaded_builtin +handle all the type checking. The result is a complete expression +that implements this function call. */ + tree tem = resolve_overloaded_builtin (loc, function, params); + if (tem) + return tem; +} + vectree, va_gc *orig_params = params; tree ret = cp_build_function_call_vec (function, params, tf_warning_or_error); --- gcc/testsuite/c-c++-common/pr60689.c.jj 2014-03-27 22:06:31.703103613 +0100 +++ gcc/testsuite/c-c++-common/pr60689.c2014-03-27 22:06:46.542024952 +0100 @@ -0,0 +1,10 @@ +/* PR c++/60689 */ +/* { dg-do compile } */ + +struct S { char x[9]; }; + +void +foo (struct S *x, struct S *y, struct S *z) +{ + __atomic_exchange (x, y, z, __ATOMIC_SEQ_CST); +} Jakub
Re: Evident fix for copy_loops.
On Fri, Mar 28, 2014 at 02:41:26PM +0400, Yuri Rumyantsev wrote: Hi All, I found out that a field 'safelen of struct loop is not copied in copy_loops. Is it OK for trunk? Ok if it passes bootstrap/regtest. 2014-03-28 Yuri Rumyantsev ysrum...@gmail.com * tree-inline.c (copy_loops): Add missed copy of 'safelen'. Jakub
Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.
On Mar 28, 2014, at 2:12 AM, James Greenhalgh james.greenha...@arm.com wrote: Hi, There is no way to perform scalar addition in the vector register file, but with the RTX costs in place we start rewriting (x 1) to (x + x) on almost all cores. The code which makes this decision has no idea that we will end up doing this (it happens well before reload) and so we end up with very ugly code generation in the case where addition was selected, but we are operating in vector registers. This patch relies on the same gimmick we are already using to allow shifts on 32-bit scalars in the vector register file - Use a vector 32x2 operation instead, knowing that we can safely ignore the top bits. This restores some normality to scalar_shift_1.c, however the test that we generate a left shift by one is clearly bogus, so remove that. This patch is pretty ugly, but it does generate superficially better looking code for this testcase. Tested on aarch64-none-elf with no issues. OK for stage 1? It seems we should also discourage the neon alternatives as there might be extra movement between the two register sets which we don't want. Thanks, Andrew Thanks, James --- gcc/ 2014-03-27 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in vector registers. gcc/testsuite/ 2014-03-27 James Greenhalgh james.greenha...@arm.com * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler. 0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch
Re: [Patch, Fortran] PR60576 Fix out-of-bounds problem
Dear Tobias, This is, of course, fine since it is 'obvious' (in my opinion at least). Thanks for the patch Paul On 27 March 2014 21:05, Tobias Burnus bur...@net-b.de wrote: An early * PING* for this wrong-code issue. Tobias Burnus wrote: This patch fixes part of the problems of the PR. The problem is that one assigns an array descriptor to an assumed-rank array descriptor. The latter has for BT_CLASS the size of max_dim (reason: we have first the data array and than vtab). With true, one takes the TREE_TYPE from the LHS (i.e. the assumed-rank variable) and as the type determines how many bytes the range assignment copies, one reads max_dimension elements from the RHS array - which can be too much. Testcase: Already in the testsuite, even if it only fails under special conditions. Build and regtested on x86-64-gnu-linux. OK for the trunk and 4.8? Tobias PS: I haven't investigated the issues Jakub is seeing. With valgrind, they do not pop up and my attempt to build with all checking enabled, failed with configure or compile errors. -- The knack of flying is learning how to throw yourself at the ground and miss. --Hitchhikers Guide to the Galaxy
[PATCH] S/390: Make S/390 a logical_op_short_circuit target
Hi, S/390 does not not define LOGICAL_OP_NON_SHORT_CIRCUIT but its default value depends on the branch cost. On S/390 we set a branch cost of 1 which makes us a logical_op_short_circuit target. This fixes the following testcases: FAIL: gcc.dg/binop-xor1.c scan-tree-dump-times optimized ^ 1 FAIL: gcc.dg/tree-ssa/forwprop-28.c scan-tree-dump-times forwprop1 Replaced 8 FAIL: gcc.dg/tree-ssa/vrp87.c scan-tree-dump vrp2 Folded into: if.* FAIL: gcc.dg/tree-ssa/vrp87.c scan-tree-dump cddce2 Deleting.*_Bool.*; Bye, -Andreas- 2014-03-28 Andreas Krebbel andreas.kreb...@de.ibm.com * gcc.dg/tree-ssa/ssa-dom-thread-4.c: Remove s390 special option. * lib/target-supports.exp: Return true for s390 in check_effective_logical_op_short_circuit. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c index 1e46634..cafdf13 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c @@ -1,6 +1,5 @@ /* { dg-do compile } */ /* { dg-options -O2 -fdump-tree-dom1-details } */ -/* { dg-additional-options -mbranch-cost=2 { target s390*-*-* } } */ struct bitmap_head_def; typedef struct bitmap_head_def *bitmap; typedef const struct bitmap_head_def *const_bitmap; diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index bee8471..0d2ccd5 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -5712,6 +5712,7 @@ proc check_effective_target_logical_op_short_circuit {} { || [istarget arc*-*-*] || [istarget avr*-*-*] || [istarget crisv32-*-*] || [istarget cris-*-*] +|| [istarget s390*-*-*] || [check_effective_target_arm_cortex_m] } { return 1 }
[PATCH] g++.dg: add ipa.exp file
Hi, I would like to add corresponding ipa.exp file for g++ that let me run: make -k check RUNTESTFLAGS=ipa.exp Changelog: 2014-03-28 Martin Liska mli...@suse.cz * g++.dg/ipa.epx: Anologous file added to g++.dg folder. OK for trunk? Thank you, Martin diff --git a/gcc/testsuite/g++.dg/ipa/ipa.exp b/gcc/testsuite/g++.dg/ipa/ipa.exp new file mode 100644 index 000..af7b8a7 --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/ipa.exp @@ -0,0 +1,35 @@ +# Copyright (C) 1997-2014 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# http://www.gnu.org/licenses/. + +# G++ testsuite that uses the `dg.exp' driver. + +# Load support procs. +load_lib g++-dg.exp + +# If a testcase doesn't have special options, use these. +global DEFAULT_CXXFLAGS +if ![info exists DEFAULT_CXXFLAGS] then { +set DEFAULT_CXXFLAGS -pedantic-errors -Wno-long-long +} + +# Initialize `dg'. +dg-init + +# Main loop. +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[C\]]] $DEFAULT_CXXFLAGS + +# All done. +dg-finish
Re: [PATCH] g++.dg: add ipa.exp file
Hi Martin, Hi, I would like to add corresponding ipa.exp file for g++ that let me run: make -k check RUNTESTFLAGS=ipa.exp Changelog: 2014-03-28 Martin Liska mli...@suse.cz * g++.dg/ipa.epx: Anologous file added to g++.dg folder. Two typos. Besides, this should be * g++.dg/ipa.exp: New file. instead. diff --git a/gcc/testsuite/g++.dg/ipa/ipa.exp b/gcc/testsuite/g++.dg/ipa/ipa.exp new file mode 100644 index 000..af7b8a7 --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/ipa.exp @@ -0,0 +1,35 @@ +# Copyright (C) 1997-2014 Free Software Foundation, Inc. Only 2014 here. This isn't enough, though: you need to add the ipa/* files to g++.dg/dg.exp to avoid running the ipa tests twice. This isn't stage4 material, anyway. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] Handle short reads and EINTR in lto-plugin/simple-object
On Wed, 26 Mar 2014, Richard Biener wrote: On March 26, 2014 4:51:58 PM CET, Ian Lance Taylor i...@google.com wrote: On Wed, Mar 26, 2014 at 8:38 AM, Richard Biener rguent...@suse.de wrote: - got = read (descriptor, buffer, size); - if (got 0) + do { - *errmsg = read; - *err = errno; - return 0; + got = read (descriptor, buffer, size); + if (got 0 + errno != EINTR) + { + *errmsg = read; + *err = errno; + return 0; + } + else + { + buffer += got; + size -= got; + } This appears to do the wrong thing if got 0 errno == EINTR. In that case it should not add got to buffer and size. Uh, indeed. Will fix. - if (offset != lseek (obj-file-fd, offset, SEEK_SET) - || length != read (obj-file-fd, secdata, length)) + if (!simple_object_internal_read (obj-file-fd, offset, + secdata, length, errmsg, err)) Hmmm, internal_read is meant to be, well, internal. It's not declared anywhere as far as I can see. I can duplicate the stuff as well. Are you really seeing EINTR reads here? That seems very odd to me, since we are always just reading a local file. But if you are seeing it, I guess we should handle it. Well, it's a shot in the dark... I definitely know short reads and EINTR happens more in virtual machines though. So handling it is an improvement. I'll see if it fixes my problems and report back. Ok, updated patch as below. The patch _does_ fix the issues I run into (well, previously 1 out of 4 compiles succeeded, now 4 out of 4 succeed, whatever that proves ;)) LTO bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk? Thanks, Richard. 2014-03-26 Richard Biener rguent...@suse.de libiberty/ * simple-object.c (simple_object_internal_read): Handle EINTR and short reads. lto-plugin/ * lto-plugin.c (process_symtab): Use simple_object_internal_read. Index: libiberty/simple-object.c === --- libiberty/simple-object.c (revision 208812) +++ libiberty/simple-object.c (working copy) @@ -63,8 +63,6 @@ simple_object_internal_read (int descrip unsigned char *buffer, size_t size, const char **errmsg, int *err) { - ssize_t got; - if (lseek (descriptor, offset, SEEK_SET) 0) { *errmsg = lseek; @@ -72,15 +70,26 @@ simple_object_internal_read (int descrip return 0; } - got = read (descriptor, buffer, size); - if (got 0) + do { - *errmsg = read; - *err = errno; - return 0; + ssize_t got = read (descriptor, buffer, size); + if (got == 0) + break; + else if (got 0) + { + buffer += got; + size -= got; + } + else if (errno != EINTR) + { + *errmsg = read; + *err = errno; + return 0; + } } + while (size 0); - if ((size_t) got size) + if (size 0) { *errmsg = file too short; *err = 0; Index: lto-plugin/lto-plugin.c === --- lto-plugin/lto-plugin.c (revision 208812) +++ lto-plugin/lto-plugin.c (working copy) @@ -39,6 +39,7 @@ along with this program; see the file CO #include stdint.h #endif #include assert.h +#include errno.h #include string.h #include stdlib.h #include stdio.h @@ -817,7 +818,7 @@ process_symtab (void *data, const char * { struct plugin_objfile *obj = (struct plugin_objfile *)data; char *s; - char *secdata; + char *secdatastart, *secdata; if (strncmp (name, LTO_SECTION_PREFIX, LTO_SECTION_PREFIX_LEN) != 0) return 1; @@ -825,23 +826,40 @@ process_symtab (void *data, const char * s = strrchr (name, '.'); if (s) sscanf (s, .% PRI_LL x, obj-out-id); - secdata = xmalloc (length); + secdata = secdatastart = xmalloc (length); offset += obj-file-offset; - if (offset != lseek (obj-file-fd, offset, SEEK_SET) - || length != read (obj-file-fd, secdata, length)) + if (offset != lseek (obj-file-fd, offset, SEEK_SET)) +goto err; + + do { - if (message) - message (LDPL_FATAL, %s: corrupt object file, obj-file-name); - /* Force claim_file_handler to abandon this file. */ - obj-found = 0; - free (secdata); - return 0; + ssize_t got = read (obj-file-fd, secdata, length); + if (got == 0) + break; + else if (got 0) + { + secdata += got; + length -= got; + } + else if (errno != EINTR) + goto err; } + while (length 0); + if (length 0) +goto err; - translate (secdata, secdata + length, obj-out); + translate (secdatastart, secdata, obj-out); obj-found++; - free (secdata); + free (secdatastart); return 1; + +err: +
[Patch ARM] Fix A12 rule for arm-none-eabi / t-aprofile.
Hi, This affects only arm-none-eabi targets and those using t-aprofile in their multilib lists. The problem here is that when the A12 support was added, we mistakenly added this to the MULTILIB_MATCHES rule for the A15 rather than putting out a separate line for this. Fixed thusly and verified that the correct multilibs are now chosen. Applied to trunk as nearly obvious. regards, Ramana 2014-03-28 Ramana Radhakrishnan ramana.radhakrish...@arm.com * config/arm/t-aprofile (MULTILIB_MATCHES): Correct A12 rule. Index: gcc/config/arm/t-aprofile === --- gcc/config/arm/t-aprofile (revision 208895) +++ gcc/config/arm/t-aprofile (working copy) @@ -81,7 +81,8 @@ MULTILIB_EXCEPTIONS+= *march=armv7ve MULTILIB_MATCHES += march?armv7-a=mcpu?cortex-a8 MULTILIB_MATCHES += march?armv7-a=mcpu?cortex-a9 MULTILIB_MATCHES += march?armv7-a=mcpu?cortex-a5 -MULTILIB_MATCHES += march?armv7ve=mcpu?cortex-a15=mcpu?cortex-a12 +MULTILIB_MATCHES += march?armv7ve=mcpu?cortex-a15 +MULTILIB_MATCHES += march?armv7ve=mcpu?cortex-a12 MULTILIB_MATCHES += march?armv7ve=mcpu?cortex-a15.cortex-a7 MULTILIB_MATCHES += march?armv8-a=mcpu?cortex-a53 MULTILIB_MATCHES += march?armv8-a=mcpu?cortex-a57 -- Ramana Radhakrishnan Principal Engineer ARM Ltd.
[DOC PATCH] Clarify docs about stmt exprs (PR c/51088)
PR51088 contains some Really Bizzare code. We should tell users not to do any shenanigans like that. Ok for trunk? 2014-03-28 Marek Polacek pola...@redhat.com PR c/51088 * doc/extend.texi (Statement Exprs): Add note about taking addresses of labels inside of statement expressions. diff --git gcc/doc/extend.texi gcc/doc/extend.texi index f9114ab..215d0a2 100644 --- gcc/doc/extend.texi +++ gcc/doc/extend.texi @@ -206,6 +206,9 @@ Jumping into a statement expression with @code{goto} or using a @code{case} or @code{default} label inside the statement expression is not permitted. Jumping into a statement expression with a computed @code{goto} (@pxref{Labels as Values}) has undefined behavior. +Taking the address of a label declared inside of a statement +expression from outside of the statement expression has undefined +behavior. Jumping out of a statement expression is permitted, but if the statement expression is part of a larger expression then it is unspecified which other subexpressions of that expression have been Marek
Re: C++ PATCH for c++/60566 (dtor devirtualization and missing thunks)
Jakub Jelinek ja...@redhat.com writes: Seems it fails on all ilp32 targets I've tried and succeeds on all lp64 targets (including ia64), so I think we should do following. Ok for trunk? Looks right to me, but I'd like to defer to Jason as the subject-matter expert. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH][ARM/AArch64][1/2] Crypto intrinsics tuning for Cortex-A53 - type Attribute restructuring
On Tue, Mar 25, 2014 at 3:51 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This two-patch series adds scheduling information for the ARMv8-A Crypto instructions on the Cortex-A53. This first patch does some preliminary restructuring to allow the arm and aarch64 backends to share the is_neon_type attribute. It also splits the crypto_aes type into crypto_aese and crypto_aesmc since the aese/aesd and aesmc/aesimc instructions will be treated differently (in patch 2/2). This patch touches both arm and aarch64 backends since there's no clean way to split it into per-backend patches without breaking each one. Tested and bootstrapped on arm-none-linux-gnueabihf and on aarch64-none-linux-gnu. This patch is fairly uncontroversial and doesn't change functionality or code generation by itself. I'll leave it to the maintainers to decide when this should go in... The real question is about patch #2. So this going in just depends on patch #2. Ramana Thanks, Kyrill 2014-03-25 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64-simd.md (aarch64_crypto_aesaes_opv16qi): Use crypto_aese type. (aarch64_crypto_aesaesmc_opv16qi): Use crypto_aesmc type. * config/arm/arm.md (is_neon_type): Replace crypto_aes with crypto_aese, crypto_aesmc. Move to types.md. * config/arm/types.md (crypto_aes): Split into crypto_aese, crypto_aesmc. * config/arm/iterators.md (crypto_type): Likewise.
[AArch64/ARM 0/3] Patch series for TRN Intrinsics
Much like the ZIP and UZP intrinsics, the vtrn[q]_* intrinsics are implemented with inline __asm__, which blocks compiler analysis. This series replaces those calls with __builtin_shuffle, which produce the same** assembler instructions. ** except for two-element vectors, where UZP, ZIP and TRN are all equivalent and the backend chooses to output ZIP. The first patch adds a bunch of tests, passing for the current asm implementation; the second patch reimplements with __builtin_shuffle; the third patch adds equivalent ARM tests using test bodies shared from the first patch. OK for stage 1? Cheers, Alan
patch to fix PR60675
The following patch fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60675 LRA assigned hard reg 30 to TImode subreg of DImode pseudo but it was wrong as hard reg 31 is unavailable for the allocation. The patch was bootstrapped and tested on x86-64 and aarch64. Committed as rev. 208900. 2014-03-28 Vladimir Makarov vmaka...@redhat.com PR target/60675 * lra-assigns.c (find_hard_regno_for): Remove unavailable hard regs from checking multi-reg pseudos. 2014-03-28 Vladimir Makarov vmaka...@redhat.com PR target/60675 * gcc.target/aarch64/pr60675.C: New. Index: lra-assigns.c === --- lra-assigns.c (revision 208895) +++ lra-assigns.c (working copy) @@ -473,7 +473,7 @@ find_hard_regno_for (int regno, int *cos enum reg_class rclass; bitmap_iterator bi; bool *rclass_intersect_p; - HARD_REG_SET impossible_start_hard_regs; + HARD_REG_SET impossible_start_hard_regs, available_regs; COPY_HARD_REG_SET (conflict_set, lra_no_alloc_regs); rclass = regno_allocno_class_array[regno]; @@ -586,6 +586,8 @@ find_hard_regno_for (int regno, int *cos biggest_nregs = hard_regno_nregs[hard_regno][biggest_mode]; nregs_diff = (biggest_nregs - hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (regno)]); + COPY_HARD_REG_SET (available_regs, reg_class_contents[rclass]); + AND_COMPL_HARD_REG_SET (available_regs, lra_no_alloc_regs); for (i = 0; i rclass_size; i++) { if (try_only_hard_regno = 0) @@ -601,9 +603,9 @@ find_hard_regno_for (int regno, int *cos (nregs_diff == 0 || (WORDS_BIG_ENDIAN ? (hard_regno - nregs_diff = 0 - TEST_HARD_REG_BIT (reg_class_contents[rclass], + TEST_HARD_REG_BIT (available_regs, hard_regno - nregs_diff)) - : TEST_HARD_REG_BIT (reg_class_contents[rclass], + : TEST_HARD_REG_BIT (available_regs, hard_regno + nregs_diff { if (hard_regno_costs_check[hard_regno] Index: testsuite/gcc.target/aarch64/pr60675.C === --- testsuite/gcc.target/aarch64/pr60675.C (revision 0) +++ testsuite/gcc.target/aarch64/pr60675.C (working copy) @@ -0,0 +1,277 @@ +/* { dg-do compile } */ +/* { dg-options -std=c++11 -w -O2 -fPIC } */ +namespace CLHEP { + static const double meter = 1000.*10; + static const double meter2 = meter*meter; + static const double megaelectronvolt = 1. ; + static const double gigaelectronvolt = 1.e+3; + static const double GeV = gigaelectronvolt; + static const double megavolt = megaelectronvolt; + static const double volt = 1.e-6*megavolt; + static const double tesla = volt*1.e+9/meter2; +} + using CLHEP::GeV; + using CLHEP::tesla; + namespace std { + typedef long int ptrdiff_t; +} + extern C { +extern double cos (double __x) throw (); +extern double sin (double __x) throw (); +extern double sqrt (double __x) throw (); +} + namespace std __attribute__ ((__visibility__ (default))) { + using ::cos; + using ::sin; + using ::sqrt; + templateclass _CharT struct char_traits; + templatetypename _CharT, typename _Traits = char_traits_CharT struct basic_ostream; + typedef basic_ostreamchar ostream; + templatetypename _Iterator struct iterator_traits { }; + templatetypename _Tp struct iterator_traits_Tp* { +typedef ptrdiff_t difference_type; +typedef _Tp reference; + }; +} + namespace __gnu_cxx __attribute__ ((__visibility__ (default))) { + using std::iterator_traits; + templatetypename _Iterator, typename _Container struct __normal_iterator { +_Iterator _M_current; +typedef iterator_traits_Iterator __traits_type; +typedef typename __traits_type::difference_type difference_type; +typedef typename __traits_type::reference reference; +explicit __normal_iterator(const _Iterator __i) : _M_current(__i) { } +reference operator*() const { + return *_M_current; + } +__normal_iterator operator+(difference_type __n) const { + return __normal_iterator(_M_current + __n); + } + }; + templatetypename _Tp struct new_allocator { + }; +} + namespace std __attribute__ ((__visibility__ (default))) { + templatetypename _Tp struct allocator: public __gnu_cxx::new_allocator_Tp { +}; + struct ios_base { }; + templatetypename _CharT, typename _Traits struct basic_ios : public ios_base { }; + templatetypename _CharT, typename _Traits struct basic_ostream : virtual public
Re: [PATCH][ARM/AArch64][2/2] Crypto intrinsics tuning for Cortex-A53 - pipeline description
On Fri, 28 Mar 2014, Ramana Radhakrishnan wrote: On Tue, Mar 25, 2014 at 3:52 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, In ARMv8-A there's a general expectation that AESE/AESMC and AESD/AESIMC sequences of the form: AESE Vn, _ AESMC Vn, Vn will issue both instructions in a single cycle on super-scalar implementations. It would be nice to model that in our pipeline descriptions. This patch defines a function to detect such pairs and uses it in the pipeline description for these instructions for the Cortex-A53. The patch also adds some missed AdvancedSIMD information to the pipeline description for the Cortex-A53. Bootstrapped and tested on arm-none-linux-gnueabihf and aarch64-none-linux-gnu. Cortex-A53 scheduling is the default scheduling description on aarch64 so this patch can change default behaviour. That's an argument for taking this in stage1 or maybe backporting it into 4.9.1 once the release is made. To my mind on ARM / AArch64 this actually helps anyone using the crypto intrinsics on A53 hardware today and it would be good to get this into 4.9. Again I perceive this as low risk on ARM (AArch32) as this is not a default tuning option for any large software vendors, the folks using this are typically the ones that write the more specialized crypto intrinsics rather than just general purpose code. However this will help with scheduling on what is essentially an in-order core, so would be nice to have. This would definitely need approval from the AArch64 maintainers and the RMs to go in at this stage. If not, we should consider this for 4.9.1 I'd rather have it in 4.9.0 than 4.9.1. Richard. regards Ramana What do people think? Thanks, Kyrill 2014-03-25 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/aarch-common.c (aarch_crypto_can_dual_issue): New function. * config/arm/aarch-common-protos.h (aarch_crypto_can_dual_issue): Declare extern. * config/arm/cortex-a53.md: Add reservations and bypass for crypto instructions as well as AdvancedSIMD loads. -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PATCH] g++.dg: add ipa.exp file
On Fri, Mar 28, 2014 at 2:05 PM, Martin Liška mli...@suse.cz wrote: Hi, I would like to add corresponding ipa.exp file for g++ that let me run: make -k check RUNTESTFLAGS=ipa.exp You can use RUNTESTFLAGS=dg.exp=ipa/*.C Richard. Changelog: 2014-03-28 Martin Liska mli...@suse.cz * g++.dg/ipa.epx: Anologous file added to g++.dg folder. OK for trunk? Thank you, Martin
Re: stray warning from gcc's cpp
on 19/03/2014 12:03 Andriy Gapon said the following: I observe the following minor annoyance on FreeBSD systems where cpp is GCC's cpp. If a DTrace script has the following shebang line: #!/usr/sbin/dtrace -Cs then the following warning is produced when the script is run: cc1: warning: is shorter than expected Some details. dtrace(1) first forks. Then a child seeks on a file descriptor associated with the script file, so that the shebang line is skipped (because otherwise it would confuse cpp). Then the child makes the file descriptor its standard input and then it execs cpp. cpp performs fstat(2) on its standard input descriptor and determines that it points to a regular file. Then it verifies that a number of bytes it reads from the file is the same as a size of the file. The check makes sense if the file is opened by cpp itself, but it does not always make sense for the stdin as described above. The following patch seems to fix the issue, but perhaps there is a better / smarter alternative. A patch that implements a different approach has been committed in FreeBSD: https://github.com/freebsd/freebsd/commit/6ceecddbc Please consider. Thanks! --- a/libcpp/files.c +++ b/libcpp/files.c @@ -601,7 +601,8 @@ read_file_guts (cpp_reader *pfile, _cpp_file *file) return false; } - if (regular total != size STAT_SIZE_RELIABLE (file-st)) + if (regular total != size file-fd != 0 + STAT_SIZE_RELIABLE (file-st)) cpp_error (pfile, CPP_DL_WARNING, %s is shorter than expected, file-path); -- Andriy Gapon
Re: C++ PATCH for c++/60566 (dtor devirtualization and missing thunks)
On 03/28/2014 06:31 AM, Jakub Jelinek wrote: Ok for trunk? Yes, thanks. Jason
Re: [C++ patch] for C++/52369
On 03/27/2014 05:32 PM, Fabien Chêne wrote: + permerror (DECL_SOURCE_LOCATION (current_function_decl), +uninitialized reference member in %q#T, type); + inform (DECL_SOURCE_LOCATION (member), + %q#D should be initialized, member); The inform should only happen if permerror returns true (i.e. without -fpermissive -w). OK with that change. Jason
Re: [PATCH][AArch64][2/3] Recognise rev16 operations on SImode and DImode data
On 19 March 2014 09:55, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: [gcc/] 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.md (rev16mode2): New pattern. (rev16mode2_alt): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle rev16 case. * config/arm/aarch-common.c (aarch_rev16_shright_mask_imm_p): New. (aarch_rev16_shleft_mask_imm_p): Likewise. (aarch_rev16_p_1): Likewise. (aarch_rev16_p): Likewise. * config/arm/aarch-common-protos.h (aarch_rev16_p): Declare extern. (aarch_rev16_shright_mask_imm_p): Likewise. (aarch_rev16_shleft_mask_imm_p): Likewise. [gcc/testsuite/] 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/rev16_1.c: New test. The aarch64 parts are OK for stage-1. /Marcus
Re: [PATCH] Fix PR c++/60573
OK. Jason
Re: [PATCH][AArch64] Add handling of bswap operations in rtx costs
On 19 March 2014 09:56, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch depends on the series started at http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00933.html but is not really a part of it. It just adds costing of the bswap operation using the new rev field in the rtx cost tables since we have patterns in aarch64.md that handle bswap by generating rev16 instructions. Tested aarch64-none-elf. Ok for stage1 after that series goes in? 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle BSWAP. This is OK in stage 1. /Marcus
Re: [PATCH][ARM/AArch64][1/2] Crypto intrinsics tuning for Cortex-A53 - type Attribute restructuring
On 28/03/14 14:18, Ramana Radhakrishnan wrote: On Tue, Mar 25, 2014 at 3:51 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This two-patch series adds scheduling information for the ARMv8-A Crypto instructions on the Cortex-A53. This first patch does some preliminary restructuring to allow the arm and aarch64 backends to share the is_neon_type attribute. It also splits the crypto_aes type into crypto_aese and crypto_aesmc since the aese/aesd and aesmc/aesimc instructions will be treated differently (in patch 2/2). This patch touches both arm and aarch64 backends since there's no clean way to split it into per-backend patches without breaking each one. Tested and bootstrapped on arm-none-linux-gnueabihf and on aarch64-none-linux-gnu. This patch is fairly uncontroversial and doesn't change functionality or code generation by itself. I'll leave it to the maintainers to decide when this should go in... The real question is about patch #2. So this going in just depends on patch #2. #2 has been ok'd. Can I take this as an approval for this patch? Kyrill Ramana Thanks, Kyrill 2014-03-25 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64-simd.md (aarch64_crypto_aesaes_opv16qi): Use crypto_aese type. (aarch64_crypto_aesaesmc_opv16qi): Use crypto_aesmc type. * config/arm/arm.md (is_neon_type): Replace crypto_aes with crypto_aese, crypto_aesmc. Move to types.md. * config/arm/types.md (crypto_aes): Split into crypto_aese, crypto_aesmc. * config/arm/iterators.md (crypto_type): Likewise.
Re: [PATCH][ARM/AArch64][2/2] Crypto intrinsics tuning for Cortex-A53 - pipeline description
On Fri, Mar 28, 2014 at 04:52:39PM +, Marcus Shawcroft wrote: On 28 March 2014 14:52, Ramana Radhakrishnan ramana@googlemail.com wrote: To my mind on ARM / AArch64 this actually helps anyone using the crypto intrinsics on A53 hardware today and it would be good to get this into 4.9. Again I perceive this as low risk on ARM (AArch32) as this is not a default tuning option for any large software vendors, the folks using this are typically the ones that write the more specialized crypto intrinsics rather than just general purpose code. However this will help with scheduling on what is essentially an in-order core, so would be nice to have. This would definitely need approval from the AArch64 maintainers and the RMs to go in at this stage. If not, we should consider this for 4.9.1 regards Ramana What do people think? Its low risk, I think it should go in now. Ok with me. Jakub
Re: [PATCH, PR 60647] Check that actual argument types match those of formal parameters before IPA-SRA
On Fri, Mar 28, 2014 at 05:35:12PM +0100, Martin Jambor wrote: after much confusion on my part, this is the proper fix for PR 60647. IPA-SRA can get confused a lot when a formal parameter is a pointer but the corresponding actual argument is not. So this patch adds a check that their types pass useless_type_conversion_p check. Bootstrapped and tested on x86_64-linux. OK for trunk? Ok, thanks. 2014-03-28 Martin Jambor mjam...@suse.cz PR middle-end/60647 * tree-sra.c (callsite_has_enough_arguments_p): Renamed to callsite_arguments_match_p. Updated all callers. Also check types of corresponding formal parameters and actual arguments. (not_all_callers_have_enough_arguments_p) Renamed to some_callers_have_mismatched_arguments_p. testsuite/ * gcc.dg/pr60647-1.c: New test. * gcc.dg/pr60647-2.c: Likewise. Jakub
Re: [PATCH][ARM/AArch64][2/2] Crypto intrinsics tuning for Cortex-A53 - pipeline description
On 28 March 2014 14:52, Ramana Radhakrishnan ramana@googlemail.com wrote: To my mind on ARM / AArch64 this actually helps anyone using the crypto intrinsics on A53 hardware today and it would be good to get this into 4.9. Again I perceive this as low risk on ARM (AArch32) as this is not a default tuning option for any large software vendors, the folks using this are typically the ones that write the more specialized crypto intrinsics rather than just general purpose code. However this will help with scheduling on what is essentially an in-order core, so would be nice to have. This would definitely need approval from the AArch64 maintainers and the RMs to go in at this stage. If not, we should consider this for 4.9.1 regards Ramana What do people think? Its low risk, I think it should go in now. /Marcus
Re: [PATCH][ARM/AArch64][1/2] Crypto intrinsics tuning for Cortex-A53 - type Attribute restructuring
On Fri, Mar 28, 2014 at 5:18 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: On 28/03/14 14:18, Ramana Radhakrishnan wrote: On Tue, Mar 25, 2014 at 3:51 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This two-patch series adds scheduling information for the ARMv8-A Crypto instructions on the Cortex-A53. This first patch does some preliminary restructuring to allow the arm and aarch64 backends to share the is_neon_type attribute. It also splits the crypto_aes type into crypto_aese and crypto_aesmc since the aese/aesd and aesmc/aesimc instructions will be treated differently (in patch 2/2). This patch touches both arm and aarch64 backends since there's no clean way to split it into per-backend patches without breaking each one. Tested and bootstrapped on arm-none-linux-gnueabihf and on aarch64-none-linux-gnu. This patch is fairly uncontroversial and doesn't change functionality or code generation by itself. I'll leave it to the maintainers to decide when this should go in... The real question is about patch #2. So this going in just depends on patch #2. #2 has been ok'd. Can I take this as an approval for this patch? Yes - Ramana Kyrill Ramana Thanks, Kyrill 2014-03-25 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64-simd.md (aarch64_crypto_aesaes_opv16qi): Use crypto_aese type. (aarch64_crypto_aesaesmc_opv16qi): Use crypto_aesmc type. * config/arm/arm.md (is_neon_type): Replace crypto_aes with crypto_aese, crypto_aesmc. Move to types.md. * config/arm/types.md (crypto_aes): Split into crypto_aese, crypto_aesmc. * config/arm/iterators.md (crypto_type): Likewise.
UBSan fix: avoid undefined behaviour in bitmask
UBSan detected that we were trying to set a non-existent bit in a mask. I don't think it has mattered before now because when this happens the value in the mask is not used. However, better safe than sorry. Andrew. 2014-03-28 Andrew Haley a...@redhat.com * boehm.c (mark_reference_fields): Avoid unsigned integer overflow when calculating an index into a bitmap descriptor. Index: gcc/java/boehm.c === --- gcc/java/boehm.c(revision 208839) +++ gcc/java/boehm.c(working copy) @@ -107,7 +107,11 @@ bits for all words in the record. This is conservative, but the size_words != 1 case is impossible in regular java code. */ for (i = 0; i size_words; ++i) - *mask = (*mask).set_bit (ubit - count - i - 1); + { + int bitpos = ubit - count - i - 1; + if (bitpos = 0) + *mask = (*mask).set_bit (bitpos); + } if (count = ubit - 2) *pointer_after_end = 1;
[PATCH, PR 60647] Check that actual argument types match those of formal parameters before IPA-SRA
Hi, after much confusion on my part, this is the proper fix for PR 60647. IPA-SRA can get confused a lot when a formal parameter is a pointer but the corresponding actual argument is not. So this patch adds a check that their types pass useless_type_conversion_p check. Bootstrapped and tested on x86_64-linux. OK for trunk? Thanks, Martin 2014-03-28 Martin Jambor mjam...@suse.cz PR middle-end/60647 * tree-sra.c (callsite_has_enough_arguments_p): Renamed to callsite_arguments_match_p. Updated all callers. Also check types of corresponding formal parameters and actual arguments. (not_all_callers_have_enough_arguments_p) Renamed to some_callers_have_mismatched_arguments_p. testsuite/ * gcc.dg/pr60647-1.c: New test. * gcc.dg/pr60647-2.c: Likewise. diff --git a/gcc/testsuite/gcc.dg/pr60647-1.c b/gcc/testsuite/gcc.dg/pr60647-1.c new file mode 100644 index 000..73ea856 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr60647-1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +struct _wincore +{ + int y; + int width; +}; +int a; +static fn1 (dpy, winInfo) struct _XDisplay *dpy; +struct _wincore *winInfo; +{ + a = winInfo-width; + fn2 (); +} + +static fn3 (dpy, winInfo, visrgn) struct _XDisplay *dpy; +{ + int b = fn1 (0, winInfo); + fn4 (0, 0, visrgn); +} + +fn5 (event) struct _XEvent *event; +{ + fn3 (0, 0, 0); +} diff --git a/gcc/testsuite/gcc.dg/pr60647-2.c b/gcc/testsuite/gcc.dg/pr60647-2.c new file mode 100644 index 000..ddeb117 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr60647-2.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +struct _wincore +{ + int width, height; +}; + +static void +foo (void *dpy, struct _wincore *winInfo, int offset) +{ + fn1 (winInfo-height); +} + +static void +bar (void *dpy, int winInfo, int *visrgn) +{ + ((void (*) (void *, int, int)) foo) ((void *) 0, winInfo, 0); /* { dg-warning function called through a non-compatible type } */ + fn2 (0, 0, visrgn); +} + +void +baz (void *dpy, int win, int prop) +{ + bar ((void *) 0, 0, (int *) 0); +} diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 284d544..ffef13d 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -1234,12 +1234,26 @@ asm_visit_addr (gimple, tree op, tree, void *) } /* Return true iff callsite CALL has at least as many actual arguments as there - are formal parameters of the function currently processed by IPA-SRA. */ + are formal parameters of the function currently processed by IPA-SRA and + that their types match. */ static inline bool -callsite_has_enough_arguments_p (gimple call) +callsite_arguments_match_p (gimple call) { - return gimple_call_num_args (call) = (unsigned) func_param_count; + if (gimple_call_num_args (call) (unsigned) func_param_count) +return false; + + tree parm; + int i; + for (parm = DECL_ARGUMENTS (current_function_decl), i = 0; + parm; + parm = DECL_CHAIN (parm), i++) +{ + tree arg = gimple_call_arg (call, i); + if (!useless_type_conversion_p (TREE_TYPE (parm), TREE_TYPE (arg))) + return false; +} + return true; } /* Scan function and look for interesting expressions and create access @@ -1294,7 +1308,7 @@ scan_function (void) if (recursive_call_p (current_function_decl, dest)) { encountered_recursive_call = true; - if (!callsite_has_enough_arguments_p (stmt)) + if (!callsite_arguments_match_p (stmt)) encountered_unchangable_recursive_call = true; } } @@ -4750,16 +4764,17 @@ sra_ipa_reset_debug_stmts (ipa_parm_adjustment_vec adjustments) } } -/* Return false iff all callers have at least as many actual arguments as there - are formal parameters in the current function. */ +/* Return false if all callers have at least as many actual arguments as there + are formal parameters in the current function and that their types + match. */ static bool -not_all_callers_have_enough_arguments_p (struct cgraph_node *node, -void *data ATTRIBUTE_UNUSED) +some_callers_have_mismatched_arguments_p (struct cgraph_node *node, + void *data ATTRIBUTE_UNUSED) { struct cgraph_edge *cs; for (cs = node-callers; cs; cs = cs-next_caller) -if (!callsite_has_enough_arguments_p (cs-call_stmt)) +if (!callsite_arguments_match_p (cs-call_stmt)) return true; return false; @@ -4970,12 +4985,13 @@ ipa_early_sra (void) goto simple_out; } - if (cgraph_for_node_and_aliases (node, not_all_callers_have_enough_arguments_p, + if (cgraph_for_node_and_aliases (node, + some_callers_have_mismatched_arguments_p, NULL, true)) { if
Re: [DOC PATCH] Clarify docs about stmt exprs (PR c/51088)
On Fri, 28 Mar 2014, Marek Polacek wrote: PR51088 contains some Really Bizzare code. We should tell users not to do any shenanigans like that. Ok for trunk? I don't think a doc patch resolves this bug. The compiler should never generate code with an undefined reference to a local label like that; either the code should get a compile-time error (that's what I suggest), or it should generate output that links but has undefined behavior at runtime. -- Joseph S. Myers jos...@codesourcery.com
[PATCH, PR 60640] When creating virtual clones, clone thunks too
Hi, this patch fixes PR 60640 by creating thunks to clones when that is necessary to properly redirect edges to them. I mostly does what cgraph_add_thunk does and what analyze_function does to thunks. It fixes the testcases on trunk (it does not apply to 4.8, I have not looked how easily fixable that it) and passes bootstrap and testing on x86_64-linux. OK for trunk? Thanks, Martin 2014-03-26 Martin Jambor mjam...@suse.cz * cgraph.h (cgraph_clone_node): New parameter added to declaration. Adjust all callers. * cgraphclones.c (build_function_type_skip_args): Moved upwards in the file. (build_function_decl_skip_args): Likewise. (duplicate_thunk_for_node): New function. (redirect_edge_duplicating_thunks): Likewise. (cgraph_clone_node): New parameter args_to_skip, pass it to redirect_edge_duplicating_thunks which is called instead of cgraph_redirect_edge_callee. (cgraph_create_virtual_clone): Pass args_to_skip to cgraph_clone_node. testsuite/ * g++.dg/ipa/pr60640-1.C: New test. * g++.dg/ipa/pr60640-2.C: Likewise. Index: src/gcc/cgraph.h === --- src.orig/gcc/cgraph.h +++ src/gcc/cgraph.h @@ -890,7 +890,7 @@ struct cgraph_edge * cgraph_clone_edge ( unsigned, gcov_type, int, bool); struct cgraph_node * cgraph_clone_node (struct cgraph_node *, tree, gcov_type, int, bool, veccgraph_edge_p, - bool, struct cgraph_node *); + bool, struct cgraph_node *, bitmap); tree clone_function_name (tree decl, const char *); struct cgraph_node * cgraph_create_virtual_clone (struct cgraph_node *old_node, veccgraph_edge_p, Index: src/gcc/cgraphclones.c === --- src.orig/gcc/cgraphclones.c +++ src/gcc/cgraphclones.c @@ -168,6 +168,183 @@ cgraph_clone_edge (struct cgraph_edge *e return new_edge; } +/* Build variant of function type ORIG_TYPE skipping ARGS_TO_SKIP and the + return value if SKIP_RETURN is true. */ + +static tree +build_function_type_skip_args (tree orig_type, bitmap args_to_skip, + bool skip_return) +{ + tree new_type = NULL; + tree args, new_args = NULL, t; + tree new_reversed; + int i = 0; + + for (args = TYPE_ARG_TYPES (orig_type); args args != void_list_node; + args = TREE_CHAIN (args), i++) +if (!args_to_skip || !bitmap_bit_p (args_to_skip, i)) + new_args = tree_cons (NULL_TREE, TREE_VALUE (args), new_args); + + new_reversed = nreverse (new_args); + if (args) +{ + if (new_reversed) +TREE_CHAIN (new_args) = void_list_node; + else + new_reversed = void_list_node; +} + + /* Use copy_node to preserve as much as possible from original type + (debug info, attribute lists etc.) + Exception is METHOD_TYPEs must have THIS argument. + When we are asked to remove it, we need to build new FUNCTION_TYPE + instead. */ + if (TREE_CODE (orig_type) != METHOD_TYPE + || !args_to_skip + || !bitmap_bit_p (args_to_skip, 0)) +{ + new_type = build_distinct_type_copy (orig_type); + TYPE_ARG_TYPES (new_type) = new_reversed; +} + else +{ + new_type += build_distinct_type_copy (build_function_type (TREE_TYPE (orig_type), +new_reversed)); + TYPE_CONTEXT (new_type) = TYPE_CONTEXT (orig_type); +} + + if (skip_return) +TREE_TYPE (new_type) = void_type_node; + + /* This is a new type, not a copy of an old type. Need to reassociate + variants. We can handle everything except the main variant lazily. */ + t = TYPE_MAIN_VARIANT (orig_type); + if (t != orig_type) +{ + t = build_function_type_skip_args (t, args_to_skip, skip_return); + TYPE_MAIN_VARIANT (new_type) = t; + TYPE_NEXT_VARIANT (new_type) = TYPE_NEXT_VARIANT (t); + TYPE_NEXT_VARIANT (t) = new_type; +} + else +{ + TYPE_MAIN_VARIANT (new_type) = new_type; + TYPE_NEXT_VARIANT (new_type) = NULL; +} + + return new_type; +} + +/* Build variant of function decl ORIG_DECL skipping ARGS_TO_SKIP and the + return value if SKIP_RETURN is true. + + Arguments from DECL_ARGUMENTS list can't be removed now, since they are + linked by TREE_CHAIN directly. The caller is responsible for eliminating + them when they are being duplicated (i.e. copy_arguments_for_versioning). */ + +static tree +build_function_decl_skip_args (tree orig_decl, bitmap args_to_skip, + bool skip_return) +{ + tree new_decl = copy_node (orig_decl); + tree new_type; + + new_type = TREE_TYPE (orig_decl); + if (prototype_p (new_type) + || (skip_return !VOID_TYPE_P
Changing INT to SI mode
Test pr59940.c is failing for AVR target because the test assumes the size of int as 32 bit and test expect to generate warnings for overflow and conversion while assigning 36-bit and 32 bit value respectively to variable si. Following patch define a 32 bit type with SI mode and use it. 2014-03-28 Vishnu K S vishnu@atmel.com * gcc/testsuite/gcc.dg/pr59940.c: Using 32-bit SI mode instead of int diff --git a/gcc/testsuite/gcc.dg/pr59940.c b/gcc/testsuite/gcc.dg/pr59940.c index b0fd17f..21d93ad 100644 --- a/gcc/testsuite/gcc.dg/pr59940.c +++ b/gcc/testsuite/gcc.dg/pr59940.c @@ -3,11 +3,12 @@ /* { dg-options -Wconversion -Woverflow } */ int f (unsigned int); +typedef sitype __attribute__((mode(SI))); int g (void) { - int si = 12; + sitype si = 12; unsigned int ui = -1; /* { dg-warning 21:negative integer implicitly converted to unsigned type } */ unsigned char uc; ui = si; /* { dg-warning 8:conversion } */
Re: [PATCH][ARM][1/3] Add rev field to rtx cost tables
On Wed, Mar 19, 2014 at 9:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, In order to properly cost the rev16 instruction we need a new field in the cost tables. This patch adds that and specifies its value for the existing cost tables. Since rev16 is used to implement the BSWAP operation we add handling of that in the rtx cost function using the new field. Tested on arm-none-eabi and bootstrapped on an arm linux target. Does it look ok for stage1? Ok for stage1 if no regressions. Thanks, Kyrill 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/aarch-common-protos.h (alu_cost_table): Add rev field. * config/arm/aarch-cost-tables.h (generic_extra_costs): Specify rev cost. (cortex_a53_extra_costs): Likewise. (cortex_a57_extra_costs): Likewise. * config/arm/arm.c (cortexa9_extra_costs): Likewise. (cortexa7_extra_costs): Likewise. (cortexa12_extra_costs): Likewise. (cortexa15_extra_costs): Likewise. (v7m_extra_costs): Likewise. (arm_new_rtx_costs): Handle BSWAP.
Re: [PATCH][ARM][3/3] Recognise bitwise operations leading to SImode rev16
On Wed, Mar 19, 2014 at 9:56 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This is the arm equivalent of patch [2/3] in the series that adds combine patterns for the bitwise operations leading to a rev16 instruction. It reuses the functions that were put in aarch-common.c to properly cost these operations. I tried matching a DImode rev16 (with the intent of splitting it into two rev16 ops) like aarch64 but combine wouldn't try to match that bitwise pattern in DImode like aarch64 does. Instead it tries various exotic combinations with subregs. Tested arm-none-eabi, bootstrap on arm-none-linux-gnueabihf. Ok for stage1? This is OK for stage1 . Ramana [gcc/] 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.md (arm_rev16si2): New pattern. (arm_rev16si2_alt): Likewise. * config/arm/arm.c (arm_new_rtx_costs): Handle rev16 case. [gcc/testsuite/] 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/arm/rev16.c: New test.
[AArch64/ARM 1/3] Add execution + assembler tests of AArch64 TRN Intrinsics
This adds DejaGNU tests of the existing AArch64 vuzp_* intrinsics, both checking the assembler output and the runtime results. Test bodies are in separate files ready to reuse for ARM in the third patch. Putting these in a new subdirectory with the ZIP Intrinsics tests, using simd.exp added there (will commit ZIP tests first). All tests passing on aarch64-none-elf and aarch64_be-none-elf. testsuite/ChangeLog: 2012-03-28 Alan Lawrence alan.lawre...@arm.com * gcc.target/aarch64/simd/vtrnf32_1.c: New file. * gcc.target/aarch64/simd/vtrnf32.x: New file. * gcc.target/aarch64/simd/vtrnp16_1.c: New file. * gcc.target/aarch64/simd/vtrnp16.x: New file. * gcc.target/aarch64/simd/vtrnp8_1.c: New file. * gcc.target/aarch64/simd/vtrnp8.x: New file. * gcc.target/aarch64/simd/vtrnqf32_1.c: New file. * gcc.target/aarch64/simd/vtrnqf32.x: New file. * gcc.target/aarch64/simd/vtrnqp16_1.c: New file. * gcc.target/aarch64/simd/vtrnqp16.x: New file. * gcc.target/aarch64/simd/vtrnqp8_1.c: New file. * gcc.target/aarch64/simd/vtrnqp8.x: New file. * gcc.target/aarch64/simd/vtrnqs16_1.c: New file. * gcc.target/aarch64/simd/vtrnqs16.x: New file. * gcc.target/aarch64/simd/vtrnqs32_1.c: New file. * gcc.target/aarch64/simd/vtrnqs32.x: New file. * gcc.target/aarch64/simd/vtrnqs8_1.c: New file. * gcc.target/aarch64/simd/vtrnqs8.x: New file. * gcc.target/aarch64/simd/vtrnqu16_1.c: New file. * gcc.target/aarch64/simd/vtrnqu16.x: New file. * gcc.target/aarch64/simd/vtrnqu32_1.c: New file. * gcc.target/aarch64/simd/vtrnqu32.x: New file. * gcc.target/aarch64/simd/vtrnqu8_1.c: New file. * gcc.target/aarch64/simd/vtrnqu8.x: New file. * gcc.target/aarch64/simd/vtrns16_1.c: New file. * gcc.target/aarch64/simd/vtrns16.x: New file. * gcc.target/aarch64/simd/vtrns32_1.c: New file. * gcc.target/aarch64/simd/vtrns32.x: New file. * gcc.target/aarch64/simd/vtrns8_1.c: New file. * gcc.target/aarch64/simd/vtrns8.x: New file. * gcc.target/aarch64/simd/vtrnu16_1.c: New file. * gcc.target/aarch64/simd/vtrnu16.x: New file. * gcc.target/aarch64/simd/vtrnu32_1.c: New file. * gcc.target/aarch64/simd/vtrnu32.x: New file. * gcc.target/aarch64/simd/vtrnu8_1.c: New file. * gcc.target/aarch64/simd/vtrnu8.x: New file.diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vtrnf32.x b/gcc/testsuite/gcc.target/aarch64/simd/vtrnf32.x new file mode 100644 index 000..7b03e6b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vtrnf32.x @@ -0,0 +1,27 @@ +extern void abort (void); + +float32x2x2_t +test_vtrnf32 (float32x2_t _a, float32x2_t _b) +{ + return vtrn_f32 (_a, _b); +} + +int +main (int argc, char **argv) +{ + int i; + float32_t first[] = {1, 2}; + float32_t second[] = {3, 4}; + float32x2x2_t result = test_vtrnf32 (vld1_f32 (first), vld1_f32 (second)); + float32x2_t res1 = result.val[0], res2 = result.val[1]; + float32_t exp1[] = {1, 3}; + float32_t exp2[] = {2, 4}; + float32x2_t expected1 = vld1_f32 (exp1); + float32x2_t expected2 = vld1_f32 (exp2); + + for (i = 0; i 2; i++) +if ((res1[i] != expected1[i]) || (res2[i] != expected2[i])) + abort (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vtrnf32_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vtrnf32_1.c new file mode 100644 index 000..24c30a3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vtrnf32_1.c @@ -0,0 +1,11 @@ +/* Test the `vtrn_f32' AArch64 SIMD intrinsic. */ + +/* { dg-do run } */ +/* { dg-options -save-temps -fno-inline } */ + +#include arm_neon.h +#include vtrnf32.x + +/* { dg-final { scan-assembler-times trn1\[ \t\]+v\[0-9\]+\.2s, ?v\[0-9\]+\.2s, ?v\[0-9\]+\.2s!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */ +/* { dg-final { scan-assembler-times trn2\[ \t\]+v\[0-9\]+\.2s, ?v\[0-9\]+\.2s, ?v\[0-9\]+\.2s!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */ +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vtrnp16.x b/gcc/testsuite/gcc.target/aarch64/simd/vtrnp16.x new file mode 100644 index 000..5feabe4 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vtrnp16.x @@ -0,0 +1,27 @@ +extern void abort (void); + +poly16x4x2_t +test_vtrnp16 (poly16x4_t _a, poly16x4_t _b) +{ + return vtrn_p16 (_a, _b); +} + +int +main (int argc, char **argv) +{ + int i; + poly16_t first[] = {1, 2, 3, 4}; + poly16_t second[] = {5, 6, 7, 8}; + poly16x4x2_t result = test_vtrnp16 (vld1_p16 (first), vld1_p16 (second)); + poly16x4_t res1 = result.val[0], res2 = result.val[1]; + poly16_t exp1[] = {1, 5, 3, 7}; + poly16_t exp2[] = {2, 6, 4, 8}; + poly16x4_t expected1 = vld1_p16 (exp1); + poly16x4_t expected2 = vld1_p16 (exp2); + + for (i = 0; i 4; i++) +if ((res1[i] != expected1[i]) || (res2[i] != expected2[i])) +
Re: Fix PR ipa/60315 (inliner explosion)
Bootstrapped/regtested x86_64-linux, comitted. Not with Ada apparently, resulting in === acats tests === FAIL: c34007d FAIL: c34007g FAIL: c34007s FAIL: c37213j FAIL: c37213k FAIL: c37213l FAIL: ce2201g FAIL: cxa5a03 FAIL: cxa5a04 FAIL: cxa5a06 FAIL: cxg2013 FAIL: cxg2015 The problem is that by redirection to noreturn, we end up freeing SSA name of the LHS but later we still process statements that refer it until they are removed as unreachable. The following patch fixes it. I tested it on x86_64-linux, but changed my mind. I think fixup_noreturn_call should do it instead, I will send updated patch after testing. Honza Actually after some additional invetstigation I decided to commit this patch. fixup_noreturn_call already cares about the return value but differently than the new Jakub's code. We ought to unify it, but only for next stage1. Honza Index: cgraph.c === --- cgraph.c (revision 208875) +++ cgraph.c (working copy) @@ -1329,6 +1331,7 @@ gimple cgraph_redirect_edge_call_stmt_to_callee (struct cgraph_edge *e) { tree decl = gimple_call_fndecl (e-call_stmt); + tree lhs = gimple_call_lhs (e-call_stmt); gimple new_stmt; gimple_stmt_iterator gsi; #ifdef ENABLE_CHECKING @@ -1471,6 +1474,22 @@ cgraph_redirect_edge_call_stmt_to_callee update_stmt_fn (DECL_STRUCT_FUNCTION (e-caller-decl), new_stmt); } + /* If the call becomes noreturn, remove the lhs. */ + if (lhs (gimple_call_flags (new_stmt) ECF_NORETURN)) +{ + if (TREE_CODE (lhs) == SSA_NAME) + { + gsi = gsi_for_stmt (new_stmt); + + tree var = create_tmp_var (TREE_TYPE (lhs), NULL); + tree def = get_or_create_ssa_default_def + (DECL_STRUCT_FUNCTION (e-caller-decl), var); + gimple set_stmt = gimple_build_assign (lhs, def); + gsi_insert_before (gsi, set_stmt, GSI_SAME_STMT); + } + gimple_call_set_lhs (new_stmt, NULL_TREE); +} + cgraph_set_call_stmt_including_clones (e-caller, e-call_stmt, new_stmt, false); if (cgraph_dump_file)
Re: [C++ PATCH] Fix __atomic_exchange (PR c++/60689)
On 03/28/2014 06:47 AM, Jakub Jelinek wrote: * typeck.c (build_function_call_vec): Call resolve_overloaded_builtin. I expect this will break in templates if arguments are dependent. Jason
Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.
On Mar 28, 2014, at 7:48 AM, James Greenhalgh james.greenha...@arm.com wrote: On Fri, Mar 28, 2014 at 11:11:58AM +, pins...@gmail.com wrote: On Mar 28, 2014, at 2:12 AM, James Greenhalgh james.greenha...@arm.com wrote: Hi, There is no way to perform scalar addition in the vector register file, but with the RTX costs in place we start rewriting (x 1) to (x + x) on almost all cores. The code which makes this decision has no idea that we will end up doing this (it happens well before reload) and so we end up with very ugly code generation in the case where addition was selected, but we are operating in vector registers. This patch relies on the same gimmick we are already using to allow shifts on 32-bit scalars in the vector register file - Use a vector 32x2 operation instead, knowing that we can safely ignore the top bits. This restores some normality to scalar_shift_1.c, however the test that we generate a left shift by one is clearly bogus, so remove that. This patch is pretty ugly, but it does generate superficially better looking code for this testcase. Tested on aarch64-none-elf with no issues. OK for stage 1? It seems we should also discourage the neon alternatives as there might be extra movement between the two register sets which we don't want. I see your point, but we've tried to avoid doing that elsewhere in the AArch64 backend. Our argument has been that strictly speaking, it isn't that the alternative is expensive, it is the movement between the register sets. We do model that elsewhere, and the register allocator should already be trying to avoid unneccesary moves between register classes. What about on a specific core where that alternative is expensive; that is the vector instructions are worse than the scalar ones. How are we going to handle this case? Thanks, Andrew If those mechanisms are broken, we should fix them - in that case fixing this by discouraging valid alternatives would seem to be gaffer-taping over the real problem. Thanks, James Thanks, Andrew Thanks, James --- gcc/ 2014-03-27 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in vector registers. gcc/testsuite/ 2014-03-27 James Greenhalgh james.greenha...@arm.com * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler. 0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch
Re: [PATCH][ARM/AArch64][1/2] Crypto intrinsics tuning for Cortex-A53 - type Attribute restructuring
On 28/03/14 17:18, Kyrill Tkachov wrote: On 28/03/14 14:18, Ramana Radhakrishnan wrote: On Tue, Mar 25, 2014 at 3:51 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This two-patch series adds scheduling information for the ARMv8-A Crypto instructions on the Cortex-A53. This first patch does some preliminary restructuring to allow the arm and aarch64 backends to share the is_neon_type attribute. It also splits the crypto_aes type into crypto_aese and crypto_aesmc since the aese/aesd and aesmc/aesimc instructions will be treated differently (in patch 2/2). This patch touches both arm and aarch64 backends since there's no clean way to split it into per-backend patches without breaking each one. Tested and bootstrapped on arm-none-linux-gnueabihf and on aarch64-none-linux-gnu. This patch is fairly uncontroversial and doesn't change functionality or code generation by itself. I'll leave it to the maintainers to decide when this should go in... The real question is about patch #2. So this going in just depends on patch #2. #2 has been ok'd. Can I take this as an approval for this patch? I've committed this as r208908 since quite a few people ok'd the more meaty #2 patch. If anyone objects to this, we can revert it later. Kyrill Kyrill Ramana Thanks, Kyrill 2014-03-25 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64-simd.md (aarch64_crypto_aesaes_opv16qi): Use crypto_aese type. (aarch64_crypto_aesaesmc_opv16qi): Use crypto_aesmc type. * config/arm/arm.md (is_neon_type): Replace crypto_aes with crypto_aese, crypto_aesmc. Move to types.md. * config/arm/types.md (crypto_aes): Split into crypto_aese, crypto_aesmc. * config/arm/iterators.md (crypto_type): Likewise.
Re: [C++ PATCH] Fix __atomic_exchange (PR c++/60689)
On Fri, Mar 28, 2014 at 11:47:52AM +0100, Jakub Jelinek wrote: Yet another possibility would be to rename all calls in C FE to build_function_call_vec to say c_build_function_call_vec and add that function which would call resolve_overloaded_builtin and then tail call to build_function_call_vec which wouldn't do that. Then c-family/ would keep its current two calls to that function, which wouldn't recurse anymore, and we'd need to change add_atomic_size_parameter to push the argument. Here is the variant patch, which implements the above. Also bootstrapped/regtested on x86_64-linux and i686-linux. 2014-03-28 Jakub Jelinek ja...@redhat.com PR c++/60689 * c-tree.h (c_build_function_call_vec): New prototype. * c-typeck.c (build_function_call_vec): Don't call resolve_overloaded_builtin here. (c_build_function_call_vec): New wrapper function around build_function_call_vec. Call resolve_overloaded_builtin here. (convert_lvalue_to_rvalue, build_function_call, build_atomic_assign): Call c_build_function_call_vec instead of build_function_call_vec. * c-parser.c (c_parser_postfix_expression_after_primary): Likewise. * c-decl.c (finish_decl): Likewise. * c-common.c (add_atomic_size_parameter): When creating new params vector, push the size argument first. * c-c++-common/pr60689.c: New test. --- gcc/c/c-tree.h.jj 2014-02-08 00:53:15.0 +0100 +++ gcc/c/c-tree.h 2014-03-28 12:30:49.155395381 +0100 @@ -643,6 +643,8 @@ extern tree c_finish_omp_clauses (tree); extern tree c_build_va_arg (location_t, tree, tree); extern tree c_finish_transaction (location_t, tree, int); extern bool c_tree_equal (tree, tree); +extern tree c_build_function_call_vec (location_t, veclocation_t, tree, + vectree, va_gc *, vectree, va_gc *); /* Set to 0 at beginning of a function definition, set to 1 if a return statement that specifies a return value is seen. */ --- gcc/c/c-typeck.c.jj 2014-03-19 08:14:35.0 +0100 +++ gcc/c/c-typeck.c2014-03-28 12:34:57.803066414 +0100 @@ -2016,7 +2016,7 @@ convert_lvalue_to_rvalue (location_t loc params-quick_push (expr_addr); params-quick_push (tmp_addr); params-quick_push (seq_cst); - func_call = build_function_call_vec (loc, vNULL, fndecl, params, NULL); + func_call = c_build_function_call_vec (loc, vNULL, fndecl, params, NULL); /* EXPR is always read. */ mark_exp_read (exp.value); @@ -2801,7 +2801,7 @@ build_function_call (location_t loc, tre vec_alloc (v, list_length (params)); for (; params; params = TREE_CHAIN (params)) v-quick_push (TREE_VALUE (params)); - ret = build_function_call_vec (loc, vNULL, function, v, NULL); + ret = c_build_function_call_vec (loc, vNULL, function, v, NULL); vec_free (v); return ret; } @@ -2840,14 +2840,6 @@ build_function_call_vec (location_t loc, /* Convert anything with function type to a pointer-to-function. */ if (TREE_CODE (function) == FUNCTION_DECL) { - /* Implement type-directed function overloading for builtins. -resolve_overloaded_builtin and targetm.resolve_overloaded_builtin -handle all the type checking. The result is a complete expression -that implements this function call. */ - tem = resolve_overloaded_builtin (loc, function, params); - if (tem) - return tem; - name = DECL_NAME (function); if (flag_tm) @@ -2970,6 +2962,30 @@ build_function_call_vec (location_t loc, } return require_complete_type (result); } + +/* Like build_function_call_vec, but call also resolve_overloaded_builtin. */ + +tree +c_build_function_call_vec (location_t loc, veclocation_t arg_loc, + tree function, vectree, va_gc *params, + vectree, va_gc *origtypes) +{ + /* Strip NON_LVALUE_EXPRs, etc., since we aren't using as an lvalue. */ + STRIP_TYPE_NOPS (function); + + /* Convert anything with function type to a pointer-to-function. */ + if (TREE_CODE (function) == FUNCTION_DECL) +{ + /* Implement type-directed function overloading for builtins. +resolve_overloaded_builtin and targetm.resolve_overloaded_builtin +handle all the type checking. The result is a complete expression +that implements this function call. */ + tree tem = resolve_overloaded_builtin (loc, function, params); + if (tem) + return tem; +} + return build_function_call_vec (loc, arg_loc, function, params, origtypes); +} /* Convert the argument expressions in the vector VALUES to the types in the list TYPELIST. @@ -3634,7 +3650,7 @@ build_atomic_assign (location_t loc, tre params-quick_push (lhs_addr); params-quick_push (rhs); params-quick_push (seq_cst); - func_call = build_function_call_vec (loc, vNULL, fndecl, params, NULL); + func_call
Re: Evident fix for copy_loops.
Jakub, I did testing of this fix and bootstrap and regression testing were OK, i.e. no new failures. 2014-03-28 14:49 GMT+04:00 Jakub Jelinek ja...@redhat.com: On Fri, Mar 28, 2014 at 02:41:26PM +0400, Yuri Rumyantsev wrote: Hi All, I found out that a field 'safelen of struct loop is not copied in copy_loops. Is it OK for trunk? Ok if it passes bootstrap/regtest. 2014-03-28 Yuri Rumyantsev ysrum...@gmail.com * tree-inline.c (copy_loops): Add missed copy of 'safelen'. Jakub
[PATCH] Allow VOIDmode argument to ix86_copy_addr_to_reg (PR target/60693)
Hi! Before ix86_copy_addr_to_reg has been added, we've been using copy_addr_to_reg, which handles VOIDmode values just fine. But this new function just ICEs on those. As the function has been added for adding SUBREGs to TLS addresses, those will never retunring CONST_INTs, so just using copy_addr_to_reg is IMHO the right thing and restores previous behavior. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-03-28 Jakub Jelinek ja...@redhat.com PR target/60693 * config/i386/i386.c (ix86_copy_addr_to_reg): Call copy_addr_to_reg also if addr has VOIDmode. * gcc.target/i386/pr60693.c: New test. --- gcc/config/i386/i386.c.jj 2014-03-20 17:05:21.0 +0100 +++ gcc/config/i386/i386.c 2014-03-28 12:04:59.695679145 +0100 @@ -22755,7 +22755,7 @@ counter_mode (rtx count_exp) static rtx ix86_copy_addr_to_reg (rtx addr) { - if (GET_MODE (addr) == Pmode) + if (GET_MODE (addr) == Pmode || GET_MODE (addr) == VOIDmode) return copy_addr_to_reg (addr); else { --- gcc/testsuite/gcc.target/i386/pr60693.c.jj 2014-03-28 12:08:00.078711929 +0100 +++ gcc/testsuite/gcc.target/i386/pr60693.c 2014-03-28 12:07:31.0 +0100 @@ -0,0 +1,13 @@ +/* PR target/60693 */ +/* { dg-do compile } */ +/* { dg-options -O0 } */ + +void bar (char *); + +void +foo (void) +{ + char buf[4096]; + __builtin_memcpy (buf, (void *) 0x8000, 4096); + bar (buf); +} Jakub
Re: [C++ PATCH] Fix __atomic_exchange (PR c++/60689)
On 03/28/2014 08:24 AM, Jakub Jelinek wrote: Here is the variant patch, which implements the above. Also bootstrapped/regtested on x86_64-linux and i686-linux. 2014-03-28 Jakub Jelinek ja...@redhat.com PR c++/60689 * c-tree.h (c_build_function_call_vec): New prototype. * c-typeck.c (build_function_call_vec): Don't call resolve_overloaded_builtin here. (c_build_function_call_vec): New wrapper function around build_function_call_vec. Call resolve_overloaded_builtin here. (convert_lvalue_to_rvalue, build_function_call, build_atomic_assign): Call c_build_function_call_vec instead of build_function_call_vec. * c-parser.c (c_parser_postfix_expression_after_primary): Likewise. * c-decl.c (finish_decl): Likewise. * c-common.c (add_atomic_size_parameter): When creating new params vector, push the size argument first. * c-c++-common/pr60689.c: New test. I do prefer this variant. r~
Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.
On Fri, Mar 28, 2014 at 03:09:22PM +, pins...@gmail.com wrote: On Mar 28, 2014, at 7:48 AM, James Greenhalgh james.greenha...@arm.com wrote: On Fri, Mar 28, 2014 at 11:11:58AM +, pins...@gmail.com wrote: On Mar 28, 2014, at 2:12 AM, James Greenhalgh james.greenha...@arm.com wrote: Hi, There is no way to perform scalar addition in the vector register file, but with the RTX costs in place we start rewriting (x 1) to (x + x) on almost all cores. The code which makes this decision has no idea that we will end up doing this (it happens well before reload) and so we end up with very ugly code generation in the case where addition was selected, but we are operating in vector registers. This patch relies on the same gimmick we are already using to allow shifts on 32-bit scalars in the vector register file - Use a vector 32x2 operation instead, knowing that we can safely ignore the top bits. This restores some normality to scalar_shift_1.c, however the test that we generate a left shift by one is clearly bogus, so remove that. This patch is pretty ugly, but it does generate superficially better looking code for this testcase. Tested on aarch64-none-elf with no issues. OK for stage 1? It seems we should also discourage the neon alternatives as there might be extra movement between the two register sets which we don't want. I see your point, but we've tried to avoid doing that elsewhere in the AArch64 backend. Our argument has been that strictly speaking, it isn't that the alternative is expensive, it is the movement between the register sets. We do model that elsewhere, and the register allocator should already be trying to avoid unneccesary moves between register classes. What about on a specific core where that alternative is expensive; that is the vector instructions are worse than the scalar ones. How are we going to handle this case? Certainly not by discouraging the alternative for all cores. We would need a more nuanced approach which could be tuned on a per-core basis. Otherwise we are bluntly and inaccurately pessimizing those cases where we can cheaply perform the operation in the vector register file (e.g. we are cleaning up loose ends after a vector loop, we have spilled to the vector register file, etc.). The register preference mechanism feels the wrong place to catch this as it does not allow for that degree of per-core felxibility, an alternative is simply disparaged slightly (?, * in LRA) or disparaged severely (!). I would think that we don't want to start polluting the machine description trying to hack around this as was done with the ARM backend's neon_for_64_bits/avoid_neon_for_64_bits. How have other targets solved this issue? Thanks, James Thanks, Andrew If those mechanisms are broken, we should fix them - in that case fixing this by discouraging valid alternatives would seem to be gaffer-taping over the real problem. Thanks, James Thanks, Andrew Thanks, James --- gcc/ 2014-03-27 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in vector registers. gcc/testsuite/ 2014-03-27 James Greenhalgh james.greenha...@arm.com * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler. 0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch
[AArch64/ARM 3/3] Add execution tests of ARM TRN Intrinsics
Final patch in series, adds new tests of the ARM TRN Intrinsics, that also check the execution results, reusing the test bodies introduced into AArch64 in the first patch. (These tests subsume the autogenerated ones in testsuite/gcc.target/arm/neon/ that only check assembler output.) Tests use gcc.target/arm/simd/simd.exp from corresponding patch for ZIP Intrinsics, will commit that first. All tests passing on arm-none-eabi. testsuite/ChangeLog: 2012-03-28 Alan Lawrence alan.lawre...@arm.com * gcc.target/arm/simd/vtrnqf32_1.c: New file. * gcc.target/arm/simd/vtrnqp16_1.c: New file. * gcc.target/arm/simd/vtrnqp8_1.c: New file. * gcc.target/arm/simd/vtrnqs16_1.c: New file. * gcc.target/arm/simd/vtrnqs32_1.c: New file. * gcc.target/arm/simd/vtrnqs8_1.c: New file. * gcc.target/arm/simd/vtrnqu16_1.c: New file. * gcc.target/arm/simd/vtrnqu32_1.c: New file. * gcc.target/arm/simd/vtrnqu8_1.c: New file. * gcc.target/arm/simd/vtrnf32_1.c: New file. * gcc.target/arm/simd/vtrnp16_1.c: New file. * gcc.target/arm/simd/vtrnp8_1.c: New file. * gcc.target/arm/simd/vtrns16_1.c: New file. * gcc.target/arm/simd/vtrns32_1.c: New file. * gcc.target/arm/simd/vtrns8_1.c: New file. * gcc.target/arm/simd/vtrnu16_1.c: New file. * gcc.target/arm/simd/vtrnu32_1.c: New file. * gcc.target/arm/simd/vtrnu8_1.c: New file.diff --git a/gcc/testsuite/gcc.target/arm/simd/vtrnf32_1.c b/gcc/testsuite/gcc.target/arm/simd/vtrnf32_1.c new file mode 100644 index 000..c9620fb --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vtrnf32_1.c @@ -0,0 +1,12 @@ +/* Test the `vtrnf32' ARM Neon intrinsic. */ + +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options -save-temps -O1 -fno-inline } */ +/* { dg-add-options arm_neon } */ + +#include arm_neon.h +#include ../../aarch64/simd/vtrnf32.x + +/* { dg-final { scan-assembler-times vtrn\.32\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */ +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/vtrnp16_1.c b/gcc/testsuite/gcc.target/arm/simd/vtrnp16_1.c new file mode 100644 index 000..0ff4319 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vtrnp16_1.c @@ -0,0 +1,12 @@ +/* Test the `vtrnp16' ARM Neon intrinsic. */ + +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options -save-temps -O1 -fno-inline } */ +/* { dg-add-options arm_neon } */ + +#include arm_neon.h +#include ../../aarch64/simd/vtrnp16.x + +/* { dg-final { scan-assembler-times vtrn\.16\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */ +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/vtrnp8_1.c b/gcc/testsuite/gcc.target/arm/simd/vtrnp8_1.c new file mode 100644 index 000..2b047e4 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vtrnp8_1.c @@ -0,0 +1,12 @@ +/* Test the `vtrnp8' ARM Neon intrinsic. */ + +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options -save-temps -O1 -fno-inline } */ +/* { dg-add-options arm_neon } */ + +#include arm_neon.h +#include ../../aarch64/simd/vtrnp8.x + +/* { dg-final { scan-assembler-times vtrn\.8\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */ +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/vtrnqf32_1.c b/gcc/testsuite/gcc.target/arm/simd/vtrnqf32_1.c new file mode 100644 index 000..dd4e883 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vtrnqf32_1.c @@ -0,0 +1,12 @@ +/* Test the `vtrnQf32' ARM Neon intrinsic. */ + +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options -save-temps -O1 -fno-inline } */ +/* { dg-add-options arm_neon } */ + +#include arm_neon.h +#include ../../aarch64/simd/vtrnqf32.x + +/* { dg-final { scan-assembler-times vtrn\.32\[ \t\]+\[qQ\]\[0-9\]+, ?\[qQ\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */ +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/vtrnqp16_1.c b/gcc/testsuite/gcc.target/arm/simd/vtrnqp16_1.c new file mode 100644 index 000..374eee3 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vtrnqp16_1.c @@ -0,0 +1,12 @@ +/* Test the `vtrnQp16' ARM Neon intrinsic. */ + +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options -save-temps -O1 -fno-inline } */ +/* { dg-add-options arm_neon } */ + +#include arm_neon.h +#include ../../aarch64/simd/vtrnqp16.x + +/* { dg-final { scan-assembler-times vtrn\.16\[ \t\]+\[qQ\]\[0-9\]+, ?\[qQ\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */ +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/vtrnqp8_1.c b/gcc/testsuite/gcc.target/arm/simd/vtrnqp8_1.c new file mode 100644 index
Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.
On Fri, Mar 28, 2014 at 11:11:58AM +, pins...@gmail.com wrote: On Mar 28, 2014, at 2:12 AM, James Greenhalgh james.greenha...@arm.com wrote: Hi, There is no way to perform scalar addition in the vector register file, but with the RTX costs in place we start rewriting (x 1) to (x + x) on almost all cores. The code which makes this decision has no idea that we will end up doing this (it happens well before reload) and so we end up with very ugly code generation in the case where addition was selected, but we are operating in vector registers. This patch relies on the same gimmick we are already using to allow shifts on 32-bit scalars in the vector register file - Use a vector 32x2 operation instead, knowing that we can safely ignore the top bits. This restores some normality to scalar_shift_1.c, however the test that we generate a left shift by one is clearly bogus, so remove that. This patch is pretty ugly, but it does generate superficially better looking code for this testcase. Tested on aarch64-none-elf with no issues. OK for stage 1? It seems we should also discourage the neon alternatives as there might be extra movement between the two register sets which we don't want. I see your point, but we've tried to avoid doing that elsewhere in the AArch64 backend. Our argument has been that strictly speaking, it isn't that the alternative is expensive, it is the movement between the register sets. We do model that elsewhere, and the register allocator should already be trying to avoid unneccesary moves between register classes. If those mechanisms are broken, we should fix them - in that case fixing this by discouraging valid alternatives would seem to be gaffer-taping over the real problem. Thanks, James Thanks, Andrew Thanks, James --- gcc/ 2014-03-27 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in vector registers. gcc/testsuite/ 2014-03-27 James Greenhalgh james.greenha...@arm.com * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler. 0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch
[AArch64/ARM 2/3] Reimplement AArch64 TRN intrinsics with __builtin_shuffle
This patch replaces the temporary inline assembler for vtrn[q]_* in arm_neon.h with equivalent calls to __builtin_shuffle. These are matched by existing patterns in aarch64.c (aarch64_expand_vec_perm_const_1), outputting the same assembler instructions. For two-element vectors, ZIP, UZP and TRN instructions all have the same effect, and the backend chooses to output ZIP, so this patch also updates the 3 affected tests. Regressed, and tests from first patch still passing modulo updates herein, on aarch64-none-elf and aarch64_be-none-elf. gcc/testsuite/ChangeLog: 2014-03-28 Alan Lawrence alan.lawre...@arm.com * gcc.target/aarch64/vtrns32.c: Expect zip[12] insn rather than trn[12]. * gcc.target/aarch64/vtrnu32.c: Likewise. * gcc.target/aarch64/vtrnf32.c: Likewise. gcc/ChangeLog: 2014-03-28 Alan Lawrence alan.lawre...@arm.com * config/aarch64/arm_neon.h (vtrn1_f32, vtrn1_p8, vtrn1_p16, vtrn1_s8, vtrn1_s16, vtrn1_s32, vtrn1_u8, vtrn1_u16, vtrn1_u32, vtrn1q_f32, vtrn1q_f64, vtrn1q_p8, vtrn1q_p16, vtrn1q_s8, vtrn1q_s16, vtrn1q_s32, vtrn1q_s64, vtrn1q_u8, vtrn1q_u16, vtrn1q_u32, vtrn1q_u64, vtrn2_f32, vtrn2_p8, vtrn2_p16, vtrn2_s8, vtrn2_s16, vtrn2_s32, vtrn2_u8, vtrn2_u16, vtrn2_u32, vtrn2q_f32, vtrn2q_f64, vtrn2q_p8, vtrn2q_p16, vtrn2q_s8, vtrn2q_s16, vtrn2q_s32, vtrn2q_s64, vtrn2q_u8, vtrn2q_u16, vtrn2q_u32, vtrn2q_u64): Replace temporary asm with __builtin_shuffle.diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 6af99361..d7962e5 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -12447,468 +12447,6 @@ vsubhn_u64 (uint64x2_t a, uint64x2_t b) return result; } -__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) -vtrn1_f32 (float32x2_t a, float32x2_t b) -{ - float32x2_t result; - __asm__ (trn1 %0.2s,%1.2s,%2.2s - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) -vtrn1_p8 (poly8x8_t a, poly8x8_t b) -{ - poly8x8_t result; - __asm__ (trn1 %0.8b,%1.8b,%2.8b - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__)) -vtrn1_p16 (poly16x4_t a, poly16x4_t b) -{ - poly16x4_t result; - __asm__ (trn1 %0.4h,%1.4h,%2.4h - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vtrn1_s8 (int8x8_t a, int8x8_t b) -{ - int8x8_t result; - __asm__ (trn1 %0.8b,%1.8b,%2.8b - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) -vtrn1_s16 (int16x4_t a, int16x4_t b) -{ - int16x4_t result; - __asm__ (trn1 %0.4h,%1.4h,%2.4h - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) -vtrn1_s32 (int32x2_t a, int32x2_t b) -{ - int32x2_t result; - __asm__ (trn1 %0.2s,%1.2s,%2.2s - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) -vtrn1_u8 (uint8x8_t a, uint8x8_t b) -{ - uint8x8_t result; - __asm__ (trn1 %0.8b,%1.8b,%2.8b - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) -vtrn1_u16 (uint16x4_t a, uint16x4_t b) -{ - uint16x4_t result; - __asm__ (trn1 %0.4h,%1.4h,%2.4h - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) -vtrn1_u32 (uint32x2_t a, uint32x2_t b) -{ - uint32x2_t result; - __asm__ (trn1 %0.2s,%1.2s,%2.2s - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) -vtrn1q_f32 (float32x4_t a, float32x4_t b) -{ - float32x4_t result; - __asm__ (trn1 %0.4s,%1.4s,%2.4s - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) -vtrn1q_f64 (float64x2_t a, float64x2_t b) -{ - float64x2_t result; - __asm__ (trn1 %0.2d,%1.2d,%2.2d - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) -vtrn1q_p8 (poly8x16_t a, poly8x16_t b) -{
Re: [RFA][PATCH][pr target/60648] Fix non-canonical RTL from x86 backend -- P1 regression
On 03/26/14 12:28, Jakub Jelinek wrote: On Wed, Mar 26, 2014 at 12:17:43PM -0600, Jeff Law wrote: On 03/26/14 12:12, Jakub Jelinek wrote: On Wed, Mar 26, 2014 at 11:02:48AM -0600, Jeff Law wrote: Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Verified it fixes the original and reduced testcase. Note, the testcase is missing from your patch. But I'd question if this is the right place to canonicalize it. The non-canonical order seems to be created in the generic code, where do_tablejump does: No, at that point it's still canonical because the x86 backend hasn't simpified the (mult ...) subexpression. Its the simplification of that subexpression to a constant that creates the non-canonical RTL. That's why I fixed the x86 bits -- those are the bits that simplify the (mult ...) into a (const_int) and thus creates the non-canonical RTL. (mult:SI (const_int 0) (const_int 4)) is IMHO far from being canonical. And, I'd say it is likely other target legitimization hooks would also try to simplify it similarly. simplify_gen_binary is used in several other places during expansion, so I don't see why it couldn't be desirable here. Here's the updated patch. It uses simplify_gen_binary in expr.c to simplify the address expression as we're building it. It also uses copy_addr_to_reg in the x86 backend to avoid the possibility of generating non-canonical RTL there too. By accident I interrupted the regression test cycle, so that is still running. OK for the trunk if that passes? diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 53d58b3..3caae44 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2014-03-27 Jeff Law l...@redhat.com + Jakub Jalinek ja...@redhat.com + + * expr.c (do_tablejump): Use simplify_gen_binary rather than + gen_rtx_{PLUS,MULT} to build up the address expression. + + * i386/i386.c (ix86_legitimize_address): Use copy_addr_to_reg to avoid + creating non-canonical RTL. + 2014-03-26 Richard Biener rguent...@suse.de * tree-pretty-print.c (percent_K_format): Implement special diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 842be68..70b8f02 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13925,13 +13925,13 @@ ix86_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED, if (GET_CODE (XEXP (x, 0)) == MULT) { changed = 1; - XEXP (x, 0) = force_operand (XEXP (x, 0), 0); + XEXP (x, 0) = copy_addr_to_reg (XEXP (x, 0)); } if (GET_CODE (XEXP (x, 1)) == MULT) { changed = 1; - XEXP (x, 1) = force_operand (XEXP (x, 1), 0); + XEXP (x, 1) = copy_addr_to_reg (XEXP (x, 1)); } if (changed diff --git a/gcc/expr.c b/gcc/expr.c index cdb4551..ebf136e 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -11134,11 +11134,12 @@ do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, GET_MODE_SIZE, because this indicates how large insns are. The other uses should all be Pmode, because they are addresses. This code could fail if addresses and insns are not the same size. */ - index = gen_rtx_PLUS -(Pmode, - gen_rtx_MULT (Pmode, index, - gen_int_mode (GET_MODE_SIZE (CASE_VECTOR_MODE), Pmode)), - gen_rtx_LABEL_REF (Pmode, table_label)); + index = simplify_gen_binary (MULT, Pmode, index, + gen_int_mode (GET_MODE_SIZE (CASE_VECTOR_MODE), +Pmode)); + index = simplify_gen_binary (PLUS, Pmode, index, + gen_rtx_LABEL_REF (Pmode, table_label)); + #ifdef PIC_CASE_VECTOR_ADDRESS if (flag_pic) index = PIC_CASE_VECTOR_ADDRESS (index); diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index cdc8e9a..fc3c198 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-03-27 Jeff Law l...@redhat.com + + PR target/60648 + * g++.dg/pr60648.C: New test. + 2014-03-26 Jakub Jelinek ja...@redhat.com PR sanitizer/60636 diff --git a/gcc/testsuite/g++.dg/pr60648.C b/gcc/testsuite/g++.dg/pr60648.C new file mode 100644 index 000..80c0561 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr60648.C @@ -0,0 +1,73 @@ +/* { dg-do compile } */ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options -O3 -fPIC -m32 } */ + +enum component +{ + Ex, + Ez, + Hy, + Permeability +}; +enum derived_component +{}; +enum direction +{ + X, + Y, + Z, + R, + P, + NO_DIRECTION +}; +derived_component a; +component *b; +component c; +direction d; +inline direction fn1 (component p1) +{ + switch (p1) +{ +case 0: + return Y; +case 1: + return Z; +case Permeability: + return NO_DIRECTION; +} + return X; +} + +inline component fn2 (direction p1) +{ + switch (p1) +{ +case 0: +case 1: + return component (); +case Z: +
Re: [RFA][PATCH][pr target/60648] Fix non-canonical RTL from x86 backend -- P1 regression
On Fri, Mar 28, 2014 at 12:04:00PM -0600, Jeff Law wrote: Here's the updated patch. It uses simplify_gen_binary in expr.c to simplify the address expression as we're building it. It also uses copy_addr_to_reg in the x86 backend to avoid the possibility of generating non-canonical RTL there too. By accident I interrupted the regression test cycle, so that is still running. OK for the trunk if that passes? Ok, thanks. Jakub
Re: Fix PR ipa/60315 (inliner explosion)
Actually after some additional invetstigation I decided to commit this patch. fixup_noreturn_call already cares about the return value but differently than the new Jakub's code. Thanks for the quick fix, I confirm that the ACATS failures are all gone. So we're left with the GIMPLE checking failure on opt33.adb. -- Eric Botcazou
Re: [PATCH][AArch64][2/3] Recognise rev16 operations on SImode and DImode data
On 28/03/14 14:21, Ramana Radhakrishnan wrote: On Wed, Mar 19, 2014 at 9:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch adds a recogniser for the bitmask,shift,orr sequence of instructions that can be used to reverse the bytes in 16-bit halfwords (for the sequence itself look at the testcase included in the patch). This can be implemented with a rev16 instruction. Since the shifts can occur in any order and there are no canonicalisation rules for where they appear in the expression we have to have two patterns to match both cases. The rtx costs function is updated to recognise the pattern and cost it appropriately by using the rev field of the cost tables introduced in patch [1/3]. The rtx costs helper functions that are used to recognise those bitwise operations are placed in config/arm/aarch-common.c so that they can be reused by both arm and aarch64. The ARM bits of this are OK if there are no regressions. I've added an execute testcase but no scan-assembler tests since conceptually in the future the combiner might decide to not use a rev instruction due to rtx costs. We can at least test that the code generated is functionally correct though. Tested aarch64-none-elf. What about arm-none-eabi :) ? Tested arm-none-eabi and bootstrap on arm linux together with patch [3/3] in the series :) Kyrill Ok for stage1? [gcc/] 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.md (rev16mode2): New pattern. (rev16mode2_alt): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle rev16 case. * config/arm/aarch-common.c (aarch_rev16_shright_mask_imm_p): New. (aarch_rev16_shleft_mask_imm_p): Likewise. (aarch_rev16_p_1): Likewise. (aarch_rev16_p): Likewise. * config/arm/aarch-common-protos.h (aarch_rev16_p): Declare extern. (aarch_rev16_shright_mask_imm_p): Likewise. (aarch_rev16_shleft_mask_imm_p): Likewise. [gcc/testsuite/] 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/rev16_1.c: New test.
Re: [PATCH][AArch64][2/3] Recognise rev16 operations on SImode and DImode data
On Wed, Mar 19, 2014 at 9:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch adds a recogniser for the bitmask,shift,orr sequence of instructions that can be used to reverse the bytes in 16-bit halfwords (for the sequence itself look at the testcase included in the patch). This can be implemented with a rev16 instruction. Since the shifts can occur in any order and there are no canonicalisation rules for where they appear in the expression we have to have two patterns to match both cases. The rtx costs function is updated to recognise the pattern and cost it appropriately by using the rev field of the cost tables introduced in patch [1/3]. The rtx costs helper functions that are used to recognise those bitwise operations are placed in config/arm/aarch-common.c so that they can be reused by both arm and aarch64. The ARM bits of this are OK if there are no regressions. I've added an execute testcase but no scan-assembler tests since conceptually in the future the combiner might decide to not use a rev instruction due to rtx costs. We can at least test that the code generated is functionally correct though. Tested aarch64-none-elf. What about arm-none-eabi :) ? Ok for stage1? [gcc/] 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.md (rev16mode2): New pattern. (rev16mode2_alt): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle rev16 case. * config/arm/aarch-common.c (aarch_rev16_shright_mask_imm_p): New. (aarch_rev16_shleft_mask_imm_p): Likewise. (aarch_rev16_p_1): Likewise. (aarch_rev16_p): Likewise. * config/arm/aarch-common-protos.h (aarch_rev16_p): Declare extern. (aarch_rev16_shright_mask_imm_p): Likewise. (aarch_rev16_shleft_mask_imm_p): Likewise. [gcc/testsuite/] 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/rev16_1.c: New test.
Re: [C++ PATCH] Fix __atomic_exchange (PR c++/60689)
On Fri, Mar 28, 2014 at 01:46:09PM -0400, Jason Merrill wrote: On 03/28/2014 06:47 AM, Jakub Jelinek wrote: * typeck.c (build_function_call_vec): Call resolve_overloaded_builtin. I expect this will break in templates if arguments are dependent. The only problem with this patch is potentially ObjC, I've missed that it also calls build_function_call_vec; in c-family and cp/ proper build_function_call_vec is only called from within resolve_overloaded_builtin itself, thus it shouldn't see dependent args. Jakub
Re: [PATCH] Handle short reads and EINTR in lto-plugin/simple-object
On Fri, Mar 28, 2014 at 6:30 AM, Richard Biener rguent...@suse.de wrote: 2014-03-26 Richard Biener rguent...@suse.de libiberty/ * simple-object.c (simple_object_internal_read): Handle EINTR and short reads. lto-plugin/ * lto-plugin.c (process_symtab): Use simple_object_internal_read. This is OK. Thanks. Ian
Re: [PATCH][ARM/AArch64][2/2] Crypto intrinsics tuning for Cortex-A53 - pipeline description
On Tue, Mar 25, 2014 at 3:52 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, In ARMv8-A there's a general expectation that AESE/AESMC and AESD/AESIMC sequences of the form: AESE Vn, _ AESMC Vn, Vn will issue both instructions in a single cycle on super-scalar implementations. It would be nice to model that in our pipeline descriptions. This patch defines a function to detect such pairs and uses it in the pipeline description for these instructions for the Cortex-A53. The patch also adds some missed AdvancedSIMD information to the pipeline description for the Cortex-A53. Bootstrapped and tested on arm-none-linux-gnueabihf and aarch64-none-linux-gnu. Cortex-A53 scheduling is the default scheduling description on aarch64 so this patch can change default behaviour. That's an argument for taking this in stage1 or maybe backporting it into 4.9.1 once the release is made. To my mind on ARM / AArch64 this actually helps anyone using the crypto intrinsics on A53 hardware today and it would be good to get this into 4.9. Again I perceive this as low risk on ARM (AArch32) as this is not a default tuning option for any large software vendors, the folks using this are typically the ones that write the more specialized crypto intrinsics rather than just general purpose code. However this will help with scheduling on what is essentially an in-order core, so would be nice to have. This would definitely need approval from the AArch64 maintainers and the RMs to go in at this stage. If not, we should consider this for 4.9.1 regards Ramana What do people think? Thanks, Kyrill 2014-03-25 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/aarch-common.c (aarch_crypto_can_dual_issue): New function. * config/arm/aarch-common-protos.h (aarch_crypto_can_dual_issue): Declare extern. * config/arm/cortex-a53.md: Add reservations and bypass for crypto instructions as well as AdvancedSIMD loads.
Re: [PATCH] RL78 - minor size optimization
On 28/03/14 00:20, DJ Delorie wrote: This is OK after 4.9 branches (i.e. stage1). I suspect we could add AX to the first alternative, although I don't know if it will get used. We could add HL to the second alternative to complete the replacement of the 'r' constraint. Yes, the missing AX in the first alternative came to me later too. HL is already in the second alternative ('T'). Looking at it again, it probably makes sense to change the third alternative to 'shrw %0,8'. It's the same length as mov x,a/clrb a but it's a cycle shorter. It also makes it more like the extendqihi2_real insn, which isn't especially important, but does mean there's a certain symmetry about it. 2014-03-28 Richard Hulme pepe...@yahoo.com * config/rl78/rl78-real.md (zero_extendqihi2_real): Minor optimizations to use clrb instruction where possible, which is 1 byte shorter than 'mov'ing #0, and shrw, which is 1 cycle less than a mov/clrb sequence. --- gcc/config/rl78/rl78-real.md |8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/gcc/config/rl78/rl78-real.md b/gcc/config/rl78/rl78-real.md index 27ff60f..5d5c598 100644 --- a/gcc/config/rl78/rl78-real.md +++ b/gcc/config/rl78/rl78-real.md @@ -77,12 +77,14 @@ ;;-- Conversions (define_insn *zero_extendqihi2_real - [(set (match_operand:HI 0 nonimmediate_operand =rv,A) - (zero_extend:HI (match_operand:QI 1 general_operand 0,a)))] + [(set (match_operand:HI 0 nonimmediate_operand =ABv,DT,A,B) + (zero_extend:HI (match_operand:QI 1 general_operand 0,0,a,b)))] rl78_real_insns_ok () @ + clrb\t%Q0 mov\t%Q0, #0 - mov\tx, a \;mov\ta, #0 + shrw\t%0, 8 + shrw\t%0, 8 ) (define_insn *extendqihi2_real -- 1.7.9.5
Lost __mips_o32 predefine on NetBSD
In the mips--netbsdelf target gcc 4.9 lost the pre-definition of __mips_o32, which is heavily used in NetBSD sources. The obvious trivial patch adds it back. Martin --8--- Define __mips_o32 for -mabi=32 --- gcc/config/mips/netbsd.h.orig 2014-01-02 23:23:26.0 +0100 +++ gcc/config/mips/netbsd.h2014-03-28 14:19:18.0 +0100 @@ -32,7 +32,9 @@ along with GCC; see the file COPYING3. if (TARGET_ABICALLS) \ builtin_define (__ABICALLS__);\ \ - if (mips_abi == ABI_EABI)\ + if (mips_abi == ABI_32) \ + builtin_define (__mips_o32); \ + else if (mips_abi == ABI_EABI) \ builtin_define (__mips_eabi); \ else if (mips_abi == ABI_N32)\ builtin_define (__mips_n32); \
Re: [PATCH] Allow VOIDmode argument to ix86_copy_addr_to_reg (PR target/60693)
On Fri, Mar 28, 2014 at 4:19 PM, Jakub Jelinek ja...@redhat.com wrote: Before ix86_copy_addr_to_reg has been added, we've been using copy_addr_to_reg, which handles VOIDmode values just fine. But this new function just ICEs on those. As the function has been added for adding SUBREGs to TLS addresses, those will never retunring CONST_INTs, so just using copy_addr_to_reg is IMHO the right thing and restores previous behavior. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-03-28 Jakub Jelinek ja...@redhat.com PR target/60693 * config/i386/i386.c (ix86_copy_addr_to_reg): Call copy_addr_to_reg also if addr has VOIDmode. * gcc.target/i386/pr60693.c: New test. OK. Thanks, Uros.
Re: Skip gcc.dg/tree-ssa/isolate-*.c for AVR Target
On Mar 28, 2014, at 3:16 AM, K_s, Vishnu vishnu@atmel.com wrote: The tests added in gcc.dg/tree-ssa/isolate-*.c is failing for AVR target, Because the isolate erroneous path pass needs -fdelete-null-pointer-checks option to be enabled. For AVR target that option is disabled, this cause the tests to fail. Following Patch skip the isolate-* tests if keeps_null_pointer_checks is true. So I didn’t see a checkin, and I don’t see an Ok? Each patch should have one or the other… Without an Ok? I assume it’s been checked in, with an Ok? I take that as a review request. I’ll assume you forgot the Ok?, Ok. Since the AVR people are fairly active, I’ll let them check it in; gives them the opportunity to further consider it. 2014-03-28 Vishnu K S vishnu@atmel.com * gcc/testsuite/gcc.dg/tree-ssa/isolate-1.c: Skip test for AVR * gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c: Ditto * gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c: Ditto * gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c: Ditto * gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c: Ditto
Re: [PATCH] RL78 - minor size optimization
Sweet. Yes please, in stage 1.
[committed, fortran] PR 60766 fix buffer overflow
Hello, I fixed an ICE in pr59599 due to a wrong number of arguments passed to the ichar function, but I forgot to update the size of the buffer containing the argument list. Fixed thusly. I have tested the patch (attached) on x86_64-unknown-linux-gnu and committed it as revision 208913. Thanks to Tobias for identifying the problem. Mikael Index: ChangeLog === --- ChangeLog (révision 208912) +++ ChangeLog (révision 208913) @@ -1,5 +1,11 @@ -2014-04-27 Thomas Koenig tkoe...@gcc.gnu.org +2014-03-28 Mikael Morin mik...@gcc.gnu.org + PR fortran/60677 + * trans-intrinsic.c (gfc_conv_intrinsic_ichar): Enlarge argument + list buffer. + +2014-03-27 Thomas Koenig tkoe...@gcc.gnu.org + PR fortran/60522 * frontend-passes.c (cfe_code): Do not walk subtrees for WHERE. Index: trans-intrinsic.c === --- trans-intrinsic.c (révision 208912) +++ trans-intrinsic.c (révision 208913) @@ -4687,7 +4687,7 @@ gfc_conv_intrinsic_index_scan_verify (gfc_se * se, static void gfc_conv_intrinsic_ichar (gfc_se * se, gfc_expr * expr) { - tree args[2], type, pchartype; + tree args[3], type, pchartype; int nargs; nargs = gfc_intrinsic_argument_list_length (expr);
Re: Changing INT to SI mode
On Mar 28, 2014, at 6:23 AM, K_s, Vishnu vishnu@atmel.com wrote: Test pr59940.c is failing for AVR target because the test assumes the size of int as 32 bit and test expect to generate warnings for overflow and conversion while assigning 36-bit and 32 bit value respectively to variable si. Following patch define a 32 bit type with SI mode and use it. 2014-03-28 Vishnu K S vishnu@atmel.com * gcc/testsuite/gcc.dg/pr59940.c: Using 32-bit SI mode instead of int [ see previous note ] Ok. I checked this in for you, and formatted the ChangeLog slightly better. Two spaces after the name, no space before the , no gcc/testsuite in the log, end sentence with a period, add name of what was changed (si in this case). +2014-03-28 Vishnu K S vishnu@atmel.com + + * gcc.dg/pr59940.c (si): Use 32-bit SI mode instead of int.
Re: various _mm512_set* intrinsics
Hello! Here are more intrinsics that are missing. I know that gcc currently generates horrible code for most of them but I think it's more important to have the API in place, albeit non-optimal. Maybe this entices some one to add the necessary optimizations. I agree that having non-optimal implementation is better than nothing. The code is self-contained and shouldn't interfere with any correct code. Should this also go into 4.9? 2014-03-27 Ulrich Drepper drep...@gmail.com * config/i386/avx512fintrin.h (__v32hi): Define type. (__v64qi): Likewise. (_mm512_set1_epi8): Define. (_mm512_set1_epi16): Define. (_mm512_set4_epi32): Define. (_mm512_set4_epi64): Define. (_mm512_set4_pd): Define. (_mm512_set4_ps): Define. (_mm512_setr4_epi64): Define. (_mm512_setr4_epi32): Define. (_mm512_setr4_pd): Define. (_mm512_setr4_ps): Define. (_mm512_setzero_epi32): Define. This is OK for mainline, but please wait for Kirill's review of the intrinsics. Thanks, Uros.
Re: Skip gcc.dg/tree-ssa/isolate-*.c for AVR Target
On Mar 28, 2014, at 12:04 PM, Mike Stump mikest...@comcast.net wrote: 2014-03-28 Vishnu K S vishnu@atmel.com * gcc/testsuite/gcc.dg/tree-ssa/isolate-1.c: Skip test for AVR * gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c: Ditto * gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c: Ditto * gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c: Ditto * gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c: Ditto So, no gcc/testsuite/ in the log,, no space before , and two spaces after the name before the , and end sentences with a ..
Re: [C++ patch] for C++/52369
Just a nit… 2014-03-28 Fabien Chêne fab...@gcc.gnu.org * cp/init.c (perform_member_init): homogeneize uninitialized diagnostics. Sentences begin with an upper case letter, and spelling… Homogenize..
PR ipa/60243 (inliner being slow)
Hi, the inliner heuristic is organized as a greedy algorithm making inline decisions in order defined by badness until inline limits are hit. The tricky part is that the badness depends both on caller and callee (it is basically size/time metric, that depends on callee, but caller provide context via known values and predicates that may simplify callee body). So after each inlining decision, the badnesses of calls from function being inlined and calls of function being inlined into needs to be updated. This updating process is basically O(1) for evaluation of predicates + O(n_call_sites) for evaulation of call edges that are independent. This may produce non-linear behaviour in stupid cases where you have function with very many call sites you inline into that is tiself called very many times. Other case where we get non-linear is the side case of want_inline_small_function_p which makes function inlinable if the code of caller grows but the overall unit shrinks. The growth of the unit after inlining given function needs to be recomputed every time when function cahnges or one of its calls are modified. This patch solves those bottlenecks. The first case via computing min_size, that is a rough estimate of minimal growth of the function after inlining. This can be used to cut the expensive per-edge computations when the function is obvoiusly large (as it would be in case it have many call sites). The other change is smarther estimate about the growth of the unit: the unit can shrink only if the function have call sites that shrink the code and in that case those will be inlined anyway or if the offline copy is eliminated. Instead of always computing precise value I introduced growth_likely_positive that makes estimate on how many calls one can have and first just quickly counts the call edges. If there are many of them, there is no need for expensive calcualation. In addition of shoting estimate_growth from the testcase profile it also imroves Firefox LTO inliner times by about 40% or 20 seconds. We are still not well on compiling richards testcase. I get out of memory problems with early inlining enabled and many other issues. Bootstrapped/regtested x86_64-linux, will commit it shortly. Honza PR ipa/60243 * ipa-inline.c (want_inline_small_function_p): Short circuit large functions; reorganize to make cheap checks first. (inline_small_functions): Do not estimate growth when dumping; it is expensive. * ipa-inline.h (inline_summary): Add min_size. (growth_likely_positive): New function. * ipa-inline-analysis.c (dump_inline_summary): Add min_size. (set_cond_stmt_execution_predicate): Cleanup. (estimate_edge_size_and_time): Compute min_size. (estimate_calls_size_and_time): Likewise. (estimate_node_size_and_time): Likewise. (inline_update_overall_summary): Update min_size. (do_estimate_edge_time): Likewise. (do_estimate_edge_size): Update. (do_estimate_edge_hints): Update. (growth_likely_positive): New function. Index: ipa-inline.c === --- ipa-inline.c(revision 208875) +++ ipa-inline.c(working copy) @@ -573,6 +573,24 @@ want_inline_small_function_p (struct cgr e-inline_failed = CIF_FUNCTION_NOT_INLINE_CANDIDATE; want_inline = false; } + /* Do fast and conservative check if the function can be good + inline cnadidate. At themoment we allow inline hints to + promote non-inline function to inline and we increase + MAX_INLINE_INSNS_SINGLE 16fold for inline functions. */ + else if (!DECL_DECLARED_INLINE_P (callee-decl) + inline_summary (callee)-min_size - inline_edge_summary (e)-call_stmt_size + MAX (MAX_INLINE_INSNS_SINGLE, MAX_INLINE_INSNS_AUTO)) +{ + e-inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT; + want_inline = false; +} + else if (DECL_DECLARED_INLINE_P (callee-decl) + inline_summary (callee)-min_size - inline_edge_summary (e)-call_stmt_size + 16 * MAX_INLINE_INSNS_SINGLE) +{ + e-inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT; + want_inline = false; +} else { int growth = estimate_edge_growth (e); @@ -585,56 +603,26 @@ want_inline_small_function_p (struct cgr hints suggests that inlining given function is very profitable. */ else if (DECL_DECLARED_INLINE_P (callee-decl) growth = MAX_INLINE_INSNS_SINGLE - !big_speedup - !(hints (INLINE_HINT_indirect_call -| INLINE_HINT_loop_iterations -| INLINE_HINT_array_index -| INLINE_HINT_loop_stride))) + ((!big_speedup +!(hints (INLINE_HINT_indirect_call + | INLINE_HINT_loop_iterations +
Re: [PATCH, PR 60640] When creating virtual clones, clone thunks too
Hi, this patch fixes PR 60640 by creating thunks to clones when that is necessary to properly redirect edges to them. I mostly does what cgraph_add_thunk does and what analyze_function does to thunks. It fixes the testcases on trunk (it does not apply to 4.8, I have not looked how easily fixable that it) and passes bootstrap and testing on x86_64-linux. OK for trunk? Thanks, Martin 2014-03-26 Martin Jambor mjam...@suse.cz * cgraph.h (cgraph_clone_node): New parameter added to declaration. Adjust all callers. * cgraphclones.c (build_function_type_skip_args): Moved upwards in the file. (build_function_decl_skip_args): Likewise. (duplicate_thunk_for_node): New function. (redirect_edge_duplicating_thunks): Likewise. (cgraph_clone_node): New parameter args_to_skip, pass it to redirect_edge_duplicating_thunks which is called instead of cgraph_redirect_edge_callee. (cgraph_create_virtual_clone): Pass args_to_skip to cgraph_clone_node. +/* Duplicate thunk THUNK but make it to refer to NODE. ARGS_TO_SKIP, if + non-NULL, determines which parameters should be omitted. */ + +static cgraph_node * +duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node, + bitmap args_to_skip) +{ + cgraph_node *new_thunk, *thunk_of; + thunk_of = cgraph_function_or_thunk_node (thunk-callees-callee); + + if (thunk_of-thunk.thunk_p) +node = duplicate_thunk_for_node (thunk_of, node, args_to_skip); + + tree new_decl; + if (!args_to_skip) +new_decl = copy_node (thunk-decl); + else +new_decl = build_function_decl_skip_args (thunk-decl, args_to_skip, false); + + gcc_checking_assert (!DECL_STRUCT_FUNCTION (new_decl)); + gcc_checking_assert (!DECL_INITIAL (new_decl)); + gcc_checking_assert (!DECL_RESULT (new_decl)); + gcc_checking_assert (!DECL_RTL_SET_P (new_decl)); + + DECL_NAME (new_decl) = clone_function_name (thunk-decl, artificial_thunk); + SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl)); + DECL_EXTERNAL (new_decl) = 0; + DECL_SECTION_NAME (new_decl) = NULL; + DECL_COMDAT_GROUP (new_decl) = 0; + TREE_PUBLIC (new_decl) = 0; + DECL_COMDAT (new_decl) = 0; + DECL_WEAK (new_decl) = 0; + DECL_VIRTUAL_P (new_decl) = 0; + DECL_STATIC_CONSTRUCTOR (new_decl) = 0; + DECL_STATIC_DESTRUCTOR (new_decl) = 0; We probably ought to factor out this to common subfunction. + + new_thunk = cgraph_create_node (new_decl); + new_thunk-definition = true; + new_thunk-thunk = thunk-thunk; + new_thunk-unique_name = in_lto_p; + new_thunk-externally_visible = 0; + new_thunk-local.local = 1; + new_thunk-lowered = true; + new_thunk-former_clone_of = thunk-decl; + + struct cgraph_edge *e = cgraph_create_edge (new_thunk, node, NULL, 0, + CGRAPH_FREQ_BASE); + e-call_stmt_cannot_inline_p = true; + cgraph_call_edge_duplication_hooks (thunk-callees, e); + if (!expand_thunk (new_thunk, false)) +new_thunk-analyzed = true; + cgraph_call_node_duplication_hooks (thunk, new_thunk); + return new_thunk; +} + +/* If E does not lead to a thunk, simply redirect it to N. Otherwise create + one or more equivalent thunks for N and redirect E to the first in the + chain. */ + +void +redirect_edge_duplicating_thunks (struct cgraph_edge *e, struct cgraph_node *n, + bitmap args_to_skip) +{ + cgraph_node *orig_to = cgraph_function_or_thunk_node (e-callee); + if (orig_to-thunk.thunk_p) +n = duplicate_thunk_for_node (orig_to, n, args_to_skip); Is there anything that would pevent us from creating a new thunk for each call? Also I think you need to avoid this logic when THIS parameter is being optimized out (i.e. it is part of skip_args) Thanks for looking into this! Honza
[Fortran-CAF, patch, committed] Fix an offset calculation - and merge from the trunk
The attached patch fixes an issue with pointer subtraction (wrong type). Committed as Rev. 208919. Additionally I have merged the trunk into the branch, Rev. 208922. Tobias Index: gcc/fortran/ChangeLog.fortran-caf === --- gcc/fortran/ChangeLog.fortran-caf (Revision 208918) +++ gcc/fortran/ChangeLog.fortran-caf (Arbeitskopie) @@ -1,5 +1,9 @@ 2014-03-28 Tobias Burnus bur...@net-b.de + * trans-intrinsic.c (conv_caf_send): Fix offset calculation. + +2014-03-28 Tobias Burnus bur...@net-b.de + * trans-intrinsic.c (caf_get_image_index, conv_caf_send): New. (gfc_conv_intrinsic_subroutine): Call it. * resolve.c (resolve_ordinary_assign): Enable coindex LHS Index: gcc/fortran/trans-intrinsic.c === --- gcc/fortran/trans-intrinsic.c (Revision 208918) +++ gcc/fortran/trans-intrinsic.c (Arbeitskopie) @@ -7942,7 +7942,8 @@ conv_caf_send (gfc_code *code) { } offset = fold_build2_loc (input_location, MINUS_EXPR, gfc_array_index_type, -offset, tmp); + fold_convert (gfc_array_index_type, offset), + fold_convert (gfc_array_index_type, tmp)); /* RHS - a noncoarray. */
Fix various x86 tests for --with-arch=bdver3
If you build an x86_64 toolchain with --with-arch enabling various instruction set extensions by default, this causes some tests to fail that aren't expecting those extensions to be enabled. This patch fixes various tests failing like that for an x86_64-linux-gnu toolchain configured --with-arch=bdver3, generally by using appropriate -mno-* options in the tests, or in the case of gcc.dg/pr45416.c by adjusting the scan-assembler to allow the alternative instruction that gets used in this case. It's quite likely other such failures appear for other --with-arch choices. Tested x86_64-linux-gnu. OK to commit? In addition to the failures fixed by this patch, there are many gcc.dg/vect tests where having additional vector extensions enabled breaks their expectations; I'm not sure of the best way to handle those. And you get FAIL: gcc.target/i386/avx512f-vfmaddXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmaddXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmaddsubXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmaddsubXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubaddXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubaddXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmaddXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmaddXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmsubXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmsubXXXps-2.c (test for excess errors) which are assembler errors such as operand type mismatch for `vfmaddpd' - it looks like the compiler isn't really prepared for the -mavx512f -mfma4 combination, but I'm not sure what the best way to handle it is (producing invalid output doesn't seem right, however). If you test with -march=bdver3 in the multilib options (runtest --target_board=unix/-march=bdver3) rather than as the configured default, you get extra failures for the usual reason of multilib options going after the options from dg-options (which I propose to address in the usual way using dg-skip-if for -march= options different from the one present in dg-options). 2014-03-28 Joseph Myers jos...@codesourcery.com * gcc.dg/pr45416.c: Allow bextr on x86. * gcc.target/i386/fma4-builtin.c, gcc.target/i386/fma4-fma-2.c, gcc.target/i386/fma4-fma.c, gcc.target/i386/fma4-vector-2.c, gcc.target/i386/fma4-vector.c: Use -mno-fma. * gcc.target/i386/l_fma_double_1.c, gcc.target/i386/l_fma_double_2.c, gcc.target/i386/l_fma_double_3.c, gcc.target/i386/l_fma_double_4.c, gcc.target/i386/l_fma_double_5.c, gcc.target/i386/l_fma_double_6.c, gcc.target/i386/l_fma_float_1.c, gcc.target/i386/l_fma_float_2.c, gcc.target/i386/l_fma_float_3.c, gcc.target/i386/l_fma_float_4.c, gcc.target/i386/l_fma_float_5.c, gcc.target/i386/l_fma_float_6.c: Use -mno-fma4. * gcc.target/i386/pr27971.c: Use -mno-tbm. * gcc.target/i386/pr42542-4a.c: Use -mno-avx. * gcc.target/i386/pr59390.c: Use -mno-fma -mno-fma4. Index: gcc/testsuite/gcc.dg/pr45416.c === --- gcc/testsuite/gcc.dg/pr45416.c (revision 208882) +++ gcc/testsuite/gcc.dg/pr45416.c (working copy) @@ -9,7 +9,7 @@ return 0; } -/* { dg-final { scan-assembler andl { target i?86-*-linux* i?86-*-gnu* x86_64-*-linux* } } } */ +/* { dg-final { scan-assembler andl|bextr { target i?86-*-linux* i?86-*-gnu* x86_64-*-linux* } } } */ /* { dg-final { scan-assembler-not setne { target i?86-*-linux* i?86-*-gnu* x86_64-*-linux* } } } */ /* { dg-final { scan-assembler and|ubfx { target arm*-*-* } } } */ /* { dg-final { scan-assembler-not moveq { target arm*-*-* } } } */ Index: gcc/testsuite/gcc.target/i386/pr27971.c === --- gcc/testsuite/gcc.target/i386/pr27971.c (revision 208882) +++ gcc/testsuite/gcc.target/i386/pr27971.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O2 } */ +/* { dg-options -O2 -mno-tbm } */ unsigned array[4]; Index: gcc/testsuite/gcc.target/i386/l_fma_double_5.c === --- gcc/testsuite/gcc.target/i386/l_fma_double_5.c (revision 208882) +++ gcc/testsuite/gcc.target/i386/l_fma_double_5.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic } */ +/* { dg-options -O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic -mno-fma4 } */ /* Test that the compiler properly optimizes floating point multiply and add instructions into FMA3 instructions. */ Index:
[PATCH] [4.8 branch] PR rtl-optimization/60700: Backport revision 201326
Hi, Revision 201326 fixes a shrink-wrap bug which is also a regression on 4.8 branch. This patch backports it to 4.8 branch. OK for 4.8 branch. I also include a testcase for PR rtl-optimization/60700. OK for trunk and 4.8 branch? Thanks. H.J. -- gcc/ PR rtl-optimization/60700 2013-07-30 Zhenqiang Chen zhenqiang.c...@linaro.org PR rtl-optimization/57637 * function.c (move_insn_for_shrink_wrap): Also check the GEN set of the LIVE problem for the liveness analysis if it exists, otherwise give up. gcc/testsuite/ PR rtl-optimization/60700 2013-07-30 Zhenqiang Chen zhenqiang.c...@linaro.org * gcc.target/arm/pr57637.c: New testcase. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@201326 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 11 ++ gcc/function.c | 49 +--- gcc/testsuite/ChangeLog| 8 ++ gcc/testsuite/gcc.target/arm/pr57637.c | 206 + 4 files changed, 261 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/pr57637.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 63a6c98..557f922 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,14 @@ +2014-03-28 H.J. Lu hongjiu...@intel.com + + PR rtl-optimization/60700 + Backport from mainline + 2013-07-30 Zhenqiang Chen zhenqiang.c...@linaro.org + + PR rtl-optimization/57637 + * function.c (move_insn_for_shrink_wrap): Also check the + GEN set of the LIVE problem for the liveness analysis + if it exists, otherwise give up. + 2014-03-26 Martin Jambor mjam...@suse.cz PR ipa/60419 diff --git a/gcc/function.c b/gcc/function.c index e673f21..80720cb 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -5509,22 +5509,45 @@ move_insn_for_shrink_wrap (basic_block bb, rtx insn, except for any part that overlaps SRC (next loop). */ bb_uses = DF_LR_BB_INFO (bb)-use; bb_defs = DF_LR_BB_INFO (bb)-def; - for (i = dregno; i end_dregno; i++) + if (df_live) { - if (REGNO_REG_SET_P (bb_uses, i) || REGNO_REG_SET_P (bb_defs, i)) - next_block = NULL; - CLEAR_REGNO_REG_SET (live_out, i); - CLEAR_REGNO_REG_SET (live_in, i); - } + for (i = dregno; i end_dregno; i++) + { + if (REGNO_REG_SET_P (bb_uses, i) || REGNO_REG_SET_P (bb_defs, i) + || REGNO_REG_SET_P (DF_LIVE_BB_INFO (bb)-gen, i)) + next_block = NULL; + CLEAR_REGNO_REG_SET (live_out, i); + CLEAR_REGNO_REG_SET (live_in, i); + } - /* Check whether BB clobbers SRC. We need to add INSN to BB if so. -Either way, SRC is now live on entry. */ - for (i = sregno; i end_sregno; i++) + /* Check whether BB clobbers SRC. We need to add INSN to BB if so. +Either way, SRC is now live on entry. */ + for (i = sregno; i end_sregno; i++) + { + if (REGNO_REG_SET_P (bb_defs, i) + || REGNO_REG_SET_P (DF_LIVE_BB_INFO (bb)-gen, i)) + next_block = NULL; + SET_REGNO_REG_SET (live_out, i); + SET_REGNO_REG_SET (live_in, i); + } + } + else { - if (REGNO_REG_SET_P (bb_defs, i)) - next_block = NULL; - SET_REGNO_REG_SET (live_out, i); - SET_REGNO_REG_SET (live_in, i); + /* DF_LR_BB_INFO (bb)-def does not comprise the DF_REF_PARTIAL and +DF_REF_CONDITIONAL defs. So if DF_LIVE doesn't exist, i.e. +at -O1, just give up searching NEXT_BLOCK. */ + next_block = NULL; + for (i = dregno; i end_dregno; i++) + { + CLEAR_REGNO_REG_SET (live_out, i); + CLEAR_REGNO_REG_SET (live_in, i); + } + + for (i = sregno; i end_sregno; i++) + { + SET_REGNO_REG_SET (live_out, i); + SET_REGNO_REG_SET (live_in, i); + } } /* If we don't need to add the move to BB, look for a single diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index f425228..50a33ee 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,11 @@ +2014-03-28 H.J. Lu hongjiu...@intel.com + + PR rtl-optimization/60700 + Backport from mainline + 2013-07-30 Zhenqiang Chen zhenqiang.c...@linaro.org + + * gcc.target/arm/pr57637.c: New testcase. + 2014-04-28 Thomas Koenig tkoe...@gcc.gnu.org PR fortran/60522 diff --git a/gcc/testsuite/gcc.target/arm/pr57637.c b/gcc/testsuite/gcc.target/arm/pr57637.c new file mode 100644 index 000..2b9bfdd --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr57637.c @@ -0,0 +1,206 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fno-inline } */ + +typedef struct _GtkCssStyleProperty GtkCssStyleProperty; +
Re: Fix various x86 tests for --with-arch=bdver3
On Fri, Mar 28, 2014 at 2:46 PM, Joseph S. Myers jos...@codesourcery.com wrote: If you build an x86_64 toolchain with --with-arch enabling various instruction set extensions by default, this causes some tests to fail that aren't expecting those extensions to be enabled. This patch fixes various tests failing like that for an x86_64-linux-gnu toolchain configured --with-arch=bdver3, generally by using appropriate -mno-* options in the tests, or in the case of gcc.dg/pr45416.c by adjusting the scan-assembler to allow the alternative instruction that gets used in this case. It's quite likely other such failures appear for other --with-arch choices. Tested x86_64-linux-gnu. OK to commit? In addition to the failures fixed by this patch, there are many gcc.dg/vect tests where having additional vector extensions enabled breaks their expectations; I'm not sure of the best way to handle those. And you get FAIL: gcc.target/i386/avx512f-vfmaddXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmaddXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmaddsubXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmaddsubXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubaddXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfmsubaddXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmaddXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmaddXXXps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmsubXXXpd-2.c (test for excess errors) FAIL: gcc.target/i386/avx512f-vfnmsubXXXps-2.c (test for excess errors) which are assembler errors such as operand type mismatch for `vfmaddpd' - it looks like the compiler isn't really prepared for the -mavx512f -mfma4 combination, but I'm not sure what the best way to handle it is (producing invalid output doesn't seem right, however). If you test with -march=bdver3 in the multilib options (runtest --target_board=unix/-march=bdver3) rather than as the configured default, you get extra failures for the usual reason of multilibhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=59971 This is http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59971 options going after the options from dg-options (which I propose to address in the usual way using dg-skip-if for -march= options different from the one present in dg-options). Here is a patch: http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01891.html -- H.J.
patch to fix PR60697
The following patch fixes PR60697. The details of the PR can be found on http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60697 The patch was successfully bootstrapped and tested on x86-64 and aarch64. Committed as rev. 208926. 2014-03-28 Vladimir Makarov vmaka...@redhat.com PR target/60697 * lra-constraints.c (index_part_to_reg): New. (process_address): Use it. 2014-03-28 Vladimir Makarov vmaka...@redhat.com PR target/60697 * gcc.target/aarch64/pr60697.c: New. Index: lra-constraints.c === --- lra-constraints.c (revision 208895) +++ lra-constraints.c (working copy) @@ -2631,6 +2631,20 @@ base_plus_disp_to_reg (struct address_in return new_reg; } +/* Make reload of index part of address AD. Return the new + pseudo. */ +static rtx +index_part_to_reg (struct address_info *ad) +{ + rtx new_reg; + + new_reg = lra_create_new_reg (GET_MODE (*ad-index), NULL_RTX, + INDEX_REG_CLASS, index term); + expand_mult (GET_MODE (*ad-index), *ad-index_term, + GEN_INT (get_index_scale (ad)), new_reg, 1); + return new_reg; +} + /* Return true if we can add a displacement to address AD, even if that makes the address invalid. The fix-up code requires any new address to be the sum of the BASE_TERM, INDEX and DISP_TERM fields. */ @@ -2935,7 +2949,7 @@ process_address (int nop, rtx *before, r emit_insn (insns); *ad.inner = new_reg; } - else + else if (ad.disp_term != NULL) { /* base + scale * index + disp = new base + scale * index, case (1) above. */ @@ -2943,6 +2957,18 @@ process_address (int nop, rtx *before, r *ad.inner = simplify_gen_binary (PLUS, GET_MODE (new_reg), new_reg, *ad.index); } + else +{ + /* base + scale * index = base + new_reg, +case (1) above. + Index part of address may become invalid. For example, we + changed pseudo on the equivalent memory and a subreg of the + pseudo onto the memory of different mode for which the scale is + prohibitted. */ + new_reg = index_part_to_reg (ad); + *ad.inner = simplify_gen_binary (PLUS, GET_MODE (new_reg), + *ad.base_term, new_reg); +} *before = get_insns (); end_sequence (); return true; Index: testsuite/gcc.target/aarch64/pr60697.c === --- testsuite/gcc.target/aarch64/pr60697.c (revision 0) +++ testsuite/gcc.target/aarch64/pr60697.c (working copy) @@ -0,0 +1,638 @@ +/* { dg-do compile } */ +/* { dg-options -w -O3 -mcpu=cortex-a53 } */ +typedef struct __sFILE __FILE; +typedef __FILE FILE; +typedef int atom_id; +typedef float real; +typedef real rvec[3]; +typedef real matrix[3][3]; +enum { + ebCGS,ebMOLS,ebSBLOCKS,ebNR +}; +enum { + efepNO, efepYES, efepNR +}; +enum { + esolNO, esolMNO, esolWATER, esolWATERWATER, esolNR +}; +typedef struct { + int nr; + atom_id *index; + atom_id *a; +} t_block; +enum { + F_LJ, + F_LJLR, + F_SR, + F_LR, + F_DVDL, +}; +typedef struct { + t_block excl; +} t_atoms; +typedef struct { + t_atoms atoms; + t_block blocks[ebNR]; +} t_topology; +typedef struct { +} t_nsborder; +extern FILE *debug; +typedef struct { +} t_nrnb; +typedef struct { + int nri,maxnri; + int nrj,maxnrj; + int maxlen; + int solvent; + int *gid; + int *jindex; + atom_id *jjnr; + int *nsatoms; +} t_nblist; +typedef struct { + int nrx,nry,nrz; +} t_grid; +typedef struct { +} t_commrec; +enum { eNL_VDWQQ, eNL_VDW, eNL_QQ, + eNL_VDWQQ_FREE, eNL_VDW_FREE, eNL_QQ_FREE, + eNL_VDWQQ_SOLMNO, eNL_VDW_SOLMNO, eNL_QQ_SOLMNO, + eNL_VDWQQ_WATER, eNL_QQ_WATER, + eNL_VDWQQ_WATERWATER, eNL_QQ_WATERWATER, + eNL_NR }; +typedef struct { + real rlist,rlistlong; + real rcoulomb_switch,rcoulomb; + real rvdw_switch,rvdw; + int efep; + int cg0,hcg; + int *solvent_type; + int *mno_index; + rvec *cg_cm; + t_nblist nlist_sr[eNL_NR]; + t_nblist nlist_lr[eNL_NR]; + int bTwinRange; + rvec *f_twin; + int *eg_excl; +} t_forcerec; +typedef struct { + real *chargeA,*chargeB,*chargeT; + int *bPerturbed; + int *typeA,*typeB; + unsigned short *cTC,*cENER,*cACC,*cFREEZE,*cXTC,*cVCM; +} t_mdatoms; +enum { egCOUL, egLJ, egBHAM, egLR, egLJLR, egCOUL14, egLJ14, egNR }; +typedef struct { + real *ee[egNR]; +} t_grp_ener; +typedef struct { + t_grp_ener estat; +} t_groups; +typedef unsigned long t_excl; +static void reset_nblist(t_nblist *nl) +{ + nl-nri = 0; + nl-nrj = 0; + nl-maxlen = 0; + if (nl-maxnri 0) { +nl-gid[0] = -1; +if (nl-maxnrj 1) { + nl-jindex[0] = 0; + nl-jindex[1] = 0; +} + } +} +static void reset_neighbor_list(t_forcerec *fr,int bLR,int eNL) +{ +reset_nblist((fr-nlist_lr[eNL])); +} +static void close_i_nblist(t_nblist *nlist) +{ + int nri = nlist-nri; +
Re: Fix PR ipa/60315 (inliner explosion)
Actually after some additional invetstigation I decided to commit this patch. fixup_noreturn_call already cares about the return value but differently than the new Jakub's code. Thanks for the quick fix, I confirm that the ACATS failures are all gone. So we're left with the GIMPLE checking failure on opt33.adb. Hi, this is the patch I just comitted. It simply clears the static chain when needed. Honza * cgraph.c (cgraph_redirect_edge_call_stmt_to_callee): Clear static chain if needed. * gnat.dg/opt33.adb: New testcase. Index: cgraph.c === --- cgraph.c(revision 208915) +++ cgraph.c(working copy) @@ -1488,6 +1488,14 @@ cgraph_redirect_edge_call_stmt_to_callee gsi_insert_before (gsi, set_stmt, GSI_SAME_STMT); } gimple_call_set_lhs (new_stmt, NULL_TREE); + update_stmt_fn (DECL_STRUCT_FUNCTION (e-caller-decl), new_stmt); +} + + /* If new callee has no static chain, remove it. */ + if (gimple_call_chain (new_stmt) !DECL_STATIC_CHAIN (e-callee-decl)) +{ + gimple_call_set_chain (new_stmt, NULL); + update_stmt_fn (DECL_STRUCT_FUNCTION (e-caller-decl), new_stmt); } cgraph_set_call_stmt_including_clones (e-caller, e-call_stmt, new_stmt, false); Index: testsuite/gnat.dg/opt33.adb === --- testsuite/gnat.dg/opt33.adb (revision 0) +++ testsuite/gnat.dg/opt33.adb (revision 0) @@ -0,0 +1,41 @@ +-- { dg-do compile } +-- { dg-options -O } + +with Ada.Containers.Ordered_Sets; +with Ada.Strings.Unbounded; + +procedure Opt33 is + + type Rec is record + Name : Ada.Strings.Unbounded.Unbounded_String; + end record; + + function (Left : Rec; Right : Rec) return Boolean; + + package My_Ordered_Sets is new Ada.Containers.Ordered_Sets (Rec); + + protected type Data is + procedure Do_It; + private + Set : My_Ordered_Sets.Set; + end Data; + + function (Left : Rec; Right : Rec) return Boolean is + begin + return False; + end ; + + protected body Data is + procedure Do_It is + procedure Dummy (Position : My_Ordered_Sets.Cursor) is + begin +null; + end; + begin + Set.Iterate (Dummy'Access); + end; + end Data; + +begin + null; +end;
[RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend
Hi All, This patch enables LRA by default for MIPS. The classic reload is still available and can be enabled via -mreload switch. All regression are fixed, with one exception described below. There was a necessary change in the LRA core as I believe there was a genuine unhandled case in LRA when processing addresses. It is specific to MIPS16 as store/load[unsigned] halfword/byte instructions cannot access the stack pointer directly. Potentially, it can affect other architectures if they have similar limitation. One of the problems showed an RTL that contained $frame as the base register (without any offset, simple move) but LRA temporarily eliminated it to $sp before calling the target hook to validate the address. The backend rejected it because of the mode and $sp. Then, LRA tried to emit base+disp but ICEd because there never was any displacement. Another testcase, revealed offset not being used and unnecessary 'add' instructions were inserted preventing the use of offsets. Marking an insn with STACK_POINTER_REGNUM as valid was not an option as LRA would generate an insn with $sp and fail during coherency check. The patch attempts to reload $sp into a register and re-validate the address with offset (if there is one). If this fails it sticks to the original plan inserting base+disp. The generated code optimized for size is fairly acceptable. CSiBE shows a slight advantage of LRA over the reload for MIPS16 with some minor regression for mips32*, mips64*, on average less than 0.5%. The code size improvements are being investigated. The patch has been tested on the following variations: - cross-tested mips-mti-elf, mips-mti-linux-gnu (languages=c,c++): {-mips32,-mips32r2}{-EL,-EB}{-mhard-float,-msoft-float}{-mno-mips16,-mips16} -mips64r2 -mabi=n32 {-mhard-float,-msoft-float} -mips64r2 -mabi=64 {-mhard-float,-msoft-float} - bootstrapped and regtested x86_84-unknown-linux-gnu (all languages) There are two known DejaGNU failures on mips64 with mabi=64, namely, m{add,sub}-8 tests because of the subtleties in LRA costing model but it's not a correctness issue. The *mul_{add,sub}_si patterns are tuned explicitly for LRA and all failures have been resolved with m{add,sub}-* except the above. By saying failures I mean the differences between tests ran with/without -mreload switch. A number of failures already existed on ToT at the time of testing. The patch is intended for Stage 1. As for the legal part, the company-wide copyright assignment is in process. Regards, Robert testsuite/ChangeLog: 2014-03-26 Robert Suchanek robert.sucha...@imgtec.com * lra-constraints.c (base_to_reg): New function. (process_address): Use new function. * rtlanal.c (get_base_term): Add CONSTANT_P (*inner). * config/mips/constraints.md (d): BASE_REG_CLASS replaced by ADDR_REG_CLASS. * config/mips/mips.c (mips_regno_mode_ok_for_base_p): Remove use !strict_p for MIPS16. (mips_register_priority): New function that implements the target hook TARGET_REGISTER_PRIORITY. (mips_spill_class): Likewise for TARGET_SPILL_CLASS (mips_lra_p): Likewise for TARGET_LRA_P. * config/mips/mips.h (reg_class): Add M16F_REGS and SPILL_REGS classes. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. (BASE_REG_CLASS): Use M16F_REGS. (ADDR_REG_CLASS): Define. (IRA_HARD_REGNO_ADD_COST_MULTIPLIER): Define. * config/mips/mips.md (*mul_acc_si, *mul_sub_si): Add alternative tuned for LRA. New set attribute to enable alternatives depending on the register allocator used. (*andmode3_mips16): Remove the load alternatives. (*lea64): Disable pattern for MIPS16. * config/mips/mips.opt (mreload): New option. --- gcc/config/mips/constraints.md |2 +- gcc/config/mips/mips.c | 51 +- gcc/config/mips/mips.h | 17 +- gcc/config/mips/mips.md| 112 +++- gcc/config/mips/mips.opt |4 ++ gcc/lra-constraints.c | 44 +++- gcc/rtlanal.c |3 +- 7 files changed, 181 insertions(+), 52 deletions(-) diff --git gcc/config/mips/constraints.md gcc/config/mips/constraints.md index 49e4895..3810ac3 100644 --- gcc/config/mips/constraints.md +++ gcc/config/mips/constraints.md @@ -19,7 +19,7 @@ ;; Register constraints -(define_register_constraint d BASE_REG_CLASS +(define_register_constraint d ADDR_REG_CLASS An address register. This is equivalent to @code{r} unless generating MIPS16 code.) diff --git gcc/config/mips/mips.c gcc/config/mips/mips.c index 143169b..f27a801 100644 --- gcc/config/mips/mips.c +++ gcc/config/mips/mips.c @@ -2255,7 +2255,7 @@ mips_regno_mode_ok_for_base_p (int regno, enum machine_mode mode, All in all, it seems more consistent to only enforce this restriction
[PATCH] Fixing PR60656
This patch is fixing PR60656. Elements in a vector with vect_used_by_reduction property cannot be reordered if the use chain with this property does not have the same operation. Bootstrapped and tested on a x86-64 machine. OK for trunk? thanks, Cong diff --git a/gcc/ChangeLog b/gcc/ChangeLog index e1d8666..d7d5b82 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2014-03-28 Cong Hou co...@google.com + + PR tree-optimization/60656 + * tree-vect-stmts.c (supportable_widening_operation): + Fix a bug that elements in a vector with vect_used_by_reduction + property are incorrectly reordered when the operation on it is not + consistant with the one in reduction operation. + 2014-03-10 Jakub Jelinek ja...@redhat.com PR ipa/60457 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 41b6875..414a745 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-03-28 Cong Hou co...@google.com + + PR tree-optimization/60656 + * gcc.dg/vect/pr60656.c: New test. + 2014-03-10 Jakub Jelinek ja...@redhat.com PR ipa/60457 diff --git a/gcc/testsuite/gcc.dg/vect/pr60656.c b/gcc/testsuite/gcc.dg/vect/pr60656.c new file mode 100644 index 000..ebaab62 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr60656.c @@ -0,0 +1,45 @@ +/* { dg-require-effective-target vect_int } */ + +#include tree-vect.h + +__attribute__ ((noinline)) long +foo () +{ + int v[] = {5000, 5001, 5002, 5003}; + long s = 0; + int i; + + for(i = 0; i 4; ++i) +{ + long P = v[i]; + s += P*P*P; +} + return s; +} + +long +bar () +{ + int v[] = {5000, 5001, 5002, 5003}; + long s = 0; + int i; + + for(i = 0; i 4; ++i) +{ + long P = v[i]; + s += P*P*P; + __asm__ volatile (); +} + return s; +} + +int main() +{ + if (foo () != bar ()) +abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect } } */ +/* { dg-final { cleanup-tree-dump vect } } */ + diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 70fb411..7442d0c 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -7827,7 +7827,16 @@ supportable_widening_operation (enum tree_code code, gimple stmt, stmt, vectype_out, vectype_in, code1, code2, multi_step_cvt, interm_types)) - return true; +{ + tree lhs = gimple_assign_lhs (stmt); + use_operand_p dummy; + gimple use_stmt; + stmt_vec_info use_stmt_info = NULL; + if (single_imm_use (lhs, dummy, use_stmt) + (use_stmt_info = vinfo_for_stmt (use_stmt)) + STMT_VINFO_DEF_TYPE (use_stmt_info) == vect_reduction_def) +return true; +} c1 = VEC_WIDEN_MULT_LO_EXPR; c2 = VEC_WIDEN_MULT_HI_EXPR; break;
Re: [PATCH] Fix PR60505
Ping? thanks, Cong On Wed, Mar 19, 2014 at 11:39 AM, Cong Hou co...@google.com wrote: On Tue, Mar 18, 2014 at 4:43 AM, Richard Biener rguent...@suse.de wrote: On Mon, 17 Mar 2014, Cong Hou wrote: On Mon, Mar 17, 2014 at 6:44 AM, Richard Biener rguent...@suse.de wrote: On Fri, 14 Mar 2014, Cong Hou wrote: On Fri, Mar 14, 2014 at 12:58 AM, Richard Biener rguent...@suse.de wrote: On Fri, 14 Mar 2014, Jakub Jelinek wrote: On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote: Consider this fact and if there are alias checks, we can safely remove the epilogue if the maximum trip count of the loop is less than or equal to the calculated threshold. You have to consider n % vf != 0, so an argument on only maximum trip count or threshold cannot work. Well, if you only check if maximum trip count is = vf and you know that for n vf the vectorized loop + it's epilogue path will not be taken, then perhaps you could, but it is a very special case. Now, the question is when we are guaranteed we enter the scalar versioned loop instead for n vf, is that in case of versioning for alias or versioning for alignment? I think neither - I have plans to do the cost model check together with the versioning condition but didn't get around to implement that. That would allow stronger max bounds for the epilogue loop. In vect_transform_loop(), check_profitability will be set to true if th = VF-1 and the number of iteration is unknown (we only consider unknown trip count here), where th is calculated based on the parameter PARAM_MIN_VECT_LOOP_BOUND and cost model, with the minimum value VF-1. If the loop needs to be versioned, then check_profitability with true value will be passed to vect_loop_versioning(), in which an enhanced loop bound check (considering cost) will be built. So I think if the loop is versioned and n VF, then we must enter the scalar version, and in this case removing epilogue should be safe when the maximum trip count = th+1. You mean exactly in the case where the profitability check ensures that n % vf == 0? Thus effectively if n == maximum trip count? That's quite a special case, no? Yes, it is a special case. But it is in this special case that those warnings are thrown out. Also, I think declaring an array with VF*N as length is not unusual. Ok, but then for the patch compute the cost model threshold once in vect_analyze_loop_2 and store it in a new LOOP_VINFO_COST_MODEL_THRESHOLD. Done. Also you have to check the return value from max_stmt_executions_int as that may return -1 if the number cannot be computed (or isn't representable in a HOST_WIDE_INT). It will be converted to unsigned type so that -1 means infinity. You also should check for LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT which should have the same effect on the cost model check. Done. The existing condition is already complicated enough - adding new stuff warrants comments before the (sub-)checks. OK. Comments added. Below is the revised patch. Bootstrapped and tested on a x86-64 machine. Cong diff --git a/gcc/ChangeLog b/gcc/ChangeLog index e1d8666..eceefb3 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,18 @@ +2014-03-11 Cong Hou co...@google.com + + PR tree-optimization/60505 + * tree-vectorizer.h (struct _stmt_vec_info): Add th field as the + threshold of number of iterations below which no vectorization will be + done. + * tree-vect-loop.c (new_loop_vec_info): + Initialize LOOP_VINFO_COST_MODEL_THRESHOLD. + * tree-vect-loop.c (vect_analyze_loop_operations): + Set LOOP_VINFO_COST_MODEL_THRESHOLD. + * tree-vect-loop.c (vect_transform_loop): + Use LOOP_VINFO_COST_MODEL_THRESHOLD. + * tree-vect-loop.c (vect_analyze_loop_2): Check the maximum number + of iterations of the loop and see if we should build the epilogue. + 2014-03-10 Jakub Jelinek ja...@redhat.com PR ipa/60457 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 41b6875..09ec1c0 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-03-11 Cong Hou co...@google.com + + PR tree-optimization/60505 + * gcc.dg/vect/pr60505.c: New test. + 2014-03-10 Jakub Jelinek ja...@redhat.com PR ipa/60457 diff --git a/gcc/testsuite/gcc.dg/vect/pr60505.c b/gcc/testsuite/gcc.dg/vect/pr60505.c new file mode 100644 index 000..6940513 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr60505.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-additional-options -Wall -Werror } */ + +void foo(char *in, char *out, int num) +{ + int i; + char ovec[16] = {0}; + + for(i = 0; i num ; ++i) +out[i] = (ovec[i] = in[i]); + out[num] = ovec[num/2]; +} diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index df6ab6f..1c78e11 100644 ---