Re: [Testsuite] Fix Cilk's exp to add -B for libcilkrts (was: Re: [Build, Driver] Add -lcilkrts for -fcilkplus)

2014-03-28 Thread Rainer Orth
Tobias Burnus bur...@net-b.de writes:

 Rainer Orth wrote:
 Tobias Burnus bur...@net-b.de writes:
 H.J. Lu wrote:
 xgcc: error: libcilkrts.spec: No such file or directory
 Hmm, I really wonder why it fails for you while it works for me:
 Do you happen to have the same/a recent version installed at the same
 prefix your build under test is configured for?

 I had - after I removed it, I could reproduce it. Sorry!

 Fixed by the attached testsuite patch. HJ: Does it now pass for you? For me
 it now does.

 OK for the trunk?

Ok.

In cases of massive reindentation like this, it's often more helpful to
post diff -w output to better see the gist of the changes.

Thanks for fixing this.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[Patch]Simplify SUBREG with operand whose target bits are cleared by AND operation

2014-03-28 Thread Terry Guo
Hi there,

When compile below case for ARM Thumb-2 target:

long long int
test (unsigned long long int a, unsigned int b)
{
  return (a  0x) * b;
}

I find the GCC function simplify_subreg fails to simplify rtx (subreg:SI
(and:DI (reg/v:DI 115 [ a ]) (const_int 4294967295 [0x])) 4) to zero
during the fwprop1 pass, considering the fact that the high 32-bit part of
(a  0x) is zero. This leads to some unnecessary multiplications for
high 32-bit part of the result of AND operation. The attached patch is
trying to improve simplify_rtx to handle such case. Other target like x86
seems hasn't such issue because it generates different RTX to handle 64bit
multiplication on a 32bit machine.

Bootstrapped gcc on x86 machine, no problem. Tested with gcc regression test
for x86 and Thumb2, no regression.

Is it OK to stage-1?

BR,
Terrydiff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 04af01e..0ed88fb 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -6099,6 +6099,19 @@ simplify_subreg (enum machine_mode outermode, rtx op,
return CONST0_RTX (outermode);
 }
 
+  /* The AND operation may clear the target bits of SUBREG to zero.
+ Then we just need to return a zero.  Here is an example:
+ (subreg:SI (and:DI (reg:DI X) (const_int 0x)) 4).  */
+  if (GET_CODE (op) == AND  SCALAR_INT_MODE_P (innermode))
+{
+  unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte);
+  unsigned HOST_WIDE_INT nzmask = nonzero_bits (op, innermode);
+  unsigned HOST_WIDE_INT smask = GET_MODE_MASK (outermode);
+
+  if (((smask  bitpos)  nzmask) == 0)
+   return CONST0_RTX (outermode);
+}
+
   if (SCALAR_INT_MODE_P (outermode)
SCALAR_INT_MODE_P (innermode)
GET_MODE_PRECISION (outermode)  GET_MODE_PRECISION (innermode)
diff --git a/gcc/testsuite/gcc.target/arm/umull.c 
b/gcc/testsuite/gcc.target/arm/umull.c
new file mode 100644
index 000..2e39baa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/umull.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-skip-if  { arm_thumb1 } } */
+/* { dg-options -O2 } */
+
+long long int
+test (unsigned long long int a, unsigned int b)
+{
+  return (a  0x) * b;
+}
+
+/* { dg-final { scan-assembler-not mla } } */


RE: [Patch]Simplify SUBREG with operand whose target bits are cleared by AND operation

2014-03-28 Thread Terry Guo


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Terry Guo
 Sent: Friday, March 28, 2014 3:48 PM
 To: gcc-patches@gcc.gnu.org
 Subject: [Patch]Simplify SUBREG with operand whose target bits are cleared
 by AND operation
 
 Hi there,
 
 When compile below case for ARM Thumb-2 target:
 
 long long int
 test (unsigned long long int a, unsigned int b)
 {
   return (a  0x) * b;
 }
 
 I find the GCC function simplify_subreg fails to simplify rtx (subreg:SI
 (and:DI (reg/v:DI 115 [ a ]) (const_int 4294967295 [0x])) 4) to
zero
 during the fwprop1 pass, considering the fact that the high 32-bit part of
 (a  0x) is zero. This leads to some unnecessary multiplications
for
 high 32-bit part of the result of AND operation. The attached patch is
 trying to improve simplify_rtx to handle such case. Other target like x86
 seems hasn't such issue because it generates different RTX to handle 64bit
 multiplication on a 32bit machine.
 
 Bootstrapped gcc on x86 machine, no problem. Tested with gcc regression
 test
 for x86 and Thumb2, no regression.
 
 Is it OK to stage-1?
 
 BR,
 Terry

Sorry for missing the ChangeLog part:

gcc/
2014-03-28  Terry Guo  terry@arm.com

* fwprop.c (simplify_subreg): Handle case that bits are
cleared by AND operation.

gcc/testsuite/
2014-03-28  Terry Guo  terry@arm.com

* gcc.target/arm/umull.c: New testcase.




Re: C++ PATCH for c++/60566 (dtor devirtualization and missing thunks)

2014-03-28 Thread Rainer Orth
Andreas Schwab sch...@suse.de writes:

 Jason Merrill ja...@redhat.com writes:

 diff --git a/gcc/testsuite/g++.dg/abi/thunk6.C
 b/gcc/testsuite/g++.dg/abi/thunk6.C
 new file mode 100644
 index 000..e3d07f2
 --- /dev/null
 +++ b/gcc/testsuite/g++.dg/abi/thunk6.C
 @@ -0,0 +1,18 @@
 +// PR c++/60566
 +// We need to emit the construction vtable thunk for ~C even if we aren't
 +// going to use it.
 +
 +struct A
 +{
 +  virtual void f() = 0;
 +  virtual ~A() {}
 +};
 +
 +struct B: virtual A { int i; };
 +struct C: virtual A { int i; ~C(); };
 +
 +C::~C() {}
 +
 +int main() {}
 +
 +// { dg-final { scan-assembler _ZTv0_n32_N1CD1Ev } }

 FAIL: g++.dg/abi/thunk6.C -std=c++11  scan-assembler _ZTv0_n32_N1CD1Ev

 $ grep _ZTv0_ thunk6.s
 .globl  _ZTv0_n16_N1CD1Ev
 .type   _ZTv0_n16_N1CD1Ev, @function
 _ZTv0_n16_N1CD1Ev:
 .size   _ZTv0_n16_N1CD1Ev, .-_ZTv0_n16_N1CD1Ev
 .globl  _ZTv0_n16_N1CD0Ev
 .type   _ZTv0_n16_N1CD0Ev, @function
 _ZTv0_n16_N1CD0Ev:
 .size   _ZTv0_n16_N1CD0Ev, .-_ZTv0_n16_N1CD0Ev

It would help to state which target this is...

Same for the 32-bit multilib on Solaris/SPARC and x86
(i386-pc-solaris2.11, sparc-sun-solaris2.11).

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[AArch64] Implement ADD in vector registers for 32-bit scalar values.

2014-03-28 Thread James Greenhalgh

Hi,

There is no way to perform scalar addition in the vector register file,
but with the RTX costs in place we start rewriting (x  1) to (x + x)
on almost all cores. The code which makes this decision has no idea that we
will end up doing this (it happens well before reload) and so we end up with
very ugly code generation in the case where addition was selected, but
we are operating in vector registers.

This patch relies on the same gimmick we are already using to allow
shifts on 32-bit scalars in the vector register file - Use a vector 32x2
operation instead, knowing that we can safely ignore the top bits.

This restores some normality to scalar_shift_1.c, however the test
that we generate a left shift by one is clearly bogus, so remove that.

This patch is pretty ugly, but it does generate superficially better
looking code for this testcase.

Tested on aarch64-none-elf with no issues.

OK for stage 1?

Thanks,
James

---
gcc/

2014-03-27  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
vector registers.

gcc/testsuite/
2014-03-27  James Greenhalgh  james.greenha...@arm.com

* gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c86a29d8e7f8df21f25e14d22df1c3e8c37c907f..9c544a0a473732ebdf9238205db96d0d0c57de9a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1063,16 +1063,17 @@ (define_expand addmode3
 
 (define_insn *addsi3_aarch64
   [(set
-(match_operand:SI 0 register_operand =rk,rk,rk)
+(match_operand:SI 0 register_operand =rk,rk,w,rk)
 (plus:SI
- (match_operand:SI 1 register_operand %rk,rk,rk)
- (match_operand:SI 2 aarch64_plus_operand I,r,J)))]
+ (match_operand:SI 1 register_operand %rk,rk,w,rk)
+ (match_operand:SI 2 aarch64_plus_operand I,r,w,J)))]
   
   @
   add\\t%w0, %w1, %2
   add\\t%w0, %w1, %w2
+  add\\t%0.2s, %1.2s, %2.2s
   sub\\t%w0, %w1, #%n2
-  [(set_attr type alu_imm,alu_reg,alu_imm)]
+  [(set_attr type alu_imm,alu_reg,neon_add,alu_imm)]
 )
 
 ;; zero_extend version of above
diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
index 7cb17f8..826bafc 100644
--- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
@@ -193,7 +193,6 @@ test_corners_sisd_di (Int64x1 b)
   return b;
 }
 /* { dg-final { scan-assembler sshr\td\[0-9\]+,\ d\[0-9\]+,\ 63 } } */
-/* { dg-final { scan-assembler shl\td\[0-9\]+,\ d\[0-9\]+,\ 1 } } */
 
 Int32x1
 test_corners_sisd_si (Int32x1 b)
@@ -207,7 +206,6 @@ test_corners_sisd_si (Int32x1 b)
   return b;
 }
 /* { dg-final { scan-assembler sshr\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 31 } } */
-/* { dg-final { scan-assembler shl\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 1 } } */
 
 
 

Re: [PING^7][PATCH] Add a couple of dialect and warning options regarding Objective-C instance variable scope

2014-03-28 Thread Dimitris Papavasiliou

Ping!

On 03/23/2014 03:20 AM, Dimitris Papavasiliou wrote:

Ping!

On 03/13/2014 11:54 AM, Dimitris Papavasiliou wrote:

Ping!

On 03/06/2014 07:44 PM, Dimitris Papavasiliou wrote:

Ping!

On 02/27/2014 11:44 AM, Dimitris Papavasiliou wrote:

Ping!

On 02/20/2014 12:11 PM, Dimitris Papavasiliou wrote:

Hello all,

Pinging this patch review request again. See previous messages quoted
below for details.

Regards,
Dimitris

On 02/13/2014 04:22 PM, Dimitris Papavasiliou wrote:

Hello,

Pinging this patch review request. Can someone involved in the
Objective-C language frontend have a quick look at the description of
the proposed features and tell me if it'd be ok to have them in the
trunk so I can go ahead and create proper patches?

Thanks,
Dimitris

On 02/06/2014 11:25 AM, Dimitris Papavasiliou wrote:

Hello,

This is a patch regarding a couple of Objective-C related dialect
options and warning switches. I have already submitted it a while
ago
but gave up after pinging a couple of times. I am now informed that
should have kept pinging until I got someone's attention so I'm
resending it.

The patch is now against an old revision and as I stated originally
it's
probably not in a state that can be adopted as is. I'm sending it
as is
so that the implemented features can be assesed in terms of their
usefulness and if they're welcome I'd be happy to make any necessary
changes to bring it up-to-date, split it into smaller patches, add
test-cases and anything else that is deemed necessary.

Here's the relevant text from my initial message:

Two of these switches are related to a feature request I submitted a
while ago, Bug 56044
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56044). I won't
reproduce
the entire argument here since it is available in the feature
request.
The relevant functionality in the patch comes in the form of two
switches:

-Wshadow-ivars which controls the local declaration of ‘somevar’
hides
instance variable warning which curiously is enabled by default
instead
of being controlled at least by -Wshadow. The patch changes it so
that
this warning can be enabled and disabled specifically through
-Wshadow-ivars as well as with all other shadowing-related warnings
through -Wshadow.

The reason for the extra switch is that, while searching through the
Internet for a solution to this problem I have found out that other
people are inconvenienced by this particular warning as well so it
might
be useful to be able to turn it off while keeping all the other
shadowing-related warnings enabled.

-flocal-ivars which when true, as it is by default, treats instance
variables as having local scope. If false (-fno-local-ivars)
instance
variables must always be referred to as self-ivarname and
references of
ivarname resolve to the local or global scope as usual.

I've also taken the opportunity of adding another switch
unrelated to
the above but related to instance variables:

-fivar-visibility which can be set to either private, protected (the
default), public and package. This sets the default instance
variable
visibility which normally is implicitly protected. My use-case for
it is
basically to be able to set it to public and thus effectively
disable
this visibility mechanism altogether which I find no use for and
therefore have to circumvent. I'm not sure if anyone else feels the
same
way towards this but I figured it was worth a try.

I'm attaching a preliminary patch against the current revision in
case
anyone wants to have a look. The changes are very small and any
blatant
mistakes should be immediately obvious. I have to admit to having
virtually no knowledge of the internals of GCC but I have tried to
keep
in line with formatting guidelines and general style as well as
looking
up the particulars of the way options are handled in the available
documentation to avoid blind copy-pasting. I have also tried to test
the
functionality both in my own (relatively large, or at least not too
small) project and with small test programs and everything works as
expected. Finallly, I tried running the tests too but these fail to
complete both in the patched and unpatched version, possibly due to
the
way I've configured GCC.

Dimitris
















Re: [PATCH] Fix PR c++/60573

2014-03-28 Thread Adam Butcher

On 2014-03-27 21:16, Adam Butcher wrote:

On 2014-03-27 20:45, Adam Butcher wrote:

PR c++/60573
* name-lookup.h (cp_binding_level): New field scope_defines_class_p.
* semantics.c (begin_class_definition): Set scope_defines_class_p.
* pt.c (instantiate_class_template_1): Likewise.
* parser.c (synthesize_implicit_template_parm): Use 
cp_binding_level::
scope_defines_class_p rather than TYPE_BEING_DEFINED as the 
predicate

for unwinding to class-defining scope to handle the erroneous
definition of a generic function of an arbitrarily nested class 
within an

enclosing class.


Still got issues with this.  It fails on out-of-line defs.  I'll have
another look.


Turns out the solution was OK but I didn't account for the 
class-defining scope being reused for subsequent out-of-line 
declarations.  I've made 'scope_defines_class_p' in to the now transient 
'defining_class_p' predicate which is reset on leaving scope.  I've 
ditched the 'scope_' prefix and also ditched the modifications to 
'instantiate_class_template_1'.


The patch delta is included below (but will probably be munged by my 
webmail client).  I'll reply to this with the full patch.


There is also the fix for PR c++/60626 
(http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01294.html) that deals 
with another form of erroneous generic function declarations with nested 
class scope.


Cheers,
Adam


diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 53f14f3..0137c3f 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -1630,10 +1630,14 @@ leave_scope (void)
   free_binding_level = scope;
 }

-  /* Find the innermost enclosing class scope, and reset
- CLASS_BINDING_LEVEL appropriately.  */
   if (scope-kind == sk_class)
 {
+  /* Reset DEFINING_CLASS_P to allow for reuse of a
+class-defining scope in a non-defining context.  */
+  scope-defining_class_p = 0;
+
+  /* Find the innermost enclosing class scope, and reset
+CLASS_BINDING_LEVEL appropriately.  */
   class_binding_level = NULL;
   for (scope = current_binding_level; scope; scope = 
scope-level_chain)

if (scope-kind == sk_class)
diff --git a/gcc/cp/name-lookup.h b/gcc/cp/name-lookup.h
index 9e5d812..40e0338 100644
--- a/gcc/cp/name-lookup.h
+++ b/gcc/cp/name-lookup.h
@@ -255,9 +255,12 @@ struct GTY(()) cp_binding_level {
   unsigned more_cleanups_ok : 1;
   unsigned have_cleanups : 1;

-  /* Set if this scope is of sk_class kind and is the defining
- scope for this_entity.  */
-  unsigned scope_defines_class_p : 1;
+  /* Transient state set if this scope is of sk_class kind
+ and is in the process of defining 'this_entity'.  Reset
+ on leaving the class definition to allow for the scope
+ to be subsequently re-used as a non-defining scope for
+ 'this_entity'.  */
+  unsigned defining_class_p : 1;

   /* 23 bits left to fill a 32-bit word.  */
 };
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4919a67..0945bfd 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -32027,7 +32027,7 @@ synthesize_implicit_template_parm  (cp_parser 
*parser)
declarator should be injected into the scope of 'A' as if 
the

ill-formed template was specified explicitly.  */

- while (scope-kind == sk_class  
!scope-scope_defines_class_p)

+ while (scope-kind == sk_class  !scope-defining_class_p)
{
  parent_scope = scope;
  scope = scope-level_chain;
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 90faeec..c791d03 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8905,12 +8905,9 @@ instantiate_class_template_1 (tree type)
 return type;

   /* Now we're really doing the instantiation.  Mark the type as in
- the process of being defined...  */
+ the process of being defined.  */
   TYPE_BEING_DEFINED (type) = 1;

-  /* ... and the scope defining it.  */
-  class_binding_level-scope_defines_class_p = 1;
-
   /* We may be in the middle of deferred access check.  Disable
  it now.  */
   push_deferring_access_checks (dk_no_deferred);
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index deba2ab..207a42d 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -2777,7 +2777,7 @@ begin_class_definition (tree t)
   maybe_process_partial_specialization (t);
   pushclass (t);
   TYPE_BEING_DEFINED (t) = 1;
-  class_binding_level-scope_defines_class_p = 1;
+  class_binding_level-defining_class_p = 1;

   if (flag_pack_struct)
 {



[PATCH] Fix PR c++/60573

2014-03-28 Thread Adam Butcher
PR c++/60573
* name-lookup.h (cp_binding_level): New transient field defining_class_p
to indicate whether a scope is in the process of defining a class.
* semantics.c (begin_class_definition): Set defining_class_p.
* name-lookup.c (leave_scope): Reset defining_class_p.
* parser.c (synthesize_implicit_template_parm): Use cp_binding_level::
defining_class_p rather than TYPE_BEING_DEFINED as the predicate for
unwinding to class-defining scope to handle the erroneous definition of
a generic function of an arbitrarily nested class within an enclosing
class.

PR c++/60573
* g++.dg/cpp1y/pr60573.C: New testcase.
---
 gcc/cp/name-lookup.c |  8 ++--
 gcc/cp/name-lookup.h |  9 -
 gcc/cp/parser.c  | 23 +--
 gcc/cp/semantics.c   |  1 +
 gcc/testsuite/g++.dg/cpp1y/pr60573.C | 28 
 5 files changed, 60 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/pr60573.C

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 53f14f3..0137c3f 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -1630,10 +1630,14 @@ leave_scope (void)
   free_binding_level = scope;
 }
 
-  /* Find the innermost enclosing class scope, and reset
- CLASS_BINDING_LEVEL appropriately.  */
   if (scope-kind == sk_class)
 {
+  /* Reset DEFINING_CLASS_P to allow for reuse of a
+class-defining scope in a non-defining context.  */
+  scope-defining_class_p = 0;
+
+  /* Find the innermost enclosing class scope, and reset
+CLASS_BINDING_LEVEL appropriately.  */
   class_binding_level = NULL;
   for (scope = current_binding_level; scope; scope = scope-level_chain)
if (scope-kind == sk_class)
diff --git a/gcc/cp/name-lookup.h b/gcc/cp/name-lookup.h
index a63442f..40e0338 100644
--- a/gcc/cp/name-lookup.h
+++ b/gcc/cp/name-lookup.h
@@ -255,7 +255,14 @@ struct GTY(()) cp_binding_level {
   unsigned more_cleanups_ok : 1;
   unsigned have_cleanups : 1;
 
-  /* 24 bits left to fill a 32-bit word.  */
+  /* Transient state set if this scope is of sk_class kind
+ and is in the process of defining 'this_entity'.  Reset
+ on leaving the class definition to allow for the scope
+ to be subsequently re-used as a non-defining scope for
+ 'this_entity'.  */
+  unsigned defining_class_p : 1;
+
+  /* 23 bits left to fill a 32-bit word.  */
 };
 
 /* The binding level currently in effect.  */
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index e729d65..0945bfd 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -32000,7 +32000,7 @@ synthesize_implicit_template_parm  (cp_parser *parser)
{
  /* If not defining a class, then any class scope is a scope level in
 an out-of-line member definition.  In this case simply wind back
-beyond the first such scope to inject the template argument list.
+beyond the first such scope to inject the template parameter list.
 Otherwise wind back to the class being defined.  The latter can
 occur in class member friend declarations such as:
 
@@ -32011,12 +32011,23 @@ synthesize_implicit_template_parm  (cp_parser *parser)
 friend void A::foo (auto);
   };
 
-   The template argument list synthesized for the friend declaration
-   must be injected in the scope of 'B', just beyond the scope of 'A'
-   introduced by 'A::'.  */
+   The template parameter list synthesized for the friend declaration
+   must be injected in the scope of 'B'.  This can also occur in
+   erroneous cases such as:
 
- while (scope-kind == sk_class
- !TYPE_BEING_DEFINED (scope-this_entity))
+  struct A {
+struct B {
+  void foo (auto);
+};
+void B::foo (auto) {}
+  };
+
+   Here the attempted definition of 'B::foo' within 'A' is ill-formed
+   but, nevertheless, the template parameter list synthesized for the
+   declarator should be injected into the scope of 'A' as if the
+   ill-formed template was specified explicitly.  */
+
+ while (scope-kind == sk_class  !scope-defining_class_p)
{
  parent_scope = scope;
  scope = scope-level_chain;
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 886fbb8..207a42d 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -2777,6 +2777,7 @@ begin_class_definition (tree t)
   maybe_process_partial_specialization (t);
   pushclass (t);
   TYPE_BEING_DEFINED (t) = 1;
+  class_binding_level-defining_class_p = 1;
 
   if (flag_pack_struct)
 {
diff --git a/gcc/testsuite/g++.dg/cpp1y/pr60573.C 
b/gcc/testsuite/g++.dg/cpp1y/pr60573.C
new file mode 100644

Skip gcc.dg/tree-ssa/isolate-*.c for AVR Target

2014-03-28 Thread K_s, Vishnu
Hi all,

The tests added in gcc.dg/tree-ssa/isolate-*.c is failing for AVR target,
Because the isolate erroneous path pass needs -fdelete-null-pointer-checks
option to be enabled. For AVR target that option is disabled, this cause 
the tests to fail. Following Patch skip the isolate-* tests if 
keeps_null_pointer_checks is true. 

2014-03-28  Vishnu K S vishnu@atmel.com 

* gcc/testsuite/gcc.dg/tree-ssa/isolate-1.c: Skip test for AVR 
* gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c: Ditto 
* gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c: Ditto
* gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c: Ditto
* gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c: Ditto

--- a/gcc/testsuite/gcc.dg/tree-ssa/isolate-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/isolate-1.c
@@ -1,6 +1,7 @@

 /* { dg-do compile } */
 /* { dg-options -O2 -fdump-tree-isolate-paths } */
+/* { dg-skip-if  keeps_null_pointer_checks } */


 struct demangle_component
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c
index bfcaa2b..912d98e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fisolate-erroneous-paths-attribute 
-fdump-tree-isolate-paths -fdump-tree-phicprop1 } */
+/* { dg-skip-if  keeps_null_pointer_checks } */


 int z;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c
index 780..9c2c5d5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fdump-tree-isolate-paths } */
+/* { dg-skip-if  keeps_null_pointer_checks } */


 typedef long unsigned int size_t;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c
index c9c074d..d50a2b2 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fisolate-erroneous-paths-attribute 
-fdump-tree-isolate-paths -fdump-tree-phicprop1 } */
+/* { dg-skip-if  keeps_null_pointer_checks } */


 extern void foo(void *) __attribute__ ((__nonnull__ (1)));
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c
index 4d01d5c..e6ae37a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fdump-tree-isolate-paths -fdump-tree-optimized } */
+/* { dg-skip-if  keeps_null_pointer_checks } */

Regards,
Vishnu KS


Re: RFA: Fix PR rtl-optimization/60651

2014-03-28 Thread Eric Botcazou
 However, the first call is for blocks with incoming abnormal edges.
 If these are empty, the change as I wrote it yesterday is fine, but not
 when they are non-empty; in that case, we should indeed insert before the
 first instruction in that block.

OK, so the issue is specific to empty basic blocks and boils down to inserting 
instructions in a FIFO manner into them.

 This can be archived by finding an insert-before position using NEXT_INSN
 on the basic block head; this amounts to the very same insertion place
 as inserting after the basic block head.  Also, we will continue to set no
 location, and use the same bb, because both add_insn_before and
 add_insn_after (in contradiction to its block comment) will infer the basic
 block from the insn given (in the case for add_insn_before, I assume
 that the basic block doesn't start with a BARRIER - that would be invalid -
 and that the insn it starts with has a valid BLOCK_FOR_INSN setting the
 same way the basic block head has.

This looks reasonable, but I think that we need more commentary because it's 
not straightforward to understand, so I would:

  1. explicitly state that we enforce an order on the entities in addition to 
the order on priority, both in the code (for example create a 4th paragraph in 
the comment at the top of the file, before More details ...) and in the doc 
as you already did, but ordering the two orders for the sake of clarity: 
first the order on priority then, for the same priority, the order to the 
entities.

  2. add a line in the head comment of new_seginfo saying that INSN may not be 
a NOTE_BASIC_BLOCK, unless BB is empty.

  3. add a comment above the trick in optimize_mode_switching saying that it 
is both required to implement the FIFO insertion and valid because we know 
that the basic block was initially empty.

It's not clear to me whether this is a regression or not, so you'll also need 
to run it by the RMs.  In the meantime I have installed the attached patchlet.


2014-03-28  Eric Botcazou  ebotca...@adacore.com

* mode-switching.c: Make small adjustments to the top comment.


-- 
Eric BotcazouIndex: mode-switching.c
===
--- mode-switching.c	(revision 208879)
+++ mode-switching.c	(working copy)
@@ -45,20 +45,20 @@ along with GCC; see the file COPYING3.
and finding all the insns which require a specific mode.  Each insn gets
a unique struct seginfo element.  These structures are inserted into a list
for each basic block.  For each entity, there is an array of bb_info over
-   the flow graph basic blocks (local var 'bb_info'), and contains a list
+   the flow graph basic blocks (local var 'bb_info'), which contains a list
of all insns within that basic block, in the order they are encountered.
 
For each entity, any basic block WITHOUT any insns requiring a specific
-   mode are given a single entry, without a mode.  (Each basic block
-   in the flow graph must have at least one entry in the segment table.)
+   mode are given a single entry without a mode (each basic block in the
+   flow graph must have at least one entry in the segment table).
 
The LCM algorithm is then run over the flow graph to determine where to
-   place the sets to the highest-priority value in respect of first the first
+   place the sets to the highest-priority mode with respect to the first
insn in any one block.  Any adjustments required to the transparency
vectors are made, then the next iteration starts for the next-lower
priority mode, till for each entity all modes are exhausted.
 
-   More details are located in the code for optimize_mode_switching().  */
+   More details can be found in the code of optimize_mode_switching.  */
 
 /* This structure contains the information for each insn which requires
either single or double mode to be set.

Re: Fix PR ipa/60315 (inliner explosion)

2014-03-28 Thread Jakub Jelinek
On Thu, Mar 27, 2014 at 12:02:01PM +0100, Andreas Schwab wrote:
  --- testsuite/g++.dg/torture/pr60315.C  (revision 0)
  +++ testsuite/g++.dg/torture/pr60315.C  (revision 0)
  @@ -0,0 +1,32 @@
  +// { dg-do compile }
  +struct Base {
  +virtual int f() = 0;
  +};
  +
  +struct Derived : public Base {
  +virtual int f() final override {
  +return 42;
  +}
  +};
  +
  +extern Base* b;
  +
  +int main() {
  +return (static_castDerived*(b)-*(Derived::f))();
  +}
 
 FAIL: g++.dg/torture/pr60315.C  -O0  (test for excess errors)
 Excess errors:
 /usr/local/gcc/gcc-20140327/gcc/testsuite/g++.dg/torture/pr60315.C:7:19: 
 warning: override controls (override/final) only available with -std=c++11 or 
 -std=gnu++11
 /usr/local/gcc/gcc-20140327/gcc/testsuite/g++.dg/torture/pr60315.C:7:21: 
 warning: override controls (override/final) only available with -std=c++11 or 
 -std=gnu++11

As dg-torture.exp doesn't cycle through c++98/c++11/c++14, I've committed
this fix as obvious:

2014-03-28  Jakub Jelinek  ja...@redhat.com

PR ipa/60315
* g++.dg/torture/pr60315.C: Add -std=c++11 to dg-options.

--- gcc/testsuite/g++.dg/torture/pr60315.C.jj   2014-03-26 10:13:22.0 
+0100
+++ gcc/testsuite/g++.dg/torture/pr60315.C  2014-03-28 11:07:08.671208010 
+0100
@@ -1,4 +1,7 @@
+// PR ipa/60315
 // { dg-do compile }
+// { dg-options -std=c++11 }
+
 struct Base {
 virtual int f() = 0;
 };


Jakub


Re: C++ PATCH for c++/60566 (dtor devirtualization and missing thunks)

2014-03-28 Thread Jakub Jelinek
On Fri, Mar 28, 2014 at 10:06:45AM +0100, Rainer Orth wrote:
  FAIL: g++.dg/abi/thunk6.C -std=c++11  scan-assembler _ZTv0_n32_N1CD1Ev
 
  $ grep _ZTv0_ thunk6.s
  .globl  _ZTv0_n16_N1CD1Ev
  .type   _ZTv0_n16_N1CD1Ev, @function
  _ZTv0_n16_N1CD1Ev:
  .size   _ZTv0_n16_N1CD1Ev, .-_ZTv0_n16_N1CD1Ev
  .globl  _ZTv0_n16_N1CD0Ev
  .type   _ZTv0_n16_N1CD0Ev, @function
  _ZTv0_n16_N1CD0Ev:
  .size   _ZTv0_n16_N1CD0Ev, .-_ZTv0_n16_N1CD0Ev
 
 It would help to state which target this is...
 
 Same for the 32-bit multilib on Solaris/SPARC and x86
 (i386-pc-solaris2.11, sparc-sun-solaris2.11).

Seems it fails on all ilp32 targets I've tried and succeeds on all lp64
targets (including ia64), so I think we should do following.
Ok for trunk?

2014-03-28  Jakub Jelinek  ja...@redhat.com

PR c++/58678
* g++.dg/abi/thunk6.C: Scan assembler for _ZTv0_n32_N1CD1Ev
only for lp64 targets and scan for _ZTv0_n16_N1CD1Ev for ilp32
targets.

--- gcc/testsuite/g++.dg/abi/thunk6.C.jj2014-03-26 20:31:53.0 
+0100
+++ gcc/testsuite/g++.dg/abi/thunk6.C   2014-03-28 11:20:45.051852976 +0100
@@ -15,4 +15,5 @@ C::~C() {}
 
 int main() {}
 
-// { dg-final { scan-assembler _ZTv0_n32_N1CD1Ev } }
+// { dg-final { scan-assembler _ZTv0_n32_N1CD1Ev { target lp64 } } }
+// { dg-final { scan-assembler _ZTv0_n16_N1CD1Ev { target ilp32 } } }


Jakub


Evident fix for copy_loops.

2014-03-28 Thread Yuri Rumyantsev
Hi All,

I found out that a field 'safelen of struct loop is not copied in copy_loops.

Is it OK for trunk?

ChangeLog:

2014-03-28  Yuri Rumyantsev  ysrum...@gmail.com

* tree-inline.c (copy_loops): Add missed copy of 'safelen'.


copy-loops-fix
Description: Binary data


[C++ PATCH] Fix __atomic_exchange (PR c++/60689)

2014-03-28 Thread Jakub Jelinek
Hi!

__atomic_exchange doesn't work in C++.  The problem is that
add_atomic_size_parameter, if there is no room in params vector, creates
new params vector big enough that the extra argument fixs in, but doesn't
add the extra argument in, because it relies on the subsequent
build_function_call_vec to call resolve_overloaded_builtin recursively and
add the parameter.  The C build_function_call_vec does that, but C++
doesn't, there resolve_overloaded_builtin is called from finish_call_expr
instead.

My first attempt to fix this - changing add_atomic_size_parameter to add
that argument - broke C, where the recursive resolve_overloaded_builtin -
get_atomic_generic_size would complain about too many arguments.

Here is one possible fix for this, let the C++ build_function_call_vec
(which is only called from two c-family/c-common.c spots that expect this
behavior) behave more like the C call.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk (and
4.8)?

Another alternative is some static flag in resolve_overloaded_function that
would just punt if the function is called recursively, but I think if we can
avoid adding global state, we should (otherwise JIT won't like it too much).

Yet another possibility would be to rename all calls in C FE to
build_function_call_vec to say c_build_function_call_vec and add that
function which would call resolve_overloaded_builtin and then tail call to
build_function_call_vec which wouldn't do that.  Then c-family/ would keep
its current two calls to that function, which wouldn't recurse anymore, and
we'd need to change add_atomic_size_parameter to push the argument.

2014-03-28  Jakub Jelinek  ja...@redhat.com

PR c++/60689
* typeck.c (build_function_call_vec): Call resolve_overloaded_builtin.

* c-c++-common/pr60689.c: New test.

--- gcc/cp/typeck.c.jj  2014-03-10 10:50:14.0 +0100
+++ gcc/cp/typeck.c 2014-03-28 08:07:49.737656541 +0100
@@ -3363,10 +3363,23 @@ build_function_call (location_t /*loc*/,
 
 /* Used by the C-common bits.  */
 tree
-build_function_call_vec (location_t /*loc*/, veclocation_t /*arg_loc*/,
+build_function_call_vec (location_t loc, veclocation_t /*arg_loc*/,
 tree function, vectree, va_gc *params,
 vectree, va_gc * /*origtypes*/)
 {
+  /* This call is here to match what the C FE does in its
+ build_function_call_vec.  See PR60689.  */
+  if (TREE_CODE (function) == FUNCTION_DECL)
+{
+  /* Implement type-directed function overloading for builtins.
+resolve_overloaded_builtin and targetm.resolve_overloaded_builtin
+handle all the type checking.  The result is a complete expression
+that implements this function call.  */
+  tree tem = resolve_overloaded_builtin (loc, function, params);
+  if (tem)
+   return tem;
+}
+
   vectree, va_gc *orig_params = params;
   tree ret = cp_build_function_call_vec (function, params,
 tf_warning_or_error);
--- gcc/testsuite/c-c++-common/pr60689.c.jj 2014-03-27 22:06:31.703103613 
+0100
+++ gcc/testsuite/c-c++-common/pr60689.c2014-03-27 22:06:46.542024952 
+0100
@@ -0,0 +1,10 @@
+/* PR c++/60689 */
+/* { dg-do compile } */
+
+struct S { char x[9]; };
+
+void
+foo (struct S *x, struct S *y, struct S *z)
+{
+  __atomic_exchange (x, y, z, __ATOMIC_SEQ_CST);
+}

Jakub


Re: Evident fix for copy_loops.

2014-03-28 Thread Jakub Jelinek
On Fri, Mar 28, 2014 at 02:41:26PM +0400, Yuri Rumyantsev wrote:
 Hi All,
 
 I found out that a field 'safelen of struct loop is not copied in copy_loops.
 
 Is it OK for trunk?

Ok if it passes bootstrap/regtest.

 2014-03-28  Yuri Rumyantsev  ysrum...@gmail.com
 
 * tree-inline.c (copy_loops): Add missed copy of 'safelen'.

Jakub


Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.

2014-03-28 Thread pinskia


 On Mar 28, 2014, at 2:12 AM, James Greenhalgh james.greenha...@arm.com 
 wrote:
 
 
 Hi,
 
 There is no way to perform scalar addition in the vector register file,
 but with the RTX costs in place we start rewriting (x  1) to (x + x)
 on almost all cores. The code which makes this decision has no idea that we
 will end up doing this (it happens well before reload) and so we end up with
 very ugly code generation in the case where addition was selected, but
 we are operating in vector registers.
 
 This patch relies on the same gimmick we are already using to allow
 shifts on 32-bit scalars in the vector register file - Use a vector 32x2
 operation instead, knowing that we can safely ignore the top bits.
 
 This restores some normality to scalar_shift_1.c, however the test
 that we generate a left shift by one is clearly bogus, so remove that.
 
 This patch is pretty ugly, but it does generate superficially better
 looking code for this testcase.
 
 Tested on aarch64-none-elf with no issues.
 
 OK for stage 1?

It seems we should also discourage the neon alternatives as there might be 
extra movement between the two register sets which we don't want. 

Thanks,
Andrew


 
 Thanks,
 James
 
 ---
 gcc/
 
 2014-03-27  James Greenhalgh  james.greenha...@arm.com
 
* config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
vector registers.
 
 gcc/testsuite/
 2014-03-27  James Greenhalgh  james.greenha...@arm.com
 
* gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.
 0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch


Re: [Patch, Fortran] PR60576 Fix out-of-bounds problem

2014-03-28 Thread Paul Richard Thomas
Dear Tobias,

This is, of course, fine since it is 'obvious' (in my opinion at least).

Thanks for the patch

Paul

On 27 March 2014 21:05, Tobias Burnus bur...@net-b.de wrote:
 An early * PING* for this wrong-code issue.


 Tobias Burnus wrote:

 This patch fixes part of the problems of the PR. The problem is that one
 assigns an array descriptor to an assumed-rank array descriptor. The latter
 has for BT_CLASS the size of max_dim (reason: we have first the data array
 and than vtab). With true, one takes the TREE_TYPE from the LHS (i.e.
 the assumed-rank variable) and as the type determines how many bytes the
 range assignment copies, one reads max_dimension elements from the RHS array
 - which can be too much.

 Testcase: Already in the testsuite, even if it only fails under special
 conditions.

 Build and regtested on x86-64-gnu-linux.
 OK for the trunk and 4.8?

 Tobias

 PS: I haven't investigated the issues Jakub is seeing. With valgrind, they
 do not pop up and my attempt to build with all checking enabled, failed with
 configure or compile errors.





-- 
The knack of flying is learning how to throw yourself at the ground and miss.
   --Hitchhikers Guide to the Galaxy


[PATCH] S/390: Make S/390 a logical_op_short_circuit target

2014-03-28 Thread Andreas Krebbel
Hi,

S/390 does not not define LOGICAL_OP_NON_SHORT_CIRCUIT but its default
value depends on the branch cost.  On S/390 we set a branch cost of 1
which makes us a logical_op_short_circuit target.

This fixes the following testcases:

 FAIL: gcc.dg/binop-xor1.c scan-tree-dump-times optimized ^ 1
 FAIL: gcc.dg/tree-ssa/forwprop-28.c scan-tree-dump-times forwprop1 Replaced 
8
 FAIL: gcc.dg/tree-ssa/vrp87.c scan-tree-dump vrp2 Folded into: if.*
 FAIL: gcc.dg/tree-ssa/vrp87.c scan-tree-dump cddce2 Deleting.*_Bool.*;

Bye,

-Andreas-

2014-03-28  Andreas Krebbel  andreas.kreb...@de.ibm.com

* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Remove s390 special
  option.
* lib/target-supports.exp: Return true for s390
  in check_effective_logical_op_short_circuit.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
index 1e46634..cafdf13 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */ 
 /* { dg-options -O2 -fdump-tree-dom1-details } */
-/* { dg-additional-options -mbranch-cost=2 { target s390*-*-* } } */
 struct bitmap_head_def;
 typedef struct bitmap_head_def *bitmap;
 typedef const struct bitmap_head_def *const_bitmap;
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index bee8471..0d2ccd5 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5712,6 +5712,7 @@ proc check_effective_target_logical_op_short_circuit {} {
 || [istarget arc*-*-*]
 || [istarget avr*-*-*]
 || [istarget crisv32-*-*] || [istarget cris-*-*]
+|| [istarget s390*-*-*]
 || [check_effective_target_arm_cortex_m] } {
return 1
 }



[PATCH] g++.dg: add ipa.exp file

2014-03-28 Thread Martin Liška

Hi,
   I would like to add corresponding ipa.exp file for g++ that let me 
run: make -k check RUNTESTFLAGS=ipa.exp


Changelog:

2014-03-28  Martin Liska  mli...@suse.cz

* g++.dg/ipa.epx: Anologous file added to g++.dg folder.

OK for trunk?

Thank you,
Martin
diff --git a/gcc/testsuite/g++.dg/ipa/ipa.exp b/gcc/testsuite/g++.dg/ipa/ipa.exp
new file mode 100644
index 000..af7b8a7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/ipa.exp
@@ -0,0 +1,35 @@
+# Copyright (C) 1997-2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# http://www.gnu.org/licenses/.
+
+# G++ testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib g++-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CXXFLAGS
+if ![info exists DEFAULT_CXXFLAGS] then {
+set DEFAULT_CXXFLAGS  -pedantic-errors -Wno-long-long
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[C\]]]  $DEFAULT_CXXFLAGS
+
+# All done.
+dg-finish


Re: [PATCH] g++.dg: add ipa.exp file

2014-03-28 Thread Rainer Orth
Hi Martin,

 Hi,
I would like to add corresponding ipa.exp file for g++ that let me run:
 make -k check RUNTESTFLAGS=ipa.exp

 Changelog:

 2014-03-28  Martin Liska  mli...@suse.cz

 * g++.dg/ipa.epx: Anologous file added to g++.dg folder.

Two typos.  Besides, this should be

* g++.dg/ipa.exp: New file.

instead.

 diff --git a/gcc/testsuite/g++.dg/ipa/ipa.exp 
 b/gcc/testsuite/g++.dg/ipa/ipa.exp
 new file mode 100644
 index 000..af7b8a7
 --- /dev/null
 +++ b/gcc/testsuite/g++.dg/ipa/ipa.exp
 @@ -0,0 +1,35 @@
 +# Copyright (C) 1997-2014 Free Software Foundation, Inc.

Only 2014 here.

This isn't enough, though: you need to add the ipa/* files to
g++.dg/dg.exp to avoid running the ipa tests twice.

This isn't stage4 material, anyway.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Handle short reads and EINTR in lto-plugin/simple-object

2014-03-28 Thread Richard Biener
On Wed, 26 Mar 2014, Richard Biener wrote:

 On March 26, 2014 4:51:58 PM CET, Ian Lance Taylor i...@google.com wrote:
 On Wed, Mar 26, 2014 at 8:38 AM, Richard Biener rguent...@suse.de
 wrote:
 
  -  got = read (descriptor, buffer, size);
  -  if (got  0)
  +  do
   {
  -  *errmsg = read;
  -  *err = errno;
  -  return 0;
  +  got = read (descriptor, buffer, size);
  +  if (got  0
  +  errno != EINTR)
  +   {
  + *errmsg = read;
  + *err = errno;
  + return 0;
  +   }
  +  else
  +   {
  + buffer += got;
  + size -= got;
  +   }
 
 This appears to do the wrong thing if got  0  errno == EINTR.  In
 that case it should not add got to buffer and size.
 
 Uh, indeed. Will fix.
 
  -  if (offset != lseek (obj-file-fd, offset, SEEK_SET)
  -   || length != read (obj-file-fd, secdata, length))
  +  if (!simple_object_internal_read (obj-file-fd, offset,
  +   secdata, length, errmsg, err))
 
 Hmmm, internal_read is meant to be, well, internal.  It's not declared
 anywhere as far as I can see.
 
 I can duplicate the stuff as well.
 
 Are you really seeing EINTR reads here?  That seems very odd to me,
 since we are always just reading a local file.  But if you are seeing
 it, I guess we should handle it.
 
 Well, it's a shot in the dark... I definitely know short reads and EINTR 
 happens more in virtual machines though. So handling it is an improvement.
 
 I'll see if it fixes my problems and report back.

Ok, updated patch as below.  The patch _does_ fix the issues I run into
(well, previously 1 out of 4 compiles succeeded, now 4 out of 4 succeed,
whatever that proves ;))

LTO bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?

Thanks,
Richard.

2014-03-26  Richard Biener  rguent...@suse.de

libiberty/
* simple-object.c (simple_object_internal_read): Handle
EINTR and short reads.

lto-plugin/
* lto-plugin.c (process_symtab): Use simple_object_internal_read.

Index: libiberty/simple-object.c
===
--- libiberty/simple-object.c   (revision 208812)
+++ libiberty/simple-object.c   (working copy)
@@ -63,8 +63,6 @@ simple_object_internal_read (int descrip
 unsigned char *buffer, size_t size,
 const char **errmsg, int *err)
 {
-  ssize_t got;
-
   if (lseek (descriptor, offset, SEEK_SET)  0)
 {
   *errmsg = lseek;
@@ -72,15 +70,26 @@ simple_object_internal_read (int descrip
   return 0;
 }
 
-  got = read (descriptor, buffer, size);
-  if (got  0)
+  do
 {
-  *errmsg = read;
-  *err = errno;
-  return 0;
+  ssize_t got = read (descriptor, buffer, size);
+  if (got == 0)
+   break;
+  else if (got  0)
+   {
+ buffer += got;
+ size -= got;
+   }
+  else if (errno != EINTR)
+   {
+ *errmsg = read;
+ *err = errno;
+ return 0;
+   }
 }
+  while (size  0);
 
-  if ((size_t) got  size)
+  if (size  0)
 {
   *errmsg = file too short;
   *err = 0;
Index: lto-plugin/lto-plugin.c
===
--- lto-plugin/lto-plugin.c (revision 208812)
+++ lto-plugin/lto-plugin.c (working copy)
@@ -39,6 +39,7 @@ along with this program; see the file CO
 #include stdint.h
 #endif
 #include assert.h
+#include errno.h
 #include string.h
 #include stdlib.h
 #include stdio.h
@@ -817,7 +818,7 @@ process_symtab (void *data, const char *
 {
   struct plugin_objfile *obj = (struct plugin_objfile *)data;
   char *s;
-  char *secdata;
+  char *secdatastart, *secdata;
 
   if (strncmp (name, LTO_SECTION_PREFIX, LTO_SECTION_PREFIX_LEN) != 0)
 return 1;
@@ -825,23 +826,40 @@ process_symtab (void *data, const char *
   s = strrchr (name, '.');
   if (s)
 sscanf (s, .% PRI_LL x, obj-out-id);
-  secdata = xmalloc (length);
+  secdata = secdatastart = xmalloc (length);
   offset += obj-file-offset;
-  if (offset != lseek (obj-file-fd, offset, SEEK_SET)
-   || length != read (obj-file-fd, secdata, length))
+  if (offset != lseek (obj-file-fd, offset, SEEK_SET))
+goto err;
+
+  do
 {
-  if (message)
-   message (LDPL_FATAL, %s: corrupt object file, obj-file-name);
-  /* Force claim_file_handler to abandon this file.  */
-  obj-found = 0;
-  free (secdata);
-  return 0;
+  ssize_t got = read (obj-file-fd, secdata, length);
+  if (got == 0)
+   break;
+  else if (got  0)
+   {
+ secdata += got;
+ length -= got;
+   }
+  else if (errno != EINTR)
+   goto err;
 }
+  while (length  0);
+  if (length  0)
+goto err;
 
-  translate (secdata, secdata + length, obj-out);
+  translate (secdatastart, secdata, obj-out);
   obj-found++;
-  free (secdata);
+  free (secdatastart);
   return 1;
+
+err:
+ 

[Patch ARM] Fix A12 rule for arm-none-eabi / t-aprofile.

2014-03-28 Thread Ramana Radhakrishnan

Hi,


	This affects only arm-none-eabi targets and those using t-aprofile in 
their multilib lists. The problem here is that when the A12 support was 
added, we mistakenly added this to the MULTILIB_MATCHES rule for the A15 
rather than putting out a separate line for this.


Fixed thusly and verified that the correct multilibs are now chosen.

Applied to trunk as nearly obvious.

regards,
Ramana

2014-03-28  Ramana Radhakrishnan  ramana.radhakrish...@arm.com

  * config/arm/t-aprofile (MULTILIB_MATCHES): Correct A12 rule.


Index: gcc/config/arm/t-aprofile
===
--- gcc/config/arm/t-aprofile   (revision 208895)
+++ gcc/config/arm/t-aprofile   (working copy)
@@ -81,7 +81,8 @@ MULTILIB_EXCEPTIONS+= *march=armv7ve
 MULTILIB_MATCHES   += march?armv7-a=mcpu?cortex-a8
 MULTILIB_MATCHES   += march?armv7-a=mcpu?cortex-a9
 MULTILIB_MATCHES   += march?armv7-a=mcpu?cortex-a5
-MULTILIB_MATCHES   += march?armv7ve=mcpu?cortex-a15=mcpu?cortex-a12
+MULTILIB_MATCHES   += march?armv7ve=mcpu?cortex-a15
+MULTILIB_MATCHES   += march?armv7ve=mcpu?cortex-a12
 MULTILIB_MATCHES   += march?armv7ve=mcpu?cortex-a15.cortex-a7
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a53
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a57

--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.



[DOC PATCH] Clarify docs about stmt exprs (PR c/51088)

2014-03-28 Thread Marek Polacek
PR51088 contains some Really Bizzare code.  We should tell users
not to do any shenanigans like that.

Ok for trunk?

2014-03-28  Marek Polacek  pola...@redhat.com

PR c/51088
* doc/extend.texi (Statement Exprs): Add note about taking
addresses of labels inside of statement expressions.

diff --git gcc/doc/extend.texi gcc/doc/extend.texi
index f9114ab..215d0a2 100644
--- gcc/doc/extend.texi
+++ gcc/doc/extend.texi
@@ -206,6 +206,9 @@ Jumping into a statement expression with @code{goto} or 
using a
 @code{case} or @code{default} label inside the statement expression is
 not permitted.  Jumping into a statement expression with a computed
 @code{goto} (@pxref{Labels as Values}) has undefined behavior.
+Taking the address of a label declared inside of a statement
+expression from outside of the statement expression has undefined
+behavior.
 Jumping out of a statement expression is permitted, but if the
 statement expression is part of a larger expression then it is
 unspecified which other subexpressions of that expression have been

Marek


Re: C++ PATCH for c++/60566 (dtor devirtualization and missing thunks)

2014-03-28 Thread Rainer Orth
Jakub Jelinek ja...@redhat.com writes:

 Seems it fails on all ilp32 targets I've tried and succeeds on all lp64
 targets (including ia64), so I think we should do following.
 Ok for trunk?

Looks right to me, but I'd like to defer to Jason as the subject-matter
expert.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH][ARM/AArch64][1/2] Crypto intrinsics tuning for Cortex-A53 - type Attribute restructuring

2014-03-28 Thread Ramana Radhakrishnan
On Tue, Mar 25, 2014 at 3:51 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 This two-patch series adds scheduling information for the ARMv8-A Crypto
 instructions on the Cortex-A53.
 This first patch does some preliminary restructuring to allow the arm and
 aarch64 backends to share the is_neon_type attribute.

 It also splits the crypto_aes type into crypto_aese and crypto_aesmc since
 the aese/aesd and aesmc/aesimc instructions will be treated differently (in
 patch 2/2).

 This patch touches both arm and aarch64 backends since there's no clean way
 to split it into per-backend patches without breaking each one.

 Tested and bootstrapped on arm-none-linux-gnueabihf and on
 aarch64-none-linux-gnu.

 This patch is fairly uncontroversial and doesn't change functionality or
 code generation by itself.

 I'll leave it to the maintainers to decide when this should go in...

The real question is about patch #2. So this going in just depends on patch #2.



Ramana


 Thanks,
 Kyrill

 2014-03-25  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/aarch64/aarch64-simd.md (aarch64_crypto_aesaes_opv16qi):
 Use crypto_aese type.
 (aarch64_crypto_aesaesmc_opv16qi): Use crypto_aesmc type.
 * config/arm/arm.md (is_neon_type): Replace crypto_aes with
 crypto_aese, crypto_aesmc.  Move to types.md.
 * config/arm/types.md (crypto_aes): Split into crypto_aese,
 crypto_aesmc.
 * config/arm/iterators.md (crypto_type): Likewise.


[AArch64/ARM 0/3] Patch series for TRN Intrinsics

2014-03-28 Thread Alan Lawrence
Much like the ZIP and UZP intrinsics, the vtrn[q]_* intrinsics are implemented 
with inline __asm__, which blocks compiler analysis. This series replaces those 
calls with __builtin_shuffle, which produce the same** assembler instructions.


** except for two-element vectors, where UZP, ZIP and TRN are all equivalent and 
the backend chooses to output ZIP.


The first patch adds a bunch of tests, passing for the current asm 
implementation;
the second patch reimplements with __builtin_shuffle;
the third patch adds equivalent ARM tests using test bodies shared from the 
first patch.


OK for stage 1?

Cheers, Alan



patch to fix PR60675

2014-03-28 Thread Vladimir Makarov

The following patch fixes

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60675

LRA assigned hard reg 30 to TImode subreg of DImode pseudo but it was 
wrong as hard reg 31 is unavailable for the allocation.


The patch was bootstrapped and tested on x86-64 and aarch64.

Committed as rev. 208900.

2014-03-28  Vladimir Makarov  vmaka...@redhat.com

PR target/60675
* lra-assigns.c (find_hard_regno_for): Remove unavailable hard
regs from checking multi-reg pseudos.

2014-03-28  Vladimir Makarov  vmaka...@redhat.com

PR target/60675
* gcc.target/aarch64/pr60675.C: New.

Index: lra-assigns.c
===
--- lra-assigns.c   (revision 208895)
+++ lra-assigns.c   (working copy)
@@ -473,7 +473,7 @@ find_hard_regno_for (int regno, int *cos
   enum reg_class rclass;
   bitmap_iterator bi;
   bool *rclass_intersect_p;
-  HARD_REG_SET impossible_start_hard_regs;
+  HARD_REG_SET impossible_start_hard_regs, available_regs;
 
   COPY_HARD_REG_SET (conflict_set, lra_no_alloc_regs);
   rclass = regno_allocno_class_array[regno];
@@ -586,6 +586,8 @@ find_hard_regno_for (int regno, int *cos
   biggest_nregs = hard_regno_nregs[hard_regno][biggest_mode];
   nregs_diff = (biggest_nregs
- hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (regno)]);
+  COPY_HARD_REG_SET (available_regs, reg_class_contents[rclass]);
+  AND_COMPL_HARD_REG_SET (available_regs, lra_no_alloc_regs);
   for (i = 0; i  rclass_size; i++)
 {
   if (try_only_hard_regno = 0)
@@ -601,9 +603,9 @@ find_hard_regno_for (int regno, int *cos
   (nregs_diff == 0
  || (WORDS_BIG_ENDIAN
  ? (hard_regno - nregs_diff = 0
- TEST_HARD_REG_BIT (reg_class_contents[rclass],
+ TEST_HARD_REG_BIT (available_regs,
   hard_regno - nregs_diff))
- : TEST_HARD_REG_BIT (reg_class_contents[rclass],
+ : TEST_HARD_REG_BIT (available_regs,
   hard_regno + nregs_diff
{
  if (hard_regno_costs_check[hard_regno]
Index: testsuite/gcc.target/aarch64/pr60675.C
===
--- testsuite/gcc.target/aarch64/pr60675.C  (revision 0)
+++ testsuite/gcc.target/aarch64/pr60675.C  (working copy)
@@ -0,0 +1,277 @@
+/* { dg-do compile } */
+/* { dg-options -std=c++11 -w -O2 -fPIC } */
+namespace CLHEP {
+  static const double meter = 1000.*10;
+  static const double meter2 = meter*meter;
+  static const double megaelectronvolt = 1. ;
+  static const double gigaelectronvolt = 1.e+3;
+  static const double GeV = gigaelectronvolt;
+  static const double megavolt = megaelectronvolt;
+  static const double volt = 1.e-6*megavolt;
+  static const double tesla = volt*1.e+9/meter2;
+}
+   using CLHEP::GeV;
+   using CLHEP::tesla;
+   namespace std {
+  typedef long int ptrdiff_t;
+}
+   extern C {
+extern double cos (double __x) throw ();
+extern double sin (double __x) throw ();
+extern double sqrt (double __x) throw ();
+}
+   namespace std __attribute__ ((__visibility__ (default))) {
+  using ::cos;
+  using ::sin;
+  using ::sqrt;
+  templateclass _CharT struct char_traits;
+  templatetypename _CharT, typename _Traits = char_traits_CharT  
struct basic_ostream;
+  typedef basic_ostreamchar ostream;
+  templatetypename _Iterator struct iterator_traits {  };
+  templatetypename _Tp struct iterator_traits_Tp* {
+typedef ptrdiff_t difference_type;
+typedef _Tp reference;
+  };
+}
+   namespace __gnu_cxx __attribute__ ((__visibility__ (default))) {
+  using std::iterator_traits;
+  templatetypename _Iterator, typename _Container struct 
__normal_iterator {
+_Iterator _M_current;
+typedef iterator_traits_Iterator __traits_type;
+typedef typename __traits_type::difference_type difference_type;
+typedef typename __traits_type::reference reference;
+explicit   __normal_iterator(const _Iterator __i)   : 
_M_current(__i) {  }
+reference   operator*() const   {
+  return *_M_current;
+  }
+__normal_iterator   operator+(difference_type __n) const   {
+  return __normal_iterator(_M_current + __n);
+  }
+  };
+  templatetypename _Tp struct new_allocator {
+  };
+}
+   namespace std __attribute__ ((__visibility__ (default))) {
+  templatetypename _Tp struct allocator: public 
__gnu_cxx::new_allocator_Tp {
+};
+  struct ios_base   {  };
+  templatetypename _CharT, typename _Traits struct basic_ios : 
public ios_base {  };
+  templatetypename _CharT, typename _Traits struct basic_ostream : 
virtual public 

Re: [PATCH][ARM/AArch64][2/2] Crypto intrinsics tuning for Cortex-A53 - pipeline description

2014-03-28 Thread Richard Biener
On Fri, 28 Mar 2014, Ramana Radhakrishnan wrote:

 On Tue, Mar 25, 2014 at 3:52 PM, Kyrill Tkachov kyrylo.tkac...@arm.com 
 wrote:
  Hi all,
 
  In ARMv8-A there's a general expectation that AESE/AESMC and AESD/AESIMC
  sequences of the form:
 
  AESE Vn, _
  AESMC Vn, Vn
 
  will issue both instructions in a single cycle on super-scalar
  implementations. It would be nice to model that in our pipeline
  descriptions. This patch defines a function to detect such pairs and uses it
  in the pipeline description for these instructions for the Cortex-A53.
 
  The patch also adds some missed AdvancedSIMD information to the pipeline
  description for the Cortex-A53.
 
  Bootstrapped and tested on arm-none-linux-gnueabihf and
  aarch64-none-linux-gnu.
 
  Cortex-A53 scheduling is the default scheduling description on aarch64 so
  this patch can change default behaviour. That's an argument for taking this
  in stage1 or maybe backporting it into 4.9.1 once the release is made.
 
 
 To my mind on ARM / AArch64 this actually helps anyone using the
 crypto intrinsics on A53 hardware today and it would be good to get
 this into 4.9. Again I perceive this as low risk on ARM (AArch32) as
 this is not a  default tuning option for any large software vendors,
 the folks using this are typically the ones that write the more
 specialized crypto intrinsics rather than just general purpose code.
 However this will help with scheduling on what is essentially an
 in-order core, so would be nice to have.
 
 This would definitely need approval from the AArch64 maintainers and
 the RMs to go in at this stage.
 
 If not, we should consider this for 4.9.1

I'd rather have it in 4.9.0 than 4.9.1.

Richard.

 regards
 Ramana
 
 
  What do people think?
 
  Thanks,
  Kyrill
 
 
  2014-03-25  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
  * config/arm/aarch-common.c (aarch_crypto_can_dual_issue): New function.
  * config/arm/aarch-common-protos.h (aarch_crypto_can_dual_issue):
  Declare
  extern.
  * config/arm/cortex-a53.md: Add reservations and bypass for crypto
  instructions as well as AdvancedSIMD loads.
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer


Re: [PATCH] g++.dg: add ipa.exp file

2014-03-28 Thread Richard Biener
On Fri, Mar 28, 2014 at 2:05 PM, Martin Liška mli...@suse.cz wrote:
 Hi,
I would like to add corresponding ipa.exp file for g++ that let me run:
 make -k check RUNTESTFLAGS=ipa.exp

You can use

RUNTESTFLAGS=dg.exp=ipa/*.C

Richard.

 Changelog:

 2014-03-28  Martin Liska  mli...@suse.cz

 * g++.dg/ipa.epx: Anologous file added to g++.dg folder.

 OK for trunk?

 Thank you,
 Martin


Re: stray warning from gcc's cpp

2014-03-28 Thread Andriy Gapon
on 19/03/2014 12:03 Andriy Gapon said the following:
 
 I observe the following minor annoyance on FreeBSD systems where cpp is GCC's
 cpp.  If a DTrace script has the following shebang line:
 #!/usr/sbin/dtrace -Cs
 then the following warning is produced when the script is run:
 cc1: warning:  is shorter than expected
 
 Some details.  dtrace(1) first forks. Then a child seeks on a file descriptor
 associated with the script file, so that the shebang line is skipped (because
 otherwise it would confuse cpp).  Then the child makes the file descriptor its
 standard input and then it execs cpp.  cpp performs fstat(2) on its standard
 input descriptor and determines that it points to a regular file.  Then it
 verifies that a number of bytes it reads from the file is the same as a size 
 of
 the file.  The check makes sense if the file is opened by cpp itself, but it
 does not always make sense for the stdin as described above.
 
 The following patch seems to fix the issue, but perhaps there is a better /
 smarter alternative.

A patch that implements a different approach has been committed in FreeBSD:
https://github.com/freebsd/freebsd/commit/6ceecddbc
Please consider.  Thanks!

 --- a/libcpp/files.c
 +++ b/libcpp/files.c
 @@ -601,7 +601,8 @@ read_file_guts (cpp_reader *pfile, _cpp_file *file)
return false;
  }
 
 -  if (regular  total != size  STAT_SIZE_RELIABLE (file-st))
 +  if (regular  total != size  file-fd != 0
 +   STAT_SIZE_RELIABLE (file-st))
  cpp_error (pfile, CPP_DL_WARNING,
  %s is shorter than expected, file-path);
 
 


-- 
Andriy Gapon


Re: C++ PATCH for c++/60566 (dtor devirtualization and missing thunks)

2014-03-28 Thread Jason Merrill

On 03/28/2014 06:31 AM, Jakub Jelinek wrote:

Ok for trunk?


Yes, thanks.

Jason




Re: [C++ patch] for C++/52369

2014-03-28 Thread Jason Merrill

On 03/27/2014 05:32 PM, Fabien Chêne wrote:

+ permerror (DECL_SOURCE_LOCATION (current_function_decl),
+uninitialized reference member in %q#T, type);
+ inform (DECL_SOURCE_LOCATION (member),
+ %q#D should be initialized, member);


The inform should only happen if permerror returns true (i.e. without 
-fpermissive -w).  OK with that change.


Jason



Re: [PATCH][AArch64][2/3] Recognise rev16 operations on SImode and DImode data

2014-03-28 Thread Marcus Shawcroft
On 19 March 2014 09:55, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

 [gcc/]
 2014-03-19  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/aarch64/aarch64.md (rev16mode2): New pattern.
 (rev16mode2_alt): Likewise.
 * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle rev16 case.
 * config/arm/aarch-common.c (aarch_rev16_shright_mask_imm_p): New.
 (aarch_rev16_shleft_mask_imm_p): Likewise.
 (aarch_rev16_p_1): Likewise.
 (aarch_rev16_p): Likewise.
 * config/arm/aarch-common-protos.h (aarch_rev16_p): Declare extern.
 (aarch_rev16_shright_mask_imm_p): Likewise.
 (aarch_rev16_shleft_mask_imm_p): Likewise.

 [gcc/testsuite/]
 2014-03-19  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * gcc.target/aarch64/rev16_1.c: New test.

The aarch64 parts are OK for stage-1.
/Marcus


Re: [PATCH] Fix PR c++/60573

2014-03-28 Thread Jason Merrill

OK.

Jason


Re: [PATCH][AArch64] Add handling of bswap operations in rtx costs

2014-03-28 Thread Marcus Shawcroft
On 19 March 2014 09:56, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 This patch depends on the series started at
 http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00933.html but is not really a
 part of it. It just adds costing of the bswap operation using the new rev
 field in the rtx cost tables since we have patterns in aarch64.md that
 handle bswap by generating rev16 instructions.

 Tested aarch64-none-elf.

 Ok for stage1 after that series goes in?

 2014-03-19  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle BSWAP.

This is OK in stage 1.

/Marcus


Re: [PATCH][ARM/AArch64][1/2] Crypto intrinsics tuning for Cortex-A53 - type Attribute restructuring

2014-03-28 Thread Kyrill Tkachov

On 28/03/14 14:18, Ramana Radhakrishnan wrote:

On Tue, Mar 25, 2014 at 3:51 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

Hi all,

This two-patch series adds scheduling information for the ARMv8-A Crypto
instructions on the Cortex-A53.
This first patch does some preliminary restructuring to allow the arm and
aarch64 backends to share the is_neon_type attribute.

It also splits the crypto_aes type into crypto_aese and crypto_aesmc since
the aese/aesd and aesmc/aesimc instructions will be treated differently (in
patch 2/2).

This patch touches both arm and aarch64 backends since there's no clean way
to split it into per-backend patches without breaking each one.

Tested and bootstrapped on arm-none-linux-gnueabihf and on
aarch64-none-linux-gnu.

This patch is fairly uncontroversial and doesn't change functionality or
code generation by itself.

I'll leave it to the maintainers to decide when this should go in...

The real question is about patch #2. So this going in just depends on patch #2.


#2 has been ok'd. Can I take this as an approval for this patch?

Kyrill




Ramana


Thanks,
Kyrill

2014-03-25  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/aarch64/aarch64-simd.md (aarch64_crypto_aesaes_opv16qi):
 Use crypto_aese type.
 (aarch64_crypto_aesaesmc_opv16qi): Use crypto_aesmc type.
 * config/arm/arm.md (is_neon_type): Replace crypto_aes with
 crypto_aese, crypto_aesmc.  Move to types.md.
 * config/arm/types.md (crypto_aes): Split into crypto_aese,
 crypto_aesmc.
 * config/arm/iterators.md (crypto_type): Likewise.





Re: [PATCH][ARM/AArch64][2/2] Crypto intrinsics tuning for Cortex-A53 - pipeline description

2014-03-28 Thread Jakub Jelinek
On Fri, Mar 28, 2014 at 04:52:39PM +, Marcus Shawcroft wrote:
 On 28 March 2014 14:52, Ramana Radhakrishnan ramana@googlemail.com 
 wrote:
 
  To my mind on ARM / AArch64 this actually helps anyone using the
  crypto intrinsics on A53 hardware today and it would be good to get
  this into 4.9. Again I perceive this as low risk on ARM (AArch32) as
  this is not a  default tuning option for any large software vendors,
  the folks using this are typically the ones that write the more
  specialized crypto intrinsics rather than just general purpose code.
  However this will help with scheduling on what is essentially an
  in-order core, so would be nice to have.
 
  This would definitely need approval from the AArch64 maintainers and
  the RMs to go in at this stage.
 
  If not, we should consider this for 4.9.1
 
  regards
  Ramana
 
 
  What do people think?
 
 Its low risk, I think it should go in now.

Ok with me.

Jakub


Re: [PATCH, PR 60647] Check that actual argument types match those of formal parameters before IPA-SRA

2014-03-28 Thread Jakub Jelinek
On Fri, Mar 28, 2014 at 05:35:12PM +0100, Martin Jambor wrote:
 after much confusion on my part, this is the proper fix for PR 60647.
 IPA-SRA can get confused a lot when a formal parameter is a pointer
 but the corresponding actual argument is not.  So this patch adds a
 check that their types pass useless_type_conversion_p check.
 
 Bootstrapped and tested on x86_64-linux.  OK for trunk?

Ok, thanks.

 2014-03-28  Martin Jambor  mjam...@suse.cz
 
   PR middle-end/60647
   * tree-sra.c (callsite_has_enough_arguments_p): Renamed to
   callsite_arguments_match_p.  Updated all callers.  Also check types of
   corresponding formal parameters and actual arguments.
   (not_all_callers_have_enough_arguments_p) Renamed to
   some_callers_have_mismatched_arguments_p.
 
 testsuite/
   * gcc.dg/pr60647-1.c: New test.
   * gcc.dg/pr60647-2.c: Likewise.

Jakub


Re: [PATCH][ARM/AArch64][2/2] Crypto intrinsics tuning for Cortex-A53 - pipeline description

2014-03-28 Thread Marcus Shawcroft
On 28 March 2014 14:52, Ramana Radhakrishnan ramana@googlemail.com wrote:

 To my mind on ARM / AArch64 this actually helps anyone using the
 crypto intrinsics on A53 hardware today and it would be good to get
 this into 4.9. Again I perceive this as low risk on ARM (AArch32) as
 this is not a  default tuning option for any large software vendors,
 the folks using this are typically the ones that write the more
 specialized crypto intrinsics rather than just general purpose code.
 However this will help with scheduling on what is essentially an
 in-order core, so would be nice to have.

 This would definitely need approval from the AArch64 maintainers and
 the RMs to go in at this stage.

 If not, we should consider this for 4.9.1

 regards
 Ramana


 What do people think?

Its low risk, I think it should go in now.

/Marcus


Re: [PATCH][ARM/AArch64][1/2] Crypto intrinsics tuning for Cortex-A53 - type Attribute restructuring

2014-03-28 Thread Ramana Radhakrishnan
On Fri, Mar 28, 2014 at 5:18 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 On 28/03/14 14:18, Ramana Radhakrishnan wrote:

 On Tue, Mar 25, 2014 at 3:51 PM, Kyrill Tkachov kyrylo.tkac...@arm.com
 wrote:

 Hi all,

 This two-patch series adds scheduling information for the ARMv8-A Crypto
 instructions on the Cortex-A53.
 This first patch does some preliminary restructuring to allow the arm and
 aarch64 backends to share the is_neon_type attribute.

 It also splits the crypto_aes type into crypto_aese and crypto_aesmc
 since
 the aese/aesd and aesmc/aesimc instructions will be treated differently
 (in
 patch 2/2).

 This patch touches both arm and aarch64 backends since there's no clean
 way
 to split it into per-backend patches without breaking each one.

 Tested and bootstrapped on arm-none-linux-gnueabihf and on
 aarch64-none-linux-gnu.

 This patch is fairly uncontroversial and doesn't change functionality or
 code generation by itself.

 I'll leave it to the maintainers to decide when this should go in...

 The real question is about patch #2. So this going in just depends on
 patch #2.


 #2 has been ok'd. Can I take this as an approval for this patch?

Yes -

Ramana



 Kyrill




 Ramana

 Thanks,
 Kyrill

 2014-03-25  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * config/aarch64/aarch64-simd.md (aarch64_crypto_aesaes_opv16qi):
  Use crypto_aese type.
  (aarch64_crypto_aesaesmc_opv16qi): Use crypto_aesmc type.
  * config/arm/arm.md (is_neon_type): Replace crypto_aes with
  crypto_aese, crypto_aesmc.  Move to types.md.
  * config/arm/types.md (crypto_aes): Split into crypto_aese,
  crypto_aesmc.
  * config/arm/iterators.md (crypto_type): Likewise.





UBSan fix: avoid undefined behaviour in bitmask

2014-03-28 Thread Andrew Haley
UBSan detected that we were trying to set a non-existent bit in a mask.

I don't think it has mattered before now because when this happens the
value in the mask is not used.  However, better safe than sorry.

Andrew.


2014-03-28  Andrew Haley  a...@redhat.com

* boehm.c (mark_reference_fields): Avoid unsigned integer overflow
when calculating an index into a bitmap descriptor.

Index: gcc/java/boehm.c
===
--- gcc/java/boehm.c(revision 208839)
+++ gcc/java/boehm.c(working copy)
@@ -107,7 +107,11 @@
 bits for all words in the record. This is conservative, but the
 size_words != 1 case is impossible in regular java code. */
  for (i = 0; i  size_words; ++i)
-   *mask = (*mask).set_bit (ubit - count - i - 1);
+   {
+ int bitpos = ubit - count - i - 1;
+ if (bitpos = 0)
+   *mask = (*mask).set_bit (bitpos);
+   }

  if (count = ubit - 2)
*pointer_after_end = 1;


[PATCH, PR 60647] Check that actual argument types match those of formal parameters before IPA-SRA

2014-03-28 Thread Martin Jambor
Hi,

after much confusion on my part, this is the proper fix for PR 60647.
IPA-SRA can get confused a lot when a formal parameter is a pointer
but the corresponding actual argument is not.  So this patch adds a
check that their types pass useless_type_conversion_p check.

Bootstrapped and tested on x86_64-linux.  OK for trunk?

Thanks,

Martin


2014-03-28  Martin Jambor  mjam...@suse.cz

PR middle-end/60647
* tree-sra.c (callsite_has_enough_arguments_p): Renamed to
callsite_arguments_match_p.  Updated all callers.  Also check types of
corresponding formal parameters and actual arguments.
(not_all_callers_have_enough_arguments_p) Renamed to
some_callers_have_mismatched_arguments_p.

testsuite/
* gcc.dg/pr60647-1.c: New test.
* gcc.dg/pr60647-2.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/pr60647-1.c b/gcc/testsuite/gcc.dg/pr60647-1.c
new file mode 100644
index 000..73ea856
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr60647-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+struct _wincore
+{
+  int y;
+  int width;
+};
+int a;
+static fn1 (dpy, winInfo) struct _XDisplay *dpy;
+struct _wincore *winInfo;
+{
+  a = winInfo-width;
+  fn2 ();
+}
+
+static fn3 (dpy, winInfo, visrgn) struct _XDisplay *dpy;
+{
+  int b = fn1 (0, winInfo);
+  fn4 (0, 0, visrgn);
+}
+
+fn5 (event) struct _XEvent *event;
+{
+  fn3 (0, 0, 0);
+}
diff --git a/gcc/testsuite/gcc.dg/pr60647-2.c b/gcc/testsuite/gcc.dg/pr60647-2.c
new file mode 100644
index 000..ddeb117
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr60647-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+struct _wincore
+{
+  int width, height;
+};
+
+static void
+foo (void *dpy, struct _wincore *winInfo, int offset)
+{
+  fn1 (winInfo-height);
+}
+
+static void
+bar (void *dpy, int winInfo, int *visrgn)
+{
+  ((void (*) (void *, int, int)) foo) ((void *) 0, winInfo, 0);  /* { 
dg-warning function called through a non-compatible type } */
+  fn2 (0, 0, visrgn);
+}
+
+void
+baz (void *dpy, int win, int prop)
+{
+  bar ((void *) 0, 0, (int *) 0);
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 284d544..ffef13d 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1234,12 +1234,26 @@ asm_visit_addr (gimple, tree op, tree, void *)
 }
 
 /* Return true iff callsite CALL has at least as many actual arguments as there
-   are formal parameters of the function currently processed by IPA-SRA.  */
+   are formal parameters of the function currently processed by IPA-SRA and
+   that their types match.  */
 
 static inline bool
-callsite_has_enough_arguments_p (gimple call)
+callsite_arguments_match_p (gimple call)
 {
-  return gimple_call_num_args (call) = (unsigned) func_param_count;
+  if (gimple_call_num_args (call)  (unsigned) func_param_count)
+return false;
+
+  tree parm;
+  int i;
+  for (parm = DECL_ARGUMENTS (current_function_decl), i = 0;
+   parm;
+   parm = DECL_CHAIN (parm), i++)
+{
+  tree arg = gimple_call_arg (call, i);
+  if (!useless_type_conversion_p (TREE_TYPE (parm), TREE_TYPE (arg)))
+   return false;
+}
+  return true;
 }
 
 /* Scan function and look for interesting expressions and create access
@@ -1294,7 +1308,7 @@ scan_function (void)
  if (recursive_call_p (current_function_decl, dest))
{
  encountered_recursive_call = true;
- if (!callsite_has_enough_arguments_p (stmt))
+ if (!callsite_arguments_match_p (stmt))
encountered_unchangable_recursive_call = true;
}
}
@@ -4750,16 +4764,17 @@ sra_ipa_reset_debug_stmts (ipa_parm_adjustment_vec 
adjustments)
 }
 }
 
-/* Return false iff all callers have at least as many actual arguments as there
-   are formal parameters in the current function.  */
+/* Return false if all callers have at least as many actual arguments as there
+   are formal parameters in the current function and that their types
+   match.  */
 
 static bool
-not_all_callers_have_enough_arguments_p (struct cgraph_node *node,
-void *data ATTRIBUTE_UNUSED)
+some_callers_have_mismatched_arguments_p (struct cgraph_node *node,
+ void *data ATTRIBUTE_UNUSED)
 {
   struct cgraph_edge *cs;
   for (cs = node-callers; cs; cs = cs-next_caller)
-if (!callsite_has_enough_arguments_p (cs-call_stmt))
+if (!callsite_arguments_match_p (cs-call_stmt))
   return true;
 
   return false;
@@ -4970,12 +4985,13 @@ ipa_early_sra (void)
   goto simple_out;
 }
 
-  if (cgraph_for_node_and_aliases (node, 
not_all_callers_have_enough_arguments_p,
+  if (cgraph_for_node_and_aliases (node,
+  some_callers_have_mismatched_arguments_p,
   NULL, true))
 {
   if 

Re: [DOC PATCH] Clarify docs about stmt exprs (PR c/51088)

2014-03-28 Thread Joseph S. Myers
On Fri, 28 Mar 2014, Marek Polacek wrote:

 PR51088 contains some Really Bizzare code.  We should tell users
 not to do any shenanigans like that.
 
 Ok for trunk?

I don't think a doc patch resolves this bug.  The compiler should never 
generate code with an undefined reference to a local label like that; 
either the code should get a compile-time error (that's what I suggest), 
or it should generate output that links but has undefined behavior at 
runtime.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH, PR 60640] When creating virtual clones, clone thunks too

2014-03-28 Thread Martin Jambor
Hi,

this patch fixes PR 60640 by creating thunks to clones when that is
necessary to properly redirect edges to them.  I mostly does what
cgraph_add_thunk does and what analyze_function does to thunks.  It
fixes the testcases on trunk (it does not apply to 4.8, I have not
looked how easily fixable that it) and passes bootstrap and testing on
x86_64-linux.

OK for trunk?

Thanks,

Martin


2014-03-26  Martin Jambor  mjam...@suse.cz

* cgraph.h (cgraph_clone_node): New parameter added to declaration.
Adjust all callers.
* cgraphclones.c (build_function_type_skip_args): Moved upwards in the
file.
(build_function_decl_skip_args): Likewise.
(duplicate_thunk_for_node): New function.
(redirect_edge_duplicating_thunks): Likewise.
(cgraph_clone_node): New parameter args_to_skip, pass it to
redirect_edge_duplicating_thunks which is called instead of
cgraph_redirect_edge_callee.
(cgraph_create_virtual_clone): Pass args_to_skip to cgraph_clone_node.

testsuite/
* g++.dg/ipa/pr60640-1.C: New test.
* g++.dg/ipa/pr60640-2.C: Likewise.

Index: src/gcc/cgraph.h
===
--- src.orig/gcc/cgraph.h
+++ src/gcc/cgraph.h
@@ -890,7 +890,7 @@ struct cgraph_edge * cgraph_clone_edge (
unsigned, gcov_type, int, bool);
 struct cgraph_node * cgraph_clone_node (struct cgraph_node *, tree, gcov_type,
int, bool, veccgraph_edge_p,
-   bool, struct cgraph_node *);
+   bool, struct cgraph_node *, bitmap);
 tree clone_function_name (tree decl, const char *);
 struct cgraph_node * cgraph_create_virtual_clone (struct cgraph_node *old_node,
  veccgraph_edge_p,
Index: src/gcc/cgraphclones.c
===
--- src.orig/gcc/cgraphclones.c
+++ src/gcc/cgraphclones.c
@@ -168,6 +168,183 @@ cgraph_clone_edge (struct cgraph_edge *e
   return new_edge;
 }
 
+/* Build variant of function type ORIG_TYPE skipping ARGS_TO_SKIP and the
+   return value if SKIP_RETURN is true.  */
+
+static tree
+build_function_type_skip_args (tree orig_type, bitmap args_to_skip,
+  bool skip_return)
+{
+  tree new_type = NULL;
+  tree args, new_args = NULL, t;
+  tree new_reversed;
+  int i = 0;
+
+  for (args = TYPE_ARG_TYPES (orig_type); args  args != void_list_node;
+   args = TREE_CHAIN (args), i++)
+if (!args_to_skip || !bitmap_bit_p (args_to_skip, i))
+  new_args = tree_cons (NULL_TREE, TREE_VALUE (args), new_args);
+
+  new_reversed = nreverse (new_args);
+  if (args)
+{
+  if (new_reversed)
+TREE_CHAIN (new_args) = void_list_node;
+  else
+   new_reversed = void_list_node;
+}
+
+  /* Use copy_node to preserve as much as possible from original type
+ (debug info, attribute lists etc.)
+ Exception is METHOD_TYPEs must have THIS argument.
+ When we are asked to remove it, we need to build new FUNCTION_TYPE
+ instead.  */
+  if (TREE_CODE (orig_type) != METHOD_TYPE
+  || !args_to_skip
+  || !bitmap_bit_p (args_to_skip, 0))
+{
+  new_type = build_distinct_type_copy (orig_type);
+  TYPE_ARG_TYPES (new_type) = new_reversed;
+}
+  else
+{
+  new_type
+= build_distinct_type_copy (build_function_type (TREE_TYPE (orig_type),
+new_reversed));
+  TYPE_CONTEXT (new_type) = TYPE_CONTEXT (orig_type);
+}
+
+  if (skip_return)
+TREE_TYPE (new_type) = void_type_node;
+
+  /* This is a new type, not a copy of an old type.  Need to reassociate
+ variants.  We can handle everything except the main variant lazily.  */
+  t = TYPE_MAIN_VARIANT (orig_type);
+  if (t != orig_type)
+{
+  t = build_function_type_skip_args (t, args_to_skip, skip_return);
+  TYPE_MAIN_VARIANT (new_type) = t;
+  TYPE_NEXT_VARIANT (new_type) = TYPE_NEXT_VARIANT (t);
+  TYPE_NEXT_VARIANT (t) = new_type;
+}
+  else
+{
+  TYPE_MAIN_VARIANT (new_type) = new_type;
+  TYPE_NEXT_VARIANT (new_type) = NULL;
+}
+
+  return new_type;
+}
+
+/* Build variant of function decl ORIG_DECL skipping ARGS_TO_SKIP and the
+   return value if SKIP_RETURN is true.
+
+   Arguments from DECL_ARGUMENTS list can't be removed now, since they are
+   linked by TREE_CHAIN directly.  The caller is responsible for eliminating
+   them when they are being duplicated (i.e. copy_arguments_for_versioning).  
*/
+
+static tree
+build_function_decl_skip_args (tree orig_decl, bitmap args_to_skip,
+  bool skip_return)
+{
+  tree new_decl = copy_node (orig_decl);
+  tree new_type;
+
+  new_type = TREE_TYPE (orig_decl);
+  if (prototype_p (new_type)
+  || (skip_return  !VOID_TYPE_P 

Changing INT to SI mode

2014-03-28 Thread K_s, Vishnu
Test pr59940.c is failing for AVR target because the test assumes the size
of int as 32 bit and test expect to generate warnings for overflow and
conversion while assigning 36-bit and 32 bit value respectively to variable si.
Following patch define a 32 bit type with SI mode and use it.

2014-03-28  Vishnu K S vishnu@atmel.com 

* gcc/testsuite/gcc.dg/pr59940.c: Using 32-bit SI mode instead of int


diff --git a/gcc/testsuite/gcc.dg/pr59940.c b/gcc/testsuite/gcc.dg/pr59940.c
index b0fd17f..21d93ad 100644
--- a/gcc/testsuite/gcc.dg/pr59940.c
+++ b/gcc/testsuite/gcc.dg/pr59940.c
@@ -3,11 +3,12 @@
 /* { dg-options -Wconversion -Woverflow } */

 int f (unsigned int);
+typedef sitype __attribute__((mode(SI)));

 int
 g (void)
 {
-  int si = 12;
+  sitype si = 12;
   unsigned int ui = -1; /* { dg-warning 21:negative integer implicitly 
converted to unsigned type } */
   unsigned char uc;
   ui = si; /* { dg-warning 8:conversion } */



Re: [PATCH][ARM][1/3] Add rev field to rtx cost tables

2014-03-28 Thread Ramana Radhakrishnan
On Wed, Mar 19, 2014 at 9:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 In order to properly cost the rev16 instruction we need a new field in the
 cost tables.
 This patch adds that and specifies its value for the existing cost tables.
 Since rev16 is used to implement the BSWAP operation we add handling of that
 in the rtx cost function using the new field.

 Tested on arm-none-eabi and bootstrapped on an arm linux target.

 Does it look ok for stage1?

Ok for stage1 if no regressions.


 Thanks,
 Kyrill

 2014-03-19  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/arm/aarch-common-protos.h (alu_cost_table): Add rev field.
 * config/arm/aarch-cost-tables.h (generic_extra_costs): Specify
 rev cost.
 (cortex_a53_extra_costs): Likewise.
 (cortex_a57_extra_costs): Likewise.
 * config/arm/arm.c (cortexa9_extra_costs): Likewise.
 (cortexa7_extra_costs): Likewise.
 (cortexa12_extra_costs): Likewise.
 (cortexa15_extra_costs): Likewise.
 (v7m_extra_costs): Likewise.
 (arm_new_rtx_costs): Handle BSWAP.


Re: [PATCH][ARM][3/3] Recognise bitwise operations leading to SImode rev16

2014-03-28 Thread Ramana Radhakrishnan
On Wed, Mar 19, 2014 at 9:56 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 This is the arm equivalent of patch [2/3] in the series that adds combine
 patterns for the bitwise operations leading to a rev16 instruction.
 It reuses the functions that were put in aarch-common.c to properly cost
 these operations.

 I tried matching a DImode rev16 (with the intent of splitting it into two
 rev16 ops) like aarch64 but combine wouldn't try to match that bitwise
 pattern in DImode like aarch64 does. Instead it tries various exotic
 combinations with subregs.

 Tested arm-none-eabi, bootstrap on arm-none-linux-gnueabihf.

 Ok for stage1?

This is OK for stage1 .

Ramana


 [gcc/]
 2014-03-19  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/arm/arm.md (arm_rev16si2): New pattern.
 (arm_rev16si2_alt): Likewise.
 * config/arm/arm.c (arm_new_rtx_costs): Handle rev16 case.


 [gcc/testsuite/]
 2014-03-19  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * gcc.target/arm/rev16.c: New test.


[AArch64/ARM 1/3] Add execution + assembler tests of AArch64 TRN Intrinsics

2014-03-28 Thread Alan Lawrence

This adds DejaGNU tests of the existing AArch64 vuzp_* intrinsics, both checking
the assembler output and the runtime results. Test bodies are in separate files
ready to reuse for ARM in the third patch.

Putting these in a new subdirectory with the ZIP Intrinsics tests, using 
simd.exp added there (will commit ZIP tests first).


All tests passing on aarch64-none-elf and aarch64_be-none-elf.

testsuite/ChangeLog:
2012-03-28  Alan Lawrence  alan.lawre...@arm.com

* gcc.target/aarch64/simd/vtrnf32_1.c: New file.
* gcc.target/aarch64/simd/vtrnf32.x: New file.
* gcc.target/aarch64/simd/vtrnp16_1.c: New file.
* gcc.target/aarch64/simd/vtrnp16.x: New file.
* gcc.target/aarch64/simd/vtrnp8_1.c: New file.
* gcc.target/aarch64/simd/vtrnp8.x: New file.
* gcc.target/aarch64/simd/vtrnqf32_1.c: New file.
* gcc.target/aarch64/simd/vtrnqf32.x: New file.
* gcc.target/aarch64/simd/vtrnqp16_1.c: New file.
* gcc.target/aarch64/simd/vtrnqp16.x: New file.
* gcc.target/aarch64/simd/vtrnqp8_1.c: New file.
* gcc.target/aarch64/simd/vtrnqp8.x: New file.
* gcc.target/aarch64/simd/vtrnqs16_1.c: New file.
* gcc.target/aarch64/simd/vtrnqs16.x: New file.
* gcc.target/aarch64/simd/vtrnqs32_1.c: New file.
* gcc.target/aarch64/simd/vtrnqs32.x: New file.
* gcc.target/aarch64/simd/vtrnqs8_1.c: New file.
* gcc.target/aarch64/simd/vtrnqs8.x: New file.
* gcc.target/aarch64/simd/vtrnqu16_1.c: New file.
* gcc.target/aarch64/simd/vtrnqu16.x: New file.
* gcc.target/aarch64/simd/vtrnqu32_1.c: New file.
* gcc.target/aarch64/simd/vtrnqu32.x: New file.
* gcc.target/aarch64/simd/vtrnqu8_1.c: New file.
* gcc.target/aarch64/simd/vtrnqu8.x: New file.
* gcc.target/aarch64/simd/vtrns16_1.c: New file.
* gcc.target/aarch64/simd/vtrns16.x: New file.
* gcc.target/aarch64/simd/vtrns32_1.c: New file.
* gcc.target/aarch64/simd/vtrns32.x: New file.
* gcc.target/aarch64/simd/vtrns8_1.c: New file.
* gcc.target/aarch64/simd/vtrns8.x: New file.
* gcc.target/aarch64/simd/vtrnu16_1.c: New file.
* gcc.target/aarch64/simd/vtrnu16.x: New file.
* gcc.target/aarch64/simd/vtrnu32_1.c: New file.
* gcc.target/aarch64/simd/vtrnu32.x: New file.
* gcc.target/aarch64/simd/vtrnu8_1.c: New file.
* gcc.target/aarch64/simd/vtrnu8.x: New file.diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vtrnf32.x b/gcc/testsuite/gcc.target/aarch64/simd/vtrnf32.x
new file mode 100644
index 000..7b03e6b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vtrnf32.x
@@ -0,0 +1,27 @@
+extern void abort (void);
+
+float32x2x2_t
+test_vtrnf32 (float32x2_t _a, float32x2_t _b)
+{
+  return vtrn_f32 (_a, _b);
+}
+
+int
+main (int argc, char **argv)
+{
+  int i;
+  float32_t first[] = {1, 2};
+  float32_t second[] = {3, 4};
+  float32x2x2_t result = test_vtrnf32 (vld1_f32 (first), vld1_f32 (second));
+  float32x2_t res1 = result.val[0], res2 = result.val[1];
+  float32_t exp1[] = {1, 3};
+  float32_t exp2[] = {2, 4};
+  float32x2_t expected1 = vld1_f32 (exp1);
+  float32x2_t expected2 = vld1_f32 (exp2);
+
+  for (i = 0; i  2; i++)
+if ((res1[i] != expected1[i]) || (res2[i] != expected2[i]))
+  abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vtrnf32_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vtrnf32_1.c
new file mode 100644
index 000..24c30a3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vtrnf32_1.c
@@ -0,0 +1,11 @@
+/* Test the `vtrn_f32' AArch64 SIMD intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-options -save-temps -fno-inline } */
+
+#include arm_neon.h
+#include vtrnf32.x
+
+/* { dg-final { scan-assembler-times trn1\[ \t\]+v\[0-9\]+\.2s, ?v\[0-9\]+\.2s, ?v\[0-9\]+\.2s!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { scan-assembler-times trn2\[ \t\]+v\[0-9\]+\.2s, ?v\[0-9\]+\.2s, ?v\[0-9\]+\.2s!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vtrnp16.x b/gcc/testsuite/gcc.target/aarch64/simd/vtrnp16.x
new file mode 100644
index 000..5feabe4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vtrnp16.x
@@ -0,0 +1,27 @@
+extern void abort (void);
+
+poly16x4x2_t
+test_vtrnp16 (poly16x4_t _a, poly16x4_t _b)
+{
+  return vtrn_p16 (_a, _b);
+}
+
+int
+main (int argc, char **argv)
+{
+  int i;
+  poly16_t first[] = {1, 2, 3, 4};
+  poly16_t second[] = {5, 6, 7, 8};
+  poly16x4x2_t result = test_vtrnp16 (vld1_p16 (first), vld1_p16 (second));
+  poly16x4_t res1 = result.val[0], res2 = result.val[1];
+  poly16_t exp1[] = {1, 5, 3, 7};
+  poly16_t exp2[] = {2, 6, 4, 8};
+  poly16x4_t expected1 = vld1_p16 (exp1);
+  poly16x4_t expected2 = vld1_p16 (exp2);
+
+  for (i = 0; i  4; i++)
+if ((res1[i] != expected1[i]) || (res2[i] != expected2[i]))
+  

Re: Fix PR ipa/60315 (inliner explosion)

2014-03-28 Thread Jan Hubicka
   Bootstrapped/regtested x86_64-linux, comitted.
  
  Not with Ada apparently, resulting in 
  
  === acats tests ===
  FAIL:   c34007d
  FAIL:   c34007g
  FAIL:   c34007s
  FAIL:   c37213j
  FAIL:   c37213k
  FAIL:   c37213l
  FAIL:   ce2201g
  FAIL:   cxa5a03
  FAIL:   cxa5a04
  FAIL:   cxa5a06
  FAIL:   cxg2013
  FAIL:   cxg2015
  
 The problem is that by redirection to noreturn, we end up freeing SSA name of 
 the
 LHS but later we still process statements that refer it until they are 
 removed as
 unreachable.
 The following patch fixes it. I tested it on x86_64-linux, but changed my 
 mind.
 I think fixup_noreturn_call should do it instead, I will send updated patch 
 after
 testing.
 
 Honza

Actually after some additional invetstigation I decided to commit this patch.
fixup_noreturn_call already cares about the return value but differently than 
the
new Jakub's code.  We ought to unify it, but only for next stage1. 

Honza
 
 Index: cgraph.c
 ===
 --- cgraph.c  (revision 208875)
 +++ cgraph.c  (working copy)
 @@ -1329,6 +1331,7 @@ gimple
  cgraph_redirect_edge_call_stmt_to_callee (struct cgraph_edge *e)
  {
tree decl = gimple_call_fndecl (e-call_stmt);
 +  tree lhs = gimple_call_lhs (e-call_stmt);
gimple new_stmt;
gimple_stmt_iterator gsi;
  #ifdef ENABLE_CHECKING
 @@ -1471,6 +1474,22 @@ cgraph_redirect_edge_call_stmt_to_callee
update_stmt_fn (DECL_STRUCT_FUNCTION (e-caller-decl), new_stmt);
  }
  
 +  /* If the call becomes noreturn, remove the lhs.  */
 +  if (lhs  (gimple_call_flags (new_stmt)  ECF_NORETURN))
 +{
 +  if (TREE_CODE (lhs) == SSA_NAME)
 + {
 +  gsi = gsi_for_stmt (new_stmt);
 +
 +   tree var = create_tmp_var (TREE_TYPE (lhs), NULL);
 +   tree def = get_or_create_ssa_default_def
 +   (DECL_STRUCT_FUNCTION (e-caller-decl), var);
 +   gimple set_stmt = gimple_build_assign (lhs, def);
 +   gsi_insert_before (gsi, set_stmt, GSI_SAME_STMT);
 + }
 +  gimple_call_set_lhs (new_stmt, NULL_TREE);
 +}
 +
cgraph_set_call_stmt_including_clones (e-caller, e-call_stmt, new_stmt, 
 false);
  
if (cgraph_dump_file)


Re: [C++ PATCH] Fix __atomic_exchange (PR c++/60689)

2014-03-28 Thread Jason Merrill

On 03/28/2014 06:47 AM, Jakub Jelinek wrote:

* typeck.c (build_function_call_vec): Call resolve_overloaded_builtin.


I expect this will break in templates if arguments are dependent.

Jason



Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.

2014-03-28 Thread pinskia


 On Mar 28, 2014, at 7:48 AM, James Greenhalgh james.greenha...@arm.com 
 wrote:
 
 On Fri, Mar 28, 2014 at 11:11:58AM +, pins...@gmail.com wrote:
 On Mar 28, 2014, at 2:12 AM, James Greenhalgh james.greenha...@arm.com 
 wrote:
 Hi,
 
 There is no way to perform scalar addition in the vector register file,
 but with the RTX costs in place we start rewriting (x  1) to (x + x)
 on almost all cores. The code which makes this decision has no idea that we
 will end up doing this (it happens well before reload) and so we end up with
 very ugly code generation in the case where addition was selected, but
 we are operating in vector registers.
 
 This patch relies on the same gimmick we are already using to allow
 shifts on 32-bit scalars in the vector register file - Use a vector 32x2
 operation instead, knowing that we can safely ignore the top bits.
 
 This restores some normality to scalar_shift_1.c, however the test
 that we generate a left shift by one is clearly bogus, so remove that.
 
 This patch is pretty ugly, but it does generate superficially better
 looking code for this testcase.
 
 Tested on aarch64-none-elf with no issues.
 
 OK for stage 1?
 
 It seems we should also discourage the neon alternatives as there might be
 extra movement between the two register sets which we don't want.
 
 I see your point, but we've tried to avoid doing that elsewhere in the
 AArch64 backend. Our argument has been that strictly speaking, it isn't that
 the alternative is expensive, it is the movement between the register sets. We
 do model that elsewhere, and the register allocator should already be trying 
 to
 avoid unneccesary moves between register classes.
 

What about on a specific core where that alternative is expensive; that is the 
vector instructions are worse than the scalar ones. How are we going to handle 
this case?

Thanks,
Andrew


 If those mechanisms are broken, we should fix them - in that case fixing
 this by discouraging valid alternatives would seem to be gaffer-taping over 
 the
 real problem.
 
 Thanks,
 James
 
 
 Thanks,
 Andrew
 
 
 Thanks,
 James
 
 ---
 gcc/
 
 2014-03-27  James Greenhalgh  james.greenha...@arm.com
 
   * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
   vector registers.
 
 gcc/testsuite/
 2014-03-27  James Greenhalgh  james.greenha...@arm.com
 
   * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.
 0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch
 


Re: [PATCH][ARM/AArch64][1/2] Crypto intrinsics tuning for Cortex-A53 - type Attribute restructuring

2014-03-28 Thread Kyrill Tkachov

On 28/03/14 17:18, Kyrill Tkachov wrote:

On 28/03/14 14:18, Ramana Radhakrishnan wrote:

On Tue, Mar 25, 2014 at 3:51 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

Hi all,

This two-patch series adds scheduling information for the ARMv8-A Crypto
instructions on the Cortex-A53.
This first patch does some preliminary restructuring to allow the arm and
aarch64 backends to share the is_neon_type attribute.

It also splits the crypto_aes type into crypto_aese and crypto_aesmc since
the aese/aesd and aesmc/aesimc instructions will be treated differently (in
patch 2/2).

This patch touches both arm and aarch64 backends since there's no clean way
to split it into per-backend patches without breaking each one.

Tested and bootstrapped on arm-none-linux-gnueabihf and on
aarch64-none-linux-gnu.

This patch is fairly uncontroversial and doesn't change functionality or
code generation by itself.

I'll leave it to the maintainers to decide when this should go in...

The real question is about patch #2. So this going in just depends on patch #2.

#2 has been ok'd. Can I take this as an approval for this patch?


I've committed this as r208908 since quite a few people ok'd the more meaty #2 
patch. If anyone objects to this, we can revert it later.


Kyrill


Kyrill



Ramana


Thanks,
Kyrill

2014-03-25  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * config/aarch64/aarch64-simd.md (aarch64_crypto_aesaes_opv16qi):
  Use crypto_aese type.
  (aarch64_crypto_aesaesmc_opv16qi): Use crypto_aesmc type.
  * config/arm/arm.md (is_neon_type): Replace crypto_aes with
  crypto_aese, crypto_aesmc.  Move to types.md.
  * config/arm/types.md (crypto_aes): Split into crypto_aese,
  crypto_aesmc.
  * config/arm/iterators.md (crypto_type): Likewise.








Re: [C++ PATCH] Fix __atomic_exchange (PR c++/60689)

2014-03-28 Thread Jakub Jelinek
On Fri, Mar 28, 2014 at 11:47:52AM +0100, Jakub Jelinek wrote:
 Yet another possibility would be to rename all calls in C FE to
 build_function_call_vec to say c_build_function_call_vec and add that
 function which would call resolve_overloaded_builtin and then tail call to
 build_function_call_vec which wouldn't do that.  Then c-family/ would keep
 its current two calls to that function, which wouldn't recurse anymore, and
 we'd need to change add_atomic_size_parameter to push the argument.

Here is the variant patch, which implements the above.
Also bootstrapped/regtested on x86_64-linux and i686-linux.

2014-03-28  Jakub Jelinek  ja...@redhat.com

PR c++/60689
* c-tree.h (c_build_function_call_vec): New prototype.
* c-typeck.c (build_function_call_vec): Don't call
resolve_overloaded_builtin here.
(c_build_function_call_vec): New wrapper function around
build_function_call_vec.  Call resolve_overloaded_builtin here.
(convert_lvalue_to_rvalue, build_function_call, build_atomic_assign):
Call c_build_function_call_vec instead of build_function_call_vec.
* c-parser.c (c_parser_postfix_expression_after_primary): Likewise.
* c-decl.c (finish_decl): Likewise.

* c-common.c (add_atomic_size_parameter): When creating new
params vector, push the size argument first.

* c-c++-common/pr60689.c: New test.

--- gcc/c/c-tree.h.jj   2014-02-08 00:53:15.0 +0100
+++ gcc/c/c-tree.h  2014-03-28 12:30:49.155395381 +0100
@@ -643,6 +643,8 @@ extern tree c_finish_omp_clauses (tree);
 extern tree c_build_va_arg (location_t, tree, tree);
 extern tree c_finish_transaction (location_t, tree, int);
 extern bool c_tree_equal (tree, tree);
+extern tree c_build_function_call_vec (location_t, veclocation_t, tree,
+  vectree, va_gc *, vectree, va_gc *);
 
 /* Set to 0 at beginning of a function definition, set to 1 if
a return statement that specifies a return value is seen.  */
--- gcc/c/c-typeck.c.jj 2014-03-19 08:14:35.0 +0100
+++ gcc/c/c-typeck.c2014-03-28 12:34:57.803066414 +0100
@@ -2016,7 +2016,7 @@ convert_lvalue_to_rvalue (location_t loc
   params-quick_push (expr_addr);
   params-quick_push (tmp_addr);
   params-quick_push (seq_cst);
-  func_call = build_function_call_vec (loc, vNULL, fndecl, params, NULL);
+  func_call = c_build_function_call_vec (loc, vNULL, fndecl, params, NULL);
 
   /* EXPR is always read.  */
   mark_exp_read (exp.value);
@@ -2801,7 +2801,7 @@ build_function_call (location_t loc, tre
   vec_alloc (v, list_length (params));
   for (; params; params = TREE_CHAIN (params))
 v-quick_push (TREE_VALUE (params));
-  ret = build_function_call_vec (loc, vNULL, function, v, NULL);
+  ret = c_build_function_call_vec (loc, vNULL, function, v, NULL);
   vec_free (v);
   return ret;
 }
@@ -2840,14 +2840,6 @@ build_function_call_vec (location_t loc,
   /* Convert anything with function type to a pointer-to-function.  */
   if (TREE_CODE (function) == FUNCTION_DECL)
 {
-  /* Implement type-directed function overloading for builtins.
-resolve_overloaded_builtin and targetm.resolve_overloaded_builtin
-handle all the type checking.  The result is a complete expression
-that implements this function call.  */
-  tem = resolve_overloaded_builtin (loc, function, params);
-  if (tem)
-   return tem;
-
   name = DECL_NAME (function);
 
   if (flag_tm)
@@ -2970,6 +2962,30 @@ build_function_call_vec (location_t loc,
 }
   return require_complete_type (result);
 }
+
+/* Like build_function_call_vec, but call also resolve_overloaded_builtin.  */
+
+tree
+c_build_function_call_vec (location_t loc, veclocation_t arg_loc,
+  tree function, vectree, va_gc *params,
+  vectree, va_gc *origtypes)
+{
+  /* Strip NON_LVALUE_EXPRs, etc., since we aren't using as an lvalue.  */
+  STRIP_TYPE_NOPS (function);
+
+  /* Convert anything with function type to a pointer-to-function.  */
+  if (TREE_CODE (function) == FUNCTION_DECL)
+{
+  /* Implement type-directed function overloading for builtins.
+resolve_overloaded_builtin and targetm.resolve_overloaded_builtin
+handle all the type checking.  The result is a complete expression
+that implements this function call.  */
+  tree tem = resolve_overloaded_builtin (loc, function, params);
+  if (tem)
+   return tem;
+}
+  return build_function_call_vec (loc, arg_loc, function, params, origtypes);
+}
 
 /* Convert the argument expressions in the vector VALUES
to the types in the list TYPELIST.
@@ -3634,7 +3650,7 @@ build_atomic_assign (location_t loc, tre
   params-quick_push (lhs_addr);
   params-quick_push (rhs);
   params-quick_push (seq_cst);
-  func_call = build_function_call_vec (loc, vNULL, fndecl, params, NULL);
+  func_call 

Re: Evident fix for copy_loops.

2014-03-28 Thread Yuri Rumyantsev
Jakub,

I did testing of this fix and bootstrap and regression testing were
OK, i.e. no new failures.

2014-03-28 14:49 GMT+04:00 Jakub Jelinek ja...@redhat.com:
 On Fri, Mar 28, 2014 at 02:41:26PM +0400, Yuri Rumyantsev wrote:
 Hi All,

 I found out that a field 'safelen of struct loop is not copied in 
 copy_loops.

 Is it OK for trunk?

 Ok if it passes bootstrap/regtest.

 2014-03-28  Yuri Rumyantsev  ysrum...@gmail.com

 * tree-inline.c (copy_loops): Add missed copy of 'safelen'.

 Jakub


[PATCH] Allow VOIDmode argument to ix86_copy_addr_to_reg (PR target/60693)

2014-03-28 Thread Jakub Jelinek
Hi!

Before ix86_copy_addr_to_reg has been added, we've been using
copy_addr_to_reg, which handles VOIDmode values just fine.
But this new function just ICEs on those.  As the function
has been added for adding SUBREGs to TLS addresses, those will
never retunring CONST_INTs, so just using copy_addr_to_reg
is IMHO the right thing and restores previous behavior.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-03-28  Jakub Jelinek  ja...@redhat.com

PR target/60693
* config/i386/i386.c (ix86_copy_addr_to_reg): Call copy_addr_to_reg
also if addr has VOIDmode.

* gcc.target/i386/pr60693.c: New test.

--- gcc/config/i386/i386.c.jj   2014-03-20 17:05:21.0 +0100
+++ gcc/config/i386/i386.c  2014-03-28 12:04:59.695679145 +0100
@@ -22755,7 +22755,7 @@ counter_mode (rtx count_exp)
 static rtx
 ix86_copy_addr_to_reg (rtx addr)
 {
-  if (GET_MODE (addr) == Pmode)
+  if (GET_MODE (addr) == Pmode || GET_MODE (addr) == VOIDmode)
 return copy_addr_to_reg (addr);
   else
 {
--- gcc/testsuite/gcc.target/i386/pr60693.c.jj  2014-03-28 12:08:00.078711929 
+0100
+++ gcc/testsuite/gcc.target/i386/pr60693.c 2014-03-28 12:07:31.0 
+0100
@@ -0,0 +1,13 @@
+/* PR target/60693 */
+/* { dg-do compile } */
+/* { dg-options -O0 } */
+
+void bar (char *);
+
+void
+foo (void)
+{
+  char buf[4096];
+  __builtin_memcpy (buf, (void *) 0x8000, 4096);
+  bar (buf);
+}

Jakub


Re: [C++ PATCH] Fix __atomic_exchange (PR c++/60689)

2014-03-28 Thread Richard Henderson
On 03/28/2014 08:24 AM, Jakub Jelinek wrote:
 Here is the variant patch, which implements the above.
 Also bootstrapped/regtested on x86_64-linux and i686-linux.
 
 2014-03-28  Jakub Jelinek  ja...@redhat.com
 
   PR c++/60689
   * c-tree.h (c_build_function_call_vec): New prototype.
   * c-typeck.c (build_function_call_vec): Don't call
   resolve_overloaded_builtin here.
   (c_build_function_call_vec): New wrapper function around
   build_function_call_vec.  Call resolve_overloaded_builtin here.
   (convert_lvalue_to_rvalue, build_function_call, build_atomic_assign):
   Call c_build_function_call_vec instead of build_function_call_vec.
   * c-parser.c (c_parser_postfix_expression_after_primary): Likewise.
   * c-decl.c (finish_decl): Likewise.
 
   * c-common.c (add_atomic_size_parameter): When creating new
   params vector, push the size argument first.
 
   * c-c++-common/pr60689.c: New test.

I do prefer this variant.


r~


Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.

2014-03-28 Thread James Greenhalgh
On Fri, Mar 28, 2014 at 03:09:22PM +, pins...@gmail.com wrote:
 
 
  On Mar 28, 2014, at 7:48 AM, James Greenhalgh james.greenha...@arm.com 
  wrote:
  
  On Fri, Mar 28, 2014 at 11:11:58AM +, pins...@gmail.com wrote:
  On Mar 28, 2014, at 2:12 AM, James Greenhalgh james.greenha...@arm.com 
  wrote:
  Hi,
  
  There is no way to perform scalar addition in the vector register file,
  but with the RTX costs in place we start rewriting (x  1) to (x + x)
  on almost all cores. The code which makes this decision has no idea that 
  we
  will end up doing this (it happens well before reload) and so we end up 
  with
  very ugly code generation in the case where addition was selected, but
  we are operating in vector registers.
  
  This patch relies on the same gimmick we are already using to allow
  shifts on 32-bit scalars in the vector register file - Use a vector 32x2
  operation instead, knowing that we can safely ignore the top bits.
  
  This restores some normality to scalar_shift_1.c, however the test
  that we generate a left shift by one is clearly bogus, so remove that.
  
  This patch is pretty ugly, but it does generate superficially better
  looking code for this testcase.
  
  Tested on aarch64-none-elf with no issues.
  
  OK for stage 1?
  
  It seems we should also discourage the neon alternatives as there might be
  extra movement between the two register sets which we don't want.
  
  I see your point, but we've tried to avoid doing that elsewhere in the
  AArch64 backend. Our argument has been that strictly speaking, it isn't that
  the alternative is expensive, it is the movement between the register sets. 
  We
  do model that elsewhere, and the register allocator should already be 
  trying to
  avoid unneccesary moves between register classes.
  
 
 What about on a specific core where that alternative is expensive; that is
 the vector instructions are worse than the scalar ones. How are we going to
 handle this case?

Certainly not by discouraging the alternative for all cores. We would need
a more nuanced approach which could be tuned on a per-core basis. Otherwise
we are bluntly and inaccurately pessimizing those cases where we can cheaply
perform the operation in the vector register file (e.g. we are cleaning up
loose ends after a vector loop, we have spilled to the vector register
file, etc.). The register preference mechanism feels the wrong place to
catch this as it does not allow for that degree of per-core felxibility,
an alternative is simply disparaged slightly (?, * in LRA) or
disparaged severely (!).

I would think that we don't want to start polluting the machine description
trying to hack around this as was done with the ARM backend's
neon_for_64_bits/avoid_neon_for_64_bits.

How have other targets solved this issue?

Thanks,
James

 Thanks,
 Andrew
 
  If those mechanisms are broken, we should fix them - in that case fixing
  this by discouraging valid alternatives would seem to be gaffer-taping over 
  the
  real problem.
  
  Thanks,
  James
  
  
  Thanks,
  Andrew
  
  
  Thanks,
  James
  
  ---
  gcc/
  
  2014-03-27  James Greenhalgh  james.greenha...@arm.com
  
* config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
vector registers.
  
  gcc/testsuite/
  2014-03-27  James Greenhalgh  james.greenha...@arm.com
  
* gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.
  0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch
  
 


[AArch64/ARM 3/3] Add execution tests of ARM TRN Intrinsics

2014-03-28 Thread Alan Lawrence

Final patch in series, adds new tests of the ARM TRN Intrinsics, that also check
the execution results, reusing the test bodies introduced into AArch64 in the
first patch. (These tests subsume the autogenerated ones in
testsuite/gcc.target/arm/neon/ that only check assembler output.)

Tests use gcc.target/arm/simd/simd.exp from corresponding patch for ZIP
Intrinsics, will commit that first.

All tests passing on arm-none-eabi.

testsuite/ChangeLog:
2012-03-28  Alan Lawrence  alan.lawre...@arm.com

* gcc.target/arm/simd/vtrnqf32_1.c: New file.
* gcc.target/arm/simd/vtrnqp16_1.c: New file.
* gcc.target/arm/simd/vtrnqp8_1.c: New file.
* gcc.target/arm/simd/vtrnqs16_1.c: New file.
* gcc.target/arm/simd/vtrnqs32_1.c: New file.
* gcc.target/arm/simd/vtrnqs8_1.c: New file.
* gcc.target/arm/simd/vtrnqu16_1.c: New file.
* gcc.target/arm/simd/vtrnqu32_1.c: New file.
* gcc.target/arm/simd/vtrnqu8_1.c: New file.
* gcc.target/arm/simd/vtrnf32_1.c: New file.
* gcc.target/arm/simd/vtrnp16_1.c: New file.
* gcc.target/arm/simd/vtrnp8_1.c: New file.
* gcc.target/arm/simd/vtrns16_1.c: New file.
* gcc.target/arm/simd/vtrns32_1.c: New file.
* gcc.target/arm/simd/vtrns8_1.c: New file.
* gcc.target/arm/simd/vtrnu16_1.c: New file.
* gcc.target/arm/simd/vtrnu32_1.c: New file.
* gcc.target/arm/simd/vtrnu8_1.c: New file.diff --git a/gcc/testsuite/gcc.target/arm/simd/vtrnf32_1.c b/gcc/testsuite/gcc.target/arm/simd/vtrnf32_1.c
new file mode 100644
index 000..c9620fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vtrnf32_1.c
@@ -0,0 +1,12 @@
+/* Test the `vtrnf32' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options -save-temps -O1 -fno-inline } */
+/* { dg-add-options arm_neon } */
+
+#include arm_neon.h
+#include ../../aarch64/simd/vtrnf32.x
+
+/* { dg-final { scan-assembler-times vtrn\.32\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vtrnp16_1.c b/gcc/testsuite/gcc.target/arm/simd/vtrnp16_1.c
new file mode 100644
index 000..0ff4319
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vtrnp16_1.c
@@ -0,0 +1,12 @@
+/* Test the `vtrnp16' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options -save-temps -O1 -fno-inline } */
+/* { dg-add-options arm_neon } */
+
+#include arm_neon.h
+#include ../../aarch64/simd/vtrnp16.x
+
+/* { dg-final { scan-assembler-times vtrn\.16\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vtrnp8_1.c b/gcc/testsuite/gcc.target/arm/simd/vtrnp8_1.c
new file mode 100644
index 000..2b047e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vtrnp8_1.c
@@ -0,0 +1,12 @@
+/* Test the `vtrnp8' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options -save-temps -O1 -fno-inline } */
+/* { dg-add-options arm_neon } */
+
+#include arm_neon.h
+#include ../../aarch64/simd/vtrnp8.x
+
+/* { dg-final { scan-assembler-times vtrn\.8\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vtrnqf32_1.c b/gcc/testsuite/gcc.target/arm/simd/vtrnqf32_1.c
new file mode 100644
index 000..dd4e883
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vtrnqf32_1.c
@@ -0,0 +1,12 @@
+/* Test the `vtrnQf32' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options -save-temps -O1 -fno-inline } */
+/* { dg-add-options arm_neon } */
+
+#include arm_neon.h
+#include ../../aarch64/simd/vtrnqf32.x
+
+/* { dg-final { scan-assembler-times vtrn\.32\[ \t\]+\[qQ\]\[0-9\]+, ?\[qQ\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vtrnqp16_1.c b/gcc/testsuite/gcc.target/arm/simd/vtrnqp16_1.c
new file mode 100644
index 000..374eee3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vtrnqp16_1.c
@@ -0,0 +1,12 @@
+/* Test the `vtrnQp16' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options -save-temps -O1 -fno-inline } */
+/* { dg-add-options arm_neon } */
+
+#include arm_neon.h
+#include ../../aarch64/simd/vtrnqp16.x
+
+/* { dg-final { scan-assembler-times vtrn\.16\[ \t\]+\[qQ\]\[0-9\]+, ?\[qQ\]\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vtrnqp8_1.c b/gcc/testsuite/gcc.target/arm/simd/vtrnqp8_1.c
new file mode 100644
index 

Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.

2014-03-28 Thread James Greenhalgh
On Fri, Mar 28, 2014 at 11:11:58AM +, pins...@gmail.com wrote:
  On Mar 28, 2014, at 2:12 AM, James Greenhalgh james.greenha...@arm.com 
  wrote:
  Hi,
  
  There is no way to perform scalar addition in the vector register file,
  but with the RTX costs in place we start rewriting (x  1) to (x + x)
  on almost all cores. The code which makes this decision has no idea that we
  will end up doing this (it happens well before reload) and so we end up with
  very ugly code generation in the case where addition was selected, but
  we are operating in vector registers.
  
  This patch relies on the same gimmick we are already using to allow
  shifts on 32-bit scalars in the vector register file - Use a vector 32x2
  operation instead, knowing that we can safely ignore the top bits.
  
  This restores some normality to scalar_shift_1.c, however the test
  that we generate a left shift by one is clearly bogus, so remove that.
  
  This patch is pretty ugly, but it does generate superficially better
  looking code for this testcase.
  
  Tested on aarch64-none-elf with no issues.
  
  OK for stage 1?
 
 It seems we should also discourage the neon alternatives as there might be
 extra movement between the two register sets which we don't want. 

I see your point, but we've tried to avoid doing that elsewhere in the
AArch64 backend. Our argument has been that strictly speaking, it isn't that
the alternative is expensive, it is the movement between the register sets. We
do model that elsewhere, and the register allocator should already be trying to
avoid unneccesary moves between register classes.

If those mechanisms are broken, we should fix them - in that case fixing
this by discouraging valid alternatives would seem to be gaffer-taping over the
real problem.

Thanks,
James

 
 Thanks,
 Andrew
 
  
  Thanks,
  James
  
  ---
  gcc/
  
  2014-03-27  James Greenhalgh  james.greenha...@arm.com
  
 * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
 vector registers.
  
  gcc/testsuite/
  2014-03-27  James Greenhalgh  james.greenha...@arm.com
  
 * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.
  0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch
 


[AArch64/ARM 2/3] Reimplement AArch64 TRN intrinsics with __builtin_shuffle

2014-03-28 Thread Alan Lawrence
This patch replaces the temporary inline assembler for vtrn[q]_* in arm_neon.h 
with equivalent calls to __builtin_shuffle.  These are matched by existing 
patterns in aarch64.c (aarch64_expand_vec_perm_const_1), outputting the same 
assembler instructions.  For two-element vectors, ZIP, UZP and TRN instructions 
all have the same effect, and the backend chooses to output ZIP, so this patch 
also updates the 3 affected tests.


Regressed, and tests from first patch still passing modulo updates herein, on
aarch64-none-elf and aarch64_be-none-elf.

gcc/testsuite/ChangeLog:
2014-03-28  Alan Lawrence  alan.lawre...@arm.com

* gcc.target/aarch64/vtrns32.c: Expect zip[12] insn rather than trn[12].
* gcc.target/aarch64/vtrnu32.c: Likewise.
* gcc.target/aarch64/vtrnf32.c: Likewise.

gcc/ChangeLog:
2014-03-28  Alan Lawrence  alan.lawre...@arm.com

* config/aarch64/arm_neon.h (vtrn1_f32, vtrn1_p8, vtrn1_p16, vtrn1_s8,
vtrn1_s16, vtrn1_s32, vtrn1_u8, vtrn1_u16, vtrn1_u32, vtrn1q_f32,
vtrn1q_f64, vtrn1q_p8, vtrn1q_p16, vtrn1q_s8, vtrn1q_s16, vtrn1q_s32,
vtrn1q_s64, vtrn1q_u8, vtrn1q_u16, vtrn1q_u32, vtrn1q_u64, vtrn2_f32,
vtrn2_p8, vtrn2_p16, vtrn2_s8, vtrn2_s16, vtrn2_s32, vtrn2_u8,
vtrn2_u16, vtrn2_u32, vtrn2q_f32, vtrn2q_f64, vtrn2q_p8, vtrn2q_p16,
vtrn2q_s8, vtrn2q_s16, vtrn2q_s32, vtrn2q_s64, vtrn2q_u8, vtrn2q_u16,
vtrn2q_u32, vtrn2q_u64): Replace temporary asm with __builtin_shuffle.diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 6af99361..d7962e5 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -12447,468 +12447,6 @@ vsubhn_u64 (uint64x2_t a, uint64x2_t b)
   return result;
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vtrn1_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ (trn1 %0.2s,%1.2s,%2.2s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vtrn1_p8 (poly8x8_t a, poly8x8_t b)
-{
-  poly8x8_t result;
-  __asm__ (trn1 %0.8b,%1.8b,%2.8b
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
-vtrn1_p16 (poly16x4_t a, poly16x4_t b)
-{
-  poly16x4_t result;
-  __asm__ (trn1 %0.4h,%1.4h,%2.4h
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vtrn1_s8 (int8x8_t a, int8x8_t b)
-{
-  int8x8_t result;
-  __asm__ (trn1 %0.8b,%1.8b,%2.8b
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vtrn1_s16 (int16x4_t a, int16x4_t b)
-{
-  int16x4_t result;
-  __asm__ (trn1 %0.4h,%1.4h,%2.4h
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vtrn1_s32 (int32x2_t a, int32x2_t b)
-{
-  int32x2_t result;
-  __asm__ (trn1 %0.2s,%1.2s,%2.2s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vtrn1_u8 (uint8x8_t a, uint8x8_t b)
-{
-  uint8x8_t result;
-  __asm__ (trn1 %0.8b,%1.8b,%2.8b
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vtrn1_u16 (uint16x4_t a, uint16x4_t b)
-{
-  uint16x4_t result;
-  __asm__ (trn1 %0.4h,%1.4h,%2.4h
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vtrn1_u32 (uint32x2_t a, uint32x2_t b)
-{
-  uint32x2_t result;
-  __asm__ (trn1 %0.2s,%1.2s,%2.2s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vtrn1q_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ (trn1 %0.4s,%1.4s,%2.4s
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vtrn1q_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ (trn1 %0.2d,%1.2d,%2.2d
-   : =w(result)
-   : w(a), w(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vtrn1q_p8 (poly8x16_t a, poly8x16_t b)
-{

Re: [RFA][PATCH][pr target/60648] Fix non-canonical RTL from x86 backend -- P1 regression

2014-03-28 Thread Jeff Law

On 03/26/14 12:28, Jakub Jelinek wrote:

On Wed, Mar 26, 2014 at 12:17:43PM -0600, Jeff Law wrote:

On 03/26/14 12:12, Jakub Jelinek wrote:

On Wed, Mar 26, 2014 at 11:02:48AM -0600, Jeff Law wrote:

Bootstrapped and regression tested on x86_64-unknown-linux-gnu.
Verified it fixes the original and reduced testcase.


Note, the testcase is missing from your patch.

But I'd question if this is the right place to canonicalize it.
The non-canonical order seems to be created in the generic code, where
do_tablejump does:

No, at that point it's still canonical because the x86 backend
hasn't simpified the (mult ...) subexpression.  Its the
simplification of that subexpression to a constant that creates the
non-canonical RTL.  That's why I fixed the x86 bits -- those are the
bits that simplify the (mult ...) into a (const_int) and thus
creates the non-canonical RTL.


(mult:SI (const_int 0) (const_int 4)) is IMHO far from being canonical.
And, I'd say it is likely other target legitimization hooks would also try
to simplify it similarly.
simplify_gen_binary is used in several other places during expansion,
so I don't see why it couldn't be desirable here.


Here's the updated patch.  It uses simplify_gen_binary in expr.c to 
simplify the address expression as we're building it.  It also uses 
copy_addr_to_reg in the x86 backend to avoid the possibility of 
generating non-canonical RTL there too.


By accident I interrupted the regression test cycle, so that is still 
running.  OK for the trunk if that passes?




diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 53d58b3..3caae44 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2014-03-27  Jeff Law  l...@redhat.com
+   Jakub Jalinek ja...@redhat.com
+
+   * expr.c (do_tablejump): Use simplify_gen_binary rather than
+   gen_rtx_{PLUS,MULT} to build up the address expression.
+
+   * i386/i386.c (ix86_legitimize_address): Use copy_addr_to_reg to avoid
+   creating non-canonical RTL.
+
 2014-03-26  Richard Biener  rguent...@suse.de
 
* tree-pretty-print.c (percent_K_format): Implement special
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 842be68..70b8f02 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13925,13 +13925,13 @@ ix86_legitimize_address (rtx x, rtx oldx 
ATTRIBUTE_UNUSED,
   if (GET_CODE (XEXP (x, 0)) == MULT)
{
  changed = 1;
- XEXP (x, 0) = force_operand (XEXP (x, 0), 0);
+ XEXP (x, 0) = copy_addr_to_reg (XEXP (x, 0));
}
 
   if (GET_CODE (XEXP (x, 1)) == MULT)
{
  changed = 1;
- XEXP (x, 1) = force_operand (XEXP (x, 1), 0);
+ XEXP (x, 1) = copy_addr_to_reg (XEXP (x, 1));
}
 
   if (changed
diff --git a/gcc/expr.c b/gcc/expr.c
index cdb4551..ebf136e 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11134,11 +11134,12 @@ do_tablejump (rtx index, enum machine_mode mode, rtx 
range, rtx table_label,
  GET_MODE_SIZE, because this indicates how large insns are.  The other
  uses should all be Pmode, because they are addresses.  This code
  could fail if addresses and insns are not the same size.  */
-  index = gen_rtx_PLUS
-(Pmode,
- gen_rtx_MULT (Pmode, index,
-  gen_int_mode (GET_MODE_SIZE (CASE_VECTOR_MODE), Pmode)),
- gen_rtx_LABEL_REF (Pmode, table_label));
+  index = simplify_gen_binary (MULT, Pmode, index,
+  gen_int_mode (GET_MODE_SIZE (CASE_VECTOR_MODE),
+Pmode));
+  index = simplify_gen_binary (PLUS, Pmode, index,
+  gen_rtx_LABEL_REF (Pmode, table_label));
+
 #ifdef PIC_CASE_VECTOR_ADDRESS
   if (flag_pic)
 index = PIC_CASE_VECTOR_ADDRESS (index);
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index cdc8e9a..fc3c198 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2014-03-27  Jeff Law  l...@redhat.com
+
+   PR target/60648
+   * g++.dg/pr60648.C: New test.
+
 2014-03-26  Jakub Jelinek  ja...@redhat.com
 
PR sanitizer/60636
diff --git a/gcc/testsuite/g++.dg/pr60648.C b/gcc/testsuite/g++.dg/pr60648.C
new file mode 100644
index 000..80c0561
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr60648.C
@@ -0,0 +1,73 @@
+/* { dg-do compile } */
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options -O3 -fPIC -m32 }  */
+
+enum component
+{
+  Ex,
+  Ez,
+  Hy,
+  Permeability
+};
+enum derived_component
+{};
+enum direction
+{
+  X,
+  Y,
+  Z,
+  R,
+  P,
+  NO_DIRECTION
+};
+derived_component a;
+component *b;
+component c;
+direction d;
+inline direction fn1 (component p1)
+{
+  switch (p1)
+{
+case 0:
+  return Y;
+case 1:
+  return Z;
+case Permeability:
+  return NO_DIRECTION;
+}
+  return X;
+}
+
+inline component fn2 (direction p1)
+{
+  switch (p1)
+{
+case 0:
+case 1:
+  return component ();
+case Z:
+ 

Re: [RFA][PATCH][pr target/60648] Fix non-canonical RTL from x86 backend -- P1 regression

2014-03-28 Thread Jakub Jelinek
On Fri, Mar 28, 2014 at 12:04:00PM -0600, Jeff Law wrote:
 Here's the updated patch.  It uses simplify_gen_binary in expr.c to
 simplify the address expression as we're building it.  It also uses
 copy_addr_to_reg in the x86 backend to avoid the possibility of
 generating non-canonical RTL there too.
 
 By accident I interrupted the regression test cycle, so that is
 still running.  OK for the trunk if that passes?

Ok, thanks.

Jakub


Re: Fix PR ipa/60315 (inliner explosion)

2014-03-28 Thread Eric Botcazou
 Actually after some additional invetstigation I decided to commit this
 patch. fixup_noreturn_call already cares about the return value but
 differently than the new Jakub's code. 

Thanks for the quick fix, I confirm that the ACATS failures are all gone.

So we're left with the GIMPLE checking failure on opt33.adb.

-- 
Eric Botcazou


Re: [PATCH][AArch64][2/3] Recognise rev16 operations on SImode and DImode data

2014-03-28 Thread Kyrill Tkachov

On 28/03/14 14:21, Ramana Radhakrishnan wrote:

On Wed, Mar 19, 2014 at 9:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

Hi all,

This patch adds a recogniser for the bitmask,shift,orr sequence of
instructions that can be used to reverse the bytes in 16-bit halfwords (for
the sequence itself look at the testcase included in the patch). This can be
implemented with a rev16 instruction.
Since the shifts can occur in any order and there are no canonicalisation
rules for where they appear in the expression we have to have two patterns
to match both cases.

The rtx costs function is updated to recognise the pattern and cost it
appropriately by using the rev field of the cost tables introduced in patch
[1/3]. The rtx costs helper functions that are used to recognise those
bitwise operations are placed in config/arm/aarch-common.c so that they can
be reused by both arm and aarch64.

The ARM bits of this are OK if there are no regressions.


I've added an execute testcase but no scan-assembler tests since
conceptually in the future the combiner might decide to not use a rev
instruction due to rtx costs. We can at least test that the code generated
is functionally correct though.

Tested aarch64-none-elf.

What about arm-none-eabi :) ?


Tested arm-none-eabi and bootstrap on arm linux together with patch [3/3] in the 
series :)


Kyrill




Ok for stage1?

[gcc/]
2014-03-19  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/aarch64/aarch64.md (rev16mode2): New pattern.
 (rev16mode2_alt): Likewise.
 * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle rev16 case.
 * config/arm/aarch-common.c (aarch_rev16_shright_mask_imm_p): New.
 (aarch_rev16_shleft_mask_imm_p): Likewise.
 (aarch_rev16_p_1): Likewise.
 (aarch_rev16_p): Likewise.
 * config/arm/aarch-common-protos.h (aarch_rev16_p): Declare extern.
 (aarch_rev16_shright_mask_imm_p): Likewise.
 (aarch_rev16_shleft_mask_imm_p): Likewise.

[gcc/testsuite/]
2014-03-19  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * gcc.target/aarch64/rev16_1.c: New test.





Re: [PATCH][AArch64][2/3] Recognise rev16 operations on SImode and DImode data

2014-03-28 Thread Ramana Radhakrishnan
On Wed, Mar 19, 2014 at 9:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 This patch adds a recogniser for the bitmask,shift,orr sequence of
 instructions that can be used to reverse the bytes in 16-bit halfwords (for
 the sequence itself look at the testcase included in the patch). This can be
 implemented with a rev16 instruction.
 Since the shifts can occur in any order and there are no canonicalisation
 rules for where they appear in the expression we have to have two patterns
 to match both cases.

 The rtx costs function is updated to recognise the pattern and cost it
 appropriately by using the rev field of the cost tables introduced in patch
 [1/3]. The rtx costs helper functions that are used to recognise those
 bitwise operations are placed in config/arm/aarch-common.c so that they can
 be reused by both arm and aarch64.

The ARM bits of this are OK if there are no regressions.


 I've added an execute testcase but no scan-assembler tests since
 conceptually in the future the combiner might decide to not use a rev
 instruction due to rtx costs. We can at least test that the code generated
 is functionally correct though.

 Tested aarch64-none-elf.

What about arm-none-eabi :) ?


 Ok for stage1?

 [gcc/]
 2014-03-19  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/aarch64/aarch64.md (rev16mode2): New pattern.
 (rev16mode2_alt): Likewise.
 * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle rev16 case.
 * config/arm/aarch-common.c (aarch_rev16_shright_mask_imm_p): New.
 (aarch_rev16_shleft_mask_imm_p): Likewise.
 (aarch_rev16_p_1): Likewise.
 (aarch_rev16_p): Likewise.
 * config/arm/aarch-common-protos.h (aarch_rev16_p): Declare extern.
 (aarch_rev16_shright_mask_imm_p): Likewise.
 (aarch_rev16_shleft_mask_imm_p): Likewise.

 [gcc/testsuite/]
 2014-03-19  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * gcc.target/aarch64/rev16_1.c: New test.


Re: [C++ PATCH] Fix __atomic_exchange (PR c++/60689)

2014-03-28 Thread Jakub Jelinek
On Fri, Mar 28, 2014 at 01:46:09PM -0400, Jason Merrill wrote:
 On 03/28/2014 06:47 AM, Jakub Jelinek wrote:
  * typeck.c (build_function_call_vec): Call resolve_overloaded_builtin.
 
 I expect this will break in templates if arguments are dependent.

The only problem with this patch is potentially ObjC, I've missed that it
also calls build_function_call_vec; in c-family and cp/ proper
build_function_call_vec is only called from within
resolve_overloaded_builtin itself, thus it shouldn't see dependent args.

Jakub


Re: [PATCH] Handle short reads and EINTR in lto-plugin/simple-object

2014-03-28 Thread Ian Lance Taylor
On Fri, Mar 28, 2014 at 6:30 AM, Richard Biener rguent...@suse.de wrote:

 2014-03-26  Richard Biener  rguent...@suse.de

 libiberty/
 * simple-object.c (simple_object_internal_read): Handle
 EINTR and short reads.

 lto-plugin/
 * lto-plugin.c (process_symtab): Use simple_object_internal_read.

This is OK.

Thanks.

Ian


Re: [PATCH][ARM/AArch64][2/2] Crypto intrinsics tuning for Cortex-A53 - pipeline description

2014-03-28 Thread Ramana Radhakrishnan
On Tue, Mar 25, 2014 at 3:52 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 In ARMv8-A there's a general expectation that AESE/AESMC and AESD/AESIMC
 sequences of the form:

 AESE Vn, _
 AESMC Vn, Vn

 will issue both instructions in a single cycle on super-scalar
 implementations. It would be nice to model that in our pipeline
 descriptions. This patch defines a function to detect such pairs and uses it
 in the pipeline description for these instructions for the Cortex-A53.

 The patch also adds some missed AdvancedSIMD information to the pipeline
 description for the Cortex-A53.

 Bootstrapped and tested on arm-none-linux-gnueabihf and
 aarch64-none-linux-gnu.

 Cortex-A53 scheduling is the default scheduling description on aarch64 so
 this patch can change default behaviour. That's an argument for taking this
 in stage1 or maybe backporting it into 4.9.1 once the release is made.


To my mind on ARM / AArch64 this actually helps anyone using the
crypto intrinsics on A53 hardware today and it would be good to get
this into 4.9. Again I perceive this as low risk on ARM (AArch32) as
this is not a  default tuning option for any large software vendors,
the folks using this are typically the ones that write the more
specialized crypto intrinsics rather than just general purpose code.
However this will help with scheduling on what is essentially an
in-order core, so would be nice to have.

This would definitely need approval from the AArch64 maintainers and
the RMs to go in at this stage.

If not, we should consider this for 4.9.1

regards
Ramana


 What do people think?

 Thanks,
 Kyrill


 2014-03-25  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/arm/aarch-common.c (aarch_crypto_can_dual_issue): New function.
 * config/arm/aarch-common-protos.h (aarch_crypto_can_dual_issue):
 Declare
 extern.
 * config/arm/cortex-a53.md: Add reservations and bypass for crypto
 instructions as well as AdvancedSIMD loads.


Re: [PATCH] RL78 - minor size optimization

2014-03-28 Thread Richard Hulme

On 28/03/14 00:20, DJ Delorie wrote:

This is OK after 4.9 branches (i.e. stage1).  I suspect we could add
AX to the first alternative, although I don't know if it will get
used.  We could add HL to the second alternative to complete the
replacement of the 'r' constraint.


Yes, the missing AX in the first alternative came to me later too.  HL 
is already in the second alternative ('T').


Looking at it again, it probably makes sense to change the third 
alternative to 'shrw %0,8'.  It's the same length as  mov x,a/clrb a but 
it's a cycle shorter.  It also makes it more like the extendqihi2_real 
insn, which isn't especially important, but does mean there's a certain 
symmetry about it.


2014-03-28  Richard Hulme  pepe...@yahoo.com

* config/rl78/rl78-real.md (zero_extendqihi2_real):
Minor optimizations to use clrb instruction where possible,
which is 1 byte shorter than 'mov'ing #0, and shrw, which
is 1 cycle less than a mov/clrb sequence.

---
 gcc/config/rl78/rl78-real.md |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rl78/rl78-real.md b/gcc/config/rl78/rl78-real.md
index 27ff60f..5d5c598 100644
--- a/gcc/config/rl78/rl78-real.md
+++ b/gcc/config/rl78/rl78-real.md
@@ -77,12 +77,14 @@
 ;;-- Conversions 

 (define_insn *zero_extendqihi2_real
-  [(set (match_operand:HI 0 nonimmediate_operand =rv,A)
-   (zero_extend:HI (match_operand:QI 1 general_operand 0,a)))]
+  [(set (match_operand:HI 0 nonimmediate_operand 
=ABv,DT,A,B)

+   (zero_extend:HI (match_operand:QI 1 general_operand 0,0,a,b)))]
   rl78_real_insns_ok ()
   @
+   clrb\t%Q0
mov\t%Q0, #0
-   mov\tx, a \;mov\ta, #0
+   shrw\t%0, 8
+   shrw\t%0, 8
   )

 (define_insn *extendqihi2_real
--
1.7.9.5




Lost __mips_o32 predefine on NetBSD

2014-03-28 Thread Martin Husemann
In the mips--netbsdelf target gcc 4.9 lost the pre-definition of
__mips_o32, which is heavily used in NetBSD sources.

The obvious trivial patch adds it back.

Martin

--8---
Define __mips_o32 for -mabi=32

--- gcc/config/mips/netbsd.h.orig   2014-01-02 23:23:26.0 +0100
+++ gcc/config/mips/netbsd.h2014-03-28 14:19:18.0 +0100
@@ -32,7 +32,9 @@ along with GCC; see the file COPYING3.  
   if (TARGET_ABICALLS) \
builtin_define (__ABICALLS__);\
\
-  if (mips_abi == ABI_EABI)\
+  if (mips_abi == ABI_32)  \
+   builtin_define (__mips_o32);  \
+  else if (mips_abi == ABI_EABI)   \
builtin_define (__mips_eabi); \
   else if (mips_abi == ABI_N32)\
builtin_define (__mips_n32);  \



Re: [PATCH] Allow VOIDmode argument to ix86_copy_addr_to_reg (PR target/60693)

2014-03-28 Thread Uros Bizjak
On Fri, Mar 28, 2014 at 4:19 PM, Jakub Jelinek ja...@redhat.com wrote:

 Before ix86_copy_addr_to_reg has been added, we've been using
 copy_addr_to_reg, which handles VOIDmode values just fine.
 But this new function just ICEs on those.  As the function
 has been added for adding SUBREGs to TLS addresses, those will
 never retunring CONST_INTs, so just using copy_addr_to_reg
 is IMHO the right thing and restores previous behavior.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

 2014-03-28  Jakub Jelinek  ja...@redhat.com

 PR target/60693
 * config/i386/i386.c (ix86_copy_addr_to_reg): Call copy_addr_to_reg
 also if addr has VOIDmode.

 * gcc.target/i386/pr60693.c: New test.

OK.

Thanks,
Uros.


Re: Skip gcc.dg/tree-ssa/isolate-*.c for AVR Target

2014-03-28 Thread Mike Stump
On Mar 28, 2014, at 3:16 AM, K_s, Vishnu vishnu@atmel.com wrote:
 The tests added in gcc.dg/tree-ssa/isolate-*.c is failing for AVR target,
 Because the isolate erroneous path pass needs -fdelete-null-pointer-checks
 option to be enabled. For AVR target that option is disabled, this cause 
 the tests to fail. Following Patch skip the isolate-* tests if 
 keeps_null_pointer_checks is true. 

So I didn’t see a checkin, and I don’t see an Ok?  Each patch should have one 
or the other…  Without an Ok? I assume it’s been checked in, with an Ok? I take 
that as a review request.

I’ll assume you forgot the Ok?, Ok.

Since the AVR people are fairly active, I’ll let them check it in; gives them 
the opportunity to further consider it.

 2014-03-28  Vishnu K S vishnu@atmel.com 
 
   * gcc/testsuite/gcc.dg/tree-ssa/isolate-1.c: Skip test for AVR 
   * gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c: Ditto 
   * gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c: Ditto
   * gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c: Ditto
   * gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c: Ditto



Re: [PATCH] RL78 - minor size optimization

2014-03-28 Thread DJ Delorie

Sweet.  Yes please, in stage 1.


[committed, fortran] PR 60766 fix buffer overflow

2014-03-28 Thread Mikael Morin
Hello,

I fixed an ICE in pr59599 due to a wrong number of arguments passed to
the ichar function, but I forgot to update the size of the buffer
containing the argument list. Fixed thusly.
I have tested the patch (attached) on x86_64-unknown-linux-gnu and
committed it as revision 208913.
Thanks to Tobias for identifying the problem.

Mikael

Index: ChangeLog
===
--- ChangeLog	(révision 208912)
+++ ChangeLog	(révision 208913)
@@ -1,5 +1,11 @@
-2014-04-27  Thomas Koenig  tkoe...@gcc.gnu.org
+2014-03-28  Mikael Morin  mik...@gcc.gnu.org
 
+	PR fortran/60677
+	* trans-intrinsic.c (gfc_conv_intrinsic_ichar): Enlarge argument
+	list buffer.
+
+2014-03-27  Thomas Koenig  tkoe...@gcc.gnu.org
+
 	PR fortran/60522
 	* frontend-passes.c (cfe_code):  Do not walk subtrees
 	for WHERE.
Index: trans-intrinsic.c
===
--- trans-intrinsic.c	(révision 208912)
+++ trans-intrinsic.c	(révision 208913)
@@ -4687,7 +4687,7 @@ gfc_conv_intrinsic_index_scan_verify (gfc_se * se,
 static void
 gfc_conv_intrinsic_ichar (gfc_se * se, gfc_expr * expr)
 {
-  tree args[2], type, pchartype;
+  tree args[3], type, pchartype;
   int nargs;
 
   nargs = gfc_intrinsic_argument_list_length (expr);



Re: Changing INT to SI mode

2014-03-28 Thread Mike Stump
On Mar 28, 2014, at 6:23 AM, K_s, Vishnu vishnu@atmel.com wrote:
 Test pr59940.c is failing for AVR target because the test assumes the size
 of int as 32 bit and test expect to generate warnings for overflow and
 conversion while assigning 36-bit and 32 bit value respectively to variable 
 si.
 Following patch define a 32 bit type with SI mode and use it.
 
 2014-03-28  Vishnu K S vishnu@atmel.com 
 
   * gcc/testsuite/gcc.dg/pr59940.c: Using 32-bit SI mode instead of int

[ see previous note ]

Ok.


I checked this in for you, and formatted the ChangeLog slightly better.  Two 
spaces after the name, no space before the , no gcc/testsuite in the log, end 
sentence with a period, add name of what was changed (si in this case).

+2014-03-28  Vishnu K S  vishnu@atmel.com
+
+   * gcc.dg/pr59940.c (si): Use 32-bit SI mode instead of int.


Re: various _mm512_set* intrinsics

2014-03-28 Thread Uros Bizjak
Hello!

 Here are more intrinsics that are missing.  I know that gcc currently
 generates horrible code for most of them but I think it's more important
 to have the API in place, albeit non-optimal.  Maybe this entices some
 one to add the necessary optimizations.

I agree that having non-optimal implementation is better than nothing.

 The code is self-contained and shouldn't interfere with any correct
 code.  Should this also go into 4.9?

 2014-03-27  Ulrich Drepper  drep...@gmail.com

 * config/i386/avx512fintrin.h (__v32hi): Define type.
 (__v64qi): Likewise.
 (_mm512_set1_epi8): Define.
 (_mm512_set1_epi16): Define.
 (_mm512_set4_epi32): Define.
 (_mm512_set4_epi64): Define.
 (_mm512_set4_pd): Define.
 (_mm512_set4_ps): Define.
 (_mm512_setr4_epi64): Define.
 (_mm512_setr4_epi32): Define.
 (_mm512_setr4_pd): Define.
 (_mm512_setr4_ps): Define.
 (_mm512_setzero_epi32): Define.

This is OK for mainline, but please wait for Kirill's review of the intrinsics.

Thanks,
Uros.


Re: Skip gcc.dg/tree-ssa/isolate-*.c for AVR Target

2014-03-28 Thread Mike Stump
On Mar 28, 2014, at 12:04 PM, Mike Stump mikest...@comcast.net wrote:
 
 2014-03-28  Vishnu K S vishnu@atmel.com 
 
  * gcc/testsuite/gcc.dg/tree-ssa/isolate-1.c: Skip test for AVR 
  * gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c: Ditto 
  * gcc/testsuite/gcc.dg/tree-ssa/isolate-3.c: Ditto
  * gcc/testsuite/gcc.dg/tree-ssa/isolate-4.c: Ditto
  * gcc/testsuite/gcc.dg/tree-ssa/isolate-5.c: Ditto

So, no gcc/testsuite/ in the log,, no space before , and two spaces after the 
name before the , and end sentences with a ..

Re: [C++ patch] for C++/52369

2014-03-28 Thread Mike Stump
Just a nit…

 2014-03-28  Fabien Chêne  fab...@gcc.gnu.org
 
* cp/init.c (perform_member_init): homogeneize uninitialized
diagnostics.

Sentences begin with an upper case letter, and spelling…  Homogenize..




PR ipa/60243 (inliner being slow)

2014-03-28 Thread Jan Hubicka
Hi,
the inliner heuristic is organized as a greedy algorithm making inline
decisions in order defined by badness until inline limits are hit.  The tricky
part is that the badness depends both on caller and callee (it is basically
size/time metric, that depends on callee, but caller provide context via known
values and predicates that may simplify callee body).  So after each inlining
decision, the badnesses of calls from function being inlined and calls of
function being inlined into needs to be updated.

This updating process is basically O(1) for evaluation of predicates +
O(n_call_sites) for evaulation of call edges that are independent.  This may
produce non-linear behaviour in stupid cases where you have function with very
many call sites you inline into that is tiself called very many times. Other
case where we get non-linear is the side case of want_inline_small_function_p
which makes function inlinable if the code of caller grows but the overall unit
shrinks. The growth of the unit after inlining given function needs to be
recomputed every time when function cahnges or one of its calls are modified.

This patch solves those bottlenecks.  The first case via computing min_size,
that is a rough estimate of minimal growth of the function after inlining.
This can be used to cut the expensive per-edge computations when the function
is obvoiusly large (as it would be in case it have many call sites).  The other
change is smarther estimate about the growth of the unit: the unit can shrink
only if the function have call sites that shrink the code and in that case
those will be inlined anyway or if the offline copy is eliminated.

Instead of always computing precise value I introduced growth_likely_positive
that makes estimate on how many calls one can have and first just quickly
counts the call edges.   If there are many of them, there is no need
for expensive calcualation.

In addition of shoting estimate_growth from the testcase profile it also
imroves Firefox LTO inliner times by about 40% or 20 seconds.

We are still not well on compiling richards testcase. I get out of memory 
problems
with early inlining enabled and many other issues.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

PR ipa/60243
* ipa-inline.c (want_inline_small_function_p): Short circuit large
functions; reorganize to make cheap checks first.
(inline_small_functions): Do not estimate growth when dumping;
it is expensive.
* ipa-inline.h (inline_summary): Add min_size.
(growth_likely_positive): New function.
* ipa-inline-analysis.c (dump_inline_summary): Add min_size.
(set_cond_stmt_execution_predicate): Cleanup.
(estimate_edge_size_and_time): Compute min_size.
(estimate_calls_size_and_time): Likewise.
(estimate_node_size_and_time): Likewise.
(inline_update_overall_summary): Update min_size.
(do_estimate_edge_time): Likewise.
(do_estimate_edge_size): Update.
(do_estimate_edge_hints): Update.
(growth_likely_positive): New function.
Index: ipa-inline.c
===
--- ipa-inline.c(revision 208875)
+++ ipa-inline.c(working copy)
@@ -573,6 +573,24 @@ want_inline_small_function_p (struct cgr
   e-inline_failed = CIF_FUNCTION_NOT_INLINE_CANDIDATE;
   want_inline = false;
 }
+  /* Do fast and conservative check if the function can be good
+ inline cnadidate.  At themoment we allow inline hints to
+ promote non-inline function to inline and we increase
+ MAX_INLINE_INSNS_SINGLE 16fold for inline functions.  */
+  else if (!DECL_DECLARED_INLINE_P (callee-decl)
+   inline_summary (callee)-min_size - inline_edge_summary 
(e)-call_stmt_size
+  MAX (MAX_INLINE_INSNS_SINGLE, MAX_INLINE_INSNS_AUTO))
+{
+  e-inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT;
+  want_inline = false;
+}
+  else if (DECL_DECLARED_INLINE_P (callee-decl)
+   inline_summary (callee)-min_size - inline_edge_summary 
(e)-call_stmt_size
+  16 * MAX_INLINE_INSNS_SINGLE)
+{
+  e-inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT;
+  want_inline = false;
+}
   else
 {
   int growth = estimate_edge_growth (e);
@@ -585,56 +603,26 @@ want_inline_small_function_p (struct cgr
 hints suggests that inlining given function is very profitable.  */
   else if (DECL_DECLARED_INLINE_P (callee-decl)
growth = MAX_INLINE_INSNS_SINGLE
-   !big_speedup
-   !(hints  (INLINE_HINT_indirect_call
-| INLINE_HINT_loop_iterations
-| INLINE_HINT_array_index
-| INLINE_HINT_loop_stride)))
+   ((!big_speedup
+!(hints  (INLINE_HINT_indirect_call
+ | INLINE_HINT_loop_iterations
+  

Re: [PATCH, PR 60640] When creating virtual clones, clone thunks too

2014-03-28 Thread Jan Hubicka
 Hi,
 
 this patch fixes PR 60640 by creating thunks to clones when that is
 necessary to properly redirect edges to them.  I mostly does what
 cgraph_add_thunk does and what analyze_function does to thunks.  It
 fixes the testcases on trunk (it does not apply to 4.8, I have not
 looked how easily fixable that it) and passes bootstrap and testing on
 x86_64-linux.
 
 OK for trunk?
 
 Thanks,
 
 Martin
 
 
 2014-03-26  Martin Jambor  mjam...@suse.cz
 
   * cgraph.h (cgraph_clone_node): New parameter added to declaration.
   Adjust all callers.
   * cgraphclones.c (build_function_type_skip_args): Moved upwards in the
   file.
   (build_function_decl_skip_args): Likewise.
   (duplicate_thunk_for_node): New function.
   (redirect_edge_duplicating_thunks): Likewise.
   (cgraph_clone_node): New parameter args_to_skip, pass it to
   redirect_edge_duplicating_thunks which is called instead of
   cgraph_redirect_edge_callee.
   (cgraph_create_virtual_clone): Pass args_to_skip to cgraph_clone_node.
 +/* Duplicate thunk THUNK but make it to refer to NODE.  ARGS_TO_SKIP, if
 +   non-NULL, determines which parameters should be omitted.  */
 +
 +static cgraph_node *
 +duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node,
 +   bitmap args_to_skip)
 +{
 +  cgraph_node *new_thunk, *thunk_of;
 +  thunk_of = cgraph_function_or_thunk_node (thunk-callees-callee);
 +
 +  if (thunk_of-thunk.thunk_p)
 +node = duplicate_thunk_for_node (thunk_of, node, args_to_skip);
 +
 +  tree new_decl;
 +  if (!args_to_skip)
 +new_decl = copy_node (thunk-decl);
 +  else
 +new_decl = build_function_decl_skip_args (thunk-decl, args_to_skip, 
 false);
 +
 +  gcc_checking_assert (!DECL_STRUCT_FUNCTION (new_decl));
 +  gcc_checking_assert (!DECL_INITIAL (new_decl));
 +  gcc_checking_assert (!DECL_RESULT (new_decl));
 +  gcc_checking_assert (!DECL_RTL_SET_P (new_decl));
 +
 +  DECL_NAME (new_decl) = clone_function_name (thunk-decl, 
 artificial_thunk);
 +  SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
 +  DECL_EXTERNAL (new_decl) = 0;
 +  DECL_SECTION_NAME (new_decl) = NULL;
 +  DECL_COMDAT_GROUP (new_decl) = 0;
 +  TREE_PUBLIC (new_decl) = 0;
 +  DECL_COMDAT (new_decl) = 0;
 +  DECL_WEAK (new_decl) = 0;
 +  DECL_VIRTUAL_P (new_decl) = 0;
 +  DECL_STATIC_CONSTRUCTOR (new_decl) = 0;
 +  DECL_STATIC_DESTRUCTOR (new_decl) = 0;

We probably ought to factor out this to common subfunction.
 +
 +  new_thunk = cgraph_create_node (new_decl);
 +  new_thunk-definition = true;
 +  new_thunk-thunk = thunk-thunk;
 +  new_thunk-unique_name = in_lto_p;
 +  new_thunk-externally_visible = 0;
 +  new_thunk-local.local = 1;
 +  new_thunk-lowered = true;
 +  new_thunk-former_clone_of = thunk-decl;
 +
 +  struct cgraph_edge *e = cgraph_create_edge (new_thunk, node, NULL, 0,
 +   CGRAPH_FREQ_BASE);
 +  e-call_stmt_cannot_inline_p = true;
 +  cgraph_call_edge_duplication_hooks (thunk-callees, e);
 +  if (!expand_thunk (new_thunk, false))
 +new_thunk-analyzed = true;
 +  cgraph_call_node_duplication_hooks (thunk, new_thunk);
 +  return new_thunk;
 +}
 +
 +/* If E does not lead to a thunk, simply redirect it to N.  Otherwise create
 +   one or more equivalent thunks for N and redirect E to the first in the
 +   chain.  */
 +
 +void
 +redirect_edge_duplicating_thunks (struct cgraph_edge *e, struct cgraph_node 
 *n,
 +   bitmap args_to_skip)
 +{
 +  cgraph_node *orig_to = cgraph_function_or_thunk_node (e-callee);
 +  if (orig_to-thunk.thunk_p)
 +n = duplicate_thunk_for_node (orig_to, n, args_to_skip);

Is there anything that would pevent us from creating a new thunk for each call?

Also I think you need to avoid this logic when THIS parameter is being 
optimized out
(i.e. it is part of skip_args)
Thanks for looking into this!

Honza


[Fortran-CAF, patch, committed] Fix an offset calculation - and merge from the trunk

2014-03-28 Thread Tobias Burnus

The attached patch fixes an issue with pointer subtraction (wrong type).

Committed as Rev. 208919.

Additionally I have merged the trunk into the branch, Rev. 208922.

Tobias
Index: gcc/fortran/ChangeLog.fortran-caf
===
--- gcc/fortran/ChangeLog.fortran-caf	(Revision 208918)
+++ gcc/fortran/ChangeLog.fortran-caf	(Arbeitskopie)
@@ -1,5 +1,9 @@
 2014-03-28  Tobias Burnus  bur...@net-b.de
 
+	* trans-intrinsic.c (conv_caf_send): Fix offset calculation.
+
+2014-03-28  Tobias Burnus  bur...@net-b.de
+
 	* trans-intrinsic.c (caf_get_image_index, conv_caf_send): New.
 	(gfc_conv_intrinsic_subroutine): Call it.
 	* resolve.c (resolve_ordinary_assign): Enable coindex LHS
Index: gcc/fortran/trans-intrinsic.c
===
--- gcc/fortran/trans-intrinsic.c	(Revision 208918)
+++ gcc/fortran/trans-intrinsic.c	(Arbeitskopie)
@@ -7942,7 +7942,8 @@ conv_caf_send (gfc_code *code) {
}
 
   offset = fold_build2_loc (input_location, MINUS_EXPR, gfc_array_index_type,
-offset, tmp);
+			fold_convert (gfc_array_index_type, offset),
+			fold_convert (gfc_array_index_type, tmp));
 
   /* RHS - a noncoarray.  */
 


Fix various x86 tests for --with-arch=bdver3

2014-03-28 Thread Joseph S. Myers
If you build an x86_64 toolchain with --with-arch enabling various
instruction set extensions by default, this causes some tests to fail
that aren't expecting those extensions to be enabled.  This patch
fixes various tests failing like that for an x86_64-linux-gnu
toolchain configured --with-arch=bdver3, generally by using
appropriate -mno-* options in the tests, or in the case of
gcc.dg/pr45416.c by adjusting the scan-assembler to allow the
alternative instruction that gets used in this case.  It's quite
likely other such failures appear for other --with-arch choices.

Tested x86_64-linux-gnu.  OK to commit?

In addition to the failures fixed by this patch, there are many
gcc.dg/vect tests where having additional vector extensions enabled
breaks their expectations; I'm not sure of the best way to handle
those.  And you get

FAIL: gcc.target/i386/avx512f-vfmaddXXXpd-2.c (test for excess errors)
FAIL: gcc.target/i386/avx512f-vfmaddXXXps-2.c (test for excess errors)
FAIL: gcc.target/i386/avx512f-vfmaddsubXXXpd-2.c (test for excess errors)
FAIL: gcc.target/i386/avx512f-vfmaddsubXXXps-2.c (test for excess errors)
FAIL: gcc.target/i386/avx512f-vfmsubXXXpd-2.c (test for excess errors)
FAIL: gcc.target/i386/avx512f-vfmsubXXXps-2.c (test for excess errors)
FAIL: gcc.target/i386/avx512f-vfmsubaddXXXpd-2.c (test for excess errors)
FAIL: gcc.target/i386/avx512f-vfmsubaddXXXps-2.c (test for excess errors)
FAIL: gcc.target/i386/avx512f-vfnmaddXXXpd-2.c (test for excess errors)
FAIL: gcc.target/i386/avx512f-vfnmaddXXXps-2.c (test for excess errors)
FAIL: gcc.target/i386/avx512f-vfnmsubXXXpd-2.c (test for excess errors)
FAIL: gcc.target/i386/avx512f-vfnmsubXXXps-2.c (test for excess errors)

which are assembler errors such as operand type mismatch for
`vfmaddpd' - it looks like the compiler isn't really prepared for the
-mavx512f -mfma4 combination, but I'm not sure what the best way to
handle it is (producing invalid output doesn't seem right, however).

If you test with -march=bdver3 in the multilib options (runtest
--target_board=unix/-march=bdver3) rather than as the configured
default, you get extra failures for the usual reason of multilib
options going after the options from dg-options (which I propose to
address in the usual way using dg-skip-if for -march= options
different from the one present in dg-options).

2014-03-28  Joseph Myers  jos...@codesourcery.com

* gcc.dg/pr45416.c: Allow bextr on x86.
* gcc.target/i386/fma4-builtin.c, gcc.target/i386/fma4-fma-2.c,
gcc.target/i386/fma4-fma.c, gcc.target/i386/fma4-vector-2.c,
gcc.target/i386/fma4-vector.c: Use -mno-fma.
* gcc.target/i386/l_fma_double_1.c,
gcc.target/i386/l_fma_double_2.c,
gcc.target/i386/l_fma_double_3.c,
gcc.target/i386/l_fma_double_4.c,
gcc.target/i386/l_fma_double_5.c,
gcc.target/i386/l_fma_double_6.c, gcc.target/i386/l_fma_float_1.c,
gcc.target/i386/l_fma_float_2.c, gcc.target/i386/l_fma_float_3.c,
gcc.target/i386/l_fma_float_4.c, gcc.target/i386/l_fma_float_5.c,
gcc.target/i386/l_fma_float_6.c: Use -mno-fma4.
* gcc.target/i386/pr27971.c: Use -mno-tbm.
* gcc.target/i386/pr42542-4a.c: Use -mno-avx.
* gcc.target/i386/pr59390.c: Use -mno-fma -mno-fma4.

Index: gcc/testsuite/gcc.dg/pr45416.c
===
--- gcc/testsuite/gcc.dg/pr45416.c  (revision 208882)
+++ gcc/testsuite/gcc.dg/pr45416.c  (working copy)
@@ -9,7 +9,7 @@
return 0;
 }
 
-/* { dg-final { scan-assembler andl { target i?86-*-linux* i?86-*-gnu* 
x86_64-*-linux* } } }  */
+/* { dg-final { scan-assembler andl|bextr { target i?86-*-linux* i?86-*-gnu* 
x86_64-*-linux* } } }  */
 /* { dg-final { scan-assembler-not setne { target i?86-*-linux* i?86-*-gnu* 
x86_64-*-linux* } } } */
 /* { dg-final { scan-assembler and|ubfx  { target arm*-*-* } } } */
 /* { dg-final { scan-assembler-not moveq { target arm*-*-* } } } */
Index: gcc/testsuite/gcc.target/i386/pr27971.c
===
--- gcc/testsuite/gcc.target/i386/pr27971.c (revision 208882)
+++ gcc/testsuite/gcc.target/i386/pr27971.c (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O2 } */
+/* { dg-options -O2 -mno-tbm } */
 
 unsigned array[4];
 
Index: gcc/testsuite/gcc.target/i386/l_fma_double_5.c
===
--- gcc/testsuite/gcc.target/i386/l_fma_double_5.c  (revision 208882)
+++ gcc/testsuite/gcc.target/i386/l_fma_double_5.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic } */
+/* { dg-options -O3 -Wno-attributes -mfpmath=sse -mfma -mtune=generic 
-mno-fma4 } */
 
 /* Test that the compiler properly optimizes floating point multiply
and add instructions into FMA3 instructions.  */
Index: 

[PATCH] [4.8 branch] PR rtl-optimization/60700: Backport revision 201326

2014-03-28 Thread H.J. Lu
Hi,

Revision 201326 fixes a shrink-wrap bug which is also a regression
on 4.8 branch.  This patch backports it to 4.8 branch.  OK for 4.8
branch.

I also include a testcase for PR rtl-optimization/60700.  OK for
trunk and 4.8 branch?

Thanks.


H.J.
--
gcc/

PR rtl-optimization/60700
2013-07-30  Zhenqiang Chen  zhenqiang.c...@linaro.org

PR rtl-optimization/57637
* function.c (move_insn_for_shrink_wrap): Also check the
GEN set of the LIVE problem for the liveness analysis
if it exists, otherwise give up.

gcc/testsuite/

PR rtl-optimization/60700
2013-07-30  Zhenqiang Chen  zhenqiang.c...@linaro.org

* gcc.target/arm/pr57637.c: New testcase.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@201326 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  |  11 ++
 gcc/function.c |  49 +---
 gcc/testsuite/ChangeLog|   8 ++
 gcc/testsuite/gcc.target/arm/pr57637.c | 206 +
 4 files changed, 261 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr57637.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 63a6c98..557f922 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2014-03-28  H.J. Lu  hongjiu...@intel.com
+
+   PR rtl-optimization/60700
+   Backport from mainline
+   2013-07-30  Zhenqiang Chen  zhenqiang.c...@linaro.org
+
+   PR rtl-optimization/57637
+   * function.c (move_insn_for_shrink_wrap): Also check the
+   GEN set of the LIVE problem for the liveness analysis
+   if it exists, otherwise give up.
+
 2014-03-26  Martin Jambor  mjam...@suse.cz
 
   PR ipa/60419
diff --git a/gcc/function.c b/gcc/function.c
index e673f21..80720cb 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5509,22 +5509,45 @@ move_insn_for_shrink_wrap (basic_block bb, rtx insn,
 except for any part that overlaps SRC (next loop).  */
   bb_uses = DF_LR_BB_INFO (bb)-use;
   bb_defs = DF_LR_BB_INFO (bb)-def;
-  for (i = dregno; i  end_dregno; i++)
+  if (df_live)
{
- if (REGNO_REG_SET_P (bb_uses, i) || REGNO_REG_SET_P (bb_defs, i))
-   next_block = NULL;
- CLEAR_REGNO_REG_SET (live_out, i);
- CLEAR_REGNO_REG_SET (live_in, i);
-   }
+ for (i = dregno; i  end_dregno; i++)
+   {
+ if (REGNO_REG_SET_P (bb_uses, i) || REGNO_REG_SET_P (bb_defs, i)
+ || REGNO_REG_SET_P (DF_LIVE_BB_INFO (bb)-gen, i))
+   next_block = NULL;
+ CLEAR_REGNO_REG_SET (live_out, i);
+ CLEAR_REGNO_REG_SET (live_in, i);
+   }
 
-  /* Check whether BB clobbers SRC.  We need to add INSN to BB if so.
-Either way, SRC is now live on entry.  */
-  for (i = sregno; i  end_sregno; i++)
+ /* Check whether BB clobbers SRC.  We need to add INSN to BB if so.
+Either way, SRC is now live on entry.  */
+ for (i = sregno; i  end_sregno; i++)
+   {
+ if (REGNO_REG_SET_P (bb_defs, i)
+ || REGNO_REG_SET_P (DF_LIVE_BB_INFO (bb)-gen, i))
+   next_block = NULL;
+ SET_REGNO_REG_SET (live_out, i);
+ SET_REGNO_REG_SET (live_in, i);
+   }
+   }
+  else
{
- if (REGNO_REG_SET_P (bb_defs, i))
-   next_block = NULL;
- SET_REGNO_REG_SET (live_out, i);
- SET_REGNO_REG_SET (live_in, i);
+ /* DF_LR_BB_INFO (bb)-def does not comprise the DF_REF_PARTIAL and
+DF_REF_CONDITIONAL defs.  So if DF_LIVE doesn't exist, i.e.
+at -O1, just give up searching NEXT_BLOCK.  */
+ next_block = NULL;
+ for (i = dregno; i  end_dregno; i++)
+   {
+ CLEAR_REGNO_REG_SET (live_out, i);
+ CLEAR_REGNO_REG_SET (live_in, i);
+   }
+
+ for (i = sregno; i  end_sregno; i++)
+   {
+ SET_REGNO_REG_SET (live_out, i);
+ SET_REGNO_REG_SET (live_in, i);
+   }
}
 
   /* If we don't need to add the move to BB, look for a single
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index f425228..50a33ee 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,11 @@
+2014-03-28  H.J. Lu  hongjiu...@intel.com
+
+   PR rtl-optimization/60700
+   Backport from mainline
+   2013-07-30  Zhenqiang Chen  zhenqiang.c...@linaro.org
+
+   * gcc.target/arm/pr57637.c: New testcase.
+
 2014-04-28  Thomas Koenig  tkoe...@gcc.gnu.org
 
PR fortran/60522
diff --git a/gcc/testsuite/gcc.target/arm/pr57637.c 
b/gcc/testsuite/gcc.target/arm/pr57637.c
new file mode 100644
index 000..2b9bfdd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr57637.c
@@ -0,0 +1,206 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fno-inline } */
+
+typedef struct _GtkCssStyleProperty GtkCssStyleProperty;
+

Re: Fix various x86 tests for --with-arch=bdver3

2014-03-28 Thread H.J. Lu
On Fri, Mar 28, 2014 at 2:46 PM, Joseph S. Myers
jos...@codesourcery.com wrote:
 If you build an x86_64 toolchain with --with-arch enabling various
 instruction set extensions by default, this causes some tests to fail
 that aren't expecting those extensions to be enabled.  This patch
 fixes various tests failing like that for an x86_64-linux-gnu
 toolchain configured --with-arch=bdver3, generally by using
 appropriate -mno-* options in the tests, or in the case of
 gcc.dg/pr45416.c by adjusting the scan-assembler to allow the
 alternative instruction that gets used in this case.  It's quite
 likely other such failures appear for other --with-arch choices.

 Tested x86_64-linux-gnu.  OK to commit?

 In addition to the failures fixed by this patch, there are many
 gcc.dg/vect tests where having additional vector extensions enabled
 breaks their expectations; I'm not sure of the best way to handle
 those.  And you get

 FAIL: gcc.target/i386/avx512f-vfmaddXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmaddXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmaddsubXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmaddsubXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmsubXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmsubXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmsubaddXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfmsubaddXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfnmaddXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfnmaddXXXps-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfnmsubXXXpd-2.c (test for excess errors)
 FAIL: gcc.target/i386/avx512f-vfnmsubXXXps-2.c (test for excess errors)

 which are assembler errors such as operand type mismatch for
 `vfmaddpd' - it looks like the compiler isn't really prepared for the
 -mavx512f -mfma4 combination, but I'm not sure what the best way to
 handle it is (producing invalid output doesn't seem right, however).

 If you test with -march=bdver3 in the multilib options (runtest
 --target_board=unix/-march=bdver3) rather than as the configured
 default, you get extra failures for the usual reason of 
 multilibhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=59971

This is

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59971

 options going after the options from dg-options (which I propose to
 address in the usual way using dg-skip-if for -march= options
 different from the one present in dg-options).

Here is a patch:

http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01891.html


-- 
H.J.


patch to fix PR60697

2014-03-28 Thread Vladimir Makarov
  The following patch fixes PR60697.  The details of the PR can be 
found on


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60697

  The patch was successfully bootstrapped and tested on x86-64 and aarch64.

  Committed as rev. 208926.

2014-03-28  Vladimir Makarov  vmaka...@redhat.com

PR target/60697
* lra-constraints.c (index_part_to_reg): New.
(process_address): Use it.

2014-03-28  Vladimir Makarov  vmaka...@redhat.com

PR target/60697
* gcc.target/aarch64/pr60697.c: New.
Index: lra-constraints.c
===
--- lra-constraints.c   (revision 208895)
+++ lra-constraints.c   (working copy)
@@ -2631,6 +2631,20 @@ base_plus_disp_to_reg (struct address_in
   return new_reg;
 }
 
+/* Make reload of index part of address AD.  Return the new
+   pseudo.  */
+static rtx
+index_part_to_reg (struct address_info *ad)
+{
+  rtx new_reg;
+
+  new_reg = lra_create_new_reg (GET_MODE (*ad-index), NULL_RTX,
+   INDEX_REG_CLASS, index term);
+  expand_mult (GET_MODE (*ad-index), *ad-index_term,
+  GEN_INT (get_index_scale (ad)), new_reg, 1);
+  return new_reg;
+}
+
 /* Return true if we can add a displacement to address AD, even if that
makes the address invalid.  The fix-up code requires any new address
to be the sum of the BASE_TERM, INDEX and DISP_TERM fields.  */
@@ -2935,7 +2949,7 @@ process_address (int nop, rtx *before, r
   emit_insn (insns);
   *ad.inner = new_reg;
 }
-  else
+  else if (ad.disp_term != NULL)
 {
   /* base + scale * index + disp = new base + scale * index,
 case (1) above.  */
@@ -2943,6 +2957,18 @@ process_address (int nop, rtx *before, r
   *ad.inner = simplify_gen_binary (PLUS, GET_MODE (new_reg),
   new_reg, *ad.index);
 }
+  else
+{
+  /* base + scale * index = base + new_reg,
+case (1) above.
+  Index part of address may become invalid.  For example, we
+  changed pseudo on the equivalent memory and a subreg of the
+  pseudo onto the memory of different mode for which the scale is
+  prohibitted.  */
+  new_reg = index_part_to_reg (ad);
+  *ad.inner = simplify_gen_binary (PLUS, GET_MODE (new_reg),
+  *ad.base_term, new_reg);
+}
   *before = get_insns ();
   end_sequence ();
   return true;
Index: testsuite/gcc.target/aarch64/pr60697.c
===
--- testsuite/gcc.target/aarch64/pr60697.c  (revision 0)
+++ testsuite/gcc.target/aarch64/pr60697.c  (working copy)
@@ -0,0 +1,638 @@
+/* { dg-do compile } */
+/* { dg-options -w -O3 -mcpu=cortex-a53 } */
+typedef struct __sFILE __FILE;
+typedef __FILE FILE;
+typedef int atom_id;
+typedef float real;
+typedef real rvec[3];
+typedef real matrix[3][3];
+enum {
+  ebCGS,ebMOLS,ebSBLOCKS,ebNR
+};
+enum {
+  efepNO, efepYES, efepNR
+};
+enum {
+  esolNO, esolMNO, esolWATER, esolWATERWATER, esolNR
+};
+typedef struct {
+  int nr;
+  atom_id *index;
+  atom_id *a;
+} t_block;
+enum {
+  F_LJ,
+  F_LJLR,
+  F_SR,
+  F_LR,
+  F_DVDL,
+};
+typedef struct {
+  t_block excl;
+} t_atoms;
+typedef struct {
+  t_atoms atoms;
+  t_block blocks[ebNR];
+} t_topology;
+typedef struct {
+} t_nsborder;
+extern FILE *debug;
+typedef struct {
+} t_nrnb;
+typedef struct {
+  int nri,maxnri;
+  int nrj,maxnrj;
+  int maxlen;
+  int solvent;
+  int *gid;
+  int *jindex;
+  atom_id *jjnr;
+  int *nsatoms;
+} t_nblist;
+typedef struct {
+  int nrx,nry,nrz;
+} t_grid;
+typedef struct {
+} t_commrec;
+enum { eNL_VDWQQ, eNL_VDW, eNL_QQ,
+   eNL_VDWQQ_FREE, eNL_VDW_FREE, eNL_QQ_FREE,
+   eNL_VDWQQ_SOLMNO, eNL_VDW_SOLMNO, eNL_QQ_SOLMNO,
+   eNL_VDWQQ_WATER, eNL_QQ_WATER,
+   eNL_VDWQQ_WATERWATER, eNL_QQ_WATERWATER,
+   eNL_NR };
+typedef struct {
+  real rlist,rlistlong;
+  real rcoulomb_switch,rcoulomb;
+  real rvdw_switch,rvdw;
+  int efep;
+  int cg0,hcg;
+  int *solvent_type;
+  int *mno_index;
+  rvec *cg_cm;
+  t_nblist nlist_sr[eNL_NR];
+  t_nblist nlist_lr[eNL_NR];
+  int bTwinRange;
+  rvec *f_twin;
+  int *eg_excl;
+} t_forcerec;
+typedef struct {
+  real *chargeA,*chargeB,*chargeT;
+  int *bPerturbed;
+  int *typeA,*typeB;
+  unsigned short *cTC,*cENER,*cACC,*cFREEZE,*cXTC,*cVCM;
+} t_mdatoms;
+enum { egCOUL, egLJ, egBHAM, egLR, egLJLR, egCOUL14, egLJ14, egNR };
+typedef struct {
+  real *ee[egNR];
+} t_grp_ener;
+typedef struct {
+  t_grp_ener estat;
+} t_groups;
+typedef unsigned long t_excl;
+static void reset_nblist(t_nblist *nl)
+{
+  nl-nri = 0;
+  nl-nrj = 0;
+  nl-maxlen = 0;
+  if (nl-maxnri  0) {
+nl-gid[0] = -1;
+if (nl-maxnrj  1) {
+  nl-jindex[0] = 0;
+  nl-jindex[1] = 0;
+}
+  }
+}
+static void reset_neighbor_list(t_forcerec *fr,int bLR,int eNL)
+{
+reset_nblist((fr-nlist_lr[eNL]));
+}
+static void close_i_nblist(t_nblist *nlist)
+{
+  int nri = nlist-nri;
+ 

Re: Fix PR ipa/60315 (inliner explosion)

2014-03-28 Thread Jan Hubicka
  Actually after some additional invetstigation I decided to commit this
  patch. fixup_noreturn_call already cares about the return value but
  differently than the new Jakub's code. 
 
 Thanks for the quick fix, I confirm that the ACATS failures are all gone.
 
 So we're left with the GIMPLE checking failure on opt33.adb.

Hi,
this is the patch I just comitted.  It simply clears the static chain when 
needed.

Honza

* cgraph.c (cgraph_redirect_edge_call_stmt_to_callee): Clear
static chain if needed.
* gnat.dg/opt33.adb: New testcase.
Index: cgraph.c
===
--- cgraph.c(revision 208915)
+++ cgraph.c(working copy)
@@ -1488,6 +1488,14 @@ cgraph_redirect_edge_call_stmt_to_callee
  gsi_insert_before (gsi, set_stmt, GSI_SAME_STMT);
}
   gimple_call_set_lhs (new_stmt, NULL_TREE);
+  update_stmt_fn (DECL_STRUCT_FUNCTION (e-caller-decl), new_stmt);
+}
+
+  /* If new callee has no static chain, remove it.  */
+  if (gimple_call_chain (new_stmt)  !DECL_STATIC_CHAIN (e-callee-decl))
+{
+  gimple_call_set_chain (new_stmt, NULL);
+  update_stmt_fn (DECL_STRUCT_FUNCTION (e-caller-decl), new_stmt);
 }
 
   cgraph_set_call_stmt_including_clones (e-caller, e-call_stmt, new_stmt, 
false);
Index: testsuite/gnat.dg/opt33.adb
===
--- testsuite/gnat.dg/opt33.adb (revision 0)
+++ testsuite/gnat.dg/opt33.adb (revision 0)
@@ -0,0 +1,41 @@
+-- { dg-do compile }
+-- { dg-options -O }
+
+with Ada.Containers.Ordered_Sets;
+with Ada.Strings.Unbounded;
+
+procedure Opt33 is
+
+   type Rec is record
+  Name : Ada.Strings.Unbounded.Unbounded_String;
+   end record;
+
+   function  (Left : Rec; Right : Rec) return Boolean;
+
+   package My_Ordered_Sets is new Ada.Containers.Ordered_Sets (Rec);
+
+   protected type Data is
+  procedure Do_It;
+   private
+  Set : My_Ordered_Sets.Set;
+   end Data;
+
+   function  (Left : Rec; Right : Rec) return Boolean is
+   begin
+  return False;
+   end ;
+
+   protected body Data is
+  procedure Do_It is
+ procedure Dummy (Position : My_Ordered_Sets.Cursor) is
+ begin
+null;
+ end;
+  begin
+ Set.Iterate (Dummy'Access);
+  end;
+   end Data;
+
+begin
+   null;
+end;


[RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend

2014-03-28 Thread Robert Suchanek
Hi All,

This patch enables LRA by default for MIPS. The classic reload is still
available and can be enabled via -mreload switch. 

All regression are fixed, with one exception described below.

There was a necessary change in the LRA core as I believe there was a genuine 
unhandled case in LRA when processing addresses. It is specific to MIPS16
as store/load[unsigned] halfword/byte instructions cannot access the stack 
pointer
directly. Potentially, it can affect other architectures if they have similar
limitation. One of the problems showed an RTL that contained $frame as the base 
register
(without any offset, simple move) but LRA temporarily eliminated it to $sp 
before
calling the target hook to validate the address.
The backend rejected it because of the mode and $sp. Then, LRA tried to emit 
base+disp 
but ICEd because there never was any displacement. Another testcase, revealed 
offset 
not being used and unnecessary 'add' instructions were inserted preventing the 
use of offsets. 
Marking an insn with STACK_POINTER_REGNUM as valid was not an option as LRA 
would 
generate an insn with $sp and fail during coherency check. The patch attempts 
to 
reload $sp into a register and re-validate the address with offset (if there is 
one). 
If this fails it sticks to the original plan inserting base+disp.

The generated code optimized for size is fairly acceptable. CSiBE shows a 
slight 
advantage of LRA over the reload for MIPS16 with some minor regression for 
mips32*, 
mips64*, on average less than 0.5%. The code size improvements are being 
investigated.

The patch has been tested on the following variations:
- cross-tested mips-mti-elf, mips-mti-linux-gnu (languages=c,c++):
  {-mips32,-mips32r2}{-EL,-EB}{-mhard-float,-msoft-float}{-mno-mips16,-mips16}
  -mips64r2 -mabi=n32 {-mhard-float,-msoft-float}
  -mips64r2 -mabi=64 {-mhard-float,-msoft-float}
- bootstrapped and regtested x86_84-unknown-linux-gnu (all languages)

There are two known DejaGNU failures on mips64 with mabi=64, namely, 
m{add,sub}-8 tests
because of the subtleties in LRA costing model but it's not a correctness issue.
The *mul_{add,sub}_si patterns are tuned explicitly for LRA and all failures 
have been resolved with m{add,sub}-* except the above. By saying failures I 
mean 
the differences between tests ran with/without -mreload switch. A number of 
failures 
already existed on ToT at the time of testing.

The patch is intended for Stage 1. As for the legal part, the company-wide 
copyright assignment is in process.

Regards,
Robert

testsuite/ChangeLog:

2014-03-26  Robert Suchanek  robert.sucha...@imgtec.com

* lra-constraints.c (base_to_reg): New function.
(process_address): Use new function.
* rtlanal.c (get_base_term): Add CONSTANT_P (*inner).

* config/mips/constraints.md (d): BASE_REG_CLASS
replaced by ADDR_REG_CLASS.
* config/mips/mips.c (mips_regno_mode_ok_for_base_p):
Remove use !strict_p for MIPS16.
(mips_register_priority): New function that implements
the target hook TARGET_REGISTER_PRIORITY.
(mips_spill_class): Likewise for TARGET_SPILL_CLASS
(mips_lra_p): Likewise for TARGET_LRA_P.
* config/mips/mips.h (reg_class): Add M16F_REGS and SPILL_REGS
classes.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(BASE_REG_CLASS): Use M16F_REGS.
(ADDR_REG_CLASS): Define.
(IRA_HARD_REGNO_ADD_COST_MULTIPLIER): Define.
* config/mips/mips.md (*mul_acc_si, *mul_sub_si): Add alternative
tuned for LRA. New set attribute to enable alternatives
depending on the register allocator used.
(*andmode3_mips16): Remove the load alternatives.
(*lea64): Disable pattern for MIPS16.
* config/mips/mips.opt
(mreload): New option.
---
 gcc/config/mips/constraints.md |2 +-
 gcc/config/mips/mips.c |   51 +-
 gcc/config/mips/mips.h |   17 +-
 gcc/config/mips/mips.md|  112 +++-
 gcc/config/mips/mips.opt   |4 ++
 gcc/lra-constraints.c  |   44 +++-
 gcc/rtlanal.c  |3 +-
 7 files changed, 181 insertions(+), 52 deletions(-)

diff --git gcc/config/mips/constraints.md gcc/config/mips/constraints.md
index 49e4895..3810ac3 100644
--- gcc/config/mips/constraints.md
+++ gcc/config/mips/constraints.md
@@ -19,7 +19,7 @@
 
 ;; Register constraints
 
-(define_register_constraint d BASE_REG_CLASS
+(define_register_constraint d ADDR_REG_CLASS
   An address register.  This is equivalent to @code{r} unless
generating MIPS16 code.)
 
diff --git gcc/config/mips/mips.c gcc/config/mips/mips.c
index 143169b..f27a801 100644
--- gcc/config/mips/mips.c
+++ gcc/config/mips/mips.c
@@ -2255,7 +2255,7 @@ mips_regno_mode_ok_for_base_p (int regno, enum 
machine_mode mode,
  All in all, it seems more consistent to only enforce this restriction

[PATCH] Fixing PR60656

2014-03-28 Thread Cong Hou
This patch is fixing PR60656. Elements in a vector with
vect_used_by_reduction property cannot be reordered if the use chain
with this property does not have the same operation.

Bootstrapped and tested on a x86-64 machine.

OK for trunk?


thanks,
Cong


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index e1d8666..d7d5b82 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2014-03-28  Cong Hou  co...@google.com
+
+ PR tree-optimization/60656
+ * tree-vect-stmts.c (supportable_widening_operation):
+ Fix a bug that elements in a vector with vect_used_by_reduction
+ property are incorrectly reordered when the operation on it is not
+ consistant with the one in reduction operation.
+
 2014-03-10  Jakub Jelinek  ja...@redhat.com

  PR ipa/60457
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 41b6875..414a745 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2014-03-28  Cong Hou  co...@google.com
+
+ PR tree-optimization/60656
+ * gcc.dg/vect/pr60656.c: New test.
+
 2014-03-10  Jakub Jelinek  ja...@redhat.com

  PR ipa/60457
diff --git a/gcc/testsuite/gcc.dg/vect/pr60656.c
b/gcc/testsuite/gcc.dg/vect/pr60656.c
new file mode 100644
index 000..ebaab62
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr60656.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include tree-vect.h
+
+__attribute__ ((noinline)) long
+foo ()
+{
+  int v[] = {5000, 5001, 5002, 5003};
+  long s = 0;
+  int i;
+
+  for(i = 0; i  4; ++i)
+{
+  long P = v[i];
+  s += P*P*P;
+}
+  return s;
+}
+
+long
+bar ()
+{
+  int v[] = {5000, 5001, 5002, 5003};
+  long s = 0;
+  int i;
+
+  for(i = 0; i  4; ++i)
+{
+  long P = v[i];
+  s += P*P*P;
+  __asm__ volatile ();
+}
+  return s;
+}
+
+int main()
+{
+  if (foo () != bar ())
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect } } */
+/* { dg-final { cleanup-tree-dump vect } } */
+
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 70fb411..7442d0c 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7827,7 +7827,16 @@ supportable_widening_operation (enum tree_code
code, gimple stmt,
  stmt, vectype_out, vectype_in,
  code1, code2, multi_step_cvt,
  interm_types))
- return true;
+{
+  tree lhs = gimple_assign_lhs (stmt);
+  use_operand_p dummy;
+  gimple use_stmt;
+  stmt_vec_info use_stmt_info = NULL;
+  if (single_imm_use (lhs, dummy, use_stmt)
+   (use_stmt_info = vinfo_for_stmt (use_stmt))
+   STMT_VINFO_DEF_TYPE (use_stmt_info) == vect_reduction_def)
+return true;
+}
   c1 = VEC_WIDEN_MULT_LO_EXPR;
   c2 = VEC_WIDEN_MULT_HI_EXPR;
   break;


Re: [PATCH] Fix PR60505

2014-03-28 Thread Cong Hou
Ping?


thanks,
Cong


On Wed, Mar 19, 2014 at 11:39 AM, Cong Hou co...@google.com wrote:
 On Tue, Mar 18, 2014 at 4:43 AM, Richard Biener rguent...@suse.de wrote:

 On Mon, 17 Mar 2014, Cong Hou wrote:

  On Mon, Mar 17, 2014 at 6:44 AM, Richard Biener rguent...@suse.de wrote:
   On Fri, 14 Mar 2014, Cong Hou wrote:
  
   On Fri, Mar 14, 2014 at 12:58 AM, Richard Biener rguent...@suse.de 
   wrote:
On Fri, 14 Mar 2014, Jakub Jelinek wrote:
   
On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote:
  Consider this fact and if there are alias checks, we can safely 
  remove
  the epilogue if the maximum trip count of the loop is less than 
  or
  equal to the calculated threshold.

 You have to consider n % vf != 0, so an argument on only maximum
 trip count or threshold cannot work.
   
Well, if you only check if maximum trip count is = vf and you know
that for n  vf the vectorized loop + it's epilogue path will not be 
taken,
then perhaps you could, but it is a very special case.
Now, the question is when we are guaranteed we enter the scalar 
versioned
loop instead for n  vf, is that in case of versioning for alias or
versioning for alignment?
   
I think neither - I have plans to do the cost model check together
with the versioning condition but didn't get around to implement that.
That would allow stronger max bounds for the epilogue loop.
  
   In vect_transform_loop(), check_profitability will be set to true if
   th = VF-1 and the number of iteration is unknown (we only consider
   unknown trip count here), where th is calculated based on the
   parameter PARAM_MIN_VECT_LOOP_BOUND and cost model, with the minimum
   value VF-1. If the loop needs to be versioned, then
   check_profitability with true value will be passed to
   vect_loop_versioning(), in which an enhanced loop bound check
   (considering cost) will be built. So I think if the loop is versioned
   and n  VF, then we must enter the scalar version, and in this case
   removing epilogue should be safe when the maximum trip count = th+1.
  
   You mean exactly in the case where the profitability check ensures
   that n % vf == 0?  Thus effectively if n == maximum trip count?
   That's quite a special case, no?
 
 
  Yes, it is a special case. But it is in this special case that those
  warnings are thrown out. Also, I think declaring an array with VF*N as
  length is not unusual.

 Ok, but then for the patch compute the cost model threshold once
 in vect_analyze_loop_2 and store it in a new
 LOOP_VINFO_COST_MODEL_THRESHOLD.


 Done.


 Also you have to check
 the return value from max_stmt_executions_int as that may return
 -1 if the number cannot be computed (or isn't representable in
 a HOST_WIDE_INT).


 It will be converted to unsigned type so that -1 means infinity.


 You also should check for
 LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT which should have the
 same effect on the cost model check.


 Done.




 The existing condition is already complicated enough - adding new
 stuff warrants comments before the (sub-)checks.


 OK. Comments added.

 Below is the revised patch. Bootstrapped and tested on a x86-64 machine.


 Cong



 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index e1d8666..eceefb3 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,18 @@
 +2014-03-11  Cong Hou  co...@google.com
 +
 + PR tree-optimization/60505
 + * tree-vectorizer.h (struct _stmt_vec_info): Add th field as the
 + threshold of number of iterations below which no vectorization will be
 + done.
 + * tree-vect-loop.c (new_loop_vec_info):
 + Initialize LOOP_VINFO_COST_MODEL_THRESHOLD.
 + * tree-vect-loop.c (vect_analyze_loop_operations):
 + Set LOOP_VINFO_COST_MODEL_THRESHOLD.
 + * tree-vect-loop.c (vect_transform_loop):
 + Use LOOP_VINFO_COST_MODEL_THRESHOLD.
 + * tree-vect-loop.c (vect_analyze_loop_2): Check the maximum number
 + of iterations of the loop and see if we should build the epilogue.
 +
  2014-03-10  Jakub Jelinek  ja...@redhat.com

   PR ipa/60457
 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
 index 41b6875..09ec1c0 100644
 --- a/gcc/testsuite/ChangeLog
 +++ b/gcc/testsuite/ChangeLog
 @@ -1,3 +1,8 @@
 +2014-03-11  Cong Hou  co...@google.com
 +
 + PR tree-optimization/60505
 + * gcc.dg/vect/pr60505.c: New test.
 +
  2014-03-10  Jakub Jelinek  ja...@redhat.com

   PR ipa/60457
 diff --git a/gcc/testsuite/gcc.dg/vect/pr60505.c
 b/gcc/testsuite/gcc.dg/vect/pr60505.c
 new file mode 100644
 index 000..6940513
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/vect/pr60505.c
 @@ -0,0 +1,12 @@
 +/* { dg-do compile } */
 +/* { dg-additional-options -Wall -Werror } */
 +
 +void foo(char *in, char *out, int num)
 +{
 +  int i;
 +  char ovec[16] = {0};
 +
 +  for(i = 0; i  num ; ++i)
 +out[i] = (ovec[i] = in[i]);
 +  out[num] = ovec[num/2];
 +}
 diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
 index df6ab6f..1c78e11 100644
 ---