[PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV

2017-09-13 Thread Daniel Santos
I made a silly mistake in libgcc by testing the cpp macro __AVX__ to
determine rather to use movaps or vmovaps in the stubs.  This resulted
in the stubs choice of instruction being decided by the machine flags
when the compiler was built rather than those being supplied at the
command line.  This patch splits stubs into separate sse and avx
versions so that both are available.

gcc:
config/i386/i386.c: (xlogue_layout::STUB_NAME_MAX_LEN): Increase to 20
bytes.
(xlogue_layout::s_stub_names): Add an additional size-2 diminsion.
(xlogue_layout::get_stub_name): Modify to select the appropairate sse
and avx version of the stub.

gcc/testsuite:
gcc.target/i386/pr82196-1.c: New test.
gcc.target/i386/pr82196-2.c: Likewise.

libgcc:
config/i386/i386-asm.h (PASTE2): New macro.
(ASMNAME): Modify to use PASTE2.
(MS2SYSV_STUB_PREFIX): New macro for isa prefix.
(MS2SYSV_STUB_BEGIN, MS2SYSV_STUB_END): New macros for stub headers.
config/i386/resms64.S: Rename to a header file, use MS2SYSV_STUB_BEGIN
instead of HIDDEN_FUNC and MS2SYSV_STUB_END instead of FUNC_END.
config/i386/resms64f.S: Likewise.
config/i386/resms64fx.S: Likewise.
config/i386/resms64x.S: Likewise.
config/i386/savms64.S: Likewise.
config/i386/savms64f.S: Likewise.
config/i386/avx_resms64.S: New file that only defines a macro and
includes it's corresponding header file.
config/i386/avx_resms64f.S: Likewise.
config/i386/avx_resms64fx.S: Likewise.
config/i386/avx_resms64x.S: Likewise.
config/i386/avx_savms64.S: Likewise.
config/i386/avx_savms64f.S: Likewise.
config/i386/sse_resms64.S: Likewise.
config/i386/sse_resms64f.S: Likewise.
config/i386/sse_resms64fx.S: Likewise.
config/i386/sse_resms64x.S: Likewise.
config/i386/sse_savms64.S: Likewise.
config/i386/sse_savms64f.S: Likewise.
config/i386/t-msabi: Modified to add avx and sse versions of stubs.
Signed-off-by: Daniel Santos 
---
 gcc/config/i386/i386.c  | 15 ++-
 gcc/testsuite/gcc.target/i386/pr82196-1.c   | 14 ++
 gcc/testsuite/gcc.target/i386/pr82196-2.c   | 14 ++
 libgcc/config/i386/avx_resms64.S|  2 ++
 libgcc/config/i386/avx_resms64f.S   |  2 ++
 libgcc/config/i386/avx_resms64fx.S  |  2 ++
 libgcc/config/i386/avx_resms64x.S   |  2 ++
 libgcc/config/i386/avx_savms64.S|  2 ++
 libgcc/config/i386/avx_savms64f.S   |  2 ++
 libgcc/config/i386/i386-asm.h   | 34 -
 libgcc/config/i386/{resms64.S => resms64.h} | 28 ++--
 libgcc/config/i386/{resms64f.S => resms64f.h}   | 24 -
 libgcc/config/i386/{resms64fx.S => resms64fx.h} | 24 -
 libgcc/config/i386/{resms64x.S => resms64x.h}   | 28 ++--
 libgcc/config/i386/{savms64.S => savms64.h} | 28 ++--
 libgcc/config/i386/{savms64f.S => savms64f.h}   | 24 -
 libgcc/config/i386/sse_resms64.S|  2 ++
 libgcc/config/i386/sse_resms64f.S   |  2 ++
 libgcc/config/i386/sse_resms64fx.S  |  2 ++
 libgcc/config/i386/sse_resms64x.S   |  2 ++
 libgcc/config/i386/sse_savms64.S|  2 ++
 libgcc/config/i386/sse_savms64f.S   |  2 ++
 libgcc/config/i386/t-msabi  | 18 -
 23 files changed, 173 insertions(+), 102 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr82196-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr82196-2.c
 create mode 100644 libgcc/config/i386/avx_resms64.S
 create mode 100644 libgcc/config/i386/avx_resms64f.S
 create mode 100644 libgcc/config/i386/avx_resms64fx.S
 create mode 100644 libgcc/config/i386/avx_resms64x.S
 create mode 100644 libgcc/config/i386/avx_savms64.S
 create mode 100644 libgcc/config/i386/avx_savms64f.S
 rename libgcc/config/i386/{resms64.S => resms64.h} (76%)
 rename libgcc/config/i386/{resms64f.S => resms64f.h} (79%)
 rename libgcc/config/i386/{resms64fx.S => resms64fx.h} (79%)
 rename libgcc/config/i386/{resms64x.S => resms64x.h} (77%)
 rename libgcc/config/i386/{savms64.S => savms64.h} (76%)
 rename libgcc/config/i386/{savms64f.S => savms64f.h} (79%)
 create mode 100644 libgcc/config/i386/sse_resms64.S
 create mode 100644 libgcc/config/i386/sse_resms64f.S
 create mode 100644 libgcc/config/i386/sse_resms64fx.S
 create mode 100644 libgcc/config/i386/sse_resms64x.S
 create mode 100644 libgcc/config/i386/sse_savms64.S
 create mode 100644 libgcc/config/i386/sse_savms64f.S

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b2b02acc58a..f0d7d0eb196 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2513,7 +2513,7 @@ public:
   static const 

Go patch committed: simplify select and channel operations

2017-09-13 Thread Ian Lance Taylor
In preparation for upgrading libgo to the 1.9 release, this patch to
the Go frontend approximately incorporates https://golang.org/cl/37661
and
https://golang.org/cl/38351 from the gc toolchain.

CL 37661 changed the gc compiler such that the select statement simply
returns an integer which is then used as the argument for a switch.
Since gccgo already worked that way, this just adjusts the switch code
to look like the gc switch code by removing the explicit case index
expression and calculating it from the order of calls to selectsend,
selectrecv, and selectdefault.

CL 38351 simplifies the channel code by not passing the unused channel
type descriptor pointer.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 252748)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-0176cbc6dbd2170bfe2eb8904b80ddfe4c946997
+199f175f4239d1ca6d7e80d08639955d41c3b09f
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 252748)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -14463,15 +14463,14 @@ Receive_expression::do_get_backend(Trans
   go_assert(this->channel_->type()->is_error());
   return context->backend()->error_expression();
 }
-  Expression* td = Expression::make_type_descriptor(channel_type, loc);
 
   Expression* recv_ref =
 Expression::make_temporary_reference(this->temp_receiver_, loc);
   Expression* recv_addr =
 Expression::make_temporary_reference(this->temp_receiver_, loc);
   recv_addr = Expression::make_unary(OPERATOR_AND, recv_addr, loc);
-  Expression* recv = Runtime::make_call(Runtime::CHANRECV1, loc, 3,
-   td, this->channel_, recv_addr);
+  Expression* recv = Runtime::make_call(Runtime::CHANRECV1, loc, 2,
+   this->channel_, recv_addr);
   return Expression::make_compound(recv, recv_ref, loc)->get_backend(context);
 }
 
Index: gcc/go/gofrontend/runtime.def
===
--- gcc/go/gofrontend/runtime.def   (revision 251948)
+++ gcc/go/gofrontend/runtime.def   (working copy)
@@ -137,39 +137,31 @@ DEF_GO_RUNTIME(MAPITERNEXT, "runtime.map
 DEF_GO_RUNTIME(MAKECHAN, "runtime.makechan", P2(TYPE, INT64), R1(CHAN))
 
 // Send a value on a channel.
-DEF_GO_RUNTIME(CHANSEND, "runtime.chansend1", P3(TYPE, CHAN, POINTER), R0())
+DEF_GO_RUNTIME(CHANSEND, "runtime.chansend1", P2(CHAN, POINTER), R0())
 
 // Receive a value from a channel.
-DEF_GO_RUNTIME(CHANRECV1, "runtime.chanrecv1", P3(TYPE, CHAN, POINTER), R0())
+DEF_GO_RUNTIME(CHANRECV1, "runtime.chanrecv1", P2(CHAN, POINTER), R0())
 
 // Receive a value from a channel returning whether it is closed.
-DEF_GO_RUNTIME(CHANRECV2, "runtime.chanrecv2", P3(TYPE, CHAN, POINTER),
-  R1(BOOL))
+DEF_GO_RUNTIME(CHANRECV2, "runtime.chanrecv2", P2(CHAN, POINTER), R1(BOOL))
 
 
 // Start building a select statement.
 DEF_GO_RUNTIME(NEWSELECT, "runtime.newselect", P3(POINTER, INT64, INT32), R0())
 
 // Add a default clause to a select statement.
-DEF_GO_RUNTIME(SELECTDEFAULT, "runtime.selectdefault",
-  P2(POINTER, INT32), R0())
+DEF_GO_RUNTIME(SELECTDEFAULT, "runtime.selectdefault", P1(POINTER), R0())
 
 // Add a send clause to a select statement.
-DEF_GO_RUNTIME(SELECTSEND, "runtime.selectsend",
-  P4(POINTER, CHAN, POINTER, INT32), R0())
+DEF_GO_RUNTIME(SELECTSEND, "runtime.selectsend", P3(POINTER, CHAN, POINTER),
+  R0())
 
-// Add a receive clause to a select statement, for a clause which does
-// not check whether the channel is closed.
+// Add a receive clause to a select statement.
 DEF_GO_RUNTIME(SELECTRECV, "runtime.selectrecv",
-  P4(POINTER, CHAN, POINTER, INT32), R0())
-
-// Add a receive clause to a select statement, for a clause which does
-// check whether the channel is closed.
-DEF_GO_RUNTIME(SELECTRECV2, "runtime.selectrecv2",
-  P5(POINTER, CHAN, POINTER, BOOLPTR, INT32), R0())
+  P4(POINTER, CHAN, POINTER, BOOLPTR), R0())
 
 // Run a select, returning the index of the selected clause.
-DEF_GO_RUNTIME(SELECTGO, "runtime.selectgo", P1(POINTER), R1(INT32))
+DEF_GO_RUNTIME(SELECTGO, "runtime.selectgo", P1(POINTER), R1(INT))
 
 
 // Panic.
Index: gcc/go/gofrontend/statements.cc
===
--- gcc/go/gofrontend/statements.cc (revision 251948)
+++ gcc/go/gofrontend/statements.cc (working copy)
@@ -1517,14 +1517,12 @@ Tuple_receive_assignment_statement::do_l
  NULL, loc);
   b->add_statement(closed_temp);
 
-  // 

Go patch committed: avoid compiler crash on invalid programs

2017-09-13 Thread Ian Lance Taylor
This minor patch to the Go frontend avoids crashing the compiler on
some invalid programs.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 252747)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-89e46ae0cde7bebd8e97434355c5b7e57d902613
+0176cbc6dbd2170bfe2eb8904b80ddfe4c946997
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 252746)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -210,7 +210,11 @@ Expression::convert_type_to_interface(Ty
 }
 
   // This should have been checked already.
-  go_assert(lhs_interface_type->implements_interface(rhs_type, NULL));
+  if (!lhs_interface_type->implements_interface(rhs_type, NULL))
+{
+  go_assert(saw_errors());
+  return Expression::make_error(location);
+}
 
   // An interface is a tuple.  If LHS_TYPE is an empty interface type,
   // then the first field is the type descriptor for RHS_TYPE.


Go patch committed: emit type specific functions for aliases

2017-09-13 Thread Ian Lance Taylor
When the Go frontend has an alias for a struct or array that requires
a type-specific function, don't emit the function with the alias name.
Emit it with the struct/array as usual.

The test case is for this is https://golang.org/cl/62531.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 252746)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-b0a46c2cdb915ddc4a4e401af9ef6eb2bcd4d4ea
+89e46ae0cde7bebd8e97434355c5b7e57d902613
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/gogo.cc
===
--- gcc/go/gofrontend/gogo.cc   (revision 251948)
+++ gcc/go/gofrontend/gogo.cc   (working copy)
@@ -2498,6 +2498,8 @@ Specific_type_functions::type(Type* t)
 case Type::TYPE_NAMED:
   {
Named_type* nt = t->named_type();
+   if (nt->is_alias())
+ return TRAVERSE_CONTINUE;
if (t->needs_specific_type_functions(this->gogo_))
  t->type_functions(this->gogo_, nt, NULL, NULL, _fn, _fn);
 


Go patch committed: Fix struct field names for embedded aliases

2017-09-13 Thread Ian Lance Taylor
This patch to the Go frontend adds much of https://golang.org/cl/35731
and https://golang.org/cl/35732, patches to the gc toolchain, to the
gofrontend code.

This is a step toward updating libgo to the 1.9 release.  The
gofrontend already supports type aliases, and this is required for
correct support of type aliases when used as embedded fields.

The change to expressions.cc is to handle the << 1, used for the newly
renamed offsetAnon field, in the constant context used for type
descriptor initialization.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 252745)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-8c6d9ff6f60b737d1e96c0dab0b4e67402bf3316
+b0a46c2cdb915ddc4a4e401af9ef6eb2bcd4d4ea
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 251948)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -6044,35 +6044,46 @@ Binary_expression::do_get_backend(Transl
 {
   go_assert(left_type->integer_type() != NULL);
 
-  mpz_t bitsval;
   int bits = left_type->integer_type()->bits();
-  mpz_init_set_ui(bitsval, bits);
-  Bexpression* bits_expr =
-  gogo->backend()->integer_constant_expression(right_btype, bitsval);
-  Bexpression* compare =
-  gogo->backend()->binary_expression(OPERATOR_LT,
- right, bits_expr, loc);
 
-  Bexpression* zero_expr =
-  gogo->backend()->integer_constant_expression(left_btype, zero);
-  overflow = zero_expr;
-  Bfunction* bfn = context->function()->func_value()->get_decl();
-  if (this->op_ == OPERATOR_RSHIFT
- && !left_type->integer_type()->is_unsigned())
+  Numeric_constant nc;
+  unsigned long ul;
+  if (!this->right_->numeric_constant_value()
+ || nc.to_unsigned_long() != Numeric_constant::NC_UL_VALID
+ || ul >= static_cast(bits))
{
-  Bexpression* neg_expr =
-  gogo->backend()->binary_expression(OPERATOR_LT, left,
- zero_expr, loc);
-  Bexpression* neg_one_expr =
-  gogo->backend()->integer_constant_expression(left_btype, 
neg_one);
-  overflow = gogo->backend()->conditional_expression(bfn,
- btype, neg_expr,
- neg_one_expr,
- zero_expr, loc);
+ mpz_t bitsval;
+ mpz_init_set_ui(bitsval, bits);
+ Bexpression* bits_expr =
+   gogo->backend()->integer_constant_expression(right_btype, bitsval);
+ Bexpression* compare =
+   gogo->backend()->binary_expression(OPERATOR_LT,
+  right, bits_expr, loc);
+
+ Bexpression* zero_expr =
+   gogo->backend()->integer_constant_expression(left_btype, zero);
+ overflow = zero_expr;
+ Bfunction* bfn = context->function()->func_value()->get_decl();
+ if (this->op_ == OPERATOR_RSHIFT
+ && !left_type->integer_type()->is_unsigned())
+   {
+ Bexpression* neg_expr =
+   gogo->backend()->binary_expression(OPERATOR_LT, left,
+  zero_expr, loc);
+ Bexpression* neg_one_expr =
+   gogo->backend()->integer_constant_expression(left_btype,
+neg_one);
+ overflow = gogo->backend()->conditional_expression(bfn,
+btype,
+neg_expr,
+neg_one_expr,
+zero_expr,
+loc);
+   }
+ ret = gogo->backend()->conditional_expression(bfn, btype, compare,
+   ret, overflow, loc);
+ mpz_clear(bitsval);
}
-  ret = gogo->backend()->conditional_expression(bfn, btype, compare, ret,
-overflow, loc);
-  mpz_clear(bitsval);
 }
 
   // Add checks for division by zero and division overflow as needed.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 252745)
+++ gcc/go/gofrontend/types.cc  

Go patch committed: fix check for notinheap conversion

2017-09-13 Thread Ian Lance Taylor
In the Go frontend, a normal pointer may not be converted to a
notinheap pointer.  The frontend was erroneously permitting a
conversion from a normal pointer to a notinheap unsafe.Pointer, which
is useless since unsafe.Pointer is not marked notinheap.  This patch
corrects the test to permit a conversion from unsafe.Pointer to a
notinheap pointer, which is the same test that the gc compiler uses.

The test case for this is in the upcoming 1.9 runtime package.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 251948)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-52ebad939927e6cbfb48dd277cef8db451e36533
+8c6d9ff6f60b737d1e96c0dab0b4e67402bf3316
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 251948)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -747,16 +747,16 @@ Type::are_convertible(const Type* lhs, c
 return true;
 
   // A pointer to a regular type may not be converted to a pointer to
-  // a type that may not live in the heap, except when converting to
+  // a type that may not live in the heap, except when converting from
   // unsafe.Pointer.
   if (lhs->points_to() != NULL
   && rhs->points_to() != NULL
-  && !rhs->points_to()->in_heap()
-  && lhs->points_to()->in_heap()
-  && !lhs->is_unsafe_pointer_type())
+  && !lhs->points_to()->in_heap()
+  && rhs->points_to()->in_heap()
+  && !rhs->is_unsafe_pointer_type())
 {
   if (reason != NULL)
-   reason->assign(_("conversion from notinheap type to normal type"));
+   reason->assign(_("conversion from normal type to notinheap type"));
   return false;
 }
 


Re: [PATCH] PR libstdc++/81468 constrain std::chrono::time_point constructor

2017-09-13 Thread Tim Song
On Wed, Sep 13, 2017 at 10:55 AM, Jonathan Wakely  wrote:
>
> +// DR 1177
> +static_assert(is_constructible{},
> +"can convert duration with one floating point rep to another");
> +static_assert(is_constructible{},
> +"can convert duration with integral rep to one with floating point rep");
> +static_assert(!is_constructible{},
> +"cannot convert duration with floating point rep to one with integral 
> rep");
> +static_assert(is_constructible{},
> +"can convert duration with one integral rep to another");
> +
> +static_assert(!is_constructible>>{},
> +"cannot convert duration to one with different period");
> +static_assert(is_constructible>>{},
> +"unless it has a floating-point representation");

"it" is a little ambiguous here unless you read the next message's
mention of "the original"...

> +static_assert(is_constructible>>{},
> +"or a period that is an integral multiple of the original");

This is backwards: duration is convertible to duration iff P1 is an integral multiple of P2, i.e., if the original's
period is an integral multiple of "its" period.

The static assert only passed because duration was used as the
destination type (presumably because of a copy/paste error).

Tim


Re: [PATCH] Fix PR 81096 (ttest failures)

2017-09-13 Thread Ian Lance Taylor via gcc-patches
On Tue, Sep 12, 2017 at 9:49 AM, Steve Ellcey  wrote:
> On Tue, 2017-09-12 at 09:39 -0700, Ian Lance Taylor wrote:
>> On Tue, Sep 12, 2017 at 3:59 AM, Wilco Dijkstra > om> wrote:
>> >
>> > Steve Ellcey wrote:
>> > >
>> > > This patch fixes the ttest failures on aarch64 by adding
>> > > AM_CFLAGS to
>> > > the test options, like btest already does and as Wilco says works
>> > > for
>> > > him in Comment #4 of the bug report.
>> > Thanks for picking this up, this looks OK.
>> >
>> > >
>> > > Tested by me on aarch64.  Ok to checkin?
>> > This counts as an obvious fix, so you can commit it.
>> Wait, what?  This patch is at best incomplete.  Makefile.in is a
>> generated file.  You need to change Makefile.am and re-run automake.
>>
>> Ian
>
> OK, here is the new patch that I will checkin.  I verified that after
> running automake on Makefile.am, the Makefile.in I got was identical
> to what I checked in earlier.

Thanks for taking care of that.

Ian


Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

2017-09-13 Thread Segher Boessenkool
On Wed, Sep 13, 2017 at 10:41:33PM +, Wilco Dijkstra wrote:
> Steve Ellcey wrote:
> 
> > And in aarch64 rtl expansion I see:
> >
> > (insn 10 9 11 (set (reg:QI 81)
> > (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8])) 
> > "pr77729.c":3 -1
> > (nil))​
> 
> Yes using QI/HI mode anywhere in the RTL seems perverse and incorrect given 
> AArch64
> doesn't support registers narrower than 32 bits. Shift counts seem to have 
> the same
> issue and expand into very complex sequences of ANDs with subreg - and nobody 
> seems
> to understand what subreg really means...

Right, just use SImode for shift counts and life is so much brighter.

> My feeling is that we should only use SI/DI mode in patterns and aggressively 
> widen all
> 8/16-bit operations to SI mode (which is what C requires anyway).

You don't need to do anything for this even, just don't define any
operations on smaller modes (other than mov and the extends, and the
few operations you perhaps actually do have).


Segher


Re: [PATCH version 3, rs6000] Add builtins to convert from float/double to int/long using current rounding mode

2017-09-13 Thread Carl Love
GCC maintainers:

The following patch has been updated to address Segher's comments.

-- rename define insn "fctiw" to  define_insn "lrintsfsi2" to match the
fctid implementation

-- add "TARGET_SF_FPR && TARGET_FPRND" to the define_insn "lrintsfsi2"
as mentioned it was missing on the original define_insn for fctiw.

>> +BU_P7_MISC_1 (FCTIW, "fctiw",CONST,  fctiw)

> Also, those two shouldn't be below the "1 arg BCD" heading; they also
> aren't ISA 2.06 (the instructions are in ISA 1 already; the builtins
> are new).

I looked and really couldn't find a macro that really fit for these
builtins.  I added a new macro for miscellaneous pre ISA 2.04 builtins.
I called it BU_P1_MISC_1 for lack of a better name.  Open to suggestions
on a better name.  I really don't know what the ISA numbers are prior to
2.04 so called it P1 to catch all builtins prior to isa 2.04.

Let me know if you have suggestions/comments.  Thanks.

 Carl Love


-

gcc/ChangeLog:

2017-09-14  Carl Love  

* config/rs6000/rs6000-builtin.def (BU_P1MISC_1): Add define macro.
(FCTID, FCTIW): Add BU_P1_MISC_1
macro expansion for builtins.
* config/rs6000/rs6000.md (lrintsfsi2): Add define_insn for the
fctiw instruction.

gcc/testsuite/ChangeLog:

2017-09-14 Carl Love  
* gcc.target/powerpc/builtin-fctid-fctiw-runnable.c: New test file
for the __builtin_fctid and __builtin_fctiw builtins.
---
 gcc/config/rs6000/rs6000-builtin.def   |  14 +++
 gcc/config/rs6000/rs6000.md|   8 ++
 .../powerpc/builtin-fctid-fctiw-runnable.c | 138 +
 3 files changed, 160 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/builtin-fctid-fctiw-runnable.c

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 850164a..6308009 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -608,6 +608,16 @@
CODE_FOR_ ## ICODE) /* ICODE */
 
 
+/* Miscellaneous builtins for instructions added prior to ISA 2.04.  These
+  operate on general purpose registers.  */
+#define BU_P1_MISC_1(ENUM, NAME, ATTR, ICODE)  \
+  RS6000_BUILTIN_1 (MISC_BUILTIN_ ## ENUM, /* ENUM */  \
+   "__builtin_" NAME,  /* NAME */  \
+   RS6000_BTM_ALWAYS,  /* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+| RS6000_BTC_UNARY),   \
+   CODE_FOR_ ## ICODE) /* ICODE */
+
 /* Miscellaneous builtins for instructions added in ISA 2.06.  These
instructions don't require either the DFP or VSX options, just the basic ISA
2.06 (popcntd) enablement since they operate on general purpose
@@ -1846,6 +1856,10 @@ BU_VSX_OVERLOAD_X (XL,"xl")
 BU_VSX_OVERLOAD_X (XL_BE,"xl_be")
 BU_VSX_OVERLOAD_X (XST, "xst")
 
+/* 1 argument builtins pre ISA 2.04.  */
+BU_P1_MISC_1 (FCTID,   "fctid",CONST,  lrintdfdi2)
+BU_P1_MISC_1 (FCTIW,   "fctiw",CONST,  lrintsfsi2)
+
 /* 2 argument CMPB instructions added in ISA 2.05. */
 BU_P6_2 (CMPB_32,"cmpb_32",CONST,  cmpbsi3)
 BU_P6_64BIT_2 (CMPB, "cmpb",   CONST,  cmpbdi3)
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 20873ac..ad39dc9 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5899,6 +5899,14 @@
   [(set_attr "type" "fpload")
(set_attr "length" "16")])
 
+(define_insn "lrintsfsi2"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
+   (unspec:SI [(match_operand:DF 1 "gpc_reg_operand" "d")]
+  UNSPEC_FCTIW))]
+  "TARGET_SF_FPR && TARGET_FPRND"
+  "fctiw %0,%1"
+  [(set_attr "type" "fp")])
+
 ;; No VSX equivalent to fctid
 (define_insn "lrintdi2"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=d")
diff --git a/gcc/testsuite/gcc.target/powerpc/builtin-fctid-fctiw-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtin-fctid-fctiw-runnable.c
new file mode 100644
index 000..5a0e20e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtin-fctid-fctiw-runnable.c
@@ -0,0 +1,138 @@
+/* { dg-do run { target { powerpc*-*-* && { lp64 && p8vector_hw } } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8" } */
+
+#ifdef DEBUG
+#include 
+#endif
+
+void abort (void);
+
+long
+test_bi_lrint_1 (float __A)
+{
+   return (__builtin_fctid (__A));
+}
+long
+test_bi_lrint_2 (double __A)
+{
+   return (__builtin_fctid (__A));
+}
+
+int
+test_bi_rint_1 (float __A)
+{
+   return (__builtin_fctiw (__A));
+}
+
+int
+test_bi_rint_2 (double __A)
+{
+   

[OpenACC] Enable SIMD vectorization on vector loops

2017-09-13 Thread Cesar Philippidis
This patch enables SIMD vectorization on non-SIMT targets in acc vector
loops. It does does so by setting the force_vectorization flag in a
similar manner to OpenMP SIMD loops. Unlike OpenMP, OpenACC provides the
compiler with the flexibility to assign gang, worker and vector
parallelism to independent acc loops. At present, automatic parallelism
is assigned during the oacc device lower pass, specifically inside
oacc_loop_process. Consequently, this patch applies the
force_vectorization flag late when the GOACC_LOOP internal functions are
expanded into target-specific code.

Note that expand_oacc_for may construct two loops for each acc loop; the
outer loop represents the "chunking" factor, whereas the inner loops are
for individual gang, worker and vector threads.

Also note that OpenACC permits the user to apply any combination of
gang, worker and vector level parallelism to each loop. E.g., acc loop
gang vector. However, oacc_xform_loop does not strip-mine the acc loops
to take advantage of this on non-SIMT targets as it does for SIMT
targets. Therefore, this the force vectorization flag is only set when
the acc loop has been assigned vector partitioning.

Is this patch OK for trunk?

Cesar
2017-09-13  Cesar Philippidis  

	gcc/
	* omp-offload.c (oacc_xform_loop): Enable SIMD vectorization on
	non-SIMT targets in acc vector loops.


diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 2d4fd411680..9d5b8bef649 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "cfgloop.h"
 
 /* Describe the OpenACC looping structure of a function.  The entire
function is held in a 'NULL' loop.  */
@@ -370,6 +371,30 @@ oacc_xform_loop (gcall *call)
   break;
 
 case IFN_GOACC_LOOP_OFFSET:
+  /* Enable vectorization on non-SIMT targets.  */
+  if (!targetm.simt.vf
+	  && outer_mask == GOMP_DIM_MASK (GOMP_DIM_VECTOR)
+	  /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
+	 the loop.  */
+	  && (flag_tree_loop_vectorize
+	  || !global_options_set.x_flag_tree_loop_vectorize))
+	{
+	  basic_block bb = gsi_bb (gsi);
+	  struct loop *parent = bb->loop_father;
+	  struct loop *body = parent->inner;
+
+	  parent->force_vectorize = true;
+	  parent->safelen = INT_MAX;
+
+	  /* "Chunking loops" may have inner loops.  */
+	  if (parent->inner)
+	{
+	  body->force_vectorize = true;
+	  body->safelen = INT_MAX;
+	}
+
+	  cfun->has_force_vectorize_loops = true;
+	}
   if (striding)
 	{
 	  r = oacc_thread_numbers (true, mask, );


Re: [PATCH version 2, rs6000] Add builtins to convert from float/double to int/long using current rounding mode

2017-09-13 Thread Segher Boessenkool
On Tue, Sep 12, 2017 at 07:17:07PM -0400, Michael Meissner wrote:
> On Tue, Sep 12, 2017 at 05:41:34PM -0500, Segher Boessenkool wrote:
> > This needs "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT" I think?  Which
> > is the same as "TARGET_DF_FPR".  "lrintdi2" also has "TARGET_FPRND"
> > but that is a mistake I think.  Cc:ing Mike for that.
> 
> TARGET_FPRND is for ISA 2.02 (power5+) which added the various round to 
> integer
> instructions.  So, unless we are going to stop supporting older machines, it 
> is
> needed.

I think you have this a bit wrong?

ISA 2.02 added fri*, but that's round to integer, not convert.

ISA 2.06 added some fc[ft]i* instructions, but most are in base PowerPC
already.

And FPRND is enabled at ISA 2.04 and later?


Segher


Re: [PATCH], Add support for __builtin_{sqrt,fma}f128 on PowerPC ISA 3.0

2017-09-13 Thread Joseph Myers
On Wed, 13 Sep 2017, Michael Meissner wrote:

> This patch adds support on PowerPC ISA 3.0 for the built-in function
> __builtin_sqrtf128 generating the XSSQRTQP hardware square root instruction 
> and
> the built-in function __builtin_fmaf128 generating XSMADDQP, XSMSUBQP,
> XSNMADDQP, and XSNMSUBQP fused multiply-add instructions.

Is there a reason for these to be architecture-specific rather than 
generic everywhere _Float128 is supported?  (With the fmaf128 / sqrtf128 
names available as well as the __builtin_* variants of those.)

Full support for _FloatN/_FloatNx variants of all the existing built-in 
functions might be complicated, and run into potential issues with startup 
cost of creating large numbers of extra built-in functions (it's 
desirable, but possibly hard, which is why I excluded it from the initial 
_FloatN / _FloatNx support patches).  But adding just these two functions 
to builtins.def and making them fold / expand appropriately ought to be 
much simpler.  (I realise sqrt goes through internal-fn.def and 
DEF_INTERNAL_FLT_FN expects a particular set of functions for standard 
types, so maybe some duplication would be involved to get the built-in 
function expanded appropriately, i.e. using an insn pattern or a call to 
an external sqrtf128 function according to whether such an insn pattern is 
available.  fma ought not to involve much more than adding an extra case 
where CASE_FLT_FN (BUILT_IN_FMA) is used.)

> While I was at it, I changed the documentation so that it no longer documents
> the 'q' built-in functions (to mirror libquadmath) but instead just documented
> the 'f128' functions that matches glibc 2.26 and the technical report that
> added the _FloatF128 date.

Those *f128 built-in functions (inf / huge_val / nan / nans / fabs / 
copysign) are not target-specific; they exist for all _FloatN / _FloatNx 
types for all targets with such types.  So it doesn't seem appropriate to 
document them in a target-specific section of the manual, beyond a brief 
cross-reference to the documentation of the functions as 
target-independent.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

2017-09-13 Thread Steve Ellcey
On Wed, 2017-09-13 at 22:39 +, Wilco Dijkstra wrote:
> Steve Ellcey wrote:
> 
> > And in aarch64 rtl expansion I see:
> >
> > (insn 10 9 11 (set (reg:QI 81)
> > (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1
> A8])) "pr77729.c":3 -1
> > (nil))
> 
> Yes using QI/HI mode anywhere in the RTL seems perverse and incorrect given 
> AArch64
> doesn't support registers narrower than 32 bits. Shift counts seem to have 
> the same
> issue and expand into very complex sequences of ANDs with subreg - and nobody 
> seems
> to understand what subreg really means...
> 
> My feeling is that we should only use SI/DI mode in patterns and aggressively 
> widen all
> 8/16-bit operations to SI mode (which is what C requires anyway).
> 
> Wilco

I was wondering if I should widen to SImode or DImode?  Here is a
patch I am testing now that steals some code from the arm32 md
file, I put it in the "mov" define_expand.  I decided on DImode
since that is the 'natural' size of the registers but using SImode
works too for the pr77729 test case.  Which mode do you think I should
use?

Steve Ellcey
sell...@cavium.com

diff --git a/gcc/config/aarch64/aarch64.md
b/gcc/config/aarch64/aarch64.md
index f8cdb06..41f9e8c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -864,6 +864,12 @@
(match_operand:SHORT 1 "general_operand" ""))]
   ""
   "
+if (can_create_pseudo_p () && MEM_P (operands[1]) && optimize > 0)
+  {
+   rtx reg = gen_reg_rtx (DImode);
+   emit_insn (gen_zero_extenddi2 (reg, operands[1]));
+   operands[1] = gen_lowpart (mode, reg);
+  }
 if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx)
   operands[1] = force_reg (mode, operands[1]);


Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

2017-09-13 Thread Wilco Dijkstra
Steve Ellcey wrote:

> And in aarch64 rtl expansion I see:
>
> (insn 10 9 11 (set (reg:QI 81)
> (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8])) 
> "pr77729.c":3 -1
> (nil))​

Yes using QI/HI mode anywhere in the RTL seems perverse and incorrect given 
AArch64
doesn't support registers narrower than 32 bits. Shift counts seem to have the 
same
issue and expand into very complex sequences of ANDs with subreg - and nobody 
seems
to understand what subreg really means...

My feeling is that we should only use SI/DI mode in patterns and aggressively 
widen all
8/16-bit operations to SI mode (which is what C requires anyway).

Wilco

 

Re: [PATCH] Fallout from -static-pie to GCC driver to create static PIE

2017-09-13 Thread Joseph Myers
On Wed, 13 Sep 2017, Jakub Jelinek wrote:

> Ok.
> 
> So here is an updated patch (also fixed whitespace in i386/gnu-user.h
> because the indentation there didn't match the nesting).
> 
> Bootstrapped/regtested on {x86_64,i686,powerpc64,powerpc64le}-linux,
> ok for trunk?
> At least the rs6000 part is really important, as bootstrap
> is broken on powerpc*.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

2017-09-13 Thread Steve Ellcey
On Wed, 2017-09-13 at 14:46 -0500, Segher Boessenkool wrote:
> On Wed, Sep 13, 2017 at 06:13:50PM +0100, Kyrill Tkachov wrote:
> > 
> > We are usually hesitant to add explicit subreg matching in the MD pattern
> > (though I don't remember if there's a hard rule against it).
> > In this case this looks like a missing simplification from combine 
> > (simplify-rtx) so
> > I think adding it there would be better.

> Yes, it probably belongs as a generic simplification in simplify-rtx.c;
> if there is a reason not to do that, it can be done in combine.c
> instead.

Actually, now that I look at it some more and compare it to the arm32
version (where we do not have this problem) I think the problem starts
well before combine.

In arm32 rtl expansion, when reading the QI memory location, I see
these instructions get generated:

(insn 10 3 11 2 (set (reg:SI 119)
(zero_extend:SI (mem:QI (reg/v/f:SI 117 [ string ]) [0 *string_9(D)+0 
S1 A8]))) "pr77729.c":4 -1
 (nil))
(insn 11 10 12 2 (set (reg:QI 118)
(subreg:QI (reg:SI 119) 0)) "pr77729.c":4 -1
 (nil))

And in aarch64 rtl expansion I see:

(insn 10 9 11 (set (reg:QI 81)
(mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8])) 
"pr77729.c":3 -1
 (nil))

Both of these sequences expand to ldrb but in the arm32 case I know
that I set all 32 bits of the register (even though I only want the
bottom 8 bits), but for aarch64 I only know that I set the bottom 8
bits and I don't know anything about the higher bits, meaning I have to
keep the AND instruction to mask out the upper bits on aarch64.

I think we should change the movqi/movhi expansions on aarch64 to
recognize that the ldrb/ldrh instructions zero out the upper bits in
the register by generating rtl like arm32 does.

Steve Ellcey
sell...@cavium.com


[PATCH], Add support for __builtin_{sqrt,fma}f128 on PowerPC ISA 3.0

2017-09-13 Thread Michael Meissner
This patch adds support on PowerPC ISA 3.0 for the built-in function
__builtin_sqrtf128 generating the XSSQRTQP hardware square root instruction and
the built-in function __builtin_fmaf128 generating XSMADDQP, XSMSUBQP,
XSNMADDQP, and XSNMSUBQP fused multiply-add instructions.

While I was at it, I changed the documentation so that it no longer documents
the 'q' built-in functions (to mirror libquadmath) but instead just documented
the 'f128' functions that matches glibc 2.26 and the technical report that
added the _FloatF128 date.

I changed the tests that used __fabsq to use __fabsf128 instead.

I also added && lp64 to float128-5.c so that it doesn't cause errors when doing
the test for a 32-bit target.  This is due to the fact that if you enable
hardware IEEE 128-bit floating point, you eventually will need TImode
supported, and that is not supported on 32-bit targets.

I did a bootstrap and make check with subversion id 252033 on a little endian
power8 system.  The subversion id 252033 is one of the last svn ids that
bootstrap without additional patches on the PowerPC.  There were no regressions
in this patch, and I verified the 4 new tests were run.  Can I check this patch
into the trunk?

[gcc]
2017-09-13  Michael Meissner  

* config/rs6000/rs6000-builtin.def (BU_FLOAT128_1_HW): New macros
to support float128 built-in functions that require the ISA 3.0
hardware.
(BU_FLOAT128_3_HW): Likewise.
(SQRTF128): Add support for the IEEE 128-bit square root and fma
built-in functions.
(FMAF128): Likewise.
(FMAQ): Likewise.
* config/rs6000/rs6000.c (rs6000_builtin_mask_calculate): Add
support for built-in functions that need the ISA 3.0 IEEE 128-bit
floating point instructions.
(rs6000_invalid_builtin): Likewise.
(rs6000_builtin_mask_names): Likewise.
* config/rs6000/rs6000.h (MASK_FLOAT128_HW): Likewise.
(RS6000_BTM_FLOAT128_HW): Likewise.
(RS6000_BTM_COMMON): Likewise.
* config/rs6000/rs6000.md (fma4_hw): Add a generator
function.
* doc/extend.texi (RS/6000 built-in functions): Document the
'f128' IEEE 128-bit floating point built-in functions.  Don't
document the older 'q' versions of the functions. Document the
built-in IEEE 128-bit floating point square root and fused
multiply-add built-ins.

[gcc/testsuite]
2017-09-13  Michael Meissner  

* gcc.target/powerpc/abs128-1.c: Use __builtin_fabsf128 instead of
__builtin_fabsq.
* gcc.target/powerpc/float128-5.c: Use __builtin_fabsf128 instead
of __builtin_fabsq.  Prevent the test from running on 32-bit.
* gcc.target/powerpc/float128-fma1.c: New test.
* gcc.target/powerpc/float128-fma2.c: Likewise.
* gcc.target/powerpc/float128-sqrt1.c: Likewise.
* gcc.target/powerpc/float128-sqrt2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 252730)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -667,6 +667,23 @@
 | RS6000_BTC_UNARY),   \
CODE_FOR_ ## ICODE) /* ICODE */
 
+/* IEEE 128-bit floating-point builtins that need the ISA 3.0 hardware.  */
+#define BU_FLOAT128_1_HW(ENUM, NAME, ATTR, ICODE)   \
+  RS6000_BUILTIN_1 (MISC_BUILTIN_ ## ENUM,  /* ENUM */  \
+   "__builtin_" NAME,  /* NAME */  \
+   RS6000_BTM_FLOAT128_HW, /* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+| RS6000_BTC_UNARY),   \
+   CODE_FOR_ ## ICODE) /* ICODE */
+
+#define BU_FLOAT128_3_HW(ENUM, NAME, ATTR, ICODE)   \
+  RS6000_BUILTIN_3 (MISC_BUILTIN_ ## ENUM,  /* ENUM */  \
+   "__builtin_" NAME,  /* NAME */  \
+   RS6000_BTM_FLOAT128_HW, /* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+| RS6000_BTC_TERNARY), \
+   CODE_FOR_ ## ICODE) /* ICODE */
+
 /* Miscellaneous builtins for instructions added in ISA 3.0.  These
instructions don't require either the DFP or VSX options, just the basic
ISA 3.0 enablement since they operate on general purpose registers.  */
@@ -2328,6 +2345,16 @@ BU_FLOAT128_1 (FABSQ,"fabsq",   CO
 
 /* 2 argument IEEE 128-bit 

Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-09-13 Thread Wilco Dijkstra
Jeff Law wrote:
> On 09/06/2017 03:55 AM, Jackson Woodruff wrote:
> > On 08/30/2017 01:46 PM, Richard Biener wrote:

>>>   rdivtmp = 1 / (y*C);
>>>   tem = x *rdivtmp;
>>>   tem2= z * rdivtmp;
>>>
>>> instead of
>>>
>>>   rdivtmp = 1/y;
>>>   tem = x * 1/C * rdivtmp;
>>>   tem2 = z * 1/C * rdivtmp;
>> 
>> Ideally we would be able to CSE that into
>> 
>> rdivtmp = 1/y * 1/C;
>> tem = x * rdivtmp;
>> tem2 = z * rdivtmp;
> So why is your sequence significantly better than Richi's desired
> seqeuence?  They both seem to need 3 mults and a division (which in both
> cases might be a reciprocal estimation).    In Richi's sequence we have
> to mult and feed the result as an operand into the reciprocal insn.  In
> yours we feed the result of the reciprocal into the multiply.

Basically this stuff happens a lot in real code, which is exactly why I 
proposed it.
I even provided counts of how many divisions each transformation avoids:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026. 

Note this transformation is a canonicalization - if you can't merge a
constant somehow, moving it out to the LHS can expose more
opportunities, like in (C1 * x) / (C2 * y) -> (C1 * x * C2) / y -> (C3 * x) / y.
Same for negation as it behaves like * -1.

The key here is that it is at least an order of magnitude worse if you have
to execute an extra division than if you have an extra multiply.

> ISTM in the end if  y*C or 1/(y*C) is a CSE, then Richi's sequence wins.
> Similarly if 1/y is a CSE, then yours wins.  Is there some reason to
> believe that one is a more likely CSE than the other?

The idea is that 1/y is much more likely a CSE than 1/(y*C).

We could make the pattern only fire in single use cases and see whether
that makes a difference. It would be easy to test old vs new vs single-use
new and count how many divisions we end up with. We could add 1/ (y * C)
to the reciprocal phase if it is unacceptable as a canonicalization, but then
you won't be able to optimize (C1 * x * C2) / y.

> I think there's a fundamental phase ordering problem here.  You want to
> CSE stuff as much as possible, then expose reciprocals, then CSE again
> because exposing reciprocals can expose new CSE opportunities.

I agree there are phase ordering issues and various problems in
reassociation, CSE and division optimizations not being able to find and
optimize complex cases that are worthwhile.

However I don't agree doing CSE before reciprocals is a good idea. We
want to expose reciprocals early on, even if that means we find fewer
CSEs as a result - again because division is so much more expensive than
any other operation. CSE is generally not smart enough to CSE a * x in
a * b * x and a * c * x, something which is likely to happen quite frequently -
unlike the made up division examples here.

> And I suspect that no matter how hard we try, there's going to be cases
> that get exposed by various transformations in the pipeline such that to
> fully optimize the cse - reciprocal - cse sequence would need to be
> repeated to fully optimize.  We may have to live with not being
> particularly good at picking up the those second order effects.
> 
> I do think that the need for cse - reciprocal - cse  sequencing suggests
> that match.pd may not be the right place for these transformations.  I
> think Richi has pointed this out a couple times already.

I don't think you can ever expect to find the optimal case by repeating
optimizations. It's quite easy to construct examples where first doing CSE
makes things significantly worse. Ultimately to get something optimal you'd
need to try lots of permutations and count for each possible permutation
how many multiplies and divisons you end up after full optimization.
Quite impractical...

> I'm not going to accept or reject at this time.  I think we need to make
> a higher level decision.  Are these transformations better suited for
> match.pd or the reciprocal transformation pass?  I realize that some
> patterns are already in match.pd, but let's try to settle the higher
> level issue before we add more.

The first question is whether you see it as a canonicalization. If so, then
match.pd should be fine.

Wilco

Re: [PATCH] Add a -Wcast-align=strict warning

2017-09-13 Thread Joseph Myers
On Wed, 13 Sep 2017, Bernd Edlinger wrote:

> So you suggest to use min_align_of_type instead of TYPE_ALIGN.
> 
> That would also make sense for the traditional -Wcast-align on
> strict-alignment targets, right?

Yes, and yes (though I'm not sure if any strict-alignment targets have 
this peculiarity).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Add a -Wcast-align=strict warning

2017-09-13 Thread Bernd Edlinger
On 09/13/17 22:03, Joseph Myers wrote:
> On Wed, 13 Sep 2017, Bernd Edlinger wrote:
> 
>> On 09/13/17 19:06, Joseph Myers wrote:
>>> What does this warning do in cases where a type has different alignments
>>> inside and outside structs?  I'm thinking of something like
>>>
>>> struct s { long long x; } *p;
>>> /* ... */
>>> (long long *)p
>>>
>>> on 32-bit x86 - where long long's preferred alignment is 8 bytes, but in
>>> structures it's 4 bytes.  (Likewise for double in place of long long.)  I
>>> think a warning for a (long long *)p cast might be surprising in that
>>> case.
>>>
>>
>> Well, yes this does get a warning.  But doesn't that cast then violate
>> the underlying alignment requirement of long long* ?
> 
> That's the difference between preferred alignment (__alignof__) and
> alignment required in all contexts (C11 _Alignof).  The above seems valid,
> just like it's valid to take the address of a long long struct element.
> That is, the alignment for the target of a pointer to long long is really
> 4 bytes here, even though the alignment for a standalone long long object
> is 8 bytes.  And there's a case for the warning to look at the required
> alignment in all contexts, not TYPE_ALIGN.
> 

So you suggest to use min_align_of_type instead of TYPE_ALIGN.

That would also make sense for the traditional -Wcast-align on
strict-alignment targets, right?


Thanks,
Bernd.


Re: [PATCH] Fallout from -static-pie to GCC driver to create static PIE

2017-09-13 Thread Segher Boessenkool
On Wed, Sep 13, 2017 at 10:19:50PM +0200, Jakub Jelinek wrote:
> On Wed, Sep 13, 2017 at 05:18:00AM -0700, H.J. Lu wrote:
> > > In LINK_EH_SPEC you use %{!static|static-pie:--eh-frame-hdr}, what
> > > exactly do you mean by that?  The way that works is if -static
> > > isn't present on the command line OR if -static -static-pie OR
> > > -static-pie -static is present, you get --eh-frame-hdr.  Didn't you
> > > mean %{!static:%{!static-pie:--eh-frame-hdr}}, i.e. don't emit
> > > --eh-frame-hdr for either -static or -static-pie?
> > 
> > It means "pass --eh-frame-hdr to ld if -static isn't used or -static-pie
> > is used since -static-pie needs PT_GNU_EH_FRAME segment.
> 
> Ok.
> 
> So here is an updated patch (also fixed whitespace in i386/gnu-user.h
> because the indentation there didn't match the nesting).
> 
> Bootstrapped/regtested on {x86_64,i686,powerpc64,powerpc64le}-linux,
> ok for trunk?
> At least the rs6000 part is really important, as bootstrap
> is broken on powerpc*.

The rs6000 part looks fine, thanks!


Segher


[PATCH] Fallout from -static-pie to GCC driver to create static PIE

2017-09-13 Thread Jakub Jelinek
On Wed, Sep 13, 2017 at 05:18:00AM -0700, H.J. Lu wrote:
> > In LINK_EH_SPEC you use %{!static|static-pie:--eh-frame-hdr}, what
> > exactly do you mean by that?  The way that works is if -static
> > isn't present on the command line OR if -static -static-pie OR
> > -static-pie -static is present, you get --eh-frame-hdr.  Didn't you
> > mean %{!static:%{!static-pie:--eh-frame-hdr}}, i.e. don't emit
> > --eh-frame-hdr for either -static or -static-pie?
> 
> It means "pass --eh-frame-hdr to ld if -static isn't used or -static-pie
> is used since -static-pie needs PT_GNU_EH_FRAME segment.

Ok.

So here is an updated patch (also fixed whitespace in i386/gnu-user.h
because the indentation there didn't match the nesting).

Bootstrapped/regtested on {x86_64,i686,powerpc64,powerpc64le}-linux,
ok for trunk?
At least the rs6000 part is really important, as bootstrap
is broken on powerpc*.

2017-09-13  Jakub Jelinek  

* config/alpha/elf.h (LINK_EH_SPEC): Add -static-pie support.
* config/alpha/linux.h (LINK_GCC_C_SEQUENCE_SPEC): Likewise.
* config/netbsd.h (LINK_EH_SPEC): Likewise.
* config/sol2.h (LINK_EH_SPEC): Likewise.
* config/arm/uclinux-elf.h (LINK_GCC_C_SEQUENCE_SPEC): Likewise.
* config/s390/linux.h (LINK_SPEC): Likewise.
* config/freebsd.h (LINK_EH_SPEC): Likewise.
* config/openbsd.h (LINK_EH_SPEC): Likewise.
* config/lm32/uclinux-elf.h (LINK_GCC_C_SEQUENCE_SPEC): Likewise.
* config/aarch64/aarch64-linux.h (LINUX_TARGET_LINK_SPEC): Likewise.
* config/powerpcspe/sysv4.h (LINK_EH_SPEC): Likewise.
* config/bfin/linux.h (LINK_GCC_C_SEQUENCE_SPEC): Likewise.
* config/rs6000/sysv4.h (STARTFILE_LINUX_SPEC): Likewise.
(ENDFILE_LINUX_SPEC): Likewise.
(LINK_EH_SPEC): Likewise.
* config/rs6000/linux64.h (LINK_SHLIB_SPEC): Likewise.
(LINK_OS_LINUX_SPEC32): Likewise.
(LINK_OS_LINUX_SPEC64): Likewise.
* config/rs6000/linux.h (LINK_SHLIB_SPEC): Likewise.
(LINK_OS_LINUX_SPEC): Likewise.
* config/i386/gnu-user64.h (GNU_USER_TARGET_LINK_SPEC): Fix a typo.
* config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): Formatting fix.

--- gcc/config/alpha/elf.h.jj   2017-01-01 12:45:40.0 +0100
+++ gcc/config/alpha/elf.h  2017-09-13 11:34:37.889721802 +0200
@@ -168,5 +168,5 @@ extern int alpha_this_gpdisp_sequence_nu
I imagine that other systems will catch up.  In the meantime, it
doesn't harm to make sure that the data exists to be used later.  */
 #if defined(HAVE_LD_EH_FRAME_HDR)
-#define LINK_EH_SPEC "%{!static:--eh-frame-hdr} "
+#define LINK_EH_SPEC "%{!static|static-pie:--eh-frame-hdr} "
 #endif
--- gcc/config/alpha/linux.h.jj 2017-01-01 12:45:40.063647860 +0100
+++ gcc/config/alpha/linux.h2017-09-13 11:42:01.187281636 +0200
@@ -79,7 +79,8 @@ along with GCC; see the file COPYING3.
 #define TARGET_POSIX_IO
 
 #define LINK_GCC_C_SEQUENCE_SPEC \
-  "%{static:--start-group} %G %L %{static:--end-group}%{!static:%G}"
+  "%{static|static-pie:--start-group} %G %L \
+   %{static|static-pie:--end-group}%{!static:%{!static-pie:%G}}"
 
 /* Use --as-needed -lgcc_s for eh support.  */
 #ifdef HAVE_LD_AS_NEEDED
--- gcc/config/netbsd.h.jj  2017-07-13 15:37:54.0 +0200
+++ gcc/config/netbsd.h 2017-09-13 11:33:39.105443205 +0200
@@ -125,7 +125,7 @@ along with GCC; see the file COPYING3.
 #define LIBGCC_SPEC NETBSD_LIBGCC_SPEC
 
 #if defined(HAVE_LD_EH_FRAME_HDR)
-#define LINK_EH_SPEC "%{!static:--eh-frame-hdr} "
+#define LINK_EH_SPEC "%{!static|static-pie:--eh-frame-hdr} "
 #endif
 
 #undef TARGET_LIBC_HAS_FUNCTION
--- gcc/config/sol2.h.jj2017-09-01 09:26:51.0 +0200
+++ gcc/config/sol2.h   2017-09-13 11:34:04.628129989 +0200
@@ -372,7 +372,7 @@ along with GCC; see the file COPYING3.
 /* Solaris 11 build 135+ implements dl_iterate_phdr.  GNU ld needs
--eh-frame-hdr to create the required .eh_frame_hdr sections.  */
 #if defined(HAVE_LD_EH_FRAME_HDR) && defined(TARGET_DL_ITERATE_PHDR)
-#define LINK_EH_SPEC "%{!static:--eh-frame-hdr} "
+#define LINK_EH_SPEC "%{!static|static-pie:--eh-frame-hdr} "
 #endif /* HAVE_LD_EH_FRAME && TARGET_DL_ITERATE_PHDR */
 #endif
 
--- gcc/config/arm/uclinux-elf.h.jj 2017-01-01 12:45:41.382630422 +0100
+++ gcc/config/arm/uclinux-elf.h2017-09-13 11:42:51.438664949 +0200
@@ -70,7 +70,8 @@
 
 #undef LINK_GCC_C_SEQUENCE_SPEC
 #define LINK_GCC_C_SEQUENCE_SPEC \
-  "%{static:--start-group} %G %L %{static:--end-group}%{!static:%G %L}"
+  "%{static|static-pie:--start-group} %G %L \
+   %{static|static-pie:--end-group}%{!static:%{!static-pie:%G %L}}"
 
 /* Use --as-needed -lgcc_s for eh support.  */
 #ifdef HAVE_LD_AS_NEEDED
--- gcc/config/s390/linux.h.jj  2017-02-06 13:32:14.0 +0100
+++ gcc/config/s390/linux.h 2017-09-13 13:21:09.209266231 +0200
@@ -82,10 +82,11 @@ along with GCC; see the file COPYING3.
%{shared:-shared} \
%{!shared: \
   

Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE

2017-09-13 Thread Bill Schmidt
On Sep 13, 2017, at 10:40 AM, Bill Schmidt  wrote:
> 
> On Sep 13, 2017, at 7:23 AM, Richard Biener  
> wrote:
>> 
>> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt
>>  wrote:
>>> Hi,
>>> 
>>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE
>>> 
>>> Folding of vector loads in GIMPLE.
>>> 
>>> Add code to handle gimple folding for the vec_ld builtins.
>>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. 
>>> Surrounding
>>> comments have been adjusted slightly so they continue to read OK for the
>>> existing vec_st code.
>>> 
>>> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c
>>> tests which have been posted separately.
>>> 
>>> For V2 of this patch, I've removed the chunk of code that prohibited the
>>> gimple fold from occurring in BE environments.   This had fixed an issue
>>> for me earlier during my development of the code, and turns out this was
>>> not necessary.  I've sniff-tested after removing that check and it looks
>>> OK.
>>> 
 + /* Limit folding of loads to LE targets.  */
 +  if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)
 +return false;
>>> 
>>> I've restarted a regression test on this updated version.
>>> 
>>> OK for trunk (assuming successful regression test completion)  ?
>>> 
>>> Thanks,
>>> -Will
>>> 
>>> [gcc]
>>> 
>>>   2017-09-12  Will Schmidt  
>>> 
>>>   * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
>>>   for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).
>>>   * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>>>   Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.
>>> 
>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
>>> index fbab0a2..bb8a77d 100644
>>> --- a/gcc/config/rs6000/rs6000-c.c
>>> +++ b/gcc/config/rs6000/rs6000-c.c
>>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, 
>>> tree fndecl,
>>>convert (TREE_TYPE (stmt), arg0));
>>>  stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>>  return stmt;
>>>}
>>> 
>>> -  /* Expand vec_ld into an expression that masks the address and
>>> - performs the load.  We need to expand this early to allow
>>> +  /* Expand vec_st into an expression that masks the address and
>>> + performs the store.  We need to expand this early to allow
>>> the best aliasing, as by the time we get into RTL we no longer
>>> are able to honor __restrict__, for example.  We may want to
>>> consider this for all memory access built-ins.
>>> 
>>> When -maltivec=be is specified, or the wrong number of arguments
>>> is provided, simply punt to existing built-in processing.  */
>>> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD
>>> -  && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
>>> -  && nargs == 2)
>>> -{
>>> -  tree arg0 = (*arglist)[0];
>>> -  tree arg1 = (*arglist)[1];
>>> -
>>> -  /* Strip qualifiers like "const" from the pointer arg.  */
>>> -  tree arg1_type = TREE_TYPE (arg1);
>>> -  if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != 
>>> ARRAY_TYPE)
>>> -   goto bad;
>>> -
>>> -  tree inner_type = TREE_TYPE (arg1_type);
>>> -  if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)
>>> -   {
>>> - arg1_type = build_pointer_type (build_qualified_type (inner_type,
>>> -   0));
>>> - arg1 = fold_convert (arg1_type, arg1);
>>> -   }
>>> -
>>> -  /* Construct the masked address.  Let existing error handling take
>>> -over if we don't have a constant offset.  */
>>> -  arg0 = fold (arg0);
>>> -
>>> -  if (TREE_CODE (arg0) == INTEGER_CST)
>>> -   {
>>> - if (!ptrofftype_p (TREE_TYPE (arg0)))
>>> -   arg0 = build1 (NOP_EXPR, sizetype, arg0);
>>> -
>>> - tree arg1_type = TREE_TYPE (arg1);
>>> - if (TREE_CODE (arg1_type) == ARRAY_TYPE)
>>> -   {
>>> - arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));
>>> - tree const0 = build_int_cstu (sizetype, 0);
>>> - tree arg1_elt0 = build_array_ref (loc, arg1, const0);
>>> - arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);
>>> -   }
>>> -
>>> - tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,
>>> -  arg1, arg0);
>>> - tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, 
>>> addr,
>>> - build_int_cst (arg1_type, -16));
>>> -
>>> - /* Find the built-in to get the return type so we can convert
>>> -the result properly (or fall back to default handling if the
>>> -arguments aren't compatible).  */
>>> - for (desc = altivec_overloaded_builtins;
>>> -  

Re: [PATCH] Add a -Wcast-align=strict warning

2017-09-13 Thread Joseph Myers
On Wed, 13 Sep 2017, Bernd Edlinger wrote:

> On 09/13/17 19:06, Joseph Myers wrote:
> > What does this warning do in cases where a type has different alignments
> > inside and outside structs?  I'm thinking of something like
> > 
> > struct s { long long x; } *p;
> > /* ... */
> > (long long *)p
> > 
> > on 32-bit x86 - where long long's preferred alignment is 8 bytes, but in
> > structures it's 4 bytes.  (Likewise for double in place of long long.)  I
> > think a warning for a (long long *)p cast might be surprising in that
> > case.
> > 
> 
> Well, yes this does get a warning.  But doesn't that cast then violate
> the underlying alignment requirement of long long* ?

That's the difference between preferred alignment (__alignof__) and 
alignment required in all contexts (C11 _Alignof).  The above seems valid, 
just like it's valid to take the address of a long long struct element.  
That is, the alignment for the target of a pointer to long long is really 
4 bytes here, even though the alignment for a standalone long long object 
is 8 bytes.  And there's a case for the warning to look at the required 
alignment in all contexts, not TYPE_ALIGN.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Factor out division by squares and remove division around comparisons (0/2)

2017-09-13 Thread Jeff Law
On 09/13/2017 10:25 AM, Jackson Woodruff wrote:
> 
> 
> On 09/13/2017 04:45 PM, Jeff Law wrote:
>> On 09/06/2017 03:54 AM, Jackson Woodruff wrote:
>>> Hi all,
>>>
>>> This patch is split from part (1/2). It includes the patterns that have
>>> been moved out of fold-const.c
>>>
>>>
>>> It also removes an (almost entirely) redundant pattern:
>>>
>>>  (A / C1) +- (A / C2) -> A * (1 / C1 +- 1 / C2)
>>>
>>> which was only used in special cases, either with combinations
>>> of flags like -fno-reciprocal-math -funsafe-math-optimizations
>>> and cases where C was sNaN, or small enough to result in infinity.
>>>
>>> This pattern is covered by:
>>>
>>> (A / C1) +- (A / C2) -> (with O1 and reciprocal math)
>>> A * (1 / C1) +- A * (1 / C2) ->
>>> A * (1 / C1 +- 1 / C2)
>>>
>>> The previous pattern required funsafe-math-optimizations.
>>>
>>> To adjust for this case, the testcase has been updated to require O1 so
>>> that the optimization is still performed.
>>>
>>>
>>> This pattern is moved verbatim into match.pd:
>>>
>>>  (A / C) +- (B / C) -> (A +- B) / C.
>>>
>>> OK for trunk?
>>>
>>> Jackson
>>>
>>> gcc/
>>>
>>> 2017-08-30  Jackson Woodruff  
>>>
>>>  PR 71026/tree-optimization
>>>  * match.pd: Move RDIV patterns from fold-const.c
>>>  * fold-const.c (distribute_real_division): Removed.
>>>  (fold_binary_loc): Remove calls to distribute_real_divison.
>>>
>>> gcc/testsuite/
>>>
>>> 2017-08-30  Jackson Woodruff  
>>>
>>>  PR 71026/tree-optimization
>>>  * gcc/testsuire/gcc.dg/fold-div-1.c: Use O1.
>> Sorry.  Just one more question here.  Does the new match.pd pattern need
>> to be conditional on flag_unsafe_math_optimizations?  ISTM as written
>> it's going to do those transformations independent of that flag.
> 
> For more context
> 
> (if (flag_unsafe_math_optimizations)
>  /* Simplify sqrt(x) * sqrt(x) -> x.  */
>  (simplify
>   (mult (SQRT@1 @0) @1) <- End mult
>   (if (!HONOR_SNANS (type))
>@0
>   ) <- End if !HONOR_SNANS
>  ) <- End simplify
> 
>  (for op (plus minus)
>   /* Simplify (A / C) +- (B / C) -> (A +- B) / C.  */
>   (simplify
>(op (rdiv @0 @1)
>(rdiv @2 @1)
>)  <- End op
>(rdiv (op @0 @2) @1) <- End rdiv
>) <- End simplify
>   ) <- End for
> 
> ... (more patterns) ...
> 
> The if (flag_unsafe_math_optimizations) covers a whole sequence of
> transformations here which mine is a part of. I've annotated the close
> parens so it is clearer.
Ah.  Missed the indention level.  I'd cut-n-pasted the new bits to
"verify" my suspicion as well and shoved it at the end of my match.pd.
WHich of course wasn't protected.

> 
> I don't have commit privileges, could you commit this on my behalf if
> this is OK?
Will do.  If you're going to contribute regularly you might as well get
that process started though :-)  List me as your sponsor.

https://gcc.gnu.org/svnwrite.html



Re: Rb_tree constructor optimization

2017-09-13 Thread François Dumont

On 08/09/2017 17:50, Jonathan Wakely wrote:


Since we know __a == __x.get_allocator() we could just do:

 _Rb_tree(_Rb_tree&& __x, _Node_allocator&&, true_type)
noexcept(is_nothrow_move_constructible<_Rb_tree_impl<_Compare>>::value)
 : _M_impl(std::move(__x._M_impl))
 { }

This means we don't need the new constructor.


You want to consider that a always equal allocator is stateless and 
so that the provided allocator rvalue reference do not need to be moved. 
IMHO you can have allocator with state that do not participate in 
comparison like some monitoring info.


I'm not confortable with that and prefer to keep current behavior 
so I propose this new patch considering all your other remarks.


I change noexcept qualification on [multi]map/set constructors to 
just rely on _Rep_type constructor noexcept qualification to not 
duplicate it.


François


diff --git a/libstdc++-v3/include/bits/stl_map.h b/libstdc++-v3/include/bits/stl_map.h
index 0e8a98a..cdd2e7c 100644
--- a/libstdc++-v3/include/bits/stl_map.h
+++ b/libstdc++-v3/include/bits/stl_map.h
@@ -235,8 +235,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /// Allocator-extended move constructor.
   map(map&& __m, const allocator_type& __a)
-  noexcept(is_nothrow_copy_constructible<_Compare>::value
-	   && _Alloc_traits::_S_always_equal())
+  noexcept( noexcept(
+	_Rep_type(declval<_Rep_type>(), declval<_Pair_alloc_type>())) )
   : _M_t(std::move(__m._M_t), _Pair_alloc_type(__a)) { }
 
   /// Allocator-extended initialier-list constructor.
diff --git a/libstdc++-v3/include/bits/stl_multimap.h b/libstdc++-v3/include/bits/stl_multimap.h
index 7e3cea4..d32104d 100644
--- a/libstdc++-v3/include/bits/stl_multimap.h
+++ b/libstdc++-v3/include/bits/stl_multimap.h
@@ -232,8 +232,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /// Allocator-extended move constructor.
   multimap(multimap&& __m, const allocator_type& __a)
-  noexcept(is_nothrow_copy_constructible<_Compare>::value
-	   && _Alloc_traits::_S_always_equal())
+  noexcept( noexcept(
+	_Rep_type(declval<_Rep_type>(), declval<_Pair_alloc_type>())) )
   : _M_t(std::move(__m._M_t), _Pair_alloc_type(__a)) { }
 
   /// Allocator-extended initialier-list constructor.
diff --git a/libstdc++-v3/include/bits/stl_multiset.h b/libstdc++-v3/include/bits/stl_multiset.h
index 517e77e..9ab4ab7 100644
--- a/libstdc++-v3/include/bits/stl_multiset.h
+++ b/libstdc++-v3/include/bits/stl_multiset.h
@@ -244,8 +244,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /// Allocator-extended move constructor.
   multiset(multiset&& __m, const allocator_type& __a)
-  noexcept(is_nothrow_copy_constructible<_Compare>::value
-	   && _Alloc_traits::_S_always_equal())
+  noexcept( noexcept(
+	_Rep_type(declval<_Rep_type>(), declval<_Key_alloc_type>())) )
   : _M_t(std::move(__m._M_t), _Key_alloc_type(__a)) { }
 
   /// Allocator-extended initialier-list constructor.
diff --git a/libstdc++-v3/include/bits/stl_set.h b/libstdc++-v3/include/bits/stl_set.h
index e804a7c..6b64bcd 100644
--- a/libstdc++-v3/include/bits/stl_set.h
+++ b/libstdc++-v3/include/bits/stl_set.h
@@ -248,8 +248,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /// Allocator-extended move constructor.
   set(set&& __x, const allocator_type& __a)
-  noexcept(is_nothrow_copy_constructible<_Compare>::value
-	   && _Alloc_traits::_S_always_equal())
+  noexcept( noexcept(
+	_Rep_type(declval<_Rep_type>(), declval<_Key_alloc_type>())) )
   : _M_t(std::move(__x._M_t), _Key_alloc_type(__a)) { }
 
   /// Allocator-extended initialier-list constructor.
diff --git a/libstdc++-v3/include/bits/stl_tree.h b/libstdc++-v3/include/bits/stl_tree.h
index c2417f1..7aacc24 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -704,6 +704,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #else
 	  _Rb_tree_impl(_Rb_tree_impl&&) = default;
 
+	  _Rb_tree_impl(_Rb_tree_impl&& __x, _Node_allocator&& __a)
+	  : _Node_allocator(std::move(__a)),
+	_Base_key_compare(std::move(__x)),
+	_Rb_tree_header(std::move(__x))
+	  { }
+
 	  _Rb_tree_impl(const _Key_compare& __comp, _Node_allocator&& __a)
 	  : _Node_allocator(std::move(__a)), _Base_key_compare(__comp)
 	  { }
@@ -947,7 +953,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   : _Rb_tree(std::move(__x), _Node_allocator(__a))
   { }
 
-  _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a);
+private:
+  _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a, true_type)
+  : _M_impl(std::move(__x._M_impl), std::move(__a))
+  { }
+
+  _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a, false_type)
+  : _M_impl(__x._M_impl._M_key_compare, std::move(__a))
+  {
+	if (__x._M_root() != nullptr)
+	  _M_move_data(__x, false_type{});
+  }
+
+public:
+  _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a)
+  

Re: [PATCH] Add a -Wcast-align=strict warning

2017-09-13 Thread Bernd Edlinger
On 09/13/17 19:06, Joseph Myers wrote:
> What does this warning do in cases where a type has different alignments
> inside and outside structs?  I'm thinking of something like
> 
> struct s { long long x; } *p;
> /* ... */
> (long long *)p
> 
> on 32-bit x86 - where long long's preferred alignment is 8 bytes, but in
> structures it's 4 bytes.  (Likewise for double in place of long long.)  I
> think a warning for a (long long *)p cast might be surprising in that
> case.
> 

Well, yes this does get a warning.  But doesn't that cast then violate
the underlying alignment requirement of long long* ?

Of course there is probably a reason why -Wcast-align is not enabled by
default, and likewise this warning emits a fair amount of false
positives, but nevertheless I think it is often worth looking at the
places where this warning flags a possible alignment issue.

However, neither -Wcast-align nor -Wcast-align=strict are enabled unless
explicitly requested.


Bernd.


Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

2017-09-13 Thread Segher Boessenkool
Hi!

On Wed, Sep 13, 2017 at 06:13:50PM +0100, Kyrill Tkachov wrote:
> +;; Specialized OR instruction for combiner.  The AND is masking out bits
> +;; not needed in the OR (doing a zero_extend).  The zero_extend is not
> +;; needed because we know from the subreg that the upper part of the reg
> +;; is zero.
> +(define_insn "*iorqi3_uxtw"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(ior:SI (and:SI
> +   (subreg:SI (match_operand:QI 1 "register_operand" "r") 0)
> +   (match_operand:SI 2 "const_int_operand" "n"))
> + (match_operand:SI 3 "aarch64_logical_operand" "K")))]
> +  "INTVAL (operands[2]) + INTVAL (operands[3]) == 255"
> +  "orr\\t%w0, %w1, %3"
> +  [(set_attr "type" "logic_imm")]
> +)
> 
> We are usually hesitant to add explicit subreg matching in the MD pattern
> (though I don't remember if there's a hard rule against it).
> In this case this looks like a missing simplification from combine 
> (simplify-rtx) so
> I think adding it there would be better.

Yes, it probably belongs as a generic simplification in simplify-rtx.c;
if there is a reason not to do that, it can be done in combine.c instead.


Segher


Turn FUNCTION_ARG_OFFSET into a hook

2017-09-13 Thread Richard Sandiford
Nice and easy, one definition and one use :-)

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by comparing the testsuite assembly output on at least one
target per CPU directory.  OK to install?

Richard


2017-09-13  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* target.def (function_arg_offset): New hook.
* targhooks.h (default_function_arg_offset): Declare.
* targhooks.c (default_function_arg_offset): New function.
* function.c (locate_and_pad_parm): Use
targetm.calls.function_arg_offset instead of FUNCTION_ARG_OFFSET.
* doc/tm.texi.in (FUNCTION_ARG_OFFSET): Replace with...
(TARGET_FUNCTION_ARG_OFFSET): ...this.
* doc/tm.texi: Regenerate.
* config/spu/spu.h (FUNCTION_ARG_OFFSET): Delete.
* config/spu/spu.c (spu_function_arg_offset): New function.
(TARGET_FUNCTION_ARG_OFFSET): Redefine.
* system.h (FUNCTION_ARG_OFFSET): Poison.

Index: gcc/target.def
===
--- gcc/target.def  2017-09-13 20:12:24.499190740 +0100
+++ gcc/target.def  2017-09-13 20:14:46.245761652 +0100
@@ -4573,6 +4573,16 @@ used for arguments without any special h
  default_function_arg_advance)
 
 DEFHOOK
+(function_arg_offset,
+ "This hook returns the number of bytes to add to the offset of an\n\
+argument of type @var{type} and mode @var{mode} when passed in memory.\n\
+This is needed for the SPU, which passes @code{char} and @code{short}\n\
+arguments in the preferred slot that is in the middle of the quad word\n\
+instead of starting at the top.  The default implementation returns 0.",
+ HOST_WIDE_INT, (machine_mode mode, const_tree type),
+ default_function_arg_offset)
+
+DEFHOOK
 (function_arg_padding,
  "This hook determines whether, and in which direction, to pad out\n\
 an argument of mode @var{mode} and type @var{type}.  It returns\n\
Index: gcc/targhooks.h
===
--- gcc/targhooks.h 2017-09-13 18:03:51.114330107 +0100
+++ gcc/targhooks.h 2017-09-13 20:14:46.245761652 +0100
@@ -132,6 +132,7 @@ extern bool hook_bool_CUMULATIVE_ARGS_tr
   (const_tree, const_tree, const_tree);
 extern void default_function_arg_advance
   (cumulative_args_t, machine_mode, const_tree, bool);
+extern HOST_WIDE_INT default_function_arg_offset (machine_mode, const_tree);
 extern pad_direction default_function_arg_padding (machine_mode, const_tree);
 extern rtx default_function_arg
   (cumulative_args_t, machine_mode, const_tree, bool);
Index: gcc/targhooks.c
===
--- gcc/targhooks.c 2017-09-13 18:03:51.114330107 +0100
+++ gcc/targhooks.c 2017-09-13 20:14:46.245761652 +0100
@@ -734,6 +734,14 @@ default_function_arg_advance (cumulative
   gcc_unreachable ();
 }
 
+/* Default implementation of TARGET_FUNCTION_ARG_OFFSET.  */
+
+HOST_WIDE_INT
+default_function_arg_offset (machine_mode, const_tree)
+{
+  return 0;
+}
+
 /* Default implementation of TARGET_FUNCTION_ARG_PADDING: usually pad
upward, but pad short args downward on big-endian machines.  */
 
Index: gcc/function.c
===
--- gcc/function.c  2017-09-13 20:12:24.498282217 +0100
+++ gcc/function.c  2017-09-13 20:14:46.244854334 +0100
@@ -4249,9 +4249,8 @@ locate_and_pad_parm (machine_mode passed
   locate->size.constant -= part_size_in_regs;
 }
 
-#ifdef FUNCTION_ARG_OFFSET
-  locate->offset.constant += FUNCTION_ARG_OFFSET (passed_mode, type);
-#endif
+  locate->offset.constant
++= targetm.calls.function_arg_offset (passed_mode, type);
 }
 
 /* Round the stack offset in *OFFSET_PTR up to a multiple of BOUNDARY.
Index: gcc/doc/tm.texi.in
===
--- gcc/doc/tm.texi.in  2017-09-13 20:12:24.496465170 +0100
+++ gcc/doc/tm.texi.in  2017-09-13 20:14:46.243947015 +0100
@@ -3281,13 +3281,7 @@ argument @var{libname} exists for symmet
 
 @hook TARGET_FUNCTION_ARG_ADVANCE
 
-@defmac FUNCTION_ARG_OFFSET (@var{mode}, @var{type})
-If defined, a C expression that is the number of bytes to add to the
-offset of the argument passed in memory.  This is needed for the SPU,
-which passes @code{char} and @code{short} arguments in the preferred
-slot that is in the middle of the quad word instead of starting at the
-top.
-@end defmac
+@hook TARGET_FUNCTION_ARG_OFFSET
 
 @hook TARGET_FUNCTION_ARG_PADDING
 
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi 2017-09-13 20:12:24.496465170 +0100
+++ gcc/doc/tm.texi 2017-09-13 20:14:46.243947015 +0100
@@ -4079,13 +4079,13 @@ on the stack.  The compiler knows how to
 used for arguments without any special help.
 @end 

Turn TRULY_NOOP_TRUNCATION into a hook

2017-09-13 Thread Richard Sandiford
I'm not sure the documentation is correct that outprec is always less
than inprec, and each non-default implementation tested for the case
in which it wasn't, but the patch leaves it as-is.

The SH port had a couple of TRULY_NOOP_TRUNCATION tests that were left
over from the old shmedia port.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by comparing the testsuite assembly output on at least one
target per CPU directory.  OK to install?

Richard


2017-09-13  Richard Sandiford  
Alan Hayard  
David Sherwood  

gcc/
* target.def (truly_noop_truncation): New hook.
(mode_rep_extended): Refer to TARGET_TRULY_NOOP_TRUNCATION rather
than TRULY_NOOP_TRUNCATION.
* hooks.h (hook_bool_uint_uint_true): Declare.
* hooks.c (hook_bool_uint_uint_true): New function.
* doc/tm.texi.in (TRULY_NOOP_TRUNCATION): Replace with...
(TARGET_TRULY_NOOP_TRUNCATION): ...this.
* doc/tm.texi: Regenerate.
* combine.c (make_extraction): Refer to TARGET_TRULY_NOOP_TRUNCATION
rather than TRULY_NOOP_TRUNCATION in comments.
(simplify_comparison): Likewise.
(record_truncated_value): Likewise.
* expmed.c (extract_bit_field_1): Likewise.
(extract_split_bit_field): Likewise.
* convert.c (convert_to_integer_1): Use targetm.truly_noop_truncation
instead of TRULY_NOOP_TRUNCATION.
* function.c (assign_parm_setup_block): Likewise.
* machmode.h (TRULY_NOOP_TRUNCATION_MODES_P): Likewise.
* rtlhooks.c: Include target.h.
* config/aarch64/aarch64.h (TRULY_NOOP_TRUNCATION): Delete.
* config/alpha/alpha.h (TRULY_NOOP_TRUNCATION): Delete.
* config/arc/arc.h (TRULY_NOOP_TRUNCATION): Delete.
* config/arm/arm.h (TRULY_NOOP_TRUNCATION): Delete.
* config/avr/avr.h (TRULY_NOOP_TRUNCATION): Delete.
* config/bfin/bfin.h (TRULY_NOOP_TRUNCATION): Delete.
* config/c6x/c6x.h (TRULY_NOOP_TRUNCATION): Delete.
* config/cr16/cr16.h (TRULY_NOOP_TRUNCATION): Delete.
* config/cris/cris.h (TRULY_NOOP_TRUNCATION): Delete.
* config/epiphany/epiphany.h (TRULY_NOOP_TRUNCATION): Delete.
* config/fr30/fr30.h (TRULY_NOOP_TRUNCATION): Delete.
* config/frv/frv.h (TRULY_NOOP_TRUNCATION): Delete.
* config/ft32/ft32.h (TRULY_NOOP_TRUNCATION): Delete.
* config/h8300/h8300.h (TRULY_NOOP_TRUNCATION): Delete.
* config/i386/i386.h (TRULY_NOOP_TRUNCATION): Delete.
* config/ia64/ia64.h (TRULY_NOOP_TRUNCATION): Delete.
* config/iq2000/iq2000.h (TRULY_NOOP_TRUNCATION): Delete.
* config/lm32/lm32.h (TRULY_NOOP_TRUNCATION): Delete.
* config/m32c/m32c.h (TRULY_NOOP_TRUNCATION): Delete.
* config/m32r/m32r.h (TRULY_NOOP_TRUNCATION): Delete.
* config/m68k/m68k.h (TRULY_NOOP_TRUNCATION): Delete.
* config/mcore/mcore.h (TRULY_NOOP_TRUNCATION): Delete.
* config/microblaze/microblaze.h (TRULY_NOOP_TRUNCATION): Delete.
* config/mips/mips.h (TRULY_NOOP_TRUNCATION): Delete.
* config/mips/mips.c (mips_truly_noop_truncation): New function.
(TARGET_TRULY_NOOP_TRUNCATION): Redefine.
* config/mips/mips.md: Refer to TARGET_TRULY_NOOP_TRUNCATION
rather than TRULY_NOOP_TRUNCATION in comments.
* config/mmix/mmix.h (TRULY_NOOP_TRUNCATION): Delete.
* config/mn10300/mn10300.h (TRULY_NOOP_TRUNCATION): Delete.
* config/moxie/moxie.h (TRULY_NOOP_TRUNCATION): Delete.
* config/msp430/msp430.h (TRULY_NOOP_TRUNCATION): Delete.
* config/nds32/nds32.h (TRULY_NOOP_TRUNCATION): Delete.
* config/nios2/nios2.h (TRULY_NOOP_TRUNCATION): Delete.
* config/nvptx/nvptx.h (TRULY_NOOP_TRUNCATION): Delete.
* config/pa/pa.h (TRULY_NOOP_TRUNCATION): Delete.
* config/pdp11/pdp11.h (TRULY_NOOP_TRUNCATION): Delete.
* config/powerpcspe/powerpcspe.h (TRULY_NOOP_TRUNCATION): Delete.
* config/riscv/riscv.h (TRULY_NOOP_TRUNCATION): Delete.
* config/riscv/riscv.md: Refer to TARGET_TRULY_NOOP_TRUNCATION
rather than TRULY_NOOP_TRUNCATION in comments.
* config/rl78/rl78.h (TRULY_NOOP_TRUNCATION): Delete.
* config/rs6000/rs6000.h (TRULY_NOOP_TRUNCATION): Delete.
* config/rx/rx.h (TRULY_NOOP_TRUNCATION): Delete.
* config/s390/s390.h (TRULY_NOOP_TRUNCATION): Delete.
* config/sh/sh.h (MAYBE_BASE_REGISTER_RTX_P): Remove
TRULY_NOOP_TRUNCATION condition.
(MAYBE_INDEX_REGISTER_RTX_P): Likewise.
(TRULY_NOOP_TRUNCATION): Delete.
* config/sparc/sparc.h (TRULY_NOOP_TRUNCATION): Delete.
* config/spu/spu.h (TRULY_NOOP_TRUNCATION): Delete.
* config/spu/spu.c (spu_truly_noop_truncation): New function.
(TARGET_TRULY_NOOP_TRUNCATION): Redefine.
* 

Turn CANNOT_CHANGE_MODE_CLASS into a hook

2017-09-13 Thread Richard Sandiford
This also seemed like a good opportunity to reverse the sense of the
hook to "can", to avoid the awkward double negative in !CANNOT.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by comparing the testsuite assembly output on at least one
target per CPU directory.  OK to install?

Richard


2017-09-13  Richard Sandiford  
Alan Hayard  
David Sherwood  

gcc/
* target.def (can_change_mode_class): New hook.
(mode_rep_extended): Refer to it instead of CANNOT_CHANGE_MODE_CLASS.
(hard_regno_nregs): Likewise.
* hooks.h (hook_bool_mode_mode_reg_class_t_true): Declare.
* hooks.c (hook_bool_mode_mode_reg_class_t_true): New function.
* doc/tm.texi.in (CANNOT_CHANGE_MODE_CLASS): Replace with...
(TARGET_CAN_CHANGE_MODE_CLASS): ...this.
(LOAD_EXTEND_OP): Update accordingly.
* doc/tm.texi: Regenerate.
* doc/rtl.texi: Refer to TARGET_CAN_CHANGE_MODE_CLASS instead of
CANNOT_CHANGE_MODE_CLASS.
* hard-reg-set.h (REG_CANNOT_CHANGE_MODE_P): Replace with...
(REG_CAN_CHANGE_MODE_P): ...this new macro.
* combine.c (simplify_set): Update accordingly.
* emit-rtl.c (validate_subreg): Likewise.
* recog.c (general_operand): Likewise.
* regcprop.c (mode_change_ok): Likewise.
* reload1.c (choose_reload_regs): Likewise.
(inherit_piecemeal_p): Likewise.
* rtlanal.c (simplify_subreg_regno): Likewise.
* postreload.c (reload_cse_simplify_set): Use REG_CAN_CHANGE_MODE_P
instead of CANNOT_CHANGE_MODE_CLASS.
(reload_cse_simplify_operands): Likewise.
* reload.c (push_reload): Use targetm.can_change_mode_class
instead of CANNOT_CHANGE_MODE_CLASS.
(push_reload): Likewise.  Also use REG_CAN_CHANGE_MODE_P instead of
REG_CANNOT_CHANGE_MODE_P.
* config/alpha/alpha.h (CANNOT_CHANGE_MODE_CLASS): Delete.
* config/alpha/alpha.c (alpha_can_change_mode_class): New function.
(TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
* config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Delete.
* config/arm/arm.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
(arm_can_change_mode_class): New function.
* config/arm/neon.md: Refer to TARGET_CAN_CHANGE_MODE_CLASS rather
than CANNOT_CHANGE_MODE_CLASS in comments.
* config/i386/i386.h (CANNOT_CHANGE_MODE_CLASS): Delete.
* config/i386/i386-protos.h (ix86_cannot_change_mode_class): Delete.
* config/i386/i386.c (ix86_cannot_change_mode_class): Replace with...
(ix86_can_change_mode_class): ...this new function, inverting the
sense of the return value.
(TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
* config/ia64/ia64.h (CANNOT_CHANGE_MODE_CLASS): Delete.
* config/ia64/ia64.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
(ia64_can_change_mode_class): New function.
* config/m32c/m32c.h (CANNOT_CHANGE_MODE_CLASS): Delete.
* config/m32c/m32c-protos.h (m32c_cannot_change_mode_class): Delete.
* config/m32c/m32c.c (m32c_cannot_change_mode_class): Replace with...
(m32c_can_change_mode_class): ...this new function, inverting the
sense of the return value.
(TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
* config/mips/mips.h (CANNOT_CHANGE_MODE_CLASS): Delete.
* config/mips/mips-protos.h (mips_cannot_change_mode_class): Delete.
* config/mips/mips.c (mips_cannot_change_mode_class): Replace with...
(mips_can_change_mode_class): ...this new function, inverting the
sense of the return value.
(TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
* config/msp430/msp430.h (CANNOT_CHANGE_MODE_CLASS): Delete.
* config/msp430/msp430.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
(msp430_can_change_mode_class): New function.
* config/nvptx/nvptx.h (CANNOT_CHANGE_MODE_CLASS): Delete.
* config/nvptx/nvptx.c (nvptx_can_change_mode_class): New function.
(TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
* config/pa/pa32-regs.h (CANNOT_CHANGE_MODE_CLASS): Delete.
* config/pa/pa64-regs.h (CANNOT_CHANGE_MODE_CLASS): Delete.
* config/pa/pa-protos.h (pa_cannot_change_mode_class): Delete.
* config/pa/pa.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.
(pa_cannot_change_mode_class): Replace with...
(pa_can_change_mode_class): ...this new function, inverting the
sense of the return value.
(pa_modes_tieable_p): Refer to TARGET_CAN_CHANGE_MODE_CLASS rather
than CANNOT_CHANGE_MODE_CLASS in comments.
* config/pdp11/pdp11.h (CANNOT_CHANGE_MODE_CLASS): Delete.
* config/pdp11/pdp11-protos.h (pdp11_cannot_change_mode_class): Delete.
* config/pdp11/pdp11.c (TARGET_CAN_CHANGE_MODE_CLASS): Redefine.

Re: [PATCH] Fix emission of exception dispatch (PR middle-end/82154).

2017-09-13 Thread Martin Liška

On 09/13/2017 08:41 PM, Jeff Law wrote:

On 09/13/2017 12:09 PM, Martin Liška wrote:

On 09/13/2017 04:22 PM, Jeff Law wrote:

On 09/13/2017 07:42 AM, Martin Liška wrote:

On 09/13/2017 03:08 PM, Martin Liška wrote:

On 09/12/2017 05:21 PM, Jeff Law wrote:

On 09/12/2017 01:43 AM, Martin Liška wrote:

Hello.

In transition to simple_case_node, I forgot to initialize m_high to m_low if a 
case

does not have CASE_HIGH.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.


Ready to be installed?
Martin

gcc/ChangeLog:

2017-09-11  Martin Liska  

 PR middle-end/82154
 * stmt.c (struct simple_case_node): Assign low to high when
 high is equal to null.

gcc/testsuite/ChangeLog:

2017-09-11  Martin Liska  

 PR middle-end/82154
 * g++.dg/torture/pr82154.C: New test.

OK.

THough I have to wonder if we should unify the HIGH handling -- having
two different conventions is bound to be confusing.

In a CASE_LABEL_EXPR the CASE_HIGH can be NULL, which means the label
refers to a singleton value that is found in CASE_LOW.

That means we end up doing stuff like this:

   if (CASE_HIGH (elt))
  maxval = fold_convert (index_type, CASE_HIGH (elt));
else
  maxval = fold_convert (index_type, CASE_LOW (elt));



You could legitimately argue for changing how this works for tree nodes

so that there's always a non-null CASE_HIGH.


Hi.

Agree with you that we have a lot of code that does what you just described.

I tent to change IL representation, where CASE_HIGH is always non-null.

$ git grep 'CASE_HIGH\>' | wc -l
125

Which is reasonable amount of code that should be changed. I'll prepare patch 
for that.



Hm, there's one question that pops up: Should we compare the values via pointer 
equality,

or tree_int_cst_eq should be done? The later one can messy the code a bit.


I'd like to say pointer equality, but it's probably safer to use
tree_int_cst_eq.

jeff



Ok, so I'll put it on my TODO list as it will take some time to do the 
transformation.

NP.  There's nothing inherently wrong with the code today, it's just a
cleanup (I hope :-)


Yes.






In order to fix the PR, I suggest following patch.

Ready for trunk?
Martin

0001-Fix-emission-of-exception-dispatch-PR-middle-end-821.patch


 From 1cd8987ad3de4a7bd3e71014a0df9ccb3be40277 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 11 Sep 2017 13:34:41 +0200
Subject: [PATCH] Fix emission of exception dispatch (PR middle-end/82154).

gcc/ChangeLog:

2017-09-11  Martin Liska  

PR middle-end/82154
* stmt.c (expand_sjlj_dispatch_table): Use CASE_LOW when
CASE_HIGH is NULL_TREE.

gcc/testsuite/ChangeLog:

2017-09-11  Martin Liska  

PR middle-end/82154
* g++.dg/torture/pr82154.C: New test.

OK.

Jeff



Thanks, I've just installed it.

Martin


Re: [PATCH] [ARC] Check the assembler for gdwarf2 support.

2017-09-13 Thread Jeff Law
On 09/12/2017 05:06 AM, Claudiu Zissulescu wrote:
> From: claziss 
> 
> This small patch enables the gcc driver to pass dwarf related options to the 
> assembler.
> 
> Ok to apply?
> Claudiu
> 
> gcc/
> 2017-06-21  Claudiu Zissulescu  
> 
>   * configure.ac: Add arc and check if assembler supports gdwarf2.
>   * configure: Regenerate.
OK.
jeff


Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling

2017-09-13 Thread Jeff Law
On 09/12/2017 09:40 AM, Tsimbalist, Igor V wrote:
> 
>> -Original Message-
>> From: Jeff Law [mailto:l...@redhat.com]
>> Sent: Friday, August 25, 2017 10:32 PM
>> To: Richard Biener ; Tsimbalist, Igor V
>> 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
>>
>> On 08/15/2017 07:42 AM, Richard Biener wrote:
>>>
>>> Please change the names to omit 'with_', thus just notrack and
>>> GF_CALL_NOTRACK.
>>>
>>> I think 'notrack' is somewhat unspecific of a name, what prevented you
>>> to use 'nocet'?
>> I think we should look for something better than notrack.  I think "control
>> flow enforcement/CFE" is commonly used for this stuff.  CET is an Intel
>> marketing name IIRC.
>>
>> The tracking is for indirect branch/call targets.  So some combination of 
>> cfe,
>> branch/call and track should be sufficient.
> Still remaining question from me - is it ok to use 'notrack' as the attribute 
> name. I've asked Richard
> about this in this thread.
I tend to agree with Richi that "track" is a bit too generic.  no_cfe
might be better.  Or no_cfi, but cfi is commonly used to represent
call-frame-info :-)

jeff


Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling

2017-09-13 Thread Jeff Law
On 09/13/2017 11:07 AM, Tsimbalist, Igor V wrote:
>> -Original Message-
>> From: Tsimbalist, Igor V
>> Sent: Tuesday, September 12, 2017 5:59 PM
>> To: 'Jeff Law' ; 'gcc-patches@gcc.gnu.org' > patc...@gcc.gnu.org>
>> Cc: Tsimbalist, Igor V 
>> Subject: RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
>>
>>
>>> -Original Message-
>>> From: Jeff Law [mailto:l...@redhat.com]
>>> Sent: Friday, August 25, 2017 10:50 PM
>>> To: Tsimbalist, Igor V ; 'gcc-
>>> patc...@gcc.gnu.org' 
>>> Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
>>>
>>> On 08/01/2017 02:56 AM, Tsimbalist, Igor V wrote:
 Part#1. Add generic part for Intel CET enabling.

>>
>>> Q. Do we need to do anything with ICF (identical code folding) and CFE?
>>> Given two functions which have the same implementation in gimple,
>>> except that one has a notrack indirect call and the other has a
>>> tracked indirect call, what is proper behavior?  I think we'd keep
>>> them separate which implies we need to make sure the notrack attribute
>>> is part of the ICF hashing implementation.  It'd probably even be
>>> worth building a test for this :-)
>> Are you talking about a case when such two functions are inlined? Or there is
>> a possibility to merge function bodies if they are identical?
>>
>> I agree with you that the functions should be kept separate. I haven't looked
>> into such optimization in gcc so I need to learn it.
> I thought over this case and my conclusion is that nothing has to be done 
> regarding ICF.
> 
> First of all let's sync on a case we are talking about. A code template could 
> look like
> 
> fn1 definition
> {
>   
> }
> 
> fn2 definition with notrack attribute
> {
>   
> }
> 
> func definition
> {
>   ...
> }
> 
> Is it the case you are talking about? Let's consider different scenarios:
> 
> 1) calls to fn1 and fn2 are direct calls. In that case 'notrack' has no 
> effect on direct calls as they are
> assumed to be save (it applies to indirect calls only). ICF can be done here;
> 2) one of calls is an indirect call or both calls are indirect calls. If 
> compiler can prove what exact functions
> are called then indirect call(s) can be replaced by direct call(s) and that 
> gives us the case 1);
> 3) if compiler cannot prove what function is called it will keep the indirect 
> call and so there is nothing
> to do for ICF here. 
No, not the case I'm worried about.  Instead

fn1()
{
  indirect call where the signature is marked with notrack
}

fn2()
{
  indirect call where the signature is not marked with notrack
}


fn1 and fn2 would be subject to ICF which I think is wrong.

Essentially we're carrying semantic information in attributes that are
part of the type of the function pointer.  I think we need to include
those attributes when we hash and compare two objects for equality
within ICF.

Jeff


Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling

2017-09-13 Thread Jeff Law
On 09/12/2017 09:59 AM, Tsimbalist, Igor V wrote:

> 
>> Q. Do we need to do anything with ICF (identical code folding) and CFE?
>> Given two functions which have the same implementation in gimple, except
>> that one has a notrack indirect call and the other has a tracked indirect 
>> call,
>> what is proper behavior?  I think we'd keep them separate which implies we
>> need to make sure the notrack attribute is part of the ICF hashing
>> implementation.  It'd probably even be worth building a test for this :-)
> Are you talking about a case when such two functions are inlined? Or there is 
> a possibility to merge
> function bodies if they are identical?
The latter.  The compiler has a couple strategies when it finds
identical bodies.  I'm over-simplifying, but given two functions, A and
B.  If ICF hashing finds they are identical, then only one function
definition would be emitted.

So given two functions A & B.  They are identical except that A has an
indirect call and the signature of the call target has the notrack
attribute while B has an indirect call and the signature of the call
target does not have the notrack attribute.

A & B would be subject to ICF, but I don't think that's the right/safe
thing to do.  Or am I missing something?

jeff


Re: [PATCH] Fix emission of exception dispatch (PR middle-end/82154).

2017-09-13 Thread Jeff Law
On 09/13/2017 12:09 PM, Martin Liška wrote:
> On 09/13/2017 04:22 PM, Jeff Law wrote:
>> On 09/13/2017 07:42 AM, Martin Liška wrote:
>>> On 09/13/2017 03:08 PM, Martin Liška wrote:
 On 09/12/2017 05:21 PM, Jeff Law wrote:
> On 09/12/2017 01:43 AM, Martin Liška wrote:
>> Hello.
>>
>> In transition to simple_case_node, I forgot to initialize m_high to 
>> m_low if a case
>>
>> does not have CASE_HIGH.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression 
>> tests.
>>
>>
>> Ready to be installed?
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2017-09-11  Martin Liska  
>>
>> PR middle-end/82154
>> * stmt.c (struct simple_case_node): Assign low to high when
>> high is equal to null.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2017-09-11  Martin Liska  
>>
>> PR middle-end/82154
>> * g++.dg/torture/pr82154.C: New test.
> OK.
>
> THough I have to wonder if we should unify the HIGH handling -- having
> two different conventions is bound to be confusing.
>
> In a CASE_LABEL_EXPR the CASE_HIGH can be NULL, which means the label
> refers to a singleton value that is found in CASE_LOW.
>
> That means we end up doing stuff like this:
>
>   if (CASE_HIGH (elt))
>  maxval = fold_convert (index_type, CASE_HIGH (elt));
>else
>  maxval = fold_convert (index_type, CASE_LOW (elt));
>
>
>
> You could legitimately argue for changing how this works for tree nodes
>
> so that there's always a non-null CASE_HIGH.

 Hi.

 Agree with you that we have a lot of code that does what you just 
 described.

 I tent to change IL representation, where CASE_HIGH is always non-null.

 $ git grep 'CASE_HIGH\>' | wc -l
 125

 Which is reasonable amount of code that should be changed. I'll prepare 
 patch for that.

>>>
>>> Hm, there's one question that pops up: Should we compare the values via 
>>> pointer equality,
>>>
>>> or tree_int_cst_eq should be done? The later one can messy the code a bit.
>>>
>> I'd like to say pointer equality, but it's probably safer to use
>> tree_int_cst_eq.
>>
>> jeff
>>
> 
> Ok, so I'll put it on my TODO list as it will take some time to do the 
> transformation.
NP.  There's nothing inherently wrong with the code today, it's just a
cleanup (I hope :-)

> 
> 
> In order to fix the PR, I suggest following patch.
> 
> Ready for trunk?
> Martin
> 
> 0001-Fix-emission-of-exception-dispatch-PR-middle-end-821.patch
> 
> 
> From 1cd8987ad3de4a7bd3e71014a0df9ccb3be40277 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Mon, 11 Sep 2017 13:34:41 +0200
> Subject: [PATCH] Fix emission of exception dispatch (PR middle-end/82154).
> 
> gcc/ChangeLog:
> 
> 2017-09-11  Martin Liska  
> 
>   PR middle-end/82154
>   * stmt.c (expand_sjlj_dispatch_table): Use CASE_LOW when
>   CASE_HIGH is NULL_TREE.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-09-11  Martin Liska  
> 
>   PR middle-end/82154
>   * g++.dg/torture/pr82154.C: New test.
OK.

Jeff


Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-09-13 Thread Jeff Law
On 09/06/2017 03:55 AM, Jackson Woodruff wrote:
> On 08/30/2017 01:46 PM, Richard Biener wrote:
>> On Wed, Aug 30, 2017 at 11:46 AM, Jackson Woodruff
>>  wrote:
>>> On 08/29/2017 01:13 PM, Richard Biener wrote:

 On Tue, Aug 29, 2017 at 1:35 PM, Jackson Woodruff
  wrote:
>
> Hi all,
>
> Apologies again to those CC'ed, who (again) received this twice.
>
> Joseph: Yes you are correct. I misread the original thread, now fixed.
>
> Richard: I've moved the optimizations out of fold-const.c. One has
> been
> replicated in match.pd, and the other (x / C +- y / C -> (x +- y) / C)
> I've
> deleted as it only introduced a new optimization when running with the
> flags
> '-O0 -funsafe-math-optimizations'.


 Hmm, how did you verify that, that it only adds sth with -O0
 -funsafe-math-optimizations?
>>>
>>>
>>> By checking with various flags, although not exhaustively. I looked for
>>> reasons for the behavior in match.pd (explained below).
>>>
>>> I have also since discovered that the combinations of
>>> '-funsafe-math-optimizations -frounding-math' (at all levels) and
>>> '-fno-recriprocal-math -funsafe-math-optimizations' mean this pattern
>>> adds
>>> something.
>>>
 Is that because in GIMPLE the reassoc pass should do this transform?
>>>
>>> It's because the pattern that changes (X / C) -> X * (1 / C) is gated
>>> with
>>> O1:
>>>
>>>(for cst (REAL_CST COMPLEX_CST VECTOR_CST)
>>> (simplify
>>>  (rdiv @0 cst@1)
>>> ->(if (optimize)
>>> -> (if (flag_reciprocal_math
>>>&& !real_zerop (@1))
>>>(with
>>> { tree tem = const_binop (RDIV_EXPR, type, build_one_cst
>>> (type), @1);
>>> }
>>> (if (tem)
>>>  (mult @0 { tem; } )))
>>>(if (cst != COMPLEX_CST)
>>> (with { tree inverse = exact_inverse (type, @1); }
>>>  (if (inverse)
>>>   (mult @0 { inverse; } 
>>>
>>>
>>> I've flagged the two lines that are particularly relevant to this.
>>
>> So this means we go x / (C * y) -> (x / C) / y -> (x * (1/C)) / y
>> why's that in any way preferable?  I suppose this is again to enable
>> the recip pass to detect / y (as opposed to / (C * y))?  What's the
>> reason to believe that / y is more "frequent"?
>>
>>> Removing this pattern, as I would expect, means that the divisions in
>>> the
>>> above optimization (and the one further down) are not removed.
>>>
>>> So then there is the question of edge cases. This pattern is
>>> (ignoring the
>>> second case) going to fail when const_binop returns null. Looking
>>> through
>>> that function says that it will fail (for reals) when:
>>>
>>> - Either argument is null (not the case)
>>>
>>> - The operation is not in the list (PLUS_EXPR,
>>>MINUS_EXPR, MULT_EXPR, RDIV_EXPR, MIN_EXPR, MAX_EXPR)
>>>(again not the case)
>>>
>>> - We honor Signalling NaNs and one of the operands is a sNaN.
>>>
>>> - The operation is a division, and the second argument is zero
>>>and dividing by 0.0 raises an exception.
>>>
>>> - The result is infinity and neither of the operands were infinity
>>>and flag_trapping_math is set.
>>>
>>> - The result isn't exact and flag_rounding_math is set.
>>>
>>>
>>> For (x / ( y * C) -> (x / C) / y), I will add some gating for each of
>>> these
>>> so that the pattern is never executed if one of these would be the case.
>>
>> Why not transform this directly to (x * (1/C)) / y then (and only then)?
>> That makes it obvious not two divisions prevail.
> 
> Done.
> 
>>
>> That said, I'm questioning this canonicalization.  I can come up with a
>> testcase where it makes things worse:
>>
>>   tem = x / (y * C);
>>   tem2 = z / (y * C);
>>
>> should generate
>>
>>   rdivtmp = 1 / (y*C);
>>   tem = x *rdivtmp;
>>   tem2= z * rdivtmp;
>>
>> instead of
>>
>>   rdivtmp = 1/y;
>>   tem = x * 1/C * rdivtmp;
>>   tem2 = z * 1/C * rdivtmp;
> 
> Ideally we would be able to CSE that into
> 
> rdivtmp = 1/y * 1/C;
> tem = x * rdivtmp;
> tem2 = z * rdivtmp;
So why is your sequence significantly better than Richi's desired
seqeuence?  They both seem to need 3 mults and a division (which in both
cases might be a reciprocal estimation).In Richi's sequence we have
to mult and feed the result as an operand into the reciprocal insn.  In
yours we feed the result of the reciprocal into the multiply.

ISTM in the end if  y*C or 1/(y*C) is a CSE, then Richi's sequence wins.
 Similarly if 1/y is a CSE, then yours wins.  Is there some reason to
believe that one is a more likely CSE than the other?

> Although we currently do not. An equally (perhaps more?) problematic
> case is something like:
> 
> tem = x / (y * C)
> tem2 = y * C
> 
> Which becomes:
> 
> tem = x * (1 / C) / y
> tem2 = y * C
> 
> Instead of
> 
> K = y * C
> tem = x / K
> tem2 = K
> 
> Which ultimately requires context awareness to avoid. This does seem to
> be a general problem 

Re: [PATCH] Fix emission of exception dispatch (PR middle-end/82154).

2017-09-13 Thread Martin Liška

On 09/13/2017 04:22 PM, Jeff Law wrote:

On 09/13/2017 07:42 AM, Martin Liška wrote:

On 09/13/2017 03:08 PM, Martin Liška wrote:

On 09/12/2017 05:21 PM, Jeff Law wrote:

On 09/12/2017 01:43 AM, Martin Liška wrote:

Hello.

In transition to simple_case_node, I forgot to initialize m_high to m_low if a 
case
does not have CASE_HIGH.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2017-09-11  Martin Liska  

PR middle-end/82154
* stmt.c (struct simple_case_node): Assign low to high when
high is equal to null.

gcc/testsuite/ChangeLog:

2017-09-11  Martin Liska  

PR middle-end/82154
* g++.dg/torture/pr82154.C: New test.

OK.

THough I have to wonder if we should unify the HIGH handling -- having
two different conventions is bound to be confusing.

In a CASE_LABEL_EXPR the CASE_HIGH can be NULL, which means the label
refers to a singleton value that is found in CASE_LOW.

That means we end up doing stuff like this:

  if (CASE_HIGH (elt))
 maxval = fold_convert (index_type, CASE_HIGH (elt));
   else
 maxval = fold_convert (index_type, CASE_LOW (elt));



You could legitimately argue for changing how this works for tree nodes
so that there's always a non-null CASE_HIGH.


Hi.

Agree with you that we have a lot of code that does what you just described.
I tent to change IL representation, where CASE_HIGH is always non-null.

$ git grep 'CASE_HIGH\>' | wc -l
125

Which is reasonable amount of code that should be changed. I'll prepare patch 
for that.


Hm, there's one question that pops up: Should we compare the values via pointer 
equality,
or tree_int_cst_eq should be done? The later one can messy the code a bit.

I'd like to say pointer equality, but it's probably safer to use
tree_int_cst_eq.

jeff



Ok, so I'll put it on my TODO list as it will take some time to do the 
transformation.

In order to fix the PR, I suggest following patch.

Ready for trunk?
Martin
>From 1cd8987ad3de4a7bd3e71014a0df9ccb3be40277 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 11 Sep 2017 13:34:41 +0200
Subject: [PATCH] Fix emission of exception dispatch (PR middle-end/82154).

gcc/ChangeLog:

2017-09-11  Martin Liska  

	PR middle-end/82154
	* stmt.c (expand_sjlj_dispatch_table): Use CASE_LOW when
	CASE_HIGH is NULL_TREE.

gcc/testsuite/ChangeLog:

2017-09-11  Martin Liska  

	PR middle-end/82154
	* g++.dg/torture/pr82154.C: New test.
---
 gcc/stmt.c |  6 ++--
 gcc/testsuite/g++.dg/torture/pr82154.C | 50 ++
 2 files changed, 54 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr82154.C

diff --git a/gcc/stmt.c b/gcc/stmt.c
index 39d29ff3da9..92bd209ad64 100644
--- a/gcc/stmt.c
+++ b/gcc/stmt.c
@@ -1063,8 +1063,10 @@ expand_sjlj_dispatch_table (rtx dispatch_index,
   for (int i = ncases - 1; i >= 0; --i)
 	{
 	  tree elt = dispatch_table[i];
-	  case_list.safe_push (simple_case_node (CASE_LOW (elt),
-		 CASE_HIGH (elt),
+	  tree high = CASE_HIGH (elt);
+	  if (high == NULL_TREE)
+	high = CASE_LOW (elt);
+	  case_list.safe_push (simple_case_node (CASE_LOW (elt), high,
 		 CASE_LABEL (elt)));
 	}
 
diff --git a/gcc/testsuite/g++.dg/torture/pr82154.C b/gcc/testsuite/g++.dg/torture/pr82154.C
new file mode 100644
index 000..f4e1c3ea139
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr82154.C
@@ -0,0 +1,50 @@
+// { dg-do compile }
+// { dg-additional-options "-Wno-deprecated" }
+
+namespace a {
+int b;
+class c
+{
+};
+}
+class g
+{
+public:
+  g ();
+};
+using a::b;
+class d
+{
+public:
+  d ();
+  void e ();
+};
+class f
+{
+  d
+  i ()
+  {
+static d j;
+  }
+  int *k () throw (a::c);
+};
+
+
+int *f::k () throw (a::c)
+{
+  static g h;
+  i ();
+  int l = 2;
+  while (l)
+{
+  --l;
+  try
+	{
+	  operator new (b);
+	}
+  catch (a::c)
+	{
+	}
+}
+  i ().e ();
+}
-- 
2.14.1



[PATCH] Fix PR target/82066 - target pragma and attribute documentation for ARM, AArch64, and S/390

2017-09-13 Thread Steve Ellcey
This patch fixes the documentation issues pointed out in PR target/82066.
It may be considered obvious enough to just check in but I'd rather have
someone look it over to make sure I didn't mess something up.

Steve Ellcey
sell...@cavium.com


2017-09-13  Steve Ellcey  

PR target/82066
* doc/extend.texi (Common Function Attributes): Add 
references to ARM, AArch64, and S/390 specific attributes.
(Function Specific Option Pragmas): Add AArch64 and S/390
to list of back ends that support the target pragma.


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cd5733e..32ccace 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3299,7 +3299,8 @@ or separate the options with a comma (@samp{,}) within a 
single string.
 
 The options supported are specific to each target; refer to @ref{x86
 Function Attributes}, @ref{PowerPC Function Attributes},
-@ref{ARM Function Attributes},and @ref{Nios II Function Attributes},
+@ref{ARM Function Attributes}, @ref{AArch64 Function Attributes},
+@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
 for details.
 
 @item target_clones (@var{options})
@@ -21839,7 +21840,7 @@ function.  The parenthesis around the options is 
optional.
 @code{target} attribute and the attribute syntax.
 
 The @code{#pragma GCC target} pragma is presently implemented for
-x86, PowerPC, and Nios II targets only.
+x86, ARM, AArch64, PowerPC, S/390, and Nios II targets only.
 @end table
 
 @table @code


Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions

2017-09-13 Thread Joseph Myers
On Wed, 13 Sep 2017, Martin Jambor wrote:

> I was just surprised by the glibc check, what would you consider a
> recent-enough glibc?  Or is the check mainly necessary to ensure we
> are indeed using glibc and not some other libc (and thus something
> like we do for TARGET_LIBC_PROVIDES_SSP would do)?

It looks like SSE4.1 {ceil,floor,rint}{,f} were added in glibc commit 
ad0f5cad15f1c76faf3843b3e189dead2c05cfcc, nearbyint{,f} in 
581d30e386b9567b973a65d0bc82af782ac078ed, so 2.15 or later for all those 
functions (the target glibc version is known when GCC is configured, 
whether from configure examining headers or from --with-glibc-version).

glibc does not have SSE4.1 {trunc,roundeven}{,f} at present (missing trunc 
is ).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions

2017-09-13 Thread Martin Jambor
Hello,

I apologize for not coming back to this, I keep on getting distracted.
Anyway...

On Tue, Aug 15, 2017 at 02:20:55PM +, Joseph Myers wrote:
> On Tue, 15 Aug 2017, Martin Jambor wrote:
> 
> > I am not sure what to do about this, to me it seems that the
> > -ffp-int-builtin-inexact simply has a wrong default value, at least
> > for x86_64, as it was added in order not to slow code down but does
> > exactly that (all of the slowdown of course disappears when
> > -fno-fp-int-builtin-inexact is used).
> > 
> > Or is the situation somehow more complex?
> 
> It's supposed to be that -ffp-int-builtin-inexact allows inexact to be 
> raised, and is on by default, and -fno-fp-int-builtin-inexact is the 
> nondefault option that disallows it from being raised and may result in 
> slower code generation.
> 
> As I understand it, your issue is actually with inline SSE expansions of 
> certain functions.  Before my patch, those had !flag_trapping_math 
> conditionals.  My patch changed that to the logically correct 
> (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact), that 
> being the conditions under which the expansion in question is correct.  
> Your problem is that the expansion, though correct under those conditions, 
> is slow compared to an IFUNC implementation of the library function.

...that is exactly right (modulo the fact that TARGET_ROUND meanwhile
became TARGET_SSE4_1.

> 
> Maybe that means that expansion should be disabled under some conditions 
> where it is correct but suboptimal.  It should be kept for TARGET_ROUND, 
> because then it's expanding to a single instruction.  But for 
> !TARGET_ROUND, it's a tuning question (e.g. if tuning for a processor that 
> would satisfy TARGET_ROUND, or for -mtune=generic, and building with 
> recent-enough glibc, the expansion should be avoided as suboptimal, on the 
> expectation that at runtime an IFUNC is likely to be available - or given 
> the size of the generic SSE expansion, maybe it should be avoided more 
> generally than that).

This seems to me the best solution.  SSE 4.1 is 11 years old, we
should be tuning for it in generic tuning.  That is also the reason
why I do not think run-time checks for SSE 4.1 or an attempt at an
internal IFUNC are a good idea (or justified effort).

I was just surprised by the glibc check, what would you consider a
recent-enough glibc?  Or is the check mainly necessary to ensure we
are indeed using glibc and not some other libc (and thus something
like we do for TARGET_LIBC_PROVIDES_SSP would do)?

I will try to come up with a patch.

Thanks,

Martin


Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

2017-09-13 Thread Kyrill Tkachov

Hi Steve,

On 13/09/17 17:50, Steve Ellcey wrote:

This is a patch for PR target/77729 on aarch64.  The code is doing an
unneeded zero extend ('uxtb' in the original report, 'and' in the ToT 
sources).


The patch looks a bit odd, it is a specialized define_insn for the combine
pass.  At some point in combine (I never did find out where), the 
zero_extend

is converted to an AND so my instruction looks for an OR of a constant
and an AND expression where one operand of the AND is a subreg and the 
other

is a constant.  If the two constants add up to 255 that means that the AND
is being used to mask out the upper bits of the register while not messing
up the constant we are using in the OR expression.

I also had to recognize this in the aarch64 cost function or combine would
not use the new expression even when it recognized it as it thought it 
cost

more than the original uncombined expressions.

Tested on aarch64 with a bootstrap and testsuite run that had no 
regressions.


OK to checkin?


2017-09-13  Steve Ellcey  

PR target/77729
* config/aarch64/aarch64.c (aarch64_rtx_costs):
Handle cost of *iorqi3_uxtw instruction.
* config/aarch64/aarch64.md (*iorqi3_uxtw): New
instruction for combine phase.


2017-09-13  Steve Ellcey  

* gcc.target/aarch64/pr77729.c: New test.


+;; Specialized OR instruction for combiner.  The AND is masking out bits
+;; not needed in the OR (doing a zero_extend).  The zero_extend is not
+;; needed because we know from the subreg that the upper part of the reg
+;; is zero.
+(define_insn "*iorqi3_uxtw"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(ior:SI (and:SI
+ (subreg:SI (match_operand:QI 1 "register_operand" "r") 0)
+ (match_operand:SI 2 "const_int_operand" "n"))
+   (match_operand:SI 3 "aarch64_logical_operand" "K")))]
+  "INTVAL (operands[2]) + INTVAL (operands[3]) == 255"
+  "orr\\t%w0, %w1, %3"
+  [(set_attr "type" "logic_imm")]
+)

We are usually hesitant to add explicit subreg matching in the MD pattern
(though I don't remember if there's a hard rule against it).
In this case this looks like a missing simplification from combine 
(simplify-rtx) so
I think adding it there would be better.

Also, in this pattern operands[3] has the aarch64_logical_operand predicate 
which allows registers
but in the final instruction check below you unconditionally take its INTVAL 
which will throw an error
if the compiler is configured with RTL checking enabled.

Thanks,
Kyrill




RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling

2017-09-13 Thread Tsimbalist, Igor V
> -Original Message-
> From: Tsimbalist, Igor V
> Sent: Tuesday, September 12, 2017 5:59 PM
> To: 'Jeff Law' ; 'gcc-patches@gcc.gnu.org'  patc...@gcc.gnu.org>
> Cc: Tsimbalist, Igor V 
> Subject: RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> 
> 
> > -Original Message-
> > From: Jeff Law [mailto:l...@redhat.com]
> > Sent: Friday, August 25, 2017 10:50 PM
> > To: Tsimbalist, Igor V ; 'gcc-
> > patc...@gcc.gnu.org' 
> > Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> >
> > On 08/01/2017 02:56 AM, Tsimbalist, Igor V wrote:
> > > Part#1. Add generic part for Intel CET enabling.
> > >
>
> > Q. Do we need to do anything with ICF (identical code folding) and CFE?
> > Given two functions which have the same implementation in gimple,
> > except that one has a notrack indirect call and the other has a
> > tracked indirect call, what is proper behavior?  I think we'd keep
> > them separate which implies we need to make sure the notrack attribute
> > is part of the ICF hashing implementation.  It'd probably even be
> > worth building a test for this :-)
> Are you talking about a case when such two functions are inlined? Or there is
> a possibility to merge function bodies if they are identical?
> 
> I agree with you that the functions should be kept separate. I haven't looked
> into such optimization in gcc so I need to learn it.
I thought over this case and my conclusion is that nothing has to be done 
regarding ICF.

First of all let's sync on a case we are talking about. A code template could 
look like

fn1 definition
{
  
}

fn2 definition with notrack attribute
{
  
}

func definition
{
  ...
}

Is it the case you are talking about? Let's consider different scenarios:

1) calls to fn1 and fn2 are direct calls. In that case 'notrack' has no effect 
on direct calls as they are
assumed to be save (it applies to indirect calls only). ICF can be done here;
2) one of calls is an indirect call or both calls are indirect calls. If 
compiler can prove what exact functions
are called then indirect call(s) can be replaced by direct call(s) and that 
gives us the case 1);
3) if compiler cannot prove what function is called it will keep the indirect 
call and so there is nothing
to do for ICF here. 

Thanks,
Igor

> 
> > >  }
> > >
> > >
> > > +/* Return true if call GS is marked as no-track.  */
> > > +
> > > +static inline bool
> > > +gimple_call_with_notrack_p (const gcall *gs) {
> > > +  return (gs->subcode & GF_CALL_WITH_NOTRACK) != 0; }
> > > +
> > > +static inline bool
> > > +gimple_call_with_notrack_p (const gimple *gs) {
> > > +  const gcall *gc = GIMPLE_CHECK2 (gs);
> > > +  return gimple_call_with_notrack_p (gc); }
> > Agree with Richi WRT avoiding gimple * overloads.
> Fixed.
> 
> Thanks,
> Igor
> 
> >
> >
> > Jeff


Re: [PATCH] Add a -Wcast-align=strict warning

2017-09-13 Thread Joseph Myers
What does this warning do in cases where a type has different alignments 
inside and outside structs?  I'm thinking of something like

struct s { long long x; } *p;
/* ... */
(long long *)p

on 32-bit x86 - where long long's preferred alignment is 8 bytes, but in 
structures it's 4 bytes.  (Likewise for double in place of long long.)  I 
think a warning for a (long long *)p cast might be surprising in that 
case.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: C/C++ PATCH to add __remove_qualifiers (PR c/65455, c/39985)

2017-09-13 Thread Joseph Myers
This patch (for C) is setting c_inhibit_evaluation_warnings and 
in_remove_qualifiers and doing corresponding use of pop_maybe_used, but I 
don't see the need for that.  Expressions within the argument to 
__remove_qualifiers are evaluated exactly if they would be evaluated in 
the containing context; it's not like sizeof or typeof (expression) which 
can cause expressions not to be evaluated.  (I don't actually think any of 
that handling is needed for typeof (type) either, it simply happens that 
the initial adjustments of c_inhibit_evaluation_warnings and in_typeof are 
done in code shared by typeof (expression) and typeof (type).)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [AArch64, PATCH] Improve Neon store of zero

2017-09-13 Thread James Greenhalgh
On Wed, Sep 13, 2017 at 05:34:56PM +0100, Jackson Woodruff wrote:
> Hi,
> 
> I have addressed the issues you raised below.
> 
> Is the amended patch OK for trunk?

Yes, thanks.

Committed as revision 252387.

Cheers,
James



[PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

2017-09-13 Thread Steve Ellcey
This is a patch for PR target/77729 on aarch64.  The code is doing an
unneeded zero extend ('uxtb' in the original report, 'and' in the ToT sources).

The patch looks a bit odd, it is a specialized define_insn for the combine
pass.  At some point in combine (I never did find out where), the zero_extend
is converted to an AND so my instruction looks for an OR of a constant
and an AND expression where one operand of the AND is a subreg and the other
is a constant.  If the two constants add up to 255 that means that the AND
is being used to mask out the upper bits of the register while not messing
up the constant we are using in the OR expression.

I also had to recognize this in the aarch64 cost function or combine would
not use the new expression even when it recognized it as it thought it cost
more than the original uncombined expressions.

Tested on aarch64 with a bootstrap and testsuite run that had no regressions.

OK to checkin?


2017-09-13  Steve Ellcey  

PR target/77729
* config/aarch64/aarch64.c (aarch64_rtx_costs):
Handle cost of *iorqi3_uxtw instruction.
* config/aarch64/aarch64.md (*iorqi3_uxtw): New
instruction for combine phase.


2017-09-13  Steve Ellcey  

* gcc.target/aarch64/pr77729.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index de1fbdc..5266347 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7433,6 +7433,19 @@ cost_plus:
 
   return true;
 }
+  /* Special cost test for *iorqi3_uxtw where the AND can be removed.  */
+  if (GET_MODE (x) == SImode
+	  && GET_CODE (XEXP (x, 0)) == AND
+	  && CONST_INT_P (XEXP (x, 1)))
+	{
+	  op0 = XEXP (XEXP (x, 0), 0);
+	  op1 = XEXP (XEXP (x, 0), 1);
+	  if (REG_P (SUBREG_REG (op0))
+	  && GET_MODE (SUBREG_REG (op0)) == QImode
+	  && CONST_INT_P (op1))
+	if (INTVAL (XEXP (x, 1)) + INTVAL (op1) == 255)
+	  return true;
+	}
 /* Fall through.  */
 case XOR:
 case AND:
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index f8cdb06..6934c15 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3441,6 +3441,21 @@
   [(set_attr "type" "logic_reg,logic_imm")]
 )
 
+;; Specialized OR instruction for combiner.  The AND is masking out bits
+;; not needed in the OR (doing a zero_extend).  The zero_extend is not
+;; needed because we know from the subreg that the upper part of the reg
+;; is zero.
+(define_insn "*iorqi3_uxtw"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(ior:SI (and:SI
+		  (subreg:SI (match_operand:QI 1 "register_operand" "r") 0)
+		  (match_operand:SI 2 "const_int_operand" "n"))
+		(match_operand:SI 3 "aarch64_logical_operand" "K")))]
+  "INTVAL (operands[2]) + INTVAL (operands[3]) == 255"
+  "orr\\t%w0, %w1, %3"
+  [(set_attr "type" "logic_imm")]
+)
+
 (define_insn "*and3_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
 	(compare:CC_NZ
diff --git a/gcc/testsuite/gcc.target/aarch64/pr77729.c b/gcc/testsuite/gcc.target/aarch64/pr77729.c
index e69de29..2fcda9a 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr77729.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr77729.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int TrieCase3_v1(const char *string)
+{
+if((string[0] | 32) == 't') {
+if((string[1] | 32) == 'a') {
+if((string[2] | 32) == 'g') {
+return 42;
+}
+}
+}
+return -1;
+}
+
+int TrieCase3_v2(const char *string)
+{
+switch(string[0] | 32) {
+case 't':
+switch(string[1] | 32) {
+case 'a':
+switch(string[2] | 32) {
+case 'g':
+return 42;
+}
+}
+}
+return -1;
+}
+
+/* { dg-final { scan-assembler-not "and" } } */
+/* { dg-final { scan-assembler-not "uxtb" } } */


Re: C/C++ PATCH to add __remove_qualifiers (PR c/65455, c/39985)

2017-09-13 Thread Joseph Myers
On Sat, 9 Sep 2017, Jason Merrill wrote:

> > +@code{__remove_qualifiers} takes a typename as an argument:
> 
> I think it would be better to use the term "type-id" here, to avoid
> confusion with "type-name", which is only a single identifier.

There's no such thing as a type-id in C, and type-name is the term used in 
C syntax (for what goes in a cast, etc.).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [AArch64, PATCH] Improve Neon store of zero

2017-09-13 Thread Jackson Woodruff

Hi,

I have addressed the issues you raised below.

Is the amended patch OK for trunk?

Thanks,

Jackson.

On 09/12/2017 05:28 PM, James Greenhalgh wrote:

On Wed, Sep 06, 2017 at 10:02:52AM +0100, Jackson Woodruff wrote:

Hi all,

I've attached a new patch that addresses some of the issues raised with
my original patch.

On 08/23/2017 03:35 PM, Wilco Dijkstra wrote:

Richard Sandiford wrote:


Sorry for only noticing now, but the call to aarch64_legitimate_address_p
is asking whether the MEM itself is a legitimate LDP/STP address.  Also,
it might be better to pass false for strict_p, since this can be called
before RA.  So maybe:

 if (GET_CODE (operands[0]) == MEM
&& !(aarch64_simd_imm_zero (operands[1], mode)
 && aarch64_mem_pair_operand (operands[0], mode)))


There were also some issues with the choice of mode for the call the
aarch64_mem_pair_operand.

For a 128-bit wide mode, we want to check `aarch64_mem_pair_operand
(operands[0], DImode)` since that's what the stp will be.

For a 64-bit wide mode, we don't need to do that check because a normal
`str` can be issued.

I've updated the condition as such.



Is there any reason for doing this check at all (or at least this early during
expand)?


Not doing this check means that the zero is forced into a register, so
we then carry around a bit more RTL and rely on combine to merge things.



There is a similar issue with this part:

   (define_insn "*aarch64_simd_mov"
 [(set (match_operand:VQ 0 "nonimmediate_operand"
-   "=w, m,  w, ?r, ?w, ?r, w")
+   "=w, Ump,  m,  w, ?r, ?w, ?r, w")

The Ump causes the instruction to always split off the address offset. Ump
cannot be used in patterns that are generated before register allocation as it
also calls laarch64_legitimate_address_p with strict_p set to true.


I've changed the constraint to a new constraint 'Umq', that acts the
same as Ump, but calls aarch64_legitimate_address_p with strict_p set to
false and uses DImode for the mode to pass.


This looks mostly OK to me, but this conditional:


+  if (GET_CODE (operands[0]) == MEM
+  && !(aarch64_simd_imm_zero (operands[1], mode)
+  && ((GET_MODE_SIZE (mode) == 16
+   && aarch64_mem_pair_operand (operands[0], DImode))
+  || GET_MODE_SIZE (mode) == 8)))


Has grown a bit too big in such a general pattern to live without a comment
explaining what is going on.


+(define_memory_constraint "Umq"
+  "@internal
+   A memory address which uses a base register with an offset small enough for
+   a load/store pair operation in DI mode."
+   (and (match_code "mem")
+   (match_test "aarch64_legitimate_address_p (DImode, XEXP (op, 0),
+  PARALLEL, 0)")))


And here you want 'false' rather than '0'.

I'll happily merge the patch with those changes, please send an update.

Thanks,
James




ChangeLog:

gcc/

2017-08-29  Jackson Woodruff  

* config/aarch64/constraints.md (Umq): New constraint.
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov):
Change to use Umq.
(mov): Update condition.

gcc/testsuite

2017-08-29  Jackson Woodruff  

* gcc.target/aarch64/simd/vect_str_zero.c:
Update testcase.


diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f3e084f8778d70c82823b92fa80ff96021ad26db..c20e513f59a35f3410eae3eb0fdc2fc86352a9fc 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -23,10 +23,17 @@
 	(match_operand:VALL_F16 1 "general_operand" ""))]
   "TARGET_SIMD"
   "
-if (GET_CODE (operands[0]) == MEM
-	&& !(aarch64_simd_imm_zero (operands[1], mode)
-	 && aarch64_legitimate_address_p (mode, operands[0],
-	  PARALLEL, 1)))
+  /* Force the operand into a register if it is not an
+ immediate whose use can be replaced with xzr.
+ If the mode is 16 bytes wide, then we will be doing
+ a stp in DI mode, so we check the validity of that.
+ If the mode is 8 bytes wide, then we will do doing a
+ normal str, so the check need not apply.  */
+  if (GET_CODE (operands[0]) == MEM
+  && !(aarch64_simd_imm_zero (operands[1], mode)
+	   && ((GET_MODE_SIZE (mode) == 16
+		&& aarch64_mem_pair_operand (operands[0], DImode))
+	   || GET_MODE_SIZE (mode) == 8)))
   operands[1] = force_reg (mode, operands[1]);
   "
 )
@@ -126,7 +133,7 @@
 
 (define_insn "*aarch64_simd_mov"
   [(set (match_operand:VQ 0 "nonimmediate_operand"
-		"=w, Ump,  m,  w, ?r, ?w, ?r, w")
+		"=w, Umq,  m,  w, ?r, ?w, ?r, w")
 	(match_operand:VQ 1 "general_operand"
 		"m,  Dz, w,  w,  w,  r,  r, Dn"))]
   "TARGET_SIMD
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 9ce3d4efaf31a301dfb7c1772a6b685fb2cbd2ee..3649fb48a33454c208a6b81e051fdd316c495710 100644
--- a/gcc/config/aarch64/constraints.md
+++ 

Re: [ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated

2017-09-13 Thread Kyrill Tkachov

Hi Christophe,

On 13/09/17 16:23, Christophe Lyon wrote:

Hi,

On 12 October 2016 at 11:22, Christophe Lyon  wrote:

On 12 October 2016 at 11:14, Kyrill Tkachov  wrote:

On 12/10/16 09:59, Christophe Lyon wrote:

Hi Kyrill,

On 7 October 2016 at 17:00, Kyrill Tkachov 
wrote:

Hi Christophe,


On 07/09/16 21:05, Christophe Lyon wrote:

Hi,

The attached patch is a first part to solve PR 67591: it removes
several occurrences of "IT blocks containing 32-bit Thumb
instructions are deprecated in ARMv8" messages in the
gcc/g++/libstdc++/fortran testsuites.

It does not remove them all yet. This patch only modifies the
*cmp_and, *cmp_ior, *ior_scc_scc, *ior_scc_scc_cmp,
*and_scc_scc and *and_scc_scc_cmp patterns.
Additional work is required in sub_shiftsi etc, at least.
I've started looking at these, but I decided I could already
post this self-contained patch to check if this implementation
is OK.

Regarding *cmp_and and *cmp_ior patterns, the addition of the
enabled_for_depr_it attribute is aggressive in the sense that it keeps
only the alternatives with 'l' and 'Py' constraints, while in some
cases the constraints could be relaxed. Indeed, these 2 patterns can
swap their input comparisons, meaning that any of them can be emitted
in the IT-block, and is thus subject to the ARMv8 deprecation.
The generated code is possibly suboptimal in the cases where the
operands are not swapped, since 'r' could be used.

Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
and --with-cpu=cortex-a57 --with-mode=thumb, showing only improvements:


http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/239850-depr-it-4/report-build-info.html

Bootstrapped OK on armv8l HW.

Is this OK?

Thanks,

Christophe


   (define_insn_and_split "*ior_scc_scc"
-  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
+  [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
  (ior:SI (match_operator:SI 3 "arm_comparison_operator"
-[(match_operand:SI 1 "s_register_operand" "r")
- (match_operand:SI 2 "arm_add_operand" "rIL")])
+[(match_operand:SI 1 "s_register_operand" "r,l")
+ (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
  (match_operator:SI 6 "arm_comparison_operator"
-[(match_operand:SI 4 "s_register_operand" "r")
- (match_operand:SI 5 "arm_add_operand" "rIL")])))
+[(match_operand:SI 4 "s_register_operand" "r,l")
+ (match_operand:SI 5 "arm_add_operand" "rIL,lPy")])))

Can you please put the more restrictive alternatives (lPy) first?
Same with the other patterns your patch touches.
Ok with that change if a rebased testing run is ok.
Sorry for the delay in reviewing.


OK, I will update my patch accordingly.

However, when I discussed this with Ramana during the Cauldron,
he requested benchmark results. So far, I was able to run spec2006
on an APM machine, and I'm seeing performance changes in the
range 11% improvement (465.tonto) to 7% degradation (433.milc).

Would that be acceptable?


Those sound like quite large swings.

Indeed, but most are in the -1%-+1% range.


Are you sure the machine was not running anything else at the time
or playing tricks with frequency scaling?

No, I had no such guarantee. I used this machine temporarily,
first to check that bootstrap worked. I planed to use another
board with an A57 "standard" microarch for proper
benchmarking, but I'm not sure when I'll have access to it
(wrt to e/o gcc stage1), that's why I reported these early
figures.


Did all iterations of SPEC show a consistent difference?

If the changes are consistent, could you have a look at the codegen
to see if there are any clues to the differences?

I will update my patch according to your comment, re-run the bench
and have a deeper look at the codegen differences.


I have finally been able to run benchmarks with my patch updated
according to your comment, on new machines where we have
better control of the environment (frequency, etc...).

These machines use cortex-a57 CPUs and spec2006 shows little
difference with and without this patch.

Comparing several runs with and without the patch:
- gcc is about 1% slower
- povray about 1% faster
- omnetpp about 1.5% faster


Yeah, that looks line with what I'd expect for this change.


The others are in the noise level.

OK for trunk?


This is ok if a bootstrap and test run on top of a recent compiler 
configured for --with-thumb

and for armv8-a passes okay (to exercise the new code).
Thanks for persevering with this!

Kyrill



Thanks,

Christophe



I'd like to get an explanation for these differences before committing
this patch. If they are just an effect of the more restrictive constraints
then there may be not much we can do, but I'd like to make sure there's not
anything else unexpected going on.


Thanks,


Re: [PATCH] Factor out division by squares and remove division around comparisons (0/2)

2017-09-13 Thread Jackson Woodruff



On 09/13/2017 04:45 PM, Jeff Law wrote:

On 09/06/2017 03:54 AM, Jackson Woodruff wrote:

Hi all,

This patch is split from part (1/2). It includes the patterns that have
been moved out of fold-const.c


It also removes an (almost entirely) redundant pattern:

 (A / C1) +- (A / C2) -> A * (1 / C1 +- 1 / C2)

which was only used in special cases, either with combinations
of flags like -fno-reciprocal-math -funsafe-math-optimizations
and cases where C was sNaN, or small enough to result in infinity.

This pattern is covered by:

(A / C1) +- (A / C2) -> (with O1 and reciprocal math)
A * (1 / C1) +- A * (1 / C2) ->
A * (1 / C1 +- 1 / C2)

The previous pattern required funsafe-math-optimizations.

To adjust for this case, the testcase has been updated to require O1 so
that the optimization is still performed.


This pattern is moved verbatim into match.pd:

 (A / C) +- (B / C) -> (A +- B) / C.

OK for trunk?

Jackson

gcc/

2017-08-30  Jackson Woodruff  

 PR 71026/tree-optimization
 * match.pd: Move RDIV patterns from fold-const.c
 * fold-const.c (distribute_real_division): Removed.
 (fold_binary_loc): Remove calls to distribute_real_divison.

gcc/testsuite/

2017-08-30  Jackson Woodruff  

 PR 71026/tree-optimization
 * gcc/testsuire/gcc.dg/fold-div-1.c: Use O1.

Sorry.  Just one more question here.  Does the new match.pd pattern need
to be conditional on flag_unsafe_math_optimizations?  ISTM as written
it's going to do those transformations independent of that flag.


For more context

(if (flag_unsafe_math_optimizations)
 /* Simplify sqrt(x) * sqrt(x) -> x.  */
 (simplify
  (mult (SQRT@1 @0) @1) <- End mult
  (if (!HONOR_SNANS (type))
   @0
  ) <- End if !HONOR_SNANS
 ) <- End simplify

 (for op (plus minus) 


  /* Simplify (A / C) +- (B / C) -> (A +- B) / C.  */
  (simplify
   (op (rdiv @0 @1)
   (rdiv @2 @1)
   )  <- End op
   (rdiv (op @0 @2) @1) <- End rdiv
   ) <- End simplify
  ) <- End for

... (more patterns) ...

The if (flag_unsafe_math_optimizations) covers a whole sequence of 
transformations here which mine is a part of. I've annotated the close 
parens so it is clearer.


I don't have commit privileges, could you commit this on my behalf if 
this is OK?


Jackson.



If this has already been discussed, don't hesitate to just point me at
the discussion :-)

Jeff



Re: [doc,libgomp] Updates for www.openacc.org URLs

2017-09-13 Thread Thomas Schwinge
Hi Gerald!

On Fri, 8 Sep 2017 14:27:30 +0200, Gerald Pfeifer  wrote:
> We were nearly down to around a dozen of problematic links on 
> gcc.gnu.org and our online docs, when the following popped up: 
> www.openacc.org switched to https

I can't tell what is "problematic" here, given that a redirect is being
done (and most likely will be "forever")?

$ curl -v http://www.openacc.org/
*   Trying 23.185.0.1...
* Connected to www.openacc.org (23.185.0.1) port 80 (#0)
> GET / HTTP/1.1
> Host: www.openacc.org
> User-Agent: curl/7.50.1
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Content-Type: text/html; charset=UTF-8
< Location: https://www.openacc.org/
[...]

> (and does not require a trailing 
> slash).

I prefer the form with a trailing slash, but I'm certainly no going to
lose any sleep over that.  ;-)

> I committed the patch below, but am wondering whether we really,
> really, really need as many instances of one and the same URL in 
> one section of the documentation?

Well, that is just OpenACC doing what OpenMP had done before.

> Can we simplify this somehow?

My preference actually is to just remove all this boilerplate stuff,
which, as far as I can tell, is just text copied from the respective
specifications anyway.  That is, why do we highlight the runtime library
APIs here, when we nowhere document the directives in much detail, which
are actually way more important, usually?  And if GCC claims to implement
these standards, then the user should rely on these standards (aside from
implementation-defined aspects, of course), just like we do for other
programming language standards?


Grüße
 Thomas


Re: Fwd: Undeliverable: Re: [doc,libgomp] Updates for www.openacc.org URLs

2017-09-13 Thread Thomas Schwinge
Hi!

On Fri, 8 Sep 2017 10:25:06 -0700, Mike Stump  wrote:
> Do you know if James wants to remain on the MAINTAINERS list? If so, the 
> email address should be updated.

Jim has left the company, and I'm afraid I don't know about another email
address.  So I committed the following to trunk in r252218:

commit 608f62d5c03a1ac7fee3f9e292cef4107a956943
Author: tschwinge 
Date:   Wed Sep 13 16:12:53 2017 +

* MAINTAINERS: Remove email address of Jim Norris.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@252218 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 ChangeLog   | 4 
 MAINTAINERS | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git ChangeLog ChangeLog
index fc6c622..eb4eee4 100644
--- ChangeLog
+++ ChangeLog
@@ -1,3 +1,7 @@
+2017-09-13  Thomas Schwinge  
+
+   * MAINTAINERS: Remove email address of Jim Norris.
+
 2017-09-11  Kyrylo Tkachov  
 
* MAINTAINERS (Reviewers): Move myself from here...
diff --git MAINTAINERS MAINTAINERS
index e5b9bc1..ce270f7 100644
--- MAINTAINERS
+++ MAINTAINERS
@@ -515,7 +515,7 @@ Adam Nemet  

 Thomas Neumann 
 Dan Nicolaescu 
 Kelvin Nilsen  
-James Norris   
+James Norris
 Diego Novillo  
 Dorit Nuzman   
 David O'Brien  


Grüße
 Thomas


> > Begin forwarded message:
> > 
> > From: 
> > Subject: Undeliverable: Re: [doc,libgomp] Updates for www.openacc.org URLs
> > Date: September 8, 2017 at 10:13:09 AM PDT
> > To: 
> > 
> > Delivery has failed to these recipients or groups:
> > 
> > James Norris (jnor...@codesourcery.com) 
> > The e-mail address you entered couldn't be found. Please check the 
> > recipient's e-mail address and try to resend the message. If the problem 
> > continues, please contact your helpdesk.


Re: [PATCH] Factor out division by squares and remove division around comparisons (0/2)

2017-09-13 Thread Jeff Law
On 09/06/2017 03:54 AM, Jackson Woodruff wrote:
> Hi all,
> 
> This patch is split from part (1/2). It includes the patterns that have
> been moved out of fold-const.c
> 
> 
> It also removes an (almost entirely) redundant pattern:
> 
> (A / C1) +- (A / C2) -> A * (1 / C1 +- 1 / C2)
> 
> which was only used in special cases, either with combinations
> of flags like -fno-reciprocal-math -funsafe-math-optimizations
> and cases where C was sNaN, or small enough to result in infinity.
> 
> This pattern is covered by:
> 
>(A / C1) +- (A / C2) -> (with O1 and reciprocal math)
>A * (1 / C1) +- A * (1 / C2) ->
>A * (1 / C1 +- 1 / C2)
> 
> The previous pattern required funsafe-math-optimizations.
> 
> To adjust for this case, the testcase has been updated to require O1 so
> that the optimization is still performed.
> 
> 
> This pattern is moved verbatim into match.pd:
> 
> (A / C) +- (B / C) -> (A +- B) / C.
> 
> OK for trunk?
> 
> Jackson
> 
> gcc/
> 
> 2017-08-30  Jackson Woodruff  
> 
> PR 71026/tree-optimization
> * match.pd: Move RDIV patterns from fold-const.c
> * fold-const.c (distribute_real_division): Removed.
> (fold_binary_loc): Remove calls to distribute_real_divison.
> 
> gcc/testsuite/
> 
> 2017-08-30  Jackson Woodruff  
> 
> PR 71026/tree-optimization
> * gcc/testsuire/gcc.dg/fold-div-1.c: Use O1.
Sorry.  Just one more question here.  Does the new match.pd pattern need
to be conditional on flag_unsafe_math_optimizations?  ISTM as written
it's going to do those transformations independent of that flag.

If this has already been discussed, don't hesitate to just point me at
the discussion :-)

Jeff


Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE

2017-09-13 Thread Bill Schmidt
On Sep 13, 2017, at 7:23 AM, Richard Biener  wrote:
> 
> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt
>  wrote:
>> Hi,
>> 
>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE
>> 
>> Folding of vector loads in GIMPLE.
>> 
>> Add code to handle gimple folding for the vec_ld builtins.
>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding
>> comments have been adjusted slightly so they continue to read OK for the
>> existing vec_st code.
>> 
>> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c
>> tests which have been posted separately.
>> 
>> For V2 of this patch, I've removed the chunk of code that prohibited the
>> gimple fold from occurring in BE environments.   This had fixed an issue
>> for me earlier during my development of the code, and turns out this was
>> not necessary.  I've sniff-tested after removing that check and it looks
>> OK.
>> 
>>> + /* Limit folding of loads to LE targets.  */
>>> +  if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)
>>> +return false;
>> 
>> I've restarted a regression test on this updated version.
>> 
>> OK for trunk (assuming successful regression test completion)  ?
>> 
>> Thanks,
>> -Will
>> 
>> [gcc]
>> 
>>2017-09-12  Will Schmidt  
>> 
>>* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
>>for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).
>>* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>>Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.
>> 
>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
>> index fbab0a2..bb8a77d 100644
>> --- a/gcc/config/rs6000/rs6000-c.c
>> +++ b/gcc/config/rs6000/rs6000-c.c
>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, 
>> tree fndecl,
>> convert (TREE_TYPE (stmt), arg0));
>>   stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>   return stmt;
>> }
>> 
>> -  /* Expand vec_ld into an expression that masks the address and
>> - performs the load.  We need to expand this early to allow
>> +  /* Expand vec_st into an expression that masks the address and
>> + performs the store.  We need to expand this early to allow
>>  the best aliasing, as by the time we get into RTL we no longer
>>  are able to honor __restrict__, for example.  We may want to
>>  consider this for all memory access built-ins.
>> 
>>  When -maltivec=be is specified, or the wrong number of arguments
>>  is provided, simply punt to existing built-in processing.  */
>> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD
>> -  && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
>> -  && nargs == 2)
>> -{
>> -  tree arg0 = (*arglist)[0];
>> -  tree arg1 = (*arglist)[1];
>> -
>> -  /* Strip qualifiers like "const" from the pointer arg.  */
>> -  tree arg1_type = TREE_TYPE (arg1);
>> -  if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != 
>> ARRAY_TYPE)
>> -   goto bad;
>> -
>> -  tree inner_type = TREE_TYPE (arg1_type);
>> -  if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)
>> -   {
>> - arg1_type = build_pointer_type (build_qualified_type (inner_type,
>> -   0));
>> - arg1 = fold_convert (arg1_type, arg1);
>> -   }
>> -
>> -  /* Construct the masked address.  Let existing error handling take
>> -over if we don't have a constant offset.  */
>> -  arg0 = fold (arg0);
>> -
>> -  if (TREE_CODE (arg0) == INTEGER_CST)
>> -   {
>> - if (!ptrofftype_p (TREE_TYPE (arg0)))
>> -   arg0 = build1 (NOP_EXPR, sizetype, arg0);
>> -
>> - tree arg1_type = TREE_TYPE (arg1);
>> - if (TREE_CODE (arg1_type) == ARRAY_TYPE)
>> -   {
>> - arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));
>> - tree const0 = build_int_cstu (sizetype, 0);
>> - tree arg1_elt0 = build_array_ref (loc, arg1, const0);
>> - arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);
>> -   }
>> -
>> - tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,
>> -  arg1, arg0);
>> - tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,
>> - build_int_cst (arg1_type, -16));
>> -
>> - /* Find the built-in to get the return type so we can convert
>> -the result properly (or fall back to default handling if the
>> -arguments aren't compatible).  */
>> - for (desc = altivec_overloaded_builtins;
>> -  desc->code && desc->code != fcode; desc++)
>> -   continue;
>> -
>> - for (; desc->code == fcode; desc++)
>> -   if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), 

[PATCH] Fix libstdc++ tests using invalid effective-target

2017-09-13 Thread Jonathan Wakely

The c++11 effective-target isn't supported on gcc-5-branch, so these
tests need to use dg-options instead. (I'll be glad when we close
gcc-5-branch and I don't have to remember this!)

One of the tests FAILed when this was fixed, because it was also using
a constructor of __gnu_test::test_container that doesn't exist on
gcc-5-branch. So I fixed that too.

* testsuite/20_util/reference_wrapper/80504.cc: Do not use invalid
effective-target.
* testsuite/22_locale/conversions/buffer/2.cc: Likewise.
* testsuite/28_regex/basic_regex/ctors/basic/iter.cc: Likewise. Fix
use of test_container.

Tested x86_64-linux, committed to gcc-5-branch.

commit 397c9db7d7a08ded788d5d7cc3c8bc9ec6525eec
Author: Jonathan Wakely 
Date:   Wed Sep 13 15:56:08 2017 +0100

Fix libstdc++ tests using invalid effective-target

* testsuite/20_util/reference_wrapper/80504.cc: Do not use invalid
effective-target.
* testsuite/22_locale/conversions/buffer/2.cc: Likewise.
* testsuite/28_regex/basic_regex/ctors/basic/iter.cc: Likewise. Fix
use of test_container.

diff --git a/libstdc++-v3/testsuite/20_util/reference_wrapper/80504.cc 
b/libstdc++-v3/testsuite/20_util/reference_wrapper/80504.cc
index 727a560cd17..d46ffcd056b 100644
--- a/libstdc++-v3/testsuite/20_util/reference_wrapper/80504.cc
+++ b/libstdc++-v3/testsuite/20_util/reference_wrapper/80504.cc
@@ -15,7 +15,8 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-do compile { target c++11 } }
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/22_locale/conversions/buffer/2.cc 
b/libstdc++-v3/testsuite/22_locale/conversions/buffer/2.cc
index 8eda714b61d..3efb51ff1c2 100644
--- a/libstdc++-v3/testsuite/22_locale/conversions/buffer/2.cc
+++ b/libstdc++-v3/testsuite/22_locale/conversions/buffer/2.cc
@@ -15,7 +15,8 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-do run { target c++11 } }
+// { dg-options "-std=gnu++11" }
+// { dg-do run }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/28_regex/basic_regex/ctors/basic/iter.cc 
b/libstdc++-v3/testsuite/28_regex/basic_regex/ctors/basic/iter.cc
index 7776c5fd557..4c70e0d6241 100644
--- a/libstdc++-v3/testsuite/28_regex/basic_regex/ctors/basic/iter.cc
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/ctors/basic/iter.cc
@@ -15,7 +15,8 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-do compile { target c++11 } }
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
 
 #include 
 #include 
@@ -24,7 +25,8 @@ void
 test01()
 {
   char s[] = "";
-  __gnu_test::test_container c(s);
+  __gnu_test::test_container
+c(s, s+1);
   std::regex r1(c.begin(), c.end());
   std::regex r2(c.begin(), c.end(), std::regex_constants::grep);
 }


[PATCH][Aarch64] Improve int<->FP conversions

2017-09-13 Thread Michael Collison
Resending this patch as I did not have the correct subject line.

This patch improves the latency of code by eliminating two FP <-> integer 
register transfers.

An example:

float f1(float x)
{
  int y = x;
  return (float)y;
}

Trunk generates:

f1:
fcvtzs  w0, s0
scvtf   s0, w0
ret

With the patch we can use the neon scalar instructions and eliminate the two FP 
<-> integer register transfes.

f1:
fcvtzs  s0, s0
scvtf   s0, s0
ret

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

2017-09-02  Michael Collison  

* config/aarch64/aarch64.md(_trunc>2):
New pattern.
(_trunchf2: New pattern.
(_trunc2: New pattern.
* config/aarch64/iterators.md (wv): New mode attribute.
(vf, VF): New mode attributes.
(vgp, VGP): New mode attributes.
(s): Update attribute with SImode and DImode prefixes.
* testsuite/gcc.target/aarch64/fix_trunc1.c: New testcase.



Re: [ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated

2017-09-13 Thread Christophe Lyon
Hi,

On 12 October 2016 at 11:22, Christophe Lyon  wrote:
> On 12 October 2016 at 11:14, Kyrill Tkachov  
> wrote:
>>
>> On 12/10/16 09:59, Christophe Lyon wrote:
>>>
>>> Hi Kyrill,
>>>
>>> On 7 October 2016 at 17:00, Kyrill Tkachov 
>>> wrote:

 Hi Christophe,


 On 07/09/16 21:05, Christophe Lyon wrote:
>
> Hi,
>
> The attached patch is a first part to solve PR 67591: it removes
> several occurrences of "IT blocks containing 32-bit Thumb
> instructions are deprecated in ARMv8" messages in the
> gcc/g++/libstdc++/fortran testsuites.
>
> It does not remove them all yet. This patch only modifies the
> *cmp_and, *cmp_ior, *ior_scc_scc, *ior_scc_scc_cmp,
> *and_scc_scc and *and_scc_scc_cmp patterns.
> Additional work is required in sub_shiftsi etc, at least.
> I've started looking at these, but I decided I could already
> post this self-contained patch to check if this implementation
> is OK.
>
> Regarding *cmp_and and *cmp_ior patterns, the addition of the
> enabled_for_depr_it attribute is aggressive in the sense that it keeps
> only the alternatives with 'l' and 'Py' constraints, while in some
> cases the constraints could be relaxed. Indeed, these 2 patterns can
> swap their input comparisons, meaning that any of them can be emitted
> in the IT-block, and is thus subject to the ARMv8 deprecation.
> The generated code is possibly suboptimal in the cases where the
> operands are not swapped, since 'r' could be used.
>
> Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
> and --with-cpu=cortex-a57 --with-mode=thumb, showing only improvements:
>
>
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/239850-depr-it-4/report-build-info.html
>
> Bootstrapped OK on armv8l HW.
>
> Is this OK?
>
> Thanks,
>
> Christophe


   (define_insn_and_split "*ior_scc_scc"
 -  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
 +  [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
  (ior:SI (match_operator:SI 3 "arm_comparison_operator"
 -[(match_operand:SI 1 "s_register_operand" "r")
 - (match_operand:SI 2 "arm_add_operand" "rIL")])
 +[(match_operand:SI 1 "s_register_operand" "r,l")
 + (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
  (match_operator:SI 6 "arm_comparison_operator"
 -[(match_operand:SI 4 "s_register_operand" "r")
 - (match_operand:SI 5 "arm_add_operand" "rIL")])))
 +[(match_operand:SI 4 "s_register_operand" "r,l")
 + (match_operand:SI 5 "arm_add_operand" "rIL,lPy")])))

 Can you please put the more restrictive alternatives (lPy) first?
 Same with the other patterns your patch touches.
 Ok with that change if a rebased testing run is ok.
 Sorry for the delay in reviewing.

>>> OK, I will update my patch accordingly.
>>>
>>> However, when I discussed this with Ramana during the Cauldron,
>>> he requested benchmark results. So far, I was able to run spec2006
>>> on an APM machine, and I'm seeing performance changes in the
>>> range 11% improvement (465.tonto) to 7% degradation (433.milc).
>>>
>>> Would that be acceptable?
>>
>>
>> Those sound like quite large swings.
> Indeed, but most are in the -1%-+1% range.
>
>> Are you sure the machine was not running anything else at the time
>> or playing tricks with frequency scaling?
> No, I had no such guarantee. I used this machine temporarily,
> first to check that bootstrap worked. I planed to use another
> board with an A57 "standard" microarch for proper
> benchmarking, but I'm not sure when I'll have access to it
> (wrt to e/o gcc stage1), that's why I reported these early
> figures.
>
>> Did all iterations of SPEC show a consistent difference?
>>
>> If the changes are consistent, could you have a look at the codegen
>> to see if there are any clues to the differences?
> I will update my patch according to your comment, re-run the bench
> and have a deeper look at the codegen differences.
>

I have finally been able to run benchmarks with my patch updated
according to your comment, on new machines where we have
better control of the environment (frequency, etc...).

These machines use cortex-a57 CPUs and spec2006 shows little
difference with and without this patch.

Comparing several runs with and without the patch:
- gcc is about 1% slower
- povray about 1% faster
- omnetpp about 1.5% faster

The others are in the noise level.

OK for trunk?

Thanks,

Christophe


>> I'd like to get an explanation for these differences before committing
>> this patch. If they are just an effect of the more restrictive constraints
>> then 

Re: [PATCH] Factor out division by squares and remove division around comparisons (0/2)

2017-09-13 Thread Jeff Law
On 09/13/2017 03:22 AM, Jackson Woodruff wrote:
> On 09/12/2017 11:43 PM, Jeff Law wrote:
>> On 09/06/2017 03:54 AM, Jackson Woodruff wrote:
>>> Hi all,
>>>
>>> This patch is split from part (1/2). It includes the patterns that have
>>> been moved out of fold-const.c
>>>
>>>
>>> It also removes an (almost entirely) redundant pattern:
>>>
>>>  (A / C1) +- (A / C2) -> A * (1 / C1 +- 1 / C2)
>>>
>>> which was only used in special cases, either with combinations
>>> of flags like -fno-reciprocal-math -funsafe-math-optimizations
>>> and cases where C was sNaN, or small enough to result in infinity.
>>>
>>> This pattern is covered by:
>>>
>>> (A / C1) +- (A / C2) -> (with O1 and reciprocal math)
>>> A * (1 / C1) +- A * (1 / C2) ->
>>> A * (1 / C1 +- 1 / C2)
>>>
>>> The previous pattern required funsafe-math-optimizations.
>>>
>>> To adjust for this case, the testcase has been updated to require O1 so
>>> that the optimization is still performed.
>>>
>>>
>>> This pattern is moved verbatim into match.pd:
>>>
>>>  (A / C) +- (B / C) -> (A +- B) / C.
>>>
>>> OK for trunk?
>>>
>>> Jackson
>>>
>>> gcc/
>>>
>>> 2017-08-30  Jackson Woodruff  
>>>
>>>  PR 71026/tree-optimization
>>>  * match.pd: Move RDIV patterns from fold-const.c
>>>  * fold-const.c (distribute_real_division): Removed.
>>>  (fold_binary_loc): Remove calls to distribute_real_divison.
>>>
>>> gcc/testsuite/
>>>
>>> 2017-08-30  Jackson Woodruff  
>>>
>>>  PR 71026/tree-optimization
>>>  * gcc/testsuire/gcc.dg/fold-div-1.c: Use O1.
>>
>> Didn't you lose the case where the operator is MULT_EXPR rather than
>> RDIV_EXPR?
>>
>> If I'm reading things correctly, arg0 and arg1 could be a RDIV_EXPR or a
>> MULT_EXPR and we distribute both cases (as long as they are both the
>> same code). >
>> Your match.pd pattern only handle rdiv.
>>
> 
> In the fold_plusminus_mult_expr function in fold-const.c, we still do
> the simplification:
> 
> (A * C) +- (B * C) -> (A+-B) * C
> 
> So the pattern I introduced only needs to handle the division case.
OK.  I was hoping that we had code elsewhere to handle the MULT case.

> 
>> Now it may be the case that MULT_EXPR doesn't happen anymore in practice
>> -- the code in fold-const is quote old and much of it was written before
>> we regularly added tests for optimization and canonicalization.  So I'm
>> not surprised nothing in this testsuite trips over this omission.
> 
> In gcc.dg/fold-plusmult.c, we test whether 2*a + 2*a is folded into a *
> 4, which comes close.
> 
> There are also a few tests for this in gcc.dg/tree-ssa/pr23294.c,
> although again, they are use constants for A and B.
Great.  Thanks for verifying we've got at least some minimal coverage.

Do you have commit privs?  I'm going to give the patch a final looksie
shortly.

jeff


Re: C PATCH to fix ICE with -Wsizeof-array-argument (PR c/82167)

2017-09-13 Thread Jeff Law
On 09/13/2017 05:00 AM, Marek Polacek wrote:
> This fixes a segv, where -Wsizeof-array-argument crashed because
> expr.original_type was null.  It was null because for sizeof (*)
> we go to c_parser_unary_expression's case CPP_AND: and case CPP_MULT:,
> and that removes the original_type.  But we don't need the original 
> type, we can just use TREE_TYPE of expr.value.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2017-09-13  Marek Polacek  
> 
>   PR c/82167
>   * c-typeck.c (c_expr_sizeof_expr): Use the type of expr.value rather
>   than expr.original_type.
> 
>   * gcc.dg/pr82167.c: New test.
OK.
jeff


[PATCH][GCC][AArch64] Restrict lrint inlining on ILP32.

2017-09-13 Thread Tamar Christina
Hi All,

The inlining of lrint isn't valid in all cases on ILP32 when
-fno-math-errno is used because an inexact exception is raised in
certain circumstances.

Instead the restriction is placed such that the integer mode has to
be larger or equal to the float mode in addition to either inexacts being
allowed or not caring about trapping math.

This prevents the overflow, and the inexact errors that may arise.

Unfortunately I can't create a test for this as there is a bug where
the pattern is always passed DI as the smallest mode,
and later takes a sub-reg of it to SI. This would prevent an overflow
where one was expected.

This fixed PR/81800.

Regtested on aarch64-none-linux-gnu and no regressions.

Ok for trunk?

Thanks,
Tamar

gcc/
2017-09-13  Tamar Christina  

PR target/81800
* config/aarch64/aarch64.md (lrint2): Add 
flag_trapping_math
and flag_fp_int_builtin_inexact.

gcc/testsuite/
2017-09-13  Tamar Christina  

* gcc.target/aarch64/inline-lrint_2.c (dg-options): Add 
-fno-trapping-math.

-- 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 64b60a903ed7c0090298989333894442a277ccd4..1efc56dd40de8ab1f6ee1b743b804f0c3ed887b2 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5122,7 +5122,9 @@
 (define_expand "lrint2"
   [(match_operand:GPI 0 "register_operand")
(match_operand:GPF 1 "register_operand")]
-  "TARGET_FLOAT"
+  "TARGET_FLOAT
+   && ((GET_MODE_SIZE (mode) <= GET_MODE_SIZE (mode))
+   || !flag_trapping_math || flag_fp_int_builtin_inexact)"
 {
   rtx cvt = gen_reg_rtx (mode);
   emit_insn (gen_rint2 (cvt, operands[1]));
diff --git a/gcc/testsuite/gcc.target/aarch64/inline-lrint_2.c b/gcc/testsuite/gcc.target/aarch64/inline-lrint_2.c
index 6080e186d8f0c6f5ede81c6438e059e8b976378f..bd0c73c8d34a2cb52d0a453a175bedd59bba5457 100644
--- a/gcc/testsuite/gcc.target/aarch64/inline-lrint_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/inline-lrint_2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target ilp32 } */
-/* { dg-options "-O3 -fno-math-errno" } */
+/* { dg-options "-O3 -fno-math-errno -fno-trapping-math" } */
 
 #include "lrint-matherr.h"
 



Re: [PATCH] PR libstdc++/81468 constrain std::chrono::time_point constructor

2017-09-13 Thread Jonathan Wakely

On 13/09/17 15:09 +0100, Jonathan Wakely wrote:

Howard reported this bug, caused by a missing SFINAE constraint on a
std::chrono::time_point constructor.

I've also done a bit of simplification using alias templates.


Here's the backport for the branches, which just adds the constraint
(and tests) without the new alias templates.


commit 4a6c846f3203dd0ceb13b7130e686833a4150d96
Author: Jonathan Wakely 
Date:   Wed Sep 13 15:16:31 2017 +0100

PR libstdc++/81468 constrain std::chrono::time_point constructor

PR libstdc++/81468
* include/std/chrono (time_point(const time_point<_Dur2>&)): Add
missing constraint from LWG DR 1177.
* testsuite/20_util/duration/cons/dr1177.cc: New.
* testsuite/20_util/time_point/cons/81468.cc: New.
* testsuite/20_util/duration/literals/range.cc: Update dg-error line.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index c3a6ba8f873..cc63d44657b 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -622,7 +622,8 @@ _GLIBCXX_END_NAMESPACE_VERSION
 	{ }
 
 	// conversions
-	template
+	template>>
 	  constexpr time_point(const time_point& __t)
 	  : __d(__t.time_since_epoch())
 	  { }
diff --git a/libstdc++-v3/testsuite/20_util/duration/cons/dr1177.cc b/libstdc++-v3/testsuite/20_util/duration/cons/dr1177.cc
new file mode 100644
index 000..28c881ccc79
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/duration/cons/dr1177.cc
@@ -0,0 +1,41 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do compile { target c++11 } }
+
+#include 
+#include 
+
+using namespace std;
+using namespace std::chrono;
+
+// DR 1177
+static_assert(is_constructible{},
+"can convert duration with one floating point rep to another");
+static_assert(is_constructible{},
+"can convert duration with integral rep to one with floating point rep");
+static_assert(!is_constructible{},
+"cannot convert duration with floating point rep to one with integral rep");
+static_assert(is_constructible{},
+"can convert duration with one integral rep to another");
+
+static_assert(!is_constructible>>{},
+"cannot convert duration to one with different period");
+static_assert(is_constructible>>{},
+"unless it has a floating-point representation");
+static_assert(is_constructible>>{},
+"or a period that is an integral multiple of the original");
diff --git a/libstdc++-v3/testsuite/20_util/duration/literals/range.cc b/libstdc++-v3/testsuite/20_util/duration/literals/range.cc
index c0d1a6e5885..531b53c42ec 100644
--- a/libstdc++-v3/testsuite/20_util/duration/literals/range.cc
+++ b/libstdc++-v3/testsuite/20_util/duration/literals/range.cc
@@ -26,6 +26,6 @@ test01()
 
   // std::numeric_limits::max() == 9223372036854775807;
   auto h = 9223372036854775808h;
-  // { dg-error "cannot be represented" "" { target *-*-* } 892 }
+  // { dg-error "cannot be represented" "" { target *-*-* } 893 }
 }
 // { dg-prune-output "in constexpr expansion" } // needed for -O0
diff --git a/libstdc++-v3/testsuite/20_util/time_point/cons/81468.cc b/libstdc++-v3/testsuite/20_util/time_point/cons/81468.cc
new file mode 100644
index 000..30d1c4a5ac7
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/time_point/cons/81468.cc
@@ -0,0 +1,34 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see 

Re: [PATCH] Fix emission of exception dispatch (PR middle-end/82154).

2017-09-13 Thread Jeff Law
On 09/13/2017 07:42 AM, Martin Liška wrote:
> On 09/13/2017 03:08 PM, Martin Liška wrote:
>> On 09/12/2017 05:21 PM, Jeff Law wrote:
>>> On 09/12/2017 01:43 AM, Martin Liška wrote:
 Hello.

 In transition to simple_case_node, I forgot to initialize m_high to m_low 
 if a case
 does not have CASE_HIGH.

 Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

 Ready to be installed?
 Martin

 gcc/ChangeLog:

 2017-09-11  Martin Liska  

PR middle-end/82154
* stmt.c (struct simple_case_node): Assign low to high when
high is equal to null.

 gcc/testsuite/ChangeLog:

 2017-09-11  Martin Liska  

PR middle-end/82154
* g++.dg/torture/pr82154.C: New test.
>>> OK.
>>>
>>> THough I have to wonder if we should unify the HIGH handling -- having
>>> two different conventions is bound to be confusing.
>>>
>>> In a CASE_LABEL_EXPR the CASE_HIGH can be NULL, which means the label
>>> refers to a singleton value that is found in CASE_LOW.
>>>
>>> That means we end up doing stuff like this:
>>>
>>>  if (CASE_HIGH (elt))
>>> maxval = fold_convert (index_type, CASE_HIGH (elt));
>>>   else
>>> maxval = fold_convert (index_type, CASE_LOW (elt));
>>>
>>>
>>>
>>> You could legitimately argue for changing how this works for tree nodes
>>> so that there's always a non-null CASE_HIGH.
>>
>> Hi.
>>
>> Agree with you that we have a lot of code that does what you just described.
>> I tent to change IL representation, where CASE_HIGH is always non-null.
>>
>> $ git grep 'CASE_HIGH\>' | wc -l
>> 125
>>
>> Which is reasonable amount of code that should be changed. I'll prepare 
>> patch for that.
> 
> Hm, there's one question that pops up: Should we compare the values via 
> pointer equality,
> or tree_int_cst_eq should be done? The later one can messy the code a bit.
I'd like to say pointer equality, but it's probably safer to use
tree_int_cst_eq.

jeff


Re: [Aarch64, Patch] Update failing testcase pr62178.c

2017-09-13 Thread James Greenhalgh
On Wed, Sep 13, 2017 at 03:02:55PM +0100, Jackson Woodruff wrote:
> Hi all,
> 
> This patch changes pr62178.c so that it now scans
> for two `ldr`s, one into an `s` register, instead
> of a `ld1r` as before. Also add a scan for an mla
> instruction.
> 
> The `ld1r` was needed when this should have generated
> a mla by vector. Now that we can generate an mla by
> element instruction and can load directly into the
> simd register, it is cheaper to not do the ld1r
> which needlessly duplicates the single element used
> across the whole vector register.
> 
> The testcase passes now that 
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00048.html has been committed
> 
> OK for trunk?

OK thanks, committed as revision 252086 on your behalf.

James

> 2017-09-13  Jackson Woodruff  
> 
>   * gcc.target/aarch64/pr62178.c: Updated testcase
>   to scan for two ldrs and an mla.

> diff --git a/gcc/testsuite/gcc.target/aarch64/pr62178.c 
> b/gcc/testsuite/gcc.target/aarch64/pr62178.c
> index 
> b80ce68656076864bb71c76949cef5d7b530021a..1bf6d838d3a49ed5d8ecf9ae0157bd2a9159bfb4
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/pr62178.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pr62178.c
> @@ -14,4 +14,6 @@ void foo (void) {
>  }
>  }
>  
> -/* { dg-final { scan-assembler "ld1r\\t\{v\[0-9\]+\."} } */
> +/* { dg-final { scan-assembler "ldr\\ts\[0-9\]+, \\\[x\[0-9\]+, 
> \[0-9\]+\\\]!" } } */
> +/* { dg-final { scan-assembler "ldr\\tq\[0-9\]+, \\\[x\[0-9\]+\\\], 
> \[0-9\]+" } } */
> +/* { dg-final { scan-assembler "mla\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, 
> v\[0-9\]+\.s\\\[0\\\]" } } */



[PATCH] PR libstdc++/81468 constrain std::chrono::time_point constructor

2017-09-13 Thread Jonathan Wakely

Howard reported this bug, caused by a missing SFINAE constraint on a
std::chrono::time_point constructor.

I've also done a bit of simplification using alias templates.

PR libstdc++/81468
* include/std/chrono (__enable_if_is_duration)
(__disable_if_is_duration): New alias templates to simplify SFINAE.
(duration_cast, floor, ceil): Use __enable_if_is_duration.
(duration::__is_float, duration::__is_harmonic): New alias templates
to simplify SFINAE.
(duration::duration(const _Rep2&)): Use _Require, __is_float and
__is_harmonic.
(duration::duration(const duration<_Rep2, _Period2>&)): Likewise.
(__common_rep_type): Remove, replace with ...
(__common_rep_t): New alias template.
(operator*, operator/, operator%): Use __common_rep_t and
__disable_if_is_duration.
(time_point::time_point(const time_point&)): Add missing
constraint from LWG DR 1177.
* testsuite/20_util/duration/cons/dr1177.cc: New.
* testsuite/20_util/duration/literals/range.cc: Update dg-error line.
* testsuite/20_util/duration/requirements/typedefs_neg1.cc: Likewise.
* testsuite/20_util/duration/requirements/typedefs_neg2.cc: Likewise.
* testsuite/20_util/duration/requirements/typedefs_neg3.cc: Likewise.
* testsuite/20_util/time_point/cons/81468.cc: New.

Tested on powerp64le-linux, committed to trunk.


commit e06db0fa47676eeb386bfd1a582a005d341fb678
Author: Jonathan Wakely 
Date:   Wed Sep 13 13:00:50 2017 +0100

PR libstdc++/81468 constrain std::chrono::time_point constructor

PR libstdc++/81468
* include/std/chrono (__enable_if_is_duration)
(__disable_if_is_duration): New alias templates to simplify SFINAE.
(duration_cast, floor, ceil): Use __enable_if_is_duration.
(duration::__is_float, duration::__is_harmonic): New alias templates
to simplify SFINAE.
(duration::duration(const _Rep2&)): Use _Require, __is_float and
__is_harmonic.
(duration::duration(const duration<_Rep2, _Period2>&)): Likewise.
(__common_rep_type): Remove, replace with ...
(__common_rep_t): New alias template.
(operator*, operator/, operator%): Use __common_rep_t and
__disable_if_is_duration.
(time_point::time_point(const time_point&)): Add 
missing
constraint from LWG DR 1177.
* testsuite/20_util/duration/cons/dr1177.cc: New.
* testsuite/20_util/duration/literals/range.cc: Update dg-error 
line.
* testsuite/20_util/duration/requirements/typedefs_neg1.cc: 
Likewise.
* testsuite/20_util/duration/requirements/typedefs_neg2.cc: 
Likewise.
* testsuite/20_util/duration/requirements/typedefs_neg3.cc: 
Likewise.
* testsuite/20_util/time_point/cons/81468.cc: New.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 1bcbf524a7b..fc058fcd8d8 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -179,10 +179,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   : std::true_type
   { };
 
+template
+  using __enable_if_is_duration
+   = typename enable_if<__is_duration<_Tp>::value, _Tp>::type;
+
+template
+  using __disable_if_is_duration
+   = typename enable_if::value, _Tp>::type;
+
 /// duration_cast
 template
-  constexpr typename enable_if<__is_duration<_ToDur>::value,
-  _ToDur>::type
+  constexpr __enable_if_is_duration<_ToDur>
   duration_cast(const duration<_Rep, _Period>& __d)
   {
typedef typename _ToDur::period __to_period;
@@ -211,7 +218,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 # define __cpp_lib_chrono 201510
 
 template
-  constexpr enable_if_t<__is_duration<_ToDur>::value, _ToDur>
+  constexpr __enable_if_is_duration<_ToDur>
   floor(const duration<_Rep, _Period>& __d)
   {
auto __to = chrono::duration_cast<_ToDur>(__d);
@@ -221,7 +228,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
 template
-  constexpr enable_if_t<__is_duration<_ToDur>::value, _ToDur>
+  constexpr __enable_if_is_duration<_ToDur>
   ceil(const duration<_Rep, _Period>& __d)
   {
auto __to = chrono::duration_cast<_ToDur>(__d);
@@ -294,6 +301,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 template
   struct duration
   {
+  private:
+   template
+ using __is_float = treat_as_floating_point<_Rep2>;
+
+   // _Period2 is an exact multiple of _Period
+   template
+ using __is_harmonic
+   = __bool_constant::den == 1>;
+
+  public:
+
typedef _Reprep;
typedef _Period

[Aarch64, Patch] Update failing testcase pr62178.c

2017-09-13 Thread Jackson Woodruff

Hi all,

This patch changes pr62178.c so that it now scans
for two `ldr`s, one into an `s` register, instead
of a `ld1r` as before. Also add a scan for an mla
instruction.

The `ld1r` was needed when this should have generated
a mla by vector. Now that we can generate an mla by
element instruction and can load directly into the
simd register, it is cheaper to not do the ld1r
which needlessly duplicates the single element used
across the whole vector register.

The testcase passes now that 
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00048.html has been committed


OK for trunk?

Jackson

ChangeLog:
gcc/testsuite

2017-09-13  Jackson Woodruff  

* gcc.target/aarch64/pr62178.c: Updated testcase
to scan for two ldrs and an mla.
diff --git a/gcc/testsuite/gcc.target/aarch64/pr62178.c 
b/gcc/testsuite/gcc.target/aarch64/pr62178.c
index 
b80ce68656076864bb71c76949cef5d7b530021a..1bf6d838d3a49ed5d8ecf9ae0157bd2a9159bfb4
 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr62178.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr62178.c
@@ -14,4 +14,6 @@ void foo (void) {
 }
 }
 
-/* { dg-final { scan-assembler "ld1r\\t\{v\[0-9\]+\."} } */
+/* { dg-final { scan-assembler "ldr\\ts\[0-9\]+, \\\[x\[0-9\]+, \[0-9\]+\\\]!" 
} } */
+/* { dg-final { scan-assembler "ldr\\tq\[0-9\]+, \\\[x\[0-9\]+\\\], \[0-9\]+" 
} } */
+/* { dg-final { scan-assembler "mla\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, 
v\[0-9\]+\.s\\\[0\\\]" } } */


Re: [PATCH][store-merging] Use store order as tie-breaker in sort_by_bitpos

2017-09-13 Thread Richard Biener
On Wed, Sep 13, 2017 at 3:02 PM, Kyrill  Tkachov
 wrote:
> Hi all,
>
> As Alexander pointed out in the thread starting at [1] the sort_by_bitpos
> sorting function
> was behaving badly when we had multiple stores at the same position.  He
> fixed that (thanks!)
> but we can do better by not returning zero when the bitpositions are equal
> but by falling back
> to comparing the order the stores appear in, which is guaranteed to be
> unique (barring other
> bugs elsewhere).
>
> This patch does that.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk?

Ok.

Richard.

> Thanks,
> Kyrill
>
> [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00895.html
>
> 2017-09-13  Kyrylo Tkachov  
>
> * gimple-ssa-store-merging.c (sort_by_bitpos): Compare store order
> when bitposition is the same.


Re: [PATCH] Fix emission of exception dispatch (PR middle-end/82154).

2017-09-13 Thread Martin Liška
On 09/13/2017 03:08 PM, Martin Liška wrote:
> On 09/12/2017 05:21 PM, Jeff Law wrote:
>> On 09/12/2017 01:43 AM, Martin Liška wrote:
>>> Hello.
>>>
>>> In transition to simple_case_node, I forgot to initialize m_high to m_low 
>>> if a case
>>> does not have CASE_HIGH.
>>>
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>
>>> Ready to be installed?
>>> Martin
>>>
>>> gcc/ChangeLog:
>>>
>>> 2017-09-11  Martin Liska  
>>>
>>> PR middle-end/82154
>>> * stmt.c (struct simple_case_node): Assign low to high when
>>> high is equal to null.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2017-09-11  Martin Liska  
>>>
>>> PR middle-end/82154
>>> * g++.dg/torture/pr82154.C: New test.
>> OK.
>>
>> THough I have to wonder if we should unify the HIGH handling -- having
>> two different conventions is bound to be confusing.
>>
>> In a CASE_LABEL_EXPR the CASE_HIGH can be NULL, which means the label
>> refers to a singleton value that is found in CASE_LOW.
>>
>> That means we end up doing stuff like this:
>>
>>  if (CASE_HIGH (elt))
>> maxval = fold_convert (index_type, CASE_HIGH (elt));
>>   else
>> maxval = fold_convert (index_type, CASE_LOW (elt));
>>
>>
>>
>> You could legitimately argue for changing how this works for tree nodes
>> so that there's always a non-null CASE_HIGH.
> 
> Hi.
> 
> Agree with you that we have a lot of code that does what you just described.
> I tent to change IL representation, where CASE_HIGH is always non-null.
> 
> $ git grep 'CASE_HIGH\>' | wc -l
> 125
> 
> Which is reasonable amount of code that should be changed. I'll prepare patch 
> for that.

Hm, there's one question that pops up: Should we compare the values via pointer 
equality,
or tree_int_cst_eq should be done? The later one can messy the code a bit.

Martin

> 
> Martin
> 
>>
>> jeff
>>
> 



Re: [AArch64, patch] Refactor of aarch64-ldpstp.md

2017-09-13 Thread Jackson Woodruff

Hi,

Since I rebased the patch that this is based on, I have also rebased 
this patch.


Jackson.

On 09/12/2017 07:15 PM, Jackson Woodruff wrote:

Hi all,

This patch removes a lot of duplicated code in aarch64-ldpstp.md.

The patterns that did not previously generate a base register now
do not check for aarch64_mem_pair_operand in the pattern. This has
been extracted to a check in aarch64_operands_ok_for_ldpstp.

All patterns in the file used to have explicit switching code to
swap loads and stores that were in the wrong order.

This has been extracted into aarch64_ldp_str_operands
and aarch64_gen_adjusted_ldp_stp.

This patch is based on my patch here: 
https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00346.html so should go in 
after it.



Bootstrap and regtest OK on AArch64.

OK for trunk?

Jackson.

gcc/

2017-09-07  Jackson Woodruff  

 * config/aarch64/aarch64-ldpstp.md: Replace uses of
 aarch64_mem_pair_operand with memory_operand and delete
 operand swapping code.
 * config/aarch64/aarch64.c (aarch64_operands_ok_for_ldpstp):
 Add check for legitimate_address.
 (aarch64_gen_adjusted_ldpstp): Add swap.
 (aarch64_swap_ldrstr_operands): New.
 * config/aarch64/aarch64-protos.h: Add
 aarch64_swap_ldrstr_operands.
diff --git a/gcc/config/aarch64/aarch64-ldpstp.md b/gcc/config/aarch64/aarch64-ldpstp.md
index 14e860d258e548d4118d957675f8bdbb74615337..126bb702f6399d13ab2dc6c8b99bcbbf3b3a7516 100644
--- a/gcc/config/aarch64/aarch64-ldpstp.md
+++ b/gcc/config/aarch64/aarch64-ldpstp.md
@@ -20,26 +20,18 @@
 
 (define_peephole2
   [(set (match_operand:GPI 0 "register_operand" "")
-	(match_operand:GPI 1 "aarch64_mem_pair_operand" ""))
+	(match_operand:GPI 1 "memory_operand" ""))
(set (match_operand:GPI 2 "register_operand" "")
 	(match_operand:GPI 3 "memory_operand" ""))]
   "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
   [(parallel [(set (match_dup 0) (match_dup 1))
 	  (set (match_dup 2) (match_dup 3))])]
 {
-  rtx base, offset_1, offset_2;
-
-  extract_base_offset_in_addr (operands[1], , _1);
-  extract_base_offset_in_addr (operands[3], , _2);
-  if (INTVAL (offset_1) > INTVAL (offset_2))
-{
-  std::swap (operands[0], operands[2]);
-  std::swap (operands[1], operands[3]);
-}
+  aarch64_swap_ldrstr_operands (operands, 1);
 })
 
 (define_peephole2
-  [(set (match_operand:GPI 0 "aarch64_mem_pair_operand" "")
+  [(set (match_operand:GPI 0 "memory_operand" "")
 	(match_operand:GPI 1 "aarch64_reg_or_zero" ""))
(set (match_operand:GPI 2 "memory_operand" "")
 	(match_operand:GPI 3 "aarch64_reg_or_zero" ""))]
@@ -47,39 +39,23 @@
   [(parallel [(set (match_dup 0) (match_dup 1))
 	  (set (match_dup 2) (match_dup 3))])]
 {
-  rtx base, offset_1, offset_2;
-
-  extract_base_offset_in_addr (operands[0], , _1);
-  extract_base_offset_in_addr (operands[2], , _2);
-  if (INTVAL (offset_1) > INTVAL (offset_2))
-{
-  std::swap (operands[0], operands[2]);
-  std::swap (operands[1], operands[3]);
-}
+  aarch64_swap_ldrstr_operands (operands, 0);
 })
 
 (define_peephole2
   [(set (match_operand:GPF 0 "register_operand" "")
-	(match_operand:GPF 1 "aarch64_mem_pair_operand" ""))
+	(match_operand:GPF 1 "memory_operand" ""))
(set (match_operand:GPF 2 "register_operand" "")
 	(match_operand:GPF 3 "memory_operand" ""))]
   "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
   [(parallel [(set (match_dup 0) (match_dup 1))
 	  (set (match_dup 2) (match_dup 3))])]
 {
-  rtx base, offset_1, offset_2;
-
-  extract_base_offset_in_addr (operands[1], , _1);
-  extract_base_offset_in_addr (operands[3], , _2);
-  if (INTVAL (offset_1) > INTVAL (offset_2))
-{
-  std::swap (operands[0], operands[2]);
-  std::swap (operands[1], operands[3]);
-}
+  aarch64_swap_ldrstr_operands (operands, 1);
 })
 
 (define_peephole2
-  [(set (match_operand:GPF 0 "aarch64_mem_pair_operand" "")
+  [(set (match_operand:GPF 0 "memory_operand" "")
 	(match_operand:GPF 1 "aarch64_reg_or_fp_zero" ""))
(set (match_operand:GPF 2 "memory_operand" "")
 	(match_operand:GPF 3 "aarch64_reg_or_fp_zero" ""))]
@@ -87,39 +63,23 @@
   [(parallel [(set (match_dup 0) (match_dup 1))
 	  (set (match_dup 2) (match_dup 3))])]
 {
-  rtx base, offset_1, offset_2;
-
-  extract_base_offset_in_addr (operands[0], , _1);
-  extract_base_offset_in_addr (operands[2], , _2);
-  if (INTVAL (offset_1) > INTVAL (offset_2))
-{
-  std::swap (operands[0], operands[2]);
-  std::swap (operands[1], operands[3]);
-}
+  aarch64_swap_ldrstr_operands (operands, 0);
 })
 
 (define_peephole2
   [(set (match_operand:DREG 0 "register_operand" "")
-	(match_operand:DREG 1 "aarch64_mem_pair_operand" ""))
+	(match_operand:DREG 1 "memory_operand" ""))
(set (match_operand:DREG2 2 "register_operand" "")
 	(match_operand:DREG2 3 "memory_operand" ""))]
   "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
   [(parallel [(set (match_dup 0) 

Re: [AArch64] Merge stores of D register values of different modes

2017-09-13 Thread Jackson Woodruff

On 09/12/2017 07:32 PM, Richard Sandiford wrote:

Thanks for doing this, looks good to me FWIW.  I was just wondering:

Jackson Woodruff  writes:

@@ -14712,6 +14712,11 @@ aarch64_operands_ok_for_ldpstp (rtx *operands, bool 
load,
if (!rtx_equal_p (base_1, base_2))
  return false;
  
+  /* Check that the operands are of the same size.  */

+  if (GET_MODE_SIZE (GET_MODE (mem_1))
+  != GET_MODE_SIZE (GET_MODE (mem_2)))
+return false;
+
offval_1 = INTVAL (offset_1);
offval_2 = INTVAL (offset_2);
msize = GET_MODE_SIZE (mode);


when can this trigger?  Your iterators always seem to enforce correct
pairings, so maybe this should be an assert instead.


Yes, it's true that this should never be triggered. I've changed it to 
an assert.


I have also rebased on top of the renaming of load/store attributes 
patch https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00702.html which had 
some conflicts with this.


Is the updated patch OK for trunk?

Thanks,
Jackson.



Thanks,
Richard

diff --git a/gcc/config/aarch64/aarch64-ldpstp.md b/gcc/config/aarch64/aarch64-ldpstp.md
index e8dda42c2dd1e30c4607c67a2156ff7813bd89ea..14e860d258e548d4118d957675f8bdbb74615337 100644
--- a/gcc/config/aarch64/aarch64-ldpstp.md
+++ b/gcc/config/aarch64/aarch64-ldpstp.md
@@ -99,10 +99,10 @@
 })
 
 (define_peephole2
-  [(set (match_operand:VD 0 "register_operand" "")
-	(match_operand:VD 1 "aarch64_mem_pair_operand" ""))
-   (set (match_operand:VD 2 "register_operand" "")
-	(match_operand:VD 3 "memory_operand" ""))]
+  [(set (match_operand:DREG 0 "register_operand" "")
+	(match_operand:DREG 1 "aarch64_mem_pair_operand" ""))
+   (set (match_operand:DREG2 2 "register_operand" "")
+	(match_operand:DREG2 3 "memory_operand" ""))]
   "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
   [(parallel [(set (match_dup 0) (match_dup 1))
 	  (set (match_dup 2) (match_dup 3))])]
@@ -119,11 +119,12 @@
 })
 
 (define_peephole2
-  [(set (match_operand:VD 0 "aarch64_mem_pair_operand" "")
-	(match_operand:VD 1 "register_operand" ""))
-   (set (match_operand:VD 2 "memory_operand" "")
-	(match_operand:VD 3 "register_operand" ""))]
-  "TARGET_SIMD && aarch64_operands_ok_for_ldpstp (operands, false, mode)"
+  [(set (match_operand:DREG 0 "aarch64_mem_pair_operand" "")
+	(match_operand:DREG 1 "register_operand" ""))
+   (set (match_operand:DREG2 2 "memory_operand" "")
+	(match_operand:DREG2 3 "register_operand" ""))]
+  "TARGET_SIMD
+   && aarch64_operands_ok_for_ldpstp (operands, false, mode)"
   [(parallel [(set (match_dup 0) (match_dup 1))
 	  (set (match_dup 2) (match_dup 3))])]
 {
@@ -138,7 +139,6 @@
 }
 })
 
-
 ;; Handle sign/zero extended consecutive load/store.
 
 (define_peephole2
@@ -181,6 +181,30 @@
 }
 })
 
+;; Handle storing of a floating point zero.
+;; We can match modes that won't work for a stp instruction
+;; as aarch64_operands_ok_for_ldpstp checks that the modes are
+;; compatible.
+(define_peephole2
+  [(set (match_operand:DSX 0 "aarch64_mem_pair_operand" "")
+	(match_operand:DSX 1 "aarch64_reg_zero_or_fp_zero" ""))
+   (set (match_operand: 2 "memory_operand" "")
+	(match_operand: 3 "aarch64_reg_zero_or_fp_zero" ""))]
+  "aarch64_operands_ok_for_ldpstp (operands, false, DImode)"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	  (set (match_dup 2) (match_dup 3))])]
+{
+  rtx base, offset_1, offset_2;
+
+  extract_base_offset_in_addr (operands[0], , _1);
+  extract_base_offset_in_addr (operands[2], , _2);
+  if (INTVAL (offset_1) > INTVAL (offset_2))
+{
+  std::swap (operands[0], operands[2]);
+  std::swap (operands[1], operands[3]);
+}
+})
+
 ;; Handle consecutive load/store whose offset is out of the range
 ;; supported by ldp/ldpsw/stp.  We firstly adjust offset in a scratch
 ;; register, then merge them into ldp/ldpsw/stp by using the adjusted
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 8f045c210502330af9d47f6adfd46a9e36328b74..90f9415b3986eb737ecdfeed43fe798cdbb8334e 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -172,11 +172,11 @@
   [(set_attr "type" "neon_store1_1reg")]
 )
 
-(define_insn "load_pair"
-  [(set (match_operand:VD 0 "register_operand" "=w")
-	(match_operand:VD 1 "aarch64_mem_pair_operand" "Ump"))
-   (set (match_operand:VD 2 "register_operand" "=w")
-	(match_operand:VD 3 "memory_operand" "m"))]
+(define_insn "load_pair"
+  [(set (match_operand:DREG 0 "register_operand" "=w")
+	(match_operand:DREG 1 "aarch64_mem_pair_operand" "Ump"))
+   (set (match_operand:DREG2 2 "register_operand" "=w")
+	(match_operand:DREG2 3 "memory_operand" "m"))]
   "TARGET_SIMD
&& rtx_equal_p (XEXP (operands[3], 0),
 		   plus_constant (Pmode,
@@ -186,11 +186,11 @@
   [(set_attr "type" "neon_ldp")]
 )
 
-(define_insn "store_pair"
-  [(set (match_operand:VD 0 "aarch64_mem_pair_operand" "=Ump")
-	(match_operand:VD 1 "register_operand" "w"))
-   (set 

[Ada] Review dependency tracking for Ada sources without -gnatd.n support

2017-09-13 Thread Pierre-Marie de Rodat
This patch will fix build glitches for parallelized Ada builds, which 
started to appear after the Makefile changes that came with the recent 
libgnat/libgnarl reorganization. Specifically, it fixes the detection of 
dependencies between Ada units for builds based on Ada compilers that 
don’t support the -gnatd.n flag.


Tested on x86_64-pc-linux-gnu, committed on trunk.

2017-09-13  Nicolas Roche  

* Make-lang.in: In the fallback mechanim, parse the associated
.ali file and try to guess the locations of dependencies.

--
Pierre-Marie de Rodat
Index: gcc/ada/gcc-interface/Make-lang.in
===
--- gcc/ada/gcc-interface/Make-lang.in  (revision 252081)
+++ gcc/ada/gcc-interface/Make-lang.in  (working copy)
@@ -106,14 +106,20 @@
 
 # Function that dumps the dependencies of an Ada object. Dependency only work
 # fully if the compiler support -gnatd.n. Otherwise a fallback mechanism is
-# used. The fallback mechanism add dependency on all ada sources in the same
-# directory as the original source.
+# used. The fallback mechanism parse the ali files to get the list of
+# dependencies and try to guess their location. If the location cannot be found
+# then the dependency is ignored.
 ifeq ($(findstring -gnatd.n,$(ALL_ADAFLAGS)),)
 ADA_DEPS=\
mkdir -p $(dir $@)/$(DEPDIR); \
(o="$@: $<"; \
-for d in $(dir $<)/*.ad[sb]; do \
-   o="$$o $$d"; \
+a="`echo $@ | sed -e 's/.o$$/.ali/'`"; \
+for d in `cat $$a | sed -ne 's;^D \([a-z0-9_\.-]*\).*;\1;gp'`; do \
+   for l in ada $(srcdir)/ada ada/libgnat $(srcdir)/ada/libgnat; do \
+  if test -f $$l/$$d; then \
+ o="$$o $$l/$$d"; \
+  fi; \
+   done; \
 done; \
 echo "$$o"; echo) \
 >$(dir $@)/$(DEPDIR)/$(patsubst %.o,%.Po,$(notdir $@))
@@ -121,11 +127,9 @@
 else
 ADA_DEPS=\
mkdir -p $(dir $@)/$(DEPDIR); \
-   (o="$@: $<"; \
-for d in `cat $@.gnatd.n`; do \
-   o="$$o $$d"; \
-done; \
-echo "$$o"; echo) \
+   (echo "$@: $< " | tr -d '\015' | tr -d '\n'; \
+cat $@.gnatd.n | tr -d '\015' | tr '\n' ' '; \
+echo; echo) \
>$(dir $@)/$(DEPDIR)/$(patsubst %.o,%.Po,$(notdir $@))
 ADA_OUTPUT_OPTION = $(OUTPUT_OPTION) > $@.gnatd.n
 endif
@@ -861,9 +865,11 @@
 
 ada.mostlyclean:
-$(RM) ada/*$(objext) ada/*.ali ada/b_gnat*.ads ada/b_gnat*.adb
+   -$(RM) ada/*$(objext).gnatd.n
-$(RM) ada/*$(coverageexts)
-$(RM) ada/sdefault.adb ada/stamp-sdefault ada/stamp-snames
-$(RMDIR) ada/tools
+   -$(RMDIR) ada/libgnat
-$(RM) gnatbind$(exeext) gnat1$(exeext)
 ada.clean:
 ada.distclean:


Re: [PATCH] Fix emission of exception dispatch (PR middle-end/82154).

2017-09-13 Thread Martin Liška
On 09/12/2017 05:21 PM, Jeff Law wrote:
> On 09/12/2017 01:43 AM, Martin Liška wrote:
>> Hello.
>>
>> In transition to simple_case_node, I forgot to initialize m_high to m_low if 
>> a case
>> does not have CASE_HIGH.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2017-09-11  Martin Liska  
>>
>>  PR middle-end/82154
>>  * stmt.c (struct simple_case_node): Assign low to high when
>>  high is equal to null.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2017-09-11  Martin Liska  
>>
>>  PR middle-end/82154
>>  * g++.dg/torture/pr82154.C: New test.
> OK.
> 
> THough I have to wonder if we should unify the HIGH handling -- having
> two different conventions is bound to be confusing.
> 
> In a CASE_LABEL_EXPR the CASE_HIGH can be NULL, which means the label
> refers to a singleton value that is found in CASE_LOW.
> 
> That means we end up doing stuff like this:
> 
>  if (CASE_HIGH (elt))
> maxval = fold_convert (index_type, CASE_HIGH (elt));
>   else
> maxval = fold_convert (index_type, CASE_LOW (elt));
> 
> 
> 
> You could legitimately argue for changing how this works for tree nodes
> so that there's always a non-null CASE_HIGH.

Hi.

Agree with you that we have a lot of code that does what you just described.
I tent to change IL representation, where CASE_HIGH is always non-null.

$ git grep 'CASE_HIGH\>' | wc -l
125

Which is reasonable amount of code that should be changed. I'll prepare patch 
for that.

Martin

> 
> jeff
> 



[PATCH][store-merging] Use store order as tie-breaker in sort_by_bitpos

2017-09-13 Thread Kyrill Tkachov

Hi all,

As Alexander pointed out in the thread starting at [1] the 
sort_by_bitpos sorting function
was behaving badly when we had multiple stores at the same position.  He 
fixed that (thanks!)
but we can do better by not returning zero when the bitpositions are 
equal but by falling back
to comparing the order the stores appear in, which is guaranteed to be 
unique (barring other

bugs elsewhere).

This patch does that.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00895.html

2017-09-13  Kyrylo Tkachov  

* gimple-ssa-store-merging.c (sort_by_bitpos): Compare store order
when bitposition is the same.
diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index c60d56a..3260c56 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -521,7 +521,9 @@ sort_by_bitpos (const void *x, const void *y)
   else if ((*tmp)->bitpos > (*tmp2)->bitpos)
 return 1;
   else
-return 0;
+/* If they are the same let's use the order which is guaranteed to
+   be different.  */
+return (*tmp)->order - (*tmp2)->order;
 }
 
 /* Sorting function for store_immediate_info objects.


[C++ PATCH] don't put conv-op names in identifier hash

2017-09-13 Thread Nathan Sidwell
This patch changes the conv-op identifier node creation.  Conv-ops are 
never looked up by name, and there's no need for them to be in the 
identifier table.  Which also means there's no need for them to have 
distinct names.


This patch implements that.  conv-op names are directly cloned from the 
regular 'conv_op_identifier'. They are only findable via make_conv_op_name.


Jason, you asked about whether we could dispense with stashing conv-op 
target types in the identifier_node.  That looked more complicated, we'd 
need to (at least) address
1) pretty printing.  That passes DECL_NAME, expecting to print 'operator 
TYPE' from it.
2) parsing.  That passes the identifier as part of declarator_name into 
grokdeclarator and friends.
3) class-scope name lookup.  That expects an identifier and from that 
find the right conversion function


I'm sure all that can be fixed with additional args or whatever, but I'm 
not going there now.


nathan
--
Nathan Sidwell
2017-09-13  Nathan Sidwell  

	Conv-op identifers not in identifier hash table
	* lex.c (conv_type_hasher): Make member fns inline.
	(make_conv_op_name): Directly clone conv_op_identifier.

Index: lex.c
===
--- lex.c	(revision 252076)
+++ lex.c	(working copy)
@@ -531,10 +531,24 @@ unqualified_fn_lookup_error (cp_expr nam
   return unqualified_name_lookup_error (name, loc);
 }
 
+
+/* Hasher for the conversion operator name hash table.  */
 struct conv_type_hasher : ggc_ptr_hash
 {
-  static hashval_t hash (tree);
-  static bool equal (tree, tree);
+  /* Hash NODE, an identifier node in the table.  TYPE_UID is
+ suitable, as we're not concerned about matching canonicalness
+ here.  */
+  static hashval_t hash (tree node)
+  {
+return (hashval_t) TYPE_UID (TREE_TYPE (node));
+  }
+
+  /* Compare NODE, an identifier node in the table, against TYPE, an
+ incoming TYPE being looked up.  */
+  static bool equal (tree node, tree type)
+  {
+return TREE_TYPE (node) == type;
+  }
 };
 
 /* This hash table maps TYPEs to the IDENTIFIER for a conversion
@@ -543,57 +557,41 @@ struct conv_type_hasher : ggc_ptr_hash *conv_type_names;
 
-/* Hash a node (VAL1) in the table.  */
-
-hashval_t
-conv_type_hasher::hash (tree val)
-{
-  return (hashval_t) TYPE_UID (TREE_TYPE (val));
-}
-
-/* Compare VAL1 (a node in the table) with VAL2 (a TYPE).  */
-
-bool
-conv_type_hasher::equal (tree val1, tree val2)
-{
-  return TREE_TYPE (val1) == val2;
-}
-
-/* Return an identifier for a conversion operator to TYPE.  We can
-   get from the returned identifier to the type.  */
+/* Return an identifier for a conversion operator to TYPE.  We can get
+   from the returned identifier to the type.  We store TYPE, which is
+   not necessarily the canonical type,  which allows us to report the
+   form the user used in error messages.  All these identifiers are
+   not in the identifier hash table, and have the same string name.
+   These IDENTIFIERS are not in the identifier hash table, and all
+   have the same IDENTIFIER_STRING.  */
 
 tree
 make_conv_op_name (tree type)
 {
-  tree *slot;
-  tree identifier;
-
   if (type == error_mark_node)
 return error_mark_node;
 
   if (conv_type_names == NULL)
 conv_type_names = hash_table::create_ggc (31);
 
-  slot = conv_type_names->find_slot_with_hash (type,
-	   (hashval_t) TYPE_UID (type),
-	   INSERT);
-  identifier = *slot;
+  tree *slot = conv_type_names->find_slot_with_hash
+(type, (hashval_t) TYPE_UID (type), INSERT);
+  tree identifier = *slot;
   if (!identifier)
 {
-  char buffer[64];
-
-   /* Create a unique name corresponding to TYPE.  */
-  sprintf (buffer, "operator %lu",
-	   (unsigned long) conv_type_names->elements ());
-  identifier = get_identifier (buffer);
-  *slot = identifier;
+  /* Create a raw IDENTIFIER outside of the identifier hash
+	 table.  */
+  identifier = copy_node (conv_op_identifier);
+
+  /* Just in case something managed to bind.  */
+  IDENTIFIER_BINDING (identifier) = NULL;
+  IDENTIFIER_LABEL_VALUE (identifier) = NULL_TREE;
 
   /* Hang TYPE off the identifier so it can be found easily later
 	 when performing conversions.  */
   TREE_TYPE (identifier) = type;
 
-  /* Set the identifier kind so we know later it's a conversion.  */
-  set_identifier_kind (identifier, cik_conv_op);
+  *slot = identifier;
 }
 
   return identifier;


Re: [PATCH 1/3] [ARM] Add bus_width_bits to tune_params

2017-09-13 Thread Wilco Dijkstra
Hi Charlie,

I can't see any use for adding a bus width to tune params. There are many
different buses in a modern CPU, so there is no such thing as a single
"bus width".

What we need is to add separate costs for the different kinds of loads and
stores. The timings for these depend mostly on the micro architecture.

It would be difficult to tune for an MCU which supports 64-bit accesses
to SRAM, 16-bit accesses to DRAM and which uses 32-bit buses otherwise, 
with different wait-states to ROM, flash and device memory...

My feeling is we shouldn't try to cater for all possibilities as they are
infinite and just focus on instruction timings as those are well defined.

Wilco


Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE

2017-09-13 Thread Richard Biener
On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt
 wrote:
> Hi,
>
> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE
>
> Folding of vector loads in GIMPLE.
>
> Add code to handle gimple folding for the vec_ld builtins.
> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding
> comments have been adjusted slightly so they continue to read OK for the
> existing vec_st code.
>
> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c
> tests which have been posted separately.
>
> For V2 of this patch, I've removed the chunk of code that prohibited the
> gimple fold from occurring in BE environments.   This had fixed an issue
> for me earlier during my development of the code, and turns out this was
> not necessary.  I've sniff-tested after removing that check and it looks
> OK.
>
>>+ /* Limit folding of loads to LE targets.  */
>> +  if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)
>> +return false;
>
> I've restarted a regression test on this updated version.
>
> OK for trunk (assuming successful regression test completion)  ?
>
> Thanks,
> -Will
>
> [gcc]
>
> 2017-09-12  Will Schmidt  
>
> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
> for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).
> * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
> Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.
>
> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> index fbab0a2..bb8a77d 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, 
> tree fndecl,
>  convert (TREE_TYPE (stmt), arg0));
>stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>return stmt;
>  }
>
> -  /* Expand vec_ld into an expression that masks the address and
> - performs the load.  We need to expand this early to allow
> +  /* Expand vec_st into an expression that masks the address and
> + performs the store.  We need to expand this early to allow
>   the best aliasing, as by the time we get into RTL we no longer
>   are able to honor __restrict__, for example.  We may want to
>   consider this for all memory access built-ins.
>
>   When -maltivec=be is specified, or the wrong number of arguments
>   is provided, simply punt to existing built-in processing.  */
> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD
> -  && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
> -  && nargs == 2)
> -{
> -  tree arg0 = (*arglist)[0];
> -  tree arg1 = (*arglist)[1];
> -
> -  /* Strip qualifiers like "const" from the pointer arg.  */
> -  tree arg1_type = TREE_TYPE (arg1);
> -  if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)
> -   goto bad;
> -
> -  tree inner_type = TREE_TYPE (arg1_type);
> -  if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)
> -   {
> - arg1_type = build_pointer_type (build_qualified_type (inner_type,
> -   0));
> - arg1 = fold_convert (arg1_type, arg1);
> -   }
> -
> -  /* Construct the masked address.  Let existing error handling take
> -over if we don't have a constant offset.  */
> -  arg0 = fold (arg0);
> -
> -  if (TREE_CODE (arg0) == INTEGER_CST)
> -   {
> - if (!ptrofftype_p (TREE_TYPE (arg0)))
> -   arg0 = build1 (NOP_EXPR, sizetype, arg0);
> -
> - tree arg1_type = TREE_TYPE (arg1);
> - if (TREE_CODE (arg1_type) == ARRAY_TYPE)
> -   {
> - arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));
> - tree const0 = build_int_cstu (sizetype, 0);
> - tree arg1_elt0 = build_array_ref (loc, arg1, const0);
> - arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);
> -   }
> -
> - tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,
> -  arg1, arg0);
> - tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,
> - build_int_cst (arg1_type, -16));
> -
> - /* Find the built-in to get the return type so we can convert
> -the result properly (or fall back to default handling if the
> -arguments aren't compatible).  */
> - for (desc = altivec_overloaded_builtins;
> -  desc->code && desc->code != fcode; desc++)
> -   continue;
> -
> - for (; desc->code == fcode; desc++)
> -   if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)
> -   && (rs6000_builtin_type_compatible (TREE_TYPE (arg1),
> -   desc->op2)))
> - {
> -   tree ret_type = 

Re: [PATCH][AArch64] Remove '*' from movsi/di/ti patterns

2017-09-13 Thread Wilco Dijkstra
Andrew Pinski wrote:
>
> Note this caused a few testsuite failures:

Confirmed. The diffs show the new sequence is always better. I've
committed this:


Update vmov_n_1.c now we are generating better code for dup:

ldr s0, [x0]
dup v0.2s, v0.s[0]
ret

Instead of:

ldr w0, [x0]
dup v0.2s, w0
ret

ChangeLog: 
2017-09-13  Wilco Dijkstra  

* gcc.target/aarch64/vmov_n_1.c: Update dup scan-assembler.
--

diff --git a/gcc/testsuite/gcc.target/aarch64/vmov_n_1.c 
b/gcc/testsuite/gcc.target/aarch64/vmov_n_1.c
index 
485a1a970bbcd30cb45a2f69bbd9f62f8258d3df..d0c284296a5e256be5cf5f859b113cdd62f929ba
 100644
--- a/gcc/testsuite/gcc.target/aarch64/vmov_n_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/vmov_n_1.c
@@ -190,8 +190,9 @@ TESTFUNC_NAME (reg_len, data_type, data_len) () \
 
 OBSCURE_FUNC (64, 32, f)
 TESTFUNC (64, 32, f)
-/* "dup  Vd.2s, Rn" is less preferable then "dup  Vd.2s, Vn.s[lane]".  */
-/* { dg-final { scan-assembler-times "dup\\tv\[0-9\]+\.2s, 
v\[0-9\]+\.s\\\[\[0-9\]+\\\]" 1 } } */
+/* "dup  Vd.2s, Rn" is less preferable than "dup  Vd.2s, Vn.s[lane]".  */
+/* { dg-final { scan-assembler-not "dup\\tv\[0-9\]+\.2s, w\[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "dup\\tv\[0-9\]+\.2s, 
v\[0-9\]+\.s\\\[\[0-9\]+\\\]" 3 } } */
 
 OBSCURE_FUNC (64, 64, f)
 TESTFUNC (64, 64, f)
@@ -216,7 +217,9 @@ TESTFUNC (64, 16, s)
 
 OBSCURE_FUNC (64, 32, s)
 TESTFUNC (64, 32, s)
-/* { dg-final { scan-assembler-times "dup\\tv\[0-9\]+\.2s, w\[0-9\]+" 2 } } */
+/* "dup  Vd.2s, Rn" is less preferable than "dup  Vd.2s, Vn.s[lane]".  */
+/* { dg-final { scan-assembler-not "dup\\tv\[0-9\]+\.2s, w\[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "dup\\tv\[0-9\]+\.2s, 
v\[0-9\]+\.s\\\[\[0-9\]+\\\]" 3 } } */
 
 OBSCURE_FUNC (64, 64, s)
 TESTFUNC (64, 64, s)
@@ -242,13 +245,15 @@ TESTFUNC (64, 64, u)
 
 OBSCURE_FUNC (128, 32, f)
 TESTFUNC (128, 32, f)
-/* "dup  Vd.4s, Rn" is less preferable then "dup  Vd.4s, Vn.s[lane]".  */
-/* { dg-final { scan-assembler-times "dup\\tv\[0-9\]+\.4s, 
v\[0-9\]+\.s\\\[\[0-9\]+\\\]" 1 } } */
+/* "dup  Vd.4s, Rn" is less preferable than "dup  Vd.4s, Vn.s[lane]".  */
+/* { dg-final { scan-assembler-not "dup\\tv\[0-9\]+\.4s, w\[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "dup\\tv\[0-9\]+\.4s, 
v\[0-9\]+\.s\\\[\[0-9\]+\\\]" 3 } } */
 
 OBSCURE_FUNC (128, 64, f)
 TESTFUNC (128, 64, f)
-/* "dup  Vd.2d, Rn" is less preferable then "dup  Vd.2d, Vn.d[lane]".  */
-/* { dg-final { scan-assembler-times "dup\\tv\[0-9\]+\.2d, 
v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 1 } } */
+/* "dup  Vd.2d, Rn" is less preferable than "dup  Vd.2d, Vn.d[lane]".  */
+/* { dg-final { scan-assembler-not "dup\\tv\[0-9\]+\.2d, x\[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "dup\\tv\[0-9\]+\.2d, 
v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 3 } } */
 
 OBSCURE_FUNC (128, 8, p)
 TESTFUNC (128, 8, p)
@@ -268,11 +273,15 @@ TESTFUNC (128, 16, s)
 
 OBSCURE_FUNC (128, 32, s)
 TESTFUNC (128, 32, s)
-/* { dg-final { scan-assembler-times "dup\\tv\[0-9\]+\.4s, w\[0-9\]+" 2 } } */
+/* "dup  Vd.4s, Rn" is less preferable than "dup  Vd.4s, Vn.s[lane]".  */
+/* { dg-final { scan-assembler-not "dup\\tv\[0-9\]+\.4s, w\[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "dup\\tv\[0-9\]+\.4s, 
v\[0-9\]+\.s\\\[\[0-9\]+\\\]" 3 } } */
 
 OBSCURE_FUNC (128, 64, s)
 TESTFUNC (128, 64, s)
-/* { dg-final { scan-assembler-times "dup\\tv\[0-9\]+\.2d, x\[0-9\]+" 2 } } */
+/* "dup  Vd.2d, Rn" is less preferable than "dup  Vd.2d, Vn.d[lane]".  */
+/* { dg-final { scan-assembler-not "dup\\tv\[0-9\]+\.2d, x\[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "dup\\tv\[0-9\]+\.2d, 
v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 3 } } */
 
 OBSCURE_FUNC (128, 8, u)
 TESTFUNC (128, 8, u)


[C++ PATCH] rename METHOD_VEC to MEMBER_VEC

2017-09-13 Thread Nathan Sidwell
Now that we hold all members on the sorted METHOD_VEC, that's no longer 
a suitable name.  This patch renames the macro and all associated 
functions from method_vec_foo to member_vec_foo.


Applied to trunk.

This wraps up my current effort of class member lookup cleaning.

nathan
--
Nathan Sidwell
2017-09-13  Nathan Sidwell  

	Rename CLASSTYPE_METHOD_VEC to CLASSTYPE_MEMBER_VEC.
	* cp-tree.h (struct lang_type): Rename methods to members.
	(CLASSTYPE_METHOD_VEC): Rename to ...
	(CLASSTYPE_MEMBER_VEC): ... this.
	* name-lookup.h (get_method_slot): Rename to ...
	(get_member_slot): ... this.
	(resort_type_method_vec): Rename to ...
	(resort_type_member_vec): ... this.
	* class.c (add_method, warn_hidden): Adjust.
	* search.c (dfs_locate_field_accessor_pre): Adjust.
	* name-lookup.c (method_vec_binary_search): Rename to ...
	(member_vec_binary_search): ... this and adjust.
	(method_vec_linear_search): Rename to ...
	(member_vec_linear_search): ... this and adjust.
	(fields_linear_search, get_class_binding_direct): Adjust.
	(get_method_slot): Rename to ...
	(get_member_slot): ... this and adjust.
	(method_name_slot): Rename to ...
	(member_name_slot): ... this and adjust.
	(resort_type_method_vec): Rename to ...
	(resort_type_member_vec): ... this and adjust.
	(method_vec_append_class_fields): Rename to ...
	(member_vec_append_class_fields): ... this and adjust.
	(method_vec_append_enum_values): Rename to ...
	(member_vec_append_enum_values): ... this and adjust.
	(method_vec_dedup): Rename to ...
	(member_vec_dedup): ... this and adjust.
	(set_class_bindings, insert_late_enum_def_bindings): Adjust.
	
Index: class.c
===
--- class.c	(revision 252076)
+++ class.c	(working copy)
@@ -1014,7 +1014,7 @@ add_method (tree type, tree method, bool
   /* Maintain TYPE_HAS_USER_CONSTRUCTOR, etc.  */
   grok_special_member_properties (method);
 
-  tree *slot = get_method_slot (type, DECL_NAME (method));
+  tree *slot = get_member_slot (type, DECL_NAME (method));
   tree current_fns = *slot;
 
   gcc_assert (!DECL_EXTERN_C_P (method));
@@ -2818,10 +2818,10 @@ check_for_override (tree decl, tree ctyp
 static void
 warn_hidden (tree t)
 {
-  if (vec *method_vec = CLASSTYPE_METHOD_VEC (t))
-for (unsigned ix = method_vec->length (); ix--;)
+  if (vec *member_vec = CLASSTYPE_MEMBER_VEC (t))
+for (unsigned ix = member_vec->length (); ix--;)
   {
-	tree fns = (*method_vec)[ix];
+	tree fns = (*member_vec)[ix];
 
 	if (!OVL_P (fns))
 	  continue;
@@ -4594,7 +4594,7 @@ decl_cloned_function_p (const_tree decl,
 
 /* Produce declarations for all appropriate clones of FN.  If
UPDATE_METHODS is true, the clones are added to the
-   CLASSTYPE_METHOD_VEC.  */
+   CLASSTYPE_MEMBER_VEC.  */
 
 void
 clone_function_decl (tree fn, bool update_methods)
Index: cp-tree.h
===
--- cp-tree.h	(revision 252076)
+++ cp-tree.h	(working copy)
@@ -2000,7 +2000,7 @@ struct GTY(()) lang_type {
   tree as_base;
   vec *pure_virtuals;
   tree friend_classes;
-  vec * GTY((reorder ("resort_type_method_vec"))) methods;
+  vec * GTY((reorder ("resort_type_member_vec"))) members;
   tree key_method;
   tree decl_list;
   tree befriending_classes;
@@ -2125,19 +2125,11 @@ struct GTY(()) lang_type {
if there is no key function or if this is a class template */
 #define CLASSTYPE_KEY_METHOD(NODE) (LANG_TYPE_CLASS_CHECK (NODE)->key_method)
 
-/* Vector member functions defined in this class.  Each element is
-   either a FUNCTION_DECL, a TEMPLATE_DECL, or an OVERLOAD.  All
-   functions with the same name end up in the same slot.  The first
-   two elements are for constructors, and destructors, respectively.
-   All template conversion operators to innermost template dependent
-   types are overloaded on the next slot, if they exist.  Note, the
-   names for these functions will not all be the same.  The
-   non-template conversion operators & templated conversions to
-   non-innermost template types are next, followed by ordinary member
-   functions.  There may be empty entries at the end of the vector.
-   The conversion operators are unsorted. The ordinary member
-   functions are sorted, once the class is complete.  */
-#define CLASSTYPE_METHOD_VEC(NODE) (LANG_TYPE_CLASS_CHECK (NODE)->methods)
+/* Vector of members.  During definition, it is unordered and only
+   member functions are present.  After completion it is sorted and
+   contains both member functions and non-functions.  STAT_HACK is
+   involved to preserve oneslot per name invariant.  */
+#define CLASSTYPE_MEMBER_VEC(NODE) (LANG_TYPE_CLASS_CHECK (NODE)->members)
 
 /* For class templates, this is a TREE_LIST of all member data,
functions, types, and friends in the order of declaration.
Index: name-lookup.c

Re: [PATCH] Fix -fcompare-debug issues in find_many_sub_basic_blocks (PR target/81325)

2017-09-13 Thread Richard Biener
On Wed, 13 Sep 2017, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase fails on powerpc64le with -fcompare-debug.
> The problem is that find_many_sub_basic_blocks is called during
> expansion on a basic block where originally two __atomic_fetch_add
> calls have some debug stmts in between.  The rs6000 backend expands
> those atomic calls into a loop that starts with a CODE_LABEL and
> ends with a JUMP_INSN, so find_many_sub_basic_blocks must
> end a basic block after the JUMP_INSN from the first call and
> start a new basic block at the start of the second call.
> If there are no DEBUG_INSNs in between, it creates just those 2
> basic blocks, which is correct, but it doesn't ignore DEBUG_INSNs when
> deciding what to do and thus when there are DEBUG_ISNSs in between
> the flow_transfer_insn (JUMP_INSN) and CODE_LABEL, it creates yet
> another basic block that has just the DEBUG_INSNs in it.  That means
> different cfg between -g and -g0 and more differences later on.
> 
> As I wrote in the PR, if we have 2 loops and some code in between
> at GIMPLE, and in that code in between DCE all the non-debug stmts and
> have only debug stmts in there, we consider it a forwarder block and delete
> it with all those debug stmts (it is one of the unfortunate cases where we
> drop them on the floor, but there isn't really much we can do with those
> without adding new extensions like conditional debug stmts or debug phis).
> 
> The following patch attempts to do the same thing on RTL during
> find_many_sub_basic_blocks, i.e. ignore debug stmts when deciding what to
> do, trying to preserve debug insns somewhere if that is easily possible
> (if there is say a conditional jump with a fallthrough into debug insns
> followed by some real insn (not a CODE_LABEL nor BARRIER), the debug
> insns can go at the start of the fallthrough bb that would be created
> anyway).  If we have an insn that must end a bb and another one that must
> start a bb (CODE_LABEL) and only debug insns in between, we remove them.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux (in these two cases
> I've also gathered statistics and no debug insns during the
> bootstraps/regtests were actually removed) and powerpc64le-linux (forgot to
> apply the statistics patch, so no details, but it surely triggers at least
> on the testcase).  Ok for trunk?

Ok.

Thanks,
Richard.

> 2017-09-13  Jakub Jelinek  
> 
>   PR target/81325
>   * cfgbuild.c (find_bb_boundaries): Ignore debug insns in decisions
>   if and where to split a bb, except for splitting before debug insn
>   sequences followed by non-label real insn.  Delete debug insns
>   in between basic blocks.
> 
>   * g++.dg/cpp0x/pr81325.C: New test.
> 
> --- gcc/cfgbuild.c.jj 2017-09-12 21:57:53.084514168 +0200
> +++ gcc/cfgbuild.c2017-09-13 09:15:57.701521179 +0200
> @@ -442,9 +442,10 @@ find_bb_boundaries (basic_block bb)
>rtx_insn *end = BB_END (bb), *x;
>rtx_jump_table_data *table;
>rtx_insn *flow_transfer_insn = NULL;
> +  rtx_insn *debug_insn = NULL;
>edge fallthru = NULL;
>  
> -  if (insn == BB_END (bb))
> +  if (insn == end)
>  return;
>  
>if (LABEL_P (insn))
> @@ -455,22 +456,43 @@ find_bb_boundaries (basic_block bb)
>  {
>enum rtx_code code = GET_CODE (insn);
>  
> +  if (code == DEBUG_INSN)
> + {
> +   if (flow_transfer_insn && !debug_insn)
> + debug_insn = insn;
> + }
>/* In case we've previously seen an insn that effects a control
>flow transfer, split the block.  */
> -  if ((flow_transfer_insn || code == CODE_LABEL)
> -   && inside_basic_block_p (insn))
> +  else if ((flow_transfer_insn || code == CODE_LABEL)
> +&& inside_basic_block_p (insn))
>   {
> -   fallthru = split_block (bb, PREV_INSN (insn));
> +   rtx_insn *prev = PREV_INSN (insn);
> +
> +   /* If the first non-debug inside_basic_block_p insn after a control
> +  flow transfer is not a label, split the block before the debug
> +  insn instead of before the non-debug insn, so that the debug
> +  insns are not lost.  */
> +   if (debug_insn && code != CODE_LABEL && code != BARRIER)
> + prev = PREV_INSN (debug_insn);
> +   fallthru = split_block (bb, prev);
> if (flow_transfer_insn)
>   {
> BB_END (bb) = flow_transfer_insn;
>  
> +   rtx_insn *next;
> /* Clean up the bb field for the insns between the blocks.  */
> for (x = NEXT_INSN (flow_transfer_insn);
>  x != BB_HEAD (fallthru->dest);
> -x = NEXT_INSN (x))
> - if (!BARRIER_P (x))
> -   set_block_for_insn (x, NULL);
> +x = next)
> + {
> +   next = NEXT_INSN (x);
> +   /* Debug insns should not be in between basic blocks,
> +  drop them on the floor.  */
> +   if 

C PATCH to fix ICE with -Wsizeof-array-argument (PR c/82167)

2017-09-13 Thread Marek Polacek
This fixes a segv, where -Wsizeof-array-argument crashed because
expr.original_type was null.  It was null because for sizeof (*)
we go to c_parser_unary_expression's case CPP_AND: and case CPP_MULT:,
and that removes the original_type.  But we don't need the original 
type, we can just use TREE_TYPE of expr.value.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-09-13  Marek Polacek  

PR c/82167
* c-typeck.c (c_expr_sizeof_expr): Use the type of expr.value rather
than expr.original_type.

* gcc.dg/pr82167.c: New test.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 91996c95ed0..f45fd3cfbbf 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -2909,7 +2909,7 @@ c_expr_sizeof_expr (location_t loc, struct c_expr expr)
  if (warning_at (loc, OPT_Wsizeof_array_argument,
  "% on array function parameter %qE will "
  "return size of %qT", expr.value,
- expr.original_type))
+ TREE_TYPE (expr.value)))
inform (DECL_SOURCE_LOCATION (expr.value), "declared here");
}
   tree folded_expr = c_fully_fold (expr.value, require_constant_value,
diff --git gcc/testsuite/gcc.dg/pr82167.c gcc/testsuite/gcc.dg/pr82167.c
index e69de29bb2d..af3b3a5a1c9 100644
--- gcc/testsuite/gcc.dg/pr82167.c
+++ gcc/testsuite/gcc.dg/pr82167.c
@@ -0,0 +1,14 @@
+/* PR c/82167 */
+/* { dg-do compile } */
+
+void
+fn1 (int a[])
+{
+  __builtin_printf ("%zu\n", sizeof (*)); /* { dg-warning ".sizeof. on array 
function parameter .a. will return size of .int \\*." } */
+}
+
+void
+fn2 (int *a[])
+{
+  __builtin_printf ("%zu\n", sizeof (*)); /* { dg-warning ".sizeof. on array 
function parameter .a. will return size of .int \\*\\*." } */
+}

Marek


Re: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-09-13 Thread Kyrill Tkachov

Hi Wilco,

On 04/09/17 20:56, Wilco Dijkstra wrote:

Kyrill Tkachov wrote:


After Bernd's change almost all DI mode instructions are split before register
allocation. So instructions using DI mode no longer exist and thus these
extend variants can never be matched and are thus redundant.

Bernd's patch splits them when we don't have NEON. When NEON is
available though
they still maintain the DImode so we'd still benefit from these
transformations, no?

While you're right it may be possible to trigger these instructions, ORN is 
already
so rare that it is hardly beneficial to have an instruction for it, and ORN of 
an extended
value never ever happens. So there is absolutely no benefit in keeping these 
versions
temporarily until we fix Neon too.


Well, it was deemed useful enough to be added in the first place. I'd 
rather not remove it

if it can still be used. For the extended value I use the testcase:

unsigned long long
foo (unsigned long long a, unsigned long b)
{
  return a | ~(unsigned long long) b;
}

I compile with -O2 -S -mthumb -mcpu=cortex-a57 and I get:
orn r0, r0, r2
mov r1, #-1
bx  lr

with the *iordi_notzesidi_di pattern being present in the postreload RTL 
dump.
Bernd's patches don't split DImode operations for NEON so I think we 
should still get the same
codegen without them. So this shows an example where the pattern can be 
used.





For the Neon case my proposal is to use the VFP early expansion (so you get an 
efficient
expansion by default in all cases). You can then use -mneon-for-64bits to 
enable the use
of Neon instructions (which may be even better in some cases). There are quite 
a few
patches in this series already and more to come soon!


So I think we should first move the NEON expansion to use the early VFP 
expansion
(with proper evaluation) and then once that's in and we can guarantee 
that these patterns
can indeed never match (due to having been expanded as SImode operations 
early on) we can remove

these patterns.

So I think this is overall ok, but we need to enable the early expansion 
everywhere first before going ahead

with this patch.

Thanks,
Kyrill



Wilco




Re: [PATCH] Better fix for the x86_64 -mcmodel=large ICEs (PR target/82145)

2017-09-13 Thread Richard Biener
On Wed, 13 Sep 2017, Jakub Jelinek wrote:

> Hi!
> 
> The testcase below (and others) still ICEs with my PR81766 fix.
> If there is a cfg cleanup in between ix86_init_pic_reg (during RA)
> and postreload, the label which my fix moved to the right spot is
> turned into NOTE_INSN_DELETED_LABEL note and moved back where it
> originally used to be emitted.
> 
> The bug is actually in generic code.
> cselib.c has code to handle LABEL_REF properly during hashing, by not
> hashing the instructions/notes that are operand thereof, but just the
> label number.  Postreload though calls cselib_lookup directly on the
> operands of an instruction, and it is very common in many backends to have
> (label_ref (match_operand ...)) in many patterns, and thus cselib
> doesn't use the LABEL_REF handling in that case.
> To avoid crashing postreload.c has:
>/* cselib blows up on CODE_LABELs.  Trying to fix that doesn't seem
>   right, so avoid the problem here.  Likewise if we have a constant
>   and the insn pattern doesn't tell us the mode we need.  */
>if (LABEL_P (recog_data.operand[i])
>|| (CONSTANT_P (recog_data.operand[i])
>&& recog_data.operand_mode[i] == VOIDmode))
>  continue;
> - we won't look up anything useful for those operands anyway.
> The problem is that a valid LABEL_REF operand doesn't have to be only
> a CODE_LABEL, it can be a NOTE with NOTE_INSN_DELETED_LABEL if all we need
> is the label's address and nothing else.
> 
> So, the following patch handles NOTE_INSN_DELETED_LABEL like CODE_LABEL
> in postreload.c.
> 
> Once that is done, my earlier PR81766 fix can be effectively reverted
> and instead the CODE_LABEL can be immediately turned into
> NOTE_INSN_DELETED_LABEL, something that a cfgcleanup would do.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

The postreload change is ok.

Thanks,
Richard.

> 2017-09-13  Jakub Jelinek  
> 
>   PR target/82145
>   * postreload.c (reload_cse_simplify_operands): Skip
>   NOTE_INSN_DELETED_LABEL similarly to skipping CODE_LABEL.
>   * config/i386/i386.c (ix86_init_large_pic_reg): Revert 2017-09-01
>   changes.  Turn CODE_LABEL into NOTE_INSN_DELETED_LABEL immediately.
>   (ix86_init_pic_reg): Revert 2017-09-01 changes.
> 
>   * gcc.target/i386/pr82145.c: New test.
> 
> --- gcc/config/i386/i386.c.jj 2017-09-08 09:13:59.0 +0200
> +++ gcc/config/i386/i386.c2017-09-11 14:48:50.532094255 +0200
> @@ -8885,7 +8885,7 @@ ix86_use_pseudo_pic_reg (void)
>  
>  /* Initialize large model PIC register.  */
>  
> -static rtx_code_label *
> +static void
>  ix86_init_large_pic_reg (unsigned int tmp_regno)
>  {
>rtx_code_label *label;
> @@ -8902,7 +8902,10 @@ ix86_init_large_pic_reg (unsigned int tm
>emit_insn (gen_set_got_offset_rex64 (tmp_reg, label));
>emit_insn (ix86_gen_add3 (pic_offset_table_rtx,
>   pic_offset_table_rtx, tmp_reg));
> -  return label;
> +  const char *name = LABEL_NAME (label);
> +  PUT_CODE (label, NOTE);
> +  NOTE_KIND (label) = NOTE_INSN_DELETED_LABEL;
> +  NOTE_DELETED_LABEL_NAME (label) = name;
>  }
>  
>  /* Create and initialize PIC register if required.  */
> @@ -8911,7 +8914,6 @@ ix86_init_pic_reg (void)
>  {
>edge entry_edge;
>rtx_insn *seq;
> -  rtx_code_label *label = NULL;
>  
>if (!ix86_use_pseudo_pic_reg ())
>  return;
> @@ -8921,7 +8923,7 @@ ix86_init_pic_reg (void)
>if (TARGET_64BIT)
>  {
>if (ix86_cmodel == CM_LARGE_PIC)
> - label = ix86_init_large_pic_reg (R11_REG);
> + ix86_init_large_pic_reg (R11_REG);
>else
>   emit_insn (gen_set_got_rex64 (pic_offset_table_rtx));
>  }
> @@ -8945,22 +8947,6 @@ ix86_init_pic_reg (void)
>entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>insert_insn_on_edge (seq, entry_edge);
>commit_one_edge_insertion (entry_edge);
> -
> -  if (label)
> -{
> -  basic_block bb = BLOCK_FOR_INSN (label);
> -  rtx_insn *bb_note = PREV_INSN (label);
> -  /* If the note preceding the label starts a basic block, and the
> -  label is a member of the same basic block, interchange the two.  */
> -  if (bb_note != NULL_RTX
> -   && NOTE_INSN_BASIC_BLOCK_P (bb_note)
> -   && bb != NULL
> -   && bb == BLOCK_FOR_INSN (bb_note))
> - {
> -   reorder_insns_nobb (bb_note, bb_note, label);
> -   BB_HEAD (bb) = label;
> - }
> -}
>  }
>  
>  /* Initialize a variable CUM of type CUMULATIVE_ARGS
> --- gcc/postreload.c.jj   2017-09-08 09:13:54.0 +0200
> +++ gcc/postreload.c  2017-09-11 14:49:04.977945563 +0200
> @@ -409,9 +409,12 @@ reload_cse_simplify_operands (rtx_insn *
>CLEAR_HARD_REG_SET (equiv_regs[i]);
>  
>/* cselib blows up on CODE_LABELs.  Trying to fix that doesn't seem
> -  right, so avoid the problem here.  Likewise if we have a constant
> - and the insn pattern doesn't tell 

[PATCH] Fix -fcompare-debug issues in find_many_sub_basic_blocks (PR target/81325)

2017-09-13 Thread Jakub Jelinek
Hi!

The following testcase fails on powerpc64le with -fcompare-debug.
The problem is that find_many_sub_basic_blocks is called during
expansion on a basic block where originally two __atomic_fetch_add
calls have some debug stmts in between.  The rs6000 backend expands
those atomic calls into a loop that starts with a CODE_LABEL and
ends with a JUMP_INSN, so find_many_sub_basic_blocks must
end a basic block after the JUMP_INSN from the first call and
start a new basic block at the start of the second call.
If there are no DEBUG_INSNs in between, it creates just those 2
basic blocks, which is correct, but it doesn't ignore DEBUG_INSNs when
deciding what to do and thus when there are DEBUG_ISNSs in between
the flow_transfer_insn (JUMP_INSN) and CODE_LABEL, it creates yet
another basic block that has just the DEBUG_INSNs in it.  That means
different cfg between -g and -g0 and more differences later on.

As I wrote in the PR, if we have 2 loops and some code in between
at GIMPLE, and in that code in between DCE all the non-debug stmts and
have only debug stmts in there, we consider it a forwarder block and delete
it with all those debug stmts (it is one of the unfortunate cases where we
drop them on the floor, but there isn't really much we can do with those
without adding new extensions like conditional debug stmts or debug phis).

The following patch attempts to do the same thing on RTL during
find_many_sub_basic_blocks, i.e. ignore debug stmts when deciding what to
do, trying to preserve debug insns somewhere if that is easily possible
(if there is say a conditional jump with a fallthrough into debug insns
followed by some real insn (not a CODE_LABEL nor BARRIER), the debug
insns can go at the start of the fallthrough bb that would be created
anyway).  If we have an insn that must end a bb and another one that must
start a bb (CODE_LABEL) and only debug insns in between, we remove them.

Bootstrapped/regtested on x86_64-linux and i686-linux (in these two cases
I've also gathered statistics and no debug insns during the
bootstraps/regtests were actually removed) and powerpc64le-linux (forgot to
apply the statistics patch, so no details, but it surely triggers at least
on the testcase).  Ok for trunk?

2017-09-13  Jakub Jelinek  

PR target/81325
* cfgbuild.c (find_bb_boundaries): Ignore debug insns in decisions
if and where to split a bb, except for splitting before debug insn
sequences followed by non-label real insn.  Delete debug insns
in between basic blocks.

* g++.dg/cpp0x/pr81325.C: New test.

--- gcc/cfgbuild.c.jj   2017-09-12 21:57:53.084514168 +0200
+++ gcc/cfgbuild.c  2017-09-13 09:15:57.701521179 +0200
@@ -442,9 +442,10 @@ find_bb_boundaries (basic_block bb)
   rtx_insn *end = BB_END (bb), *x;
   rtx_jump_table_data *table;
   rtx_insn *flow_transfer_insn = NULL;
+  rtx_insn *debug_insn = NULL;
   edge fallthru = NULL;
 
-  if (insn == BB_END (bb))
+  if (insn == end)
 return;
 
   if (LABEL_P (insn))
@@ -455,22 +456,43 @@ find_bb_boundaries (basic_block bb)
 {
   enum rtx_code code = GET_CODE (insn);
 
+  if (code == DEBUG_INSN)
+   {
+ if (flow_transfer_insn && !debug_insn)
+   debug_insn = insn;
+   }
   /* In case we've previously seen an insn that effects a control
 flow transfer, split the block.  */
-  if ((flow_transfer_insn || code == CODE_LABEL)
- && inside_basic_block_p (insn))
+  else if ((flow_transfer_insn || code == CODE_LABEL)
+  && inside_basic_block_p (insn))
{
- fallthru = split_block (bb, PREV_INSN (insn));
+ rtx_insn *prev = PREV_INSN (insn);
+
+ /* If the first non-debug inside_basic_block_p insn after a control
+flow transfer is not a label, split the block before the debug
+insn instead of before the non-debug insn, so that the debug
+insns are not lost.  */
+ if (debug_insn && code != CODE_LABEL && code != BARRIER)
+   prev = PREV_INSN (debug_insn);
+ fallthru = split_block (bb, prev);
  if (flow_transfer_insn)
{
  BB_END (bb) = flow_transfer_insn;
 
+ rtx_insn *next;
  /* Clean up the bb field for the insns between the blocks.  */
  for (x = NEXT_INSN (flow_transfer_insn);
   x != BB_HEAD (fallthru->dest);
-  x = NEXT_INSN (x))
-   if (!BARRIER_P (x))
- set_block_for_insn (x, NULL);
+  x = next)
+   {
+ next = NEXT_INSN (x);
+ /* Debug insns should not be in between basic blocks,
+drop them on the floor.  */
+ if (DEBUG_INSN_P (x))
+   delete_insn (x);
+ else if (!BARRIER_P (x))
+   set_block_for_insn (x, NULL);
+   }
}
 
  bb = 

[Ada] vxworks: auto-registration of foreign threads

2017-09-13 Thread Pierre-Marie de Rodat
To make Ada tasks and C threads interoperate better, we have added some
functionality to Self. Suppose a C main program (with threads) calls an
Ada procedure and the Ada procedure calls the tasking runtime system.
Eventually, a call will be made to self. Since the call is not coming
from an Ada task, there will be no corresponding ATCB.

What we do in Self is to catch references that do not come from
recognized Ada tasks, and create an ATCB for the calling thread.

The new ATCB will be "detached" from the normal Ada task master
hierarchy, much like the existing implicitly created signal-server
tasks.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-13  Jerome Guitton  

* libgnarl/s-tpopsp__vxworks-tls.adb,
libgnarl/s-tpopsp__vxworks-rtp.adb, libgnarl/s-tpopsp__vxworks.adb
(Self): Register thread if task id is null.

Index: libgnarl/s-tpopsp__vxworks-tls.adb
===
--- libgnarl/s-tpopsp__vxworks-tls.adb  (revision 252075)
+++ libgnarl/s-tpopsp__vxworks-tls.adb  (working copy)
@@ -71,9 +71,29 @@
-- Self --
--
 
+   --  To make Ada tasks and C threads interoperate better, we have added some
+   --  functionality to Self. Suppose a C main program (with threads) calls an
+   --  Ada procedure and the Ada procedure calls the tasking runtime system.
+   --  Eventually, a call will be made to self. Since the call is not coming
+   --  from an Ada task, there will be no corresponding ATCB.
+
+   --  What we do in Self is to catch references that do not come from
+   --  recognized Ada tasks, and create an ATCB for the calling thread.
+
+   --  The new ATCB will be "detached" from the normal Ada task master
+   --  hierarchy, much like the existing implicitly created signal-server
+   --  tasks.
+
function Self return Task_Id is
+  Result : constant Task_Id := ATCB;
begin
-  return ATCB;
+  if Result /= null then
+ return Result;
+  else
+ --  If the value is Null then it is a non-Ada task
+
+ return Register_Foreign_Thread;
+  end if;
end Self;
 
 end Specific;
Index: libgnarl/s-tpopsp__vxworks-rtp.adb
===
--- libgnarl/s-tpopsp__vxworks-rtp.adb  (revision 252075)
+++ libgnarl/s-tpopsp__vxworks-rtp.adb  (working copy)
@@ -72,9 +72,29 @@
-- Self --
--
 
+   --  To make Ada tasks and C threads interoperate better, we have added some
+   --  functionality to Self. Suppose a C main program (with threads) calls an
+   --  Ada procedure and the Ada procedure calls the tasking runtime system.
+   --  Eventually, a call will be made to self. Since the call is not coming
+   --  from an Ada task, there will be no corresponding ATCB.
+
+   --  What we do in Self is to catch references that do not come from
+   --  recognized Ada tasks, and create an ATCB for the calling thread.
+
+   --  The new ATCB will be "detached" from the normal Ada task master
+   --  hierarchy, much like the existing implicitly created signal-server
+   --  tasks.
+
function Self return Task_Id is
+  Result : constant Task_Id := To_Task_Id (tlsValueGet (ATCB_Key));
begin
-  return To_Task_Id (tlsValueGet (ATCB_Key));
+  if Result /= null then
+ return Result;
+  else
+ --  If the value is Null then it is a non-Ada task
+
+ return Register_Foreign_Thread;
+  end if;
end Self;
 
 end Specific;
Index: libgnarl/s-tpopsp__vxworks.adb
===
--- libgnarl/s-tpopsp__vxworks.adb  (revision 252075)
+++ libgnarl/s-tpopsp__vxworks.adb  (working copy)
@@ -121,9 +121,29 @@
-- Self --
--
 
+   --  To make Ada tasks and C threads interoperate better, we have added some
+   --  functionality to Self. Suppose a C main program (with threads) calls an
+   --  Ada procedure and the Ada procedure calls the tasking runtime system.
+   --  Eventually, a call will be made to self. Since the call is not coming
+   --  from an Ada task, there will be no corresponding ATCB.
+
+   --  What we do in Self is to catch references that do not come from
+   --  recognized Ada tasks, and create an ATCB for the calling thread.
+
+   --  The new ATCB will be "detached" from the normal Ada task master
+   --  hierarchy, much like the existing implicitly created signal-server
+   --  tasks.
+
function Self return Task_Id is
+  Result : constant Task_Id := To_Task_Id (ATCB_Key);
begin
-  return To_Task_Id (ATCB_Key);
+  if Result /= null then
+ return Result;
+  else
+ --  If the value is Null then it is a non-Ada task
+
+ return Register_Foreign_Thread;
+  end if;
end Self;
 
 end Specific;


[Ada] Ineffective pragma Suppress (Alignment_Check) on warning

2017-09-13 Thread Pierre-Marie de Rodat
On platforms that require strict alignment of memory accesses, the per-object
form of pragma Suppress (Alignment_Check) also disables the alignment warning
associated with the check.  That's not the case for the global form and this
change fixes the inconsistency.

Here's an example on a small package compiled with -gnatl:

Compiling: p.ads
Source file time stamp: 2017-08-07 10:41:19
Compiled at: 2017-08-07 15:19:52

 1. package P is
 2.
 3.   type Arr is array (1 .. 16) of Short_Integer;
 4.
 5.   A : Arr;
 6.
 7.   pragma Suppress (Alignment_Check);
 8.
 9.   F1 : Float;
10.   for F1 use at A'Address;  -- no warning
11.
12.   F2 : Float;
13.   for F2 use at A'Address;  -- warning
  |
>>> warning: specified address for "F2" may be inconsistent with
 alignment
>>> warning: program execution may be erroneous (RM 13.3(27))
>>> warning: alignment of "F2" is 4
>>> warning: alignment of "A" is 2

14.   pragma Unsuppress (Alignment_Check, F2);
15.
16.   pragma Unsuppress (Alignment_Check);
17.
18.   F3 : Float;
19.   for F3 use at A'Address;  -- warning
  |
>>> warning: specified address for "F3" may be inconsistent with
 alignment
>>> warning: program execution may be erroneous (RM 13.3(27))
>>> warning: alignment of "F3" is 4
>>> warning: alignment of "A" is 2

20.
21.   F4 : Float;
22.   for F4 use at A'Address;  -- no warning
23.   pragma Suppress (Alignment_Check, F4);
24.
25. end P;

 25 lines: No errors, 8 warnings

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-09-13  Eric Botcazou  

* sem_ch13.adb (Register_Address_Clause_Check): New procedure to save
the suppression status of Alignment_Check on the current scope.
(Alignment_Checks_Suppressed): New function to use the saved instead of
the current suppression status of Alignment_Check.
(Address_Clause_Check_Record): Add Alignment_Checks_Suppressed field.
(Analyze_Attribute_Definition_Clause): Instead of manually appending to
the table, call Register_Address_Clause_Check.
(Validate_Address_Clauses): Call Alignment_Checks_Suppressed on the
recorded address clause instead of its entity.

Index: sem_ch13.adb
===
--- sem_ch13.adb(revision 252075)
+++ sem_ch13.adb(working copy)
@@ -203,6 +203,15 @@
--  renaming_as_body. For tagged types, the specification is one of the
--  primitive specs.
 
+   procedure Register_Address_Clause_Check
+ (N   : Node_Id;
+  X   : Entity_Id;
+  A   : Uint;
+  Y   : Entity_Id;
+  Off : Boolean);
+   --  Register a check for the address clause N. The rest of the parameters
+   --  are in keeping with the components of Address_Clause_Check_Record below.
+
procedure Resolve_Iterable_Operation
  (N  : Node_Id;
   Cursor : Entity_Id;
@@ -318,6 +327,11 @@
 
   Off : Boolean;
   --  Whether the address is offset within Y in the second case
+
+  Alignment_Checks_Suppressed : Boolean;
+  --  Whether alignment checks are suppressed by an active scope suppress
+  --  setting. We need to save the value in order to be able to reuse it
+  --  after the back end has been run.
end record;
 
package Address_Clause_Checks is new Table.Table (
@@ -328,6 +342,26 @@
  Table_Increment  => 200,
  Table_Name   => "Address_Clause_Checks");
 
+   function Alignment_Checks_Suppressed
+ (ACCR : Address_Clause_Check_Record) return Boolean;
+   --  Return whether the alignment check generated for the address clause
+   --  is suppressed.
+
+   -
+   -- Alignment_Checks_Suppressed --
+   -
+
+   function Alignment_Checks_Suppressed
+ (ACCR : Address_Clause_Check_Record) return Boolean
+   is
+   begin
+  if Checks_May_Be_Suppressed (ACCR.X) then
+ return Is_Check_Suppressed (ACCR.X, Alignment_Check);
+  else
+ return ACCR.Alignment_Checks_Suppressed;
+  end if;
+   end Alignment_Checks_Suppressed;
+
-
-- Adjust_Record_For_Reverse_Bit_Order --
-
@@ -5047,8 +5081,8 @@
and then not Is_Generic_Type (Etype (U_Ent))
and then Address_Clause_Overlay_Warnings
  then
-Address_Clause_Checks.Append
-  ((N, U_Ent, No_Uint, O_Ent, Off));
+Register_Address_Clause_Check
+  (N, 

[PATCH] Better fix for the x86_64 -mcmodel=large ICEs (PR target/82145)

2017-09-13 Thread Jakub Jelinek
Hi!

The testcase below (and others) still ICEs with my PR81766 fix.
If there is a cfg cleanup in between ix86_init_pic_reg (during RA)
and postreload, the label which my fix moved to the right spot is
turned into NOTE_INSN_DELETED_LABEL note and moved back where it
originally used to be emitted.

The bug is actually in generic code.
cselib.c has code to handle LABEL_REF properly during hashing, by not
hashing the instructions/notes that are operand thereof, but just the
label number.  Postreload though calls cselib_lookup directly on the
operands of an instruction, and it is very common in many backends to have
(label_ref (match_operand ...)) in many patterns, and thus cselib
doesn't use the LABEL_REF handling in that case.
To avoid crashing postreload.c has:
   /* cselib blows up on CODE_LABELs.  Trying to fix that doesn't seem
  right, so avoid the problem here.  Likewise if we have a constant
  and the insn pattern doesn't tell us the mode we need.  */
   if (LABEL_P (recog_data.operand[i])
   || (CONSTANT_P (recog_data.operand[i])
   && recog_data.operand_mode[i] == VOIDmode))
 continue;
- we won't look up anything useful for those operands anyway.
The problem is that a valid LABEL_REF operand doesn't have to be only
a CODE_LABEL, it can be a NOTE with NOTE_INSN_DELETED_LABEL if all we need
is the label's address and nothing else.

So, the following patch handles NOTE_INSN_DELETED_LABEL like CODE_LABEL
in postreload.c.

Once that is done, my earlier PR81766 fix can be effectively reverted
and instead the CODE_LABEL can be immediately turned into
NOTE_INSN_DELETED_LABEL, something that a cfgcleanup would do.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-09-13  Jakub Jelinek  

PR target/82145
* postreload.c (reload_cse_simplify_operands): Skip
NOTE_INSN_DELETED_LABEL similarly to skipping CODE_LABEL.
* config/i386/i386.c (ix86_init_large_pic_reg): Revert 2017-09-01
changes.  Turn CODE_LABEL into NOTE_INSN_DELETED_LABEL immediately.
(ix86_init_pic_reg): Revert 2017-09-01 changes.

* gcc.target/i386/pr82145.c: New test.

--- gcc/config/i386/i386.c.jj   2017-09-08 09:13:59.0 +0200
+++ gcc/config/i386/i386.c  2017-09-11 14:48:50.532094255 +0200
@@ -8885,7 +8885,7 @@ ix86_use_pseudo_pic_reg (void)
 
 /* Initialize large model PIC register.  */
 
-static rtx_code_label *
+static void
 ix86_init_large_pic_reg (unsigned int tmp_regno)
 {
   rtx_code_label *label;
@@ -8902,7 +8902,10 @@ ix86_init_large_pic_reg (unsigned int tm
   emit_insn (gen_set_got_offset_rex64 (tmp_reg, label));
   emit_insn (ix86_gen_add3 (pic_offset_table_rtx,
pic_offset_table_rtx, tmp_reg));
-  return label;
+  const char *name = LABEL_NAME (label);
+  PUT_CODE (label, NOTE);
+  NOTE_KIND (label) = NOTE_INSN_DELETED_LABEL;
+  NOTE_DELETED_LABEL_NAME (label) = name;
 }
 
 /* Create and initialize PIC register if required.  */
@@ -8911,7 +8914,6 @@ ix86_init_pic_reg (void)
 {
   edge entry_edge;
   rtx_insn *seq;
-  rtx_code_label *label = NULL;
 
   if (!ix86_use_pseudo_pic_reg ())
 return;
@@ -8921,7 +8923,7 @@ ix86_init_pic_reg (void)
   if (TARGET_64BIT)
 {
   if (ix86_cmodel == CM_LARGE_PIC)
-   label = ix86_init_large_pic_reg (R11_REG);
+   ix86_init_large_pic_reg (R11_REG);
   else
emit_insn (gen_set_got_rex64 (pic_offset_table_rtx));
 }
@@ -8945,22 +8947,6 @@ ix86_init_pic_reg (void)
   entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
   insert_insn_on_edge (seq, entry_edge);
   commit_one_edge_insertion (entry_edge);
-
-  if (label)
-{
-  basic_block bb = BLOCK_FOR_INSN (label);
-  rtx_insn *bb_note = PREV_INSN (label);
-  /* If the note preceding the label starts a basic block, and the
-label is a member of the same basic block, interchange the two.  */
-  if (bb_note != NULL_RTX
- && NOTE_INSN_BASIC_BLOCK_P (bb_note)
- && bb != NULL
- && bb == BLOCK_FOR_INSN (bb_note))
-   {
- reorder_insns_nobb (bb_note, bb_note, label);
- BB_HEAD (bb) = label;
-   }
-}
 }
 
 /* Initialize a variable CUM of type CUMULATIVE_ARGS
--- gcc/postreload.c.jj 2017-09-08 09:13:54.0 +0200
+++ gcc/postreload.c2017-09-11 14:49:04.977945563 +0200
@@ -409,9 +409,12 @@ reload_cse_simplify_operands (rtx_insn *
   CLEAR_HARD_REG_SET (equiv_regs[i]);
 
   /* cselib blows up on CODE_LABELs.  Trying to fix that doesn't seem
-right, so avoid the problem here.  Likewise if we have a constant
- and the insn pattern doesn't tell us the mode we need.  */
+right, so avoid the problem here.  Similarly NOTE_INSN_DELETED_LABEL.
+Likewise if we have a constant and the insn pattern doesn't tell us
+the mode we need.  */
   if (LABEL_P (recog_data.operand[i])
+ || (NOTE_P 

[wwwdocs] Tweak a link in gcc-4.3/porting_to.html

2017-09-13 Thread Gerald Pfeifer
Just another server moving to https, this time crustytoothpaste.ath.cx.

Applied.

Gerald

Index: gcc-4.3/porting_to.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.3/porting_to.html,v
retrieving revision 1.14
diff -u -r1.14 porting_to.html
--- gcc-4.3/porting_to.html 25 Feb 2017 15:37:46 -  1.14
+++ gcc-4.3/porting_to.html 9 Sep 2017 06:44:55 -
@@ -527,7 +527,7 @@
 
 
 
-Brian M. Carlson, http://crustytoothpaste.ath.cx/~bmc/blog/entries/;>GCC 4.3: Declaration 
of...Changes Meaning of...
+Brian M. Carlson, https://crustytoothpaste.ath.cx/~bmc/blog/entries/;>GCC 4.3: Declaration 
of...Changes Meaning of...
 
 
 


Re: [PATCH] PR libstdc++/81835 fix broken URLs in libstdc++ docs

2017-09-13 Thread Jonathan Wakely

On 13/09/17 11:06 +0100, Jonathan Wakely wrote:

PR libstdc++/81835
* doc/xml/manual/extensions.xml: Replace unstable URL.
* doc/html/manual/ext_demangling.html: Regenerate.
* libsupc++/cxxabi.h (__cxa_demangle): Fix broken URL.


More of the same.

Committed to trunk, will backport too.

commit 8b68d9fc404ee72eb06281b003d9c17066bc3e80
Author: Jonathan Wakely 
Date:   Wed Sep 13 11:16:29 2017 +0100

Fix broken URLs in libstdc++ API docs

* doc/doxygen/mainpage.html: Fix broken URLs.

diff --git a/libstdc++-v3/doc/doxygen/mainpage.html b/libstdc++-v3/doc/doxygen/mainpage.html
index aa650bafeda..b54482a74e9 100644
--- a/libstdc++-v3/doc/doxygen/mainpage.html
+++ b/libstdc++-v3/doc/doxygen/mainpage.html
@@ -28,7 +28,7 @@
 
 There are two types of documentation for libstdc++.  One is the
distribution documentation, which can be read online
-   http://gcc.gnu.org/onlinedocs/libstdc++/index.html;>here
+   https://gcc.gnu.org/onlinedocs/libstdc++/index.html;>here
or offline from the file doc/html/index.html in the library source
directory.
 
@@ -78,11 +78,11 @@
pages.  See the section "Documentation Style"
in doc/xml/manual/appendix_contributing.xml in the
source tree for how to create (and write) the doxygen markup.
-  This style guide can also be viewed on the http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01apas04.html;>web.
+  This style guide can also be viewed on the https://gcc.gnu.org/onlinedocs/libstdc++/manual/documentation_hacking.html;>web.
 
 License, Copyright, and Other Lawyerly Verbosity
 The libstdc++ documentation is released under
-   http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt01ch01s02.html;>
+   https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_gpl.html;>
these terms.
 
 Part of the generated documentation involved comments and notes from


[PATCH] PR libstdc++/81835 fix broken URLs in libstdc++ docs

2017-09-13 Thread Jonathan Wakely

PR libstdc++/81835
* doc/xml/manual/extensions.xml: Replace unstable URL.
* doc/html/manual/ext_demangling.html: Regenerate.
* libsupc++/cxxabi.h (__cxa_demangle): Fix broken URL.

Committed to trunk, will backport too.

commit ad7d4f59d81d948255ae5d7aaee3f4147a1c3df7
Author: Jonathan Wakely 
Date:   Wed Sep 13 11:02:11 2017 +0100

PR libstdc++/81835 fix broken URLs in libstdc++ docs

PR libstdc++/81835
* doc/xml/manual/extensions.xml: Replace unstable URL.
* doc/html/manual/ext_demangling.html: Regenerate.
* libsupc++/cxxabi.h (__cxa_demangle): Fix broken URL.

diff --git a/libstdc++-v3/doc/xml/manual/extensions.xml 
b/libstdc++-v3/doc/xml/manual/extensions.xml
index 41b1a801325..a6e4db2b6f7 100644
--- a/libstdc++-v3/doc/xml/manual/extensions.xml
+++ b/libstdc++-v3/doc/xml/manual/extensions.xml
@@ -502,7 +502,7 @@ get_temporary_buffer(5, (int*)0);
 demangling.
   
   
-If you have read the http://www.w3.org/1999/xlink; 
xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a01115.html;>source
+If you have read the http://www.w3.org/1999/xlink; 
xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/namespaces.html;>source
 documentation for namespace abi then you are
 aware of the cross-vendor C++ ABI in use by GCC.  One of the
 exposed functions is used for demangling,
diff --git a/libstdc++-v3/libsupc++/cxxabi.h b/libstdc++-v3/libsupc++/cxxabi.h
index b66d6d194bc..6e6b21ddbde 100644
--- a/libstdc++-v3/libsupc++/cxxabi.h
+++ b/libstdc++-v3/libsupc++/cxxabi.h
@@ -182,7 +182,7 @@ namespace __cxxabiv1
*  with GNU extensions. For example, this function is used in
*  __gnu_cxx::__verbose_terminate_handler.
*
-   *  See http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt12ch39.html
+   *  See https://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html
*  for other examples of use.
*
*  @note The same demangling functionality is available via


[Ada] Undefined symbol at link time due to Disable_Controlled

2017-09-13 Thread Pierre-Marie de Rodat
This patch reimplements aspect Disable_Controlled to plug the following holes
in its original implementation:

   * The aspect may appear without an expression in which case the aspect
 defaults to True, however the compiler would crash due to the lack of
 expression.

   * If the expression is present, then it should be static, however the
 compiler would silently accept a non-static expression.

   * Various types that derive and/or contain a component of a type subject
 to the aspect are now properly handled.

The patch also modifies predicate Is_Controlled to indicate whether a type is
derived from [Limited_]Controlled AND NOT subject to aspect Disable_Controlled.
This modification allows the semantics of the aspect to automatically perculate
to derived types and/or composite types with components subject to the aspect.
As a result, the finalization mechanism now properly handles such types and
generates the appropriate Deep_Adjust, Deep_Initialize, and Deep_Finalize
primitives.


-- Source --


--  factorial.ads

function Factorial (Val : Natural) return Natural;

--  factorial.adb

function Factorial (Val : Natural) return Natural is
begin
   if Val > 1 then
  return Val * Factorial (Val - 1);
   end if;

   return 1;
end Factorial;

--  semantics.ads

with Ada.Finalization; use Ada.Finalization;
with Factorial;

package Semantics is
   generic
  Flag : Boolean;
  Int  : Integer;

   package Nested_Gen is
  type Ctrl_Rec_1 is new Controlled with null record
with Disable_Controlled => Int;  --  Error

  type Ctrl_Rec_2 is new Limited_Controlled with null record
with Disable_Controlled => Factorial (3) = 6;--  N/A

  type Ctrl_Rec_3 is new Controlled with null record
with Disable_Controlled => Flag; --  OK
   end Nested_Gen;

   subtype Small_Int is Integer range 1 .. 10
 with Disable_Controlled;--  Error

   type Rec is null record
 with Disable_Controlled => False;   --  Error

   type Ctrl_Rec_1 is new Controlled with null record
 with Disable_Controlled => "what?"; --  Error

   type Ctrl_Rec_2 is new Limited_Controlled with null record
 with Disable_Controlled => Factorial (3) = 6;   --  Error

   type Ctrl_Rec_3 is new Controlled with null record
 with Disable_Controlled => True;--  OK

   Is_True : constant Boolean := True;

   type Ctrl_Rec_4 is new Limited_Controlled with null record
 with Disable_Controlled => Is_True; --  OK
end Semantics;

--  types.ads

with Ada.Finalization; use Ada.Finalization;

package Types is
   generic
  Flag : Boolean;

   package Gen is
  type Ctrl is new Controlled with record
 Id : Natural;
  end record;

  procedure Adjust (Obj : in out Ctrl);
  procedure Finalize (Obj : in out Ctrl);
  procedure Initialize (Obj : in out Ctrl);

  type Ctrl_DC is new Controlled with record
 Id : Natural;
  end record
with Disable_Controlled => Flag;

  procedure Adjust (Obj : in out Ctrl_DC);
  procedure Finalize (Obj : in out Ctrl_DC);
  procedure Initialize (Obj : in out Ctrl_DC);

  type Ctrl_Ctrl_DC is new Controlled with record
 Id   : Natural;
 Comp : Ctrl_DC;
  end record;

  procedure Adjust (Obj : in out Ctrl_Ctrl_DC);
  procedure Finalize (Obj : in out Ctrl_Ctrl_DC);
  procedure Initialize (Obj : in out Ctrl_Ctrl_DC);

  type Ctrl_DC_Ctrl is new Controlled with record
 Id   : Natural;
 Comp : Ctrl;
  end record
with Disable_Controlled => True;

  procedure Adjust (Obj : in out Ctrl_DC_Ctrl);
  procedure Finalize (Obj : in out Ctrl_DC_Ctrl);
  procedure Initialize (Obj : in out Ctrl_DC_Ctrl);

  type Ctrl_DC_Ctrl_DC is new Controlled with record
 Id   : Natural;
 Comp : Ctrl_DC;
  end record
with Disable_Controlled;

  procedure Adjust (Obj : in out Ctrl_DC_Ctrl_DC);
  procedure Finalize (Obj : in out Ctrl_DC_Ctrl_DC);
  procedure Initialize (Obj : in out Ctrl_DC_Ctrl_DC);

  type Rec_Ctrl_DC is record
 Comp : Ctrl_DC;
  end record;
   end Gen;

   generic
  Typ_Name : String;
  type Typ is private;
   procedure Test;

   type Ctrl is new Controlled with record
  Id : Natural;
   end record;

   procedure Adjust (Obj : in out Ctrl);
   procedure Finalize (Obj : in out Ctrl);
   procedure Initialize (Obj : in out Ctrl);

   type Ctrl_DC is new Controlled with record
  Id : Natural;
   end record
 with Disable_Controlled => True;

   procedure Adjust (Obj : in out Ctrl_DC);
   procedure Finalize (Obj : in out Ctrl_DC);
   procedure Initialize (Obj : in out Ctrl_DC);

   type Ctrl_Ctrl_DC is new 

Re: [PATCH][GCC][ARM] Dot Product commandline options [Patch (1/8)]

2017-09-13 Thread Kyrill Tkachov

Hi Tamar,

On 01/09/17 14:19, Tamar Christina wrote:

Hi All,

This patch adds support for the +dotprod extension to ARM.
Dot Product requires Adv.SIMD to work and so enables this option
by default when enabled.

It is available from ARMv8.2-a and onwards and is enabled by
default on Cortex-A55 and Cortex-A75.

Regtested and bootstrapped on arm-none-eabi and no issues.


I'm assuming you mean arm-none-linux-gnueabihf :)


Ok for trunk?

gcc/
2017-09-01  Tamar Christina  

* config/arm/arm.h (TARGET_DOTPROD): New.
* config/arm/arm.c (arm_arch_dotprod): New.
(arm_option_reconfigure_globals): Add arm_arch_dotprod.
* config/arm/arm-c.c (__ARM_FEATURE_DOTPROD): New.
* config/arm/arm-cpus.in (cortex-a55, cortex-75): Enabled +dotprod.
(armv8.2-a, cortex-a75.cortex-a55): Likewise.
* config/arm/arm-isa.h (isa_bit_dotprod, ISA_DOTPROD): New.


arm-isa.h is now autogenerated after r251799 so you'll need to rebase on 
top of that.
That being said, that patch was temporarily reverted [1] so you'll have 
to apply it manually in your

tree to rebase, or wait until it is reapplied.

[1] https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00579.html

The patch looks ok to me otherwise with a documentation nit below.


* config/arm/t-multilib (v8_2_a_simd_variants): Add dotprod.
* doc/invoke.texi (armv8.2-a): Document dotprod



--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15492,6 +15492,10 @@ The ARMv8.1 Advanced SIMD and floating-point 
instructions.
 The cryptographic instructions.  This also enables the Advanced SIMD and
 floating-point instructions.
 
+@item +dotprod

+Enable the Dot Product extension.  This also enables Advanced SIMD instructions
+and allows auto vectorization of dot products to the Dot Product instructions.

This should be "auto-vectorization"

Thanks,
Kyrill





new Ada merger

2017-09-13 Thread Arnaud Charlet
This is a heads up that I am working with Pierre-Marie (in cc:) to
transition my work as "AdaCore merger" - aka the person who submits most
of AdaCore front-end changes in GCC - over to him.

Pierre-Marie has already write-after-approval and has submitted and
committed many patches already in the past years, and will now start
sending emails to this list about changes on the Ada front-end similarly
to what I've done so far (emails whose subject starts with "[Ada]").

We will initially work together on these changes and once Pierre-Marie is
familiar enough with the process, he will take over on his own.

In particular the next emails and commits that will come from Pierre-Marie
will be done together (with my implicit approval).

Arno


Re: [PATCH][GCC][ARM] Dot Product NEON patterns [Patch (2/8)]

2017-09-13 Thread Kyrill Tkachov

Hi Tamar,

On 01/09/17 14:33, Tamar Christina wrote:

Hi All,

This patch adds the instructions for Dot Product to ARM along
with the intrinsics and vectorizer pattern.

Armv8.2-a dot product supports 8-bit element values both
signed and unsigned.

Dot product is available from Armv8.2-a and onwards.

Regtested and bootstrapped on arm-none-eabi and no issues.

Ok for trunk?


This is ok once the prerequisites are approved with one ChangeLog nit.

Kyrill


gcc/
2017-09-01  Tamar Christina  

* config/arm/arm-builtins.c (arm_unsigned_uternop_qualifiers): New.
(UTERNOP_QUALIFIERS, arm_umac_lane_qualifiers, UMAC_LANE_QUALIFIERS): 
New.
* config/arm/arm_neon_builtins.def (sdot, udot, sdot_lane, udot_lane): 
new.
* config/arm/iterators.md (DOTPROD, DOT_MODE, dot_mode): New.
(UNSPEC_DOT_S, UNSPEC_DOT_U, opsuffix): New.
* config/arm/neon.md (neon_dot): New.
(neon_dot_lane, dot_prod): New.
* config/arm/types.md (neon_dot, neon_dot_q): New.
* config/arm/unspecs.md (UNSPEC_DOT_S, UNSPEC_DOT_U): New.



diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 
7acbaf1bb40a4f270e75968804546508f7839e49..139e09fd929e17216ad9383505f1453a73d071fb
 100644
--- a/gcc/config/arm/iterators.md

--snip---

 ;;
 ;; Code attributes
 ;;
@@ -816,6 +822,7 @@
   (UNSPEC_VSRA_S_N "s") (UNSPEC_VSRA_U_N "u")
   (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
   (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u")
+  (UNSPEC_DOT_S "s") (UNSPEC_DOT_U "u")
 ])

In your ChangeLog you list this as "New" whereas your patch just adds them to the 
"sup" int_attr.



Re: [PATCH] Factor out division by squares and remove division around comparisons (0/2)

2017-09-13 Thread Jackson Woodruff

On 09/12/2017 11:43 PM, Jeff Law wrote:

On 09/06/2017 03:54 AM, Jackson Woodruff wrote:

Hi all,

This patch is split from part (1/2). It includes the patterns that have
been moved out of fold-const.c


It also removes an (almost entirely) redundant pattern:

 (A / C1) +- (A / C2) -> A * (1 / C1 +- 1 / C2)

which was only used in special cases, either with combinations
of flags like -fno-reciprocal-math -funsafe-math-optimizations
and cases where C was sNaN, or small enough to result in infinity.

This pattern is covered by:

(A / C1) +- (A / C2) -> (with O1 and reciprocal math)
A * (1 / C1) +- A * (1 / C2) ->
A * (1 / C1 +- 1 / C2)

The previous pattern required funsafe-math-optimizations.

To adjust for this case, the testcase has been updated to require O1 so
that the optimization is still performed.


This pattern is moved verbatim into match.pd:

 (A / C) +- (B / C) -> (A +- B) / C.

OK for trunk?

Jackson

gcc/

2017-08-30  Jackson Woodruff  

 PR 71026/tree-optimization
 * match.pd: Move RDIV patterns from fold-const.c
 * fold-const.c (distribute_real_division): Removed.
 (fold_binary_loc): Remove calls to distribute_real_divison.

gcc/testsuite/

2017-08-30  Jackson Woodruff  

 PR 71026/tree-optimization
 * gcc/testsuire/gcc.dg/fold-div-1.c: Use O1.


Didn't you lose the case where the operator is MULT_EXPR rather than
RDIV_EXPR?

If I'm reading things correctly, arg0 and arg1 could be a RDIV_EXPR or a
MULT_EXPR and we distribute both cases (as long as they are both the
same code). >
Your match.pd pattern only handle rdiv.



In the fold_plusminus_mult_expr function in fold-const.c, we still do 
the simplification:


(A * C) +- (B * C) -> (A+-B) * C

So the pattern I introduced only needs to handle the division case.


Now it may be the case that MULT_EXPR doesn't happen anymore in practice
-- the code in fold-const is quote old and much of it was written before
we regularly added tests for optimization and canonicalization.  So I'm
not surprised nothing in this testsuite trips over this omission.


In gcc.dg/fold-plusmult.c, we test whether 2*a + 2*a is folded into a * 
4, which comes close.


There are also a few tests for this in gcc.dg/tree-ssa/pr23294.c, 
although again, they are use constants for A and B.


I verified that

x * y + z * y

Gets folded into

y * (x + z)



ISTM you need to either argue that the MULT_EXPR case is handled
elsewhere or that it is no longer relevant.

jeff



Re: [PATCH, GCC/testsuite/ARM, ping3] Fix coprocessor intrinsic test failures on ARMv8-A

2017-09-13 Thread Kyrill Tkachov


On 05/09/17 10:04, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 23/08/17 11:59, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 17/07/17 09:51, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 12/07/17 14:31, Thomas Preudhomme wrote:

Coprocessor intrinsic tests in gcc.target/arm/acle test whether
__ARM_FEATURE_COPROC has the right bit defined before calling the
intrinsic. This allows to test both the correct setting of that macro
and the availability and correct working of the intrinsic. However the
__ARM_FEATURE_COPROC macro is no longer defined for ARMv8-A since
r249399.

This patch changes the testcases to skip that test for ARMv8-A and
ARMv8-R targets.  It also fixes some irregularity in the coprocessor
effective targets:
- add ldcl and stcl to the list of instructions listed as guarded by
   arm_coproc1_ok
- enable tests guarded by arm_coproc2_ok, arm_coproc3_ok and
   arm_coproc4_ok for Thumb-2 capable targets but disable for Thumb-1
   targets.



Ok.
Thanks,
Kyrill



ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2017-07-04  Thomas Preud'homme 

 * gcc.target/arm/acle/cdp.c: Skip __ARM_FEATURE_COPROC check for
 ARMv8-A and ARMv8-R.
 * gcc.target/arm/acle/cdp2.c: Likewise.
 * gcc.target/arm/acle/ldc.c: Likewise.
 * gcc.target/arm/acle/ldc2.c: Likewise.
 * gcc.target/arm/acle/ldc2l.c: Likewise.
 * gcc.target/arm/acle/ldcl.c: Likewise.
 * gcc.target/arm/acle/mcr.c: Likewise.
 * gcc.target/arm/acle/mcr2.c: Likewise.
 * gcc.target/arm/acle/mcrr.c: Likewise.
 * gcc.target/arm/acle/mcrr2.c: Likewise.
 * gcc.target/arm/acle/mrc.c: Likewise.
 * gcc.target/arm/acle/mrc2.c: Likewise.
 * gcc.target/arm/acle/mrrc.c: Likewise.
 * gcc.target/arm/acle/mrrc2.c: Likewise.
 * gcc.target/arm/acle/stc.c: Likewise.
 * gcc.target/arm/acle/stc2.c: Likewise.
 * gcc.target/arm/acle/stc2l.c: Likewise.
 * gcc.target/arm/acle/stcl.c: Likewise.
 * lib/target-supports.exp:
 (check_effective_target_arm_coproc1_ok_nocache): Mention ldcl
 and stcl in the comment.
 (check_effective_target_arm_coproc2_ok_nocache): Allow Thumb-2 
targets

 and disable Thumb-1 targets.
 (check_effective_target_arm_coproc3_ok_nocache): Likewise.
 (check_effective_target_arm_coproc4_ok_nocache): Likewise.

Tested by running all tests in gcc.target/arm/acle before and after 
this

patch for ARMv6-M, ARMv7-M, ARMv7E-M, ARMv3, ARMv4 (ARM state), ARMv4T
(Thumb state), ARMv5 (ARM state), ARMv5TE (ARM state), ARMv6 (ARM
state), ARMv6T2 (Thumb state) and and ARMv8-A (both state). The only
changes are for ARMv8-A where tests FAILing are now PASSing again.

Is this ok for trunk?

Best regards,

Thomas




Re: [patch] Fix wrong code with small structure return on PowerPC

2017-09-13 Thread Richard Biener
On Mon, Sep 11, 2017 at 9:59 PM, Eric Botcazou  wrote:
>> I think the issue is that we set SUBREG_PROMOTED_* on something that is
>> possibly not so (aka uninitialized in this case).
>
> Yes, that's what I called inherent weakness of the promoted subregs mechanism.
>
>> We may only set it if either the ABI or a previous operation forced it to.
>> Maybe this is also the reason why we need this zero init pass in some cases
>> (though that isn't a full solution either).
>
> Do you think that we should zero-init all the unsigned promoted subregs (and
> sign-extend-init all the signed promoted subregs)?  That sounds like a big
> hammer to me, but I can give it a try.

My suggestion would be to not set SUBREG_PROMOTED_* on "everything"
(which we seem to do).  zero-initing looks like the easier way to deal with
the situation though.

ISTR SUBREG_PROMOTED_* gets set by RTL expansion, the problematic
one should be the one in expr.c setting it on all SSA_NAMEs, maybe it is
enough to avoid setting it for SSA_NAME_IS_DEFAULT_DEF ones?

Richard.

> --
> Eric Botcazou


Re: [PATCH] Add -static-pie to GCC driver to create static PIE

2017-09-13 Thread Jakub Jelinek
On Wed, Sep 13, 2017 at 08:57:39AM +0200, Markus Trippelsdorf wrote:
> On 2017.09.12 at 13:48 -0500, Aaron Sawdey wrote:
> > On Tue, 2017-09-12 at 16:20 +, Joseph Myers wrote:
> > > On Mon, 28 Aug 2017, H.J. Lu wrote:
> > > 
> > > > Here is the updated patch.   OK for trunk?
> > > 
> > > OK.
> > 
> > This seems to be causing an issue for me on powerpc:
> > 
> > ../../trunk/gcc/config/rs6000/sysv4.h:819:0: error: "LINK_EH_SPEC" 
> > redefined [-Werror]
> >  # define LINK_EH_SPEC "%{!static:--eh-frame-hdr} "
> 
> It will cause problems on other platforms as well:
> 
> gcc/config/alpha/elf.h:171:#define LINK_EH_SPEC "%{!static:--eh-frame-hdr} "
> gcc/config/alpha/vms.h:209:#define LINK_EH_SPEC "vms-dwarf2eh.o%s "
> gcc/config/dragonfly.h:64:#define LINK_EH_SPEC "--eh-frame-hdr"
> gcc/config/freebsd.h:48:#define LINK_EH_SPEC "%{!static:--eh-frame-hdr} "
> gcc/config/gnu-user.h:135:#define LINK_EH_SPEC 
> "%{!static|static-pie:--eh-frame-hdr} "
> gcc/config/netbsd.h:128:#define LINK_EH_SPEC "%{!static:--eh-frame-hdr} "
> gcc/config/openbsd.h:139:#define LINK_EH_SPEC "%{!static:--eh-frame-hdr} "
> gcc/config/powerpcspe/sysv4.h:808:# define LINK_EH_SPEC 
> "%{!static:--eh-frame-hdr} "
> gcc/config/rs6000/sysv4.h:819:# define LINK_EH_SPEC 
> "%{!static:--eh-frame-hdr} "
> gcc/config/sol2.h:375:#define LINK_EH_SPEC "%{!static:--eh-frame-hdr} "

Most of these aren't including gnu-user.h and then the other header.
It is only rs6000/sysv4.h and powerpcspe/sysv4.h where gnu-user.h is
(sometimes, which is why you can't remove it from there altogether)
included first, and alpha/elf.h where gnu-user.h is not included, yet it is
a linux target where you probably want to handle it similarly.
Although, perhaps -static-pie should be supported by other ELF targets like
Solaris, NetBSD, OpenBSD, FreeBSD and therefore the other LINK_EH_SPEC that
have !static:--eh-frame-hdr should be tweaked too.

I'd say the urgent thing is to patch rs6000/sysv4.h to match the new
gnu-user.h definition.  Reorganizing headers so that for linux and hurd
it is solely gnu-user.h that defines this kind of macros is lots of work.

Jakub


Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.

2017-09-13 Thread Kyrill Tkachov

Hi Charles,

On 12/09/17 09:34, charles.bay...@linaro.org wrote:

From: Charles Baylis 

This patch moves the calculation of costs for MEM into a
separate function, and reforms the calculation into two
parts. Firstly any additional cost of the addressing mode
is calculated, and then the cost of the memory access itself
is added.

In this patch, the calculation of the cost of the addressing
mode is left as a placeholder, to be added in a subsequent
patch.



Can you please mention how has this series been tested?
A bootstrap and test run on arm-none-linux-gnueabihf is required at least.
Also, do you have any benchmarking results for this?
I agree that generating the addressing modes in the new tests is desirable.
So I'm not objecting to the goal of this patch, but a check to make sure 
that this doesn't regress SPEC

would be great.  Further comments on the patch inline.


gcc/ChangeLog:

  Charles Baylis 

* config/arm/arm.c (arm_mem_costs): New function.
(arm_rtx_costs_internal): Use arm_mem_costs.

gcc/testsuite/ChangeLog:

  Charles Baylis 

* gcc.target/arm/addr-modes-float.c: New test.
* gcc.target/arm/addr-modes-int.c: New test.
* gcc.target/arm/addr-modes.h: New header.

Change-Id: I99e93406ea39ee31f71c7bf428ad3e127b7a618e
---
 gcc/config/arm/arm.c| 67 
-

 gcc/testsuite/gcc.target/arm/addr-modes-float.c | 42 
 gcc/testsuite/gcc.target/arm/addr-modes-int.c   | 46 +
 gcc/testsuite/gcc.target/arm/addr-modes.h   | 53 +++
 4 files changed, 183 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-float.c
 create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-int.c
 create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes.h

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 32001e5..b8dbed6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9228,8 +9228,48 @@ arm_unspec_cost (rtx x, enum rtx_code /* 
outer_code */, bool speed_p, int *cost)

} \
 while (0);

+/* Helper function for arm_rtx_costs_internal.  Calculates the cost 
of a MEM,

+   considering the costs of the addressing mode and memory access
+   separately.  */
+static bool
+arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
+  int *cost, bool speed_p)
+{
+  machine_mode mode = GET_MODE (x);
+  if (flag_pic
+  && GET_CODE (XEXP (x, 0)) == PLUS
+  && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
+/* This will be split into two instructions.  Add the cost of the
+   additional instruction here.  The cost of the memory access is 
computed

+   below.  See arm.md:calculate_pic_address.  */
+*cost = COSTS_N_INSNS (1);
+  else
+*cost = 0;


For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a 
each insn)
plus the appropriate field in extra_cost. So you should unconditionally 
initialise the cost
to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1) 
with the condition above.



+
+  /* Calculate cost of the addressing mode.  */
+  if (speed_p)
+{
+  /* TODO: Add table-driven costs for addressing modes.  (See 
patch 2) */

+}


You mean "patch 3". I recommend you just remove this conditional from 
this patch and add the logic

in patch 3 entirely.


+
+  /* Calculate cost of memory access.  */
+  if (speed_p)
+{
+  /* data transfer is transfer size divided by bus width.  */
+  int bus_width_bytes = current_tune->bus_width / 4;


This should be bus_width / BITS_PER_UNIT to get the size in bytes.
BITS_PER_UNIT is 8 though, so you'll have to double check to make sure the
cost calculation and generated code is still appropriate.


+  *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);
+  *cost += extra_cost->ldst.load;
+}
+  else
+{
+  *cost += COSTS_N_INSNS (1);
+}


Given my first comment above this else would be deleted.

Thanks,
Kyrill


+
+  return true;
+}
+
 /* RTX costs.  Make an estimate of the cost of executing the operation
-   X, which is contained with an operation with code OUTER_CODE.
+   X, which is contained within an operation with code OUTER_CODE.
SPEED_P indicates whether the cost desired is the performance cost,
or the size cost.  The estimate is stored in COST and the return
value is TRUE if the cost calculation is final, or FALSE if the
@@ -9308,30 +9348,7 @@ arm_rtx_costs_internal (rtx x, enum rtx_code 
code, enum rtx_code outer_code,

   return false;

 case MEM:
-  /* A memory access costs 1 insn if the mode is small, or the 
address is

-a single register, otherwise it costs one insn per word.  */
-  if (REG_P (XEXP (x, 0)))
-   *cost = COSTS_N_INSNS (1);
-  else if (flag_pic
-  && GET_CODE (XEXP (x, 0)) == PLUS
-  && 

  1   2   >