date:20160506

Re: Fix for PR68159 in Libiberty Demangler (6)

2016-05-06 Thread Marcel Böhme

Hi Ian,

Stack overflows are a security concern and must be addressed. The Libiberty 
demangler is part of several tools, including binutils, gdb, valgrind, and many 
other libbfd-based tools that are used by the security community for the 
analysis of program binaries. Without a patch, the reverse engineering of 
untrusted binaries as well as determining whether an untrusted binary is 
malicious could cause serious damage. More details here: 
http://www.openwall.com/lists/oss-security/2016/05/05/3

> On 7 May 2016, at 12:16 AM, Ian Lance Taylor  wrote:
> 
> The function cplus_demangle_v3_callback must not call malloc.  The
> whole point of that function is to work when nothing else works.  That
> is why d_demangle_callback does not, and must not, call malloc.

Point taken. In fact, I tracked down the patch submitted by Google's Simon 
Baldwin and the ensuing discussion from 2007: 
https://gcc.gnu.org/ml/gcc-patches/2007-01/msg01116.html (committed as revision 
121305).

In that thread, Mark Mitchell raised concerns about small stacks and large 
mangled names and suggested to focus on an allocation interface where the the 
caller provides "alloc" and "dealloc" functions (i.e., C++ allocators): 
https://gcc.gnu.org/ml/gcc-patches/2007-01/msg01904.html

In the later patch to libstdc++ which has vterminate use the malloc-less 
demangler, Benjamin Kosnik raised similar concerns: 
https://gcc.gnu.org/ml/libstdc++/2007-03/msg00181.html

Perhaps the allocation interface is the way to go?

Best regards,
- Marcel

[SH][committed] Improve utilization of zero-displacement conditional branches

2016-05-06 Thread Oleg Endo

Hi,

On SH a conditional branch with a (physical) zero displacement jumps
over the next instruction.  On some SH hardware implementations these
branches are handled in a special way which allows using it for
conditional execution.  A while ago I've added some hardcoded asm
patterns to utilize this.  It seems there has been an attempt to do
something about it a long time ago with the "branch_zero" attribute. 
 However I could not find any uses thereof and wasn't sure how to
utilize it.

The attached patch disables the delay slot (for the DBR pass) of
conditional branches which branch over only one 2-byte instruction. 
 This results in more zero-displacement branches to be emitted.

Having this in place, it would also be possible to simplify those
hardcoded asm patterns (e.g. SH abs patterns).  However, while that
works, doing so exposes the actual CFG to other passes and in quite
some cases basic blocks get reordered in a very counterproductive way
resulting in worse code.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r235993.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh-protos.h (sh_cbranch_distance): Declare new function.
* config/sh/sh.c (sh_cbranch_distance): Implement it.
* config/sh/sh.md (branch_zero): Remove define_attr.
(define_delay): Disable delay slot if branch distance is one insn.diff --git a/gcc/config/sh/sh-protos.h b/gcc/config/sh/sh-protos.h
index c47e2ea..d302394 100644
--- a/gcc/config/sh/sh-protos.h
+++ b/gcc/config/sh/sh-protos.h
@@ -348,6 +348,18 @@ private:
 
 extern sh_treg_insns sh_split_treg_set_expr (rtx x, rtx_insn* curr_insn);
 
+enum
+{
+  /* An effective conditional branch distance of zero bytes is impossible.
+ Hence we can use it to designate an unknown value.  */
+  unknown_cbranch_distance = 0u,
+  infinite_cbranch_distance = ~0u
+};
+
+unsigned int
+sh_cbranch_distance (rtx_insn* cbranch_insn,
+		 unsigned int max_dist = infinite_cbranch_distance);
+
 #endif /* RTX_CODE */
 
 extern void sh_cpu_cpp_builtins (cpp_reader* pfile);
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 809f679..6d1d1a3 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -1928,6 +1928,52 @@ sh_fixed_condition_code_regs (unsigned int* p1, unsigned int* p2)
   return true;
 }
 
+/* Try to calculate the branch distance of a conditional branch in bytes.
+
+   FIXME: Because of PR 59189 we can't use the CFG here.  Instead just
+   walk from this insn into the next (fall-through) basic block and see if
+   we hit the label.  */
+unsigned int
+sh_cbranch_distance (rtx_insn* _cbranch_insn, unsigned int max_dist)
+{
+  rtx_jump_insn* cbranch_insn = safe_as_a (_cbranch_insn);
+
+  if (dump_file)
+{
+  fprintf (dump_file, "sh_cbranch_distance insn = \n");
+  print_rtl_single (dump_file, cbranch_insn);
+}
+
+  unsigned int dist = 0;
+
+  for (rtx_insn* i = next_nonnote_insn (cbranch_insn);
+   i != NULL && dist < max_dist; i = next_nonnote_insn (i))
+{
+  const unsigned int i_len = get_attr_length (i);
+  dist += i_len;
+
+  if (dump_file)
+	fprintf (dump_file, "  insn %d  length = %u  dist = %u\n",
+		 INSN_UID (i), i_len, dist);
+
+  if (rtx_code_label* l = dyn_cast (i))
+	{
+	  if (l == cbranch_insn->jump_target ())
+	{
+	  if (dump_file)
+		fprintf (dump_file, "  cbranch dist = %u\n", dist);
+	  return dist;
+	}
+	  break;
+	}
+}
+
+  if (dump_file)
+fprintf (dump_file, "  cbranch dist = unknown\n");
+
+  return unknown_cbranch_distance;
+}
+
 enum rtx_code
 prepare_cbranch_operands (rtx *operands, machine_mode mode,
 			  enum rtx_code comparison)
diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 39270ce..406721d 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -477,16 +477,6 @@
 (define_attr "is_sfunc" ""
   (if_then_else (eq_attr "type" "sfunc") (const_int 1) (const_int 0)))
 
-(define_attr "branch_zero" "yes,no"
-  (cond [(eq_attr "type" "!cbranch") (const_string "no")
-	 (ne (symbol_ref "(next_active_insn (insn)\
-			   == (prev_active_insn\
-			   (XEXP (SET_SRC (PATTERN (insn)), 1\
-			  && get_attr_length (next_active_insn (insn)) == 2")
-	 (const_int 0))
-	 (const_string "yes")]
-	(const_string "no")))
-
 ;; SH4 Double-precision computation with double-precision result -
 ;; the two halves are ready at different times.
 (define_attr "dfp_comp" "yes,no"
@@ -539,8 +529,13 @@
 	(eq_attr "type" "!pstore,prget")) (nil) (nil)])
 
 ;; Conditional branches with delay slots are available starting with SH2.
+;; If zero displacement conditional branches are fast, disable the delay
+;; slot if the branch jumps over only one 2-byte insn.
 (define_delay
-  (and (eq_attr "type" "cbranch") (match_test "TARGET_SH2"))
+  (and (eq_attr "type" "cbranch")
+   (match_test "TARGET_SH2")
+   (not (and (match_test

Re: [PATCH] Make basic asm implicitly clobber memory

2016-05-06 Thread David Wohlferd




A few questions:

1) I'm not clear precisely what problem this patch fixes.  It's true
that some people have incorrectly assumed that basic asm clobbers memory
and this change would fix their code.  But some people also incorrectly
assume it clobbers registers.  I assume that's why Jeff Law proposed
making basic asm "an opaque blob that read/write/clobber any register or
memory location."  Do we have enough problem reports from users to know
which is the real solution here?


Whenever I do something for gcc I do it actually for myself, in my own
best interest.  And this is no exception.


Seems fair.  You are the one putting the time in to change it.

But do you have actual code that is affected by this?  You can't really 
be planning to wait until v7 is released to have your projects work 
correctly?



The way I see it, is this: in simple cases a basic asm behaves as if
it would clobber memory, because of the way Jeff implemented the
asm handling in sched-deps.c some 20 years ago.

Look for ASM_INPUT where we have this comment:
"Traditional and volatile asm instructions must be considered to use
   and clobber all hard registers, all pseudo-registers and all of
   memory."

The assumption that it is OK to clobber memory in a basic asm will only
break if the asm statement is inlined in a loop, and that may happen
unexpectedly, when gcc rolls out new optimizations.
That's why I consider this to be security relevant.


I'm not sure I follow.  Do you fear that gcc could mistakenly move the 
asm into a nearby loop during optimization (resulting in who-knows-what 
results)?  Or is there some way that any basic asm in a loop could have 
some kind of a problem?



But OTOH you see immediately that all general registers are in use
by gcc, unless you declare a variable like
register int eax __asm__("rax");
then it is perfectly OK to use rax in a basic asm of course.


According to the docs, that is only supported for global registers. The 
docs for local register variables explicitly say that it can't be used 
as input/outputs for basic asm.



And if we want to have implicitly clobbered registers, like the
diab compiler handles the basic asm, then this patch will
make it possible to add a target hook that clobbers additional
registers for basic asm.


I think we should try to avoid changing the semantics in v7 for memory 
and then changing them again in v8 for registers.


IOW, if I see some basic asm in a project (or on stack overflow/blog as 
a code fragment), should I assume it was intended for v6 semantics? v7? 
v8?  People often copy this stuff without understanding what it does.  
The more often the semantics change, the harder it is to use correctly 
and maintain.



2) The -Wbasic-asm warning patch wasn't approved for v6.  If we are
going to change this behavior now, is it time?


Yes. We have stage1 for gcc-7 development, I can't think of a better
time for it.
I would even say, the -Wbasic-asm warning patch makes more sense now,
because we could warn, that the basich asm clobbers memory, which it
did not previously.


After your patch has been checked in, I'll re-submit this.


4) There are more basic asm docs that need to change: "It also does not
know about side effects of the assembler code, such as modifications to
memory or registers. Unlike some compilers, GCC assumes that no changes
to either memory or registers occur. This assumption may change in a
future release."


Yes, I should change that sentence too.

Maybe this way:

"Unlike some compilers, GCC assumes that no changes to registers
occur.  This assumption may change in a future release."


Is it worth describing the fact that the semantics have changed here?  
Something like "Before v7, gcc assumed no changes were made to memory."  
I guess users can "figure it out" by reading the v6 docs and comparing 
it to v7.  But if the semantic change has introduced a problem that 
someone is trying to debug, this could help them track it down.


Also, I'm kind of hoping that v7 is the 'future release.'  If we are 
going to change the clobbers, I'd hope that we'd do it all at one time, 
rather than spreading it out across several releases.


If no one is prepared to step up and implement (or at least defend) the 
idea of clobbering registers, I'd like to see the "This assumption may 
change in a future release" part removed.  Since (theoretically) 
anything can change at any time, the only reason this text made sense 
was because a change was imminent.  If that's no longer true, it's time 
for it to go.


dw

Re: tuple move constructor

2016-05-06 Thread Ville Voutilainen

On 7 May 2016 at 00:39, Marc Glisse  wrote:
> Assuming we want the copy constructor to be defaulted, I think we still
> could with concepts:
>
> tuple(tuple const&)
> requires(__and_...>::value)
> = default;
>
> While there is precedent for enabling C++11 features in C++03 mode inside
> system headers, I guess maintainers might be more reluctant for something
> that is only heading for a TS for now.

Much as I'd like to go towards that direction, I don't think we can,
yet, at least not as our default
implementation, because front-ends like clang wouldn't be able to
compile our library.

>> I think the patch is ok, but I think it would be a good idea to have a
>> comment on the added tag type and its purpose.
> Indeed. I wasn't sure if people preferred more tags or more enable_if...

I don't have a strong opinion if there's implementation choice between those.

Re: tuple move constructor

2016-05-06 Thread Marc Glisse


On Fri, 6 May 2016, Ville Voutilainen wrote:


On 6 May 2016 at 20:51, Marc Glisse  wrote:

Hi Ville,

since you wrote the latest patches on tuple constructors, do you have an
opinion on this patch, or alternate strategies to achieve the same goal?

https://gcc.gnu.org/ml/libstdc++/2016-04/msg00041.html


I have fairly mixed feelings about the approach; it's adding a tag type 
and more enable_ifs into the base classes of tuple, which I'd rather not 
do unless absolutely necessary. Then again, the testcase you add looks 
like something we want to support properly. I haven't analyzed your 
patch in a very detailed manner; my initial thought was "can't we do 
this in the constraints of tuple's constructors", but looking at the 
patch and knowing the constructors of tuple, I don't think we can.


Assuming we want the copy constructor to be defaulted, I think we still 
could with concepts:


tuple(tuple const&)
requires(__and_...>::value)
= default;

While there is precedent for enabling C++11 features in C++03 mode inside 
system headers, I guess maintainers might be more reluctant for something 
that is only heading for a TS for now.


I think the patch is ok, but I think it would be a good idea to have a 
comment on the added tag type and its purpose.


Indeed. I wasn't sure if people preferred more tags or more enable_if...


Minor point: the technique that looks like

typename enable_if<
!is_same<_UHead, _Head>::value, bool>::type = false

isn't necessary unless we have another overload that we need to
distinguish with true/false. For a single overload,
just using typename = typename enable_if<
!is_same<_UHead, _Head>::value, bool>::type
works equally well. The amount of boilerplate is more or less the
same, so that's really not a significant matter,
just an fyi. :)


Thanks. There are several variants on how to use enable_if, I copied one 
from some neighboring code without checking why this specific variant was 
used there.


--
Marc Glisse

Go patch committed: Add escape graph nodes

2016-05-06 Thread Ian Lance Taylor

This patch by Chris Manghane adds nodes to the escape analysis graph
in the Go frontend.  They still aren't used for anything.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 235982)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-33f1d1d151721305ba37f3e23652d21310f868af
+7f5a9fde801eb755a5252fd4ff588b0a47475bd3
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 235982)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -4,9 +4,263 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
+#include 
+#include 
+
 #include "gogo.h"
+#include "types.h"
+#include "expressions.h"
+#include "statements.h"
 #include "escape.h"
 
+// class Node.
+
+// Return the node's type, if it makes sense for it to have one.
+
+Type*
+Node::type() const
+{
+  if (this->object() != NULL
+  && this->object()->is_variable())
+return this->object()->var_value()->type();
+  else if (this->object() != NULL
+  && this->object()->is_function())
+return this->object()->func_value()->type();
+  else if (this->expr() != NULL)
+return this->expr()->type();
+  else
+return NULL;
+}
+
+// A helper for reporting; return this node's location.
+
+Location
+Node::location() const
+{
+  if (this->object() != NULL && !this->object()->is_sink())
+return this->object()->location();
+  else if (this->expr() != NULL)
+return this->expr()->location();
+  else if (this->statement() != NULL)
+return this->statement()->location();
+  else
+return Linemap::unknown_location();
+}
+
+// Return this node's state, creating it if has not been initialized.
+
+Node::Escape_state*
+Node::state(Escape_context* context, Named_object* fn)
+{
+  if (this->state_ == NULL)
+{
+  if (this->expr() != NULL && this->expr()->var_expression() != NULL)
+   {
+ // Tie state of variable references to underlying variables.
+ Named_object* var_no = this->expr()->var_expression()->named_object();
+ Node* var_node = Node::make_node(var_no);
+ this->state_ = var_node->state(context, fn);
+   }
+  else
+   {
+ this->state_ = new Node::Escape_state;
+ if (fn == NULL)
+   fn = context->current_function();
+
+ this->state_->fn = fn;
+   }
+}
+  go_assert(this->state_ != NULL);
+  return this->state_;
+}
+
+void
+Node::set_encoding(int enc)
+{
+  this->encoding_ = enc;
+  if (this->expr() != NULL
+  && this->expr()->var_expression() != NULL)
+{
+  // Set underlying object as well.
+  Named_object* no = this->expr()->var_expression()->named_object();
+  Node::make_node(no)->set_encoding(enc);
+}
+}
+
+bool
+Node::is_sink() const
+{
+  if (this->object() != NULL
+  && this->object()->is_sink())
+return true;
+  else if (this->expr() != NULL
+  && this->expr()->is_sink_expression())
+return true;
+  return false;
+}
+
+std::map Node::objects;
+std::map Node::expressions;
+std::map Node::statements;
+
+// Make a object node or return a cached node for this object.
+
+Node*
+Node::make_node(Named_object* no)
+{
+  if (Node::objects.find(no) != Node::objects.end())
+return Node::objects[no];
+
+  Node* n = new Node(no);
+  std::pair val(no, n);
+  Node::objects.insert(val);
+  return n;
+}
+
+// Make an expression node or return a cached node for this expression.
+
+Node*
+Node::make_node(Expression* e)
+{
+  if (Node::expressions.find(e) != Node::expressions.end())
+return Node::expressions[e];
+
+  Node* n = new Node(e);
+  std::pair val(e, n);
+  Node::expressions.insert(val);
+  return n;
+}
+
+// Make a statement node or return a cached node for this statement.
+
+Node*
+Node::make_node(Statement* s)
+{
+  if (Node::statements.find(s) != Node::statements.end())
+return Node::statements[s];
+
+  Node* n = new Node(s);
+  std::pair val(s, n);
+  Node::statements.insert(val);
+  return n;
+}
+
+// Returns the maximum of an exisiting escape value
+// (and its additional parameter flow flags) and a new escape type.
+
+int
+Node::max_encoding(int e, int etype)
+{
+  if ((e & ESCAPE_MASK) >= etype)
+return e;
+  if (etype == Node::ESCAPE_NONE || etype == Node::ESCAPE_RETURN)
+return (e & ~ESCAPE_MASK) | etype;
+  return etype;
+}
+
+// Return a modified encoding for an input parameter that flows into an
+// output parameter.
+
+// Class Escape_context.
+
+Escape_context::Escape_context(Gogo* gogo, bool recursive)
+  :

[RFA] Remove useless test in bitmap_find_bit.

2016-05-06 Thread Jeff Law

I was looking at a performance regression with some threading changes 
I'm working on and spotted this trivial cleanup.


in bitmap_find_bit:

 /* `element' is the nearest to the one we want.  If it's not the one we
want, the one we want doesn't exist.  */
 head->current = element;
 head->indx = element->indx;
 if (element != 0 && element->indx != indx)
   element = 0;

ELEMENT will always be non-NULL at the conditional as it was 
dereferenced in the prior statement.  And if we look up further (not 
shown here), we can deduce that ELEMENT will always be non-NULL at the 
dereference point as well.


Things have been like this since the introduction of bitmap.c in 1997.

VRP will catch this, but its kind of silly to not clean this nit up at 
the source level.


Bootstrapped and regression tested on x86_64 linux.

OK for the trunk?

* bitmap.c (bitmap_find_bit): Remove useless test.

diff --git a/gcc/bitmap.c b/gcc/bitmap.c
index 0c05512..010cf75 100644
--- a/gcc/bitmap.c
+++ b/gcc/bitmap.c
@@ -556,7 +556,7 @@ bitmap_find_bit (bitmap head, unsigned int bit)
  want, the one we want doesn't exist.  */
   head->current = element;
   head->indx = element->indx;
-  if (element != 0 && element->indx != indx)
+  if (element->indx != indx)
 element = 0;
 
   return element;

[PATCH, i386]: Cleanup LEA splitters

2016-05-06 Thread Uros Bizjak

2016-05-06  Uros Bizjak  

* config/i386/i386.md (LEAMODE): New mode attribute.
(plus to LEA splitter): Rewrite splitter using LEAMODE mode attribute.
(ashift to LEA splitter): Rewrte splitter using SWI mode iterator
and LEAMODE mode attribute.  Use VOIDmode const_0_to_3_operand as
operand 2 predicate.
(*lea_general_2): Use VOIDmode for const248_operand.
(*lea_general_3): Ditto.
(*lea_general_4): Use VOIDmode for const_0_to_3_operand.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: i386.md
===
--- i386.md (revision 235982)
+++ i386.md (working copy)
@@ -1024,6 +1024,9 @@
 (define_mode_attr DWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI")])
 (define_mode_attr dwi [(QI "hi") (HI "si") (SI "di") (DI "ti")])
 
+;; LEA mode corresponding to an integer mode
+(define_mode_attr LEAMODE [(QI "SI") (HI "SI") (SI "SI") (DI "DI")])
+
 ;; Half mode for double word integer modes.
 (define_mode_iterator DWIH [(SI "!TARGET_64BIT")
(DI "TARGET_64BIT")])
@@ -5710,32 +5713,6 @@
(parallel [(set (match_dup 0) (plus:SWI48 (match_dup 0) (match_dup 2)))
  (clobber (reg:CC FLAGS_REG))])])
 
-;; Convert add to the lea pattern to avoid flags dependency.
-(define_split
-  [(set (match_operand:SWI 0 "register_operand")
-   (plus:SWI (match_operand:SWI 1 "register_operand")
- (match_operand:SWI 2 "")))
-   (clobber (reg:CC FLAGS_REG))]
-  "reload_completed && ix86_lea_for_add_ok (insn, operands)" 
-  [(const_int 0)]
-{
-  machine_mode mode = mode;
-  rtx pat;
-
-  if ( < GET_MODE_SIZE (SImode))
-{ 
-  mode = SImode; 
-  operands[0] = gen_lowpart (mode, operands[0]);
-  operands[1] = gen_lowpart (mode, operands[1]);
-  operands[2] = gen_lowpart (mode, operands[2]);
-}
-
-  pat = gen_rtx_PLUS (mode, operands[1], operands[2]);
-
-  emit_insn (gen_rtx_SET (operands[0], pat));
-  DONE;
-})
-
 ;; Split non destructive adds if we cannot use lea.
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -5753,6 +5730,24 @@
 
 ;; Convert add to the lea pattern to avoid flags dependency.
 (define_split
+  [(set (match_operand:SWI 0 "register_operand")
+   (plus:SWI (match_operand:SWI 1 "register_operand")
+ (match_operand:SWI 2 "")))
+   (clobber (reg:CC FLAGS_REG))]
+  "reload_completed && ix86_lea_for_add_ok (insn, operands)" 
+  [(set (match_dup 0)
+   (plus: (match_dup 1) (match_dup 2)))]
+{
+  if (mode != mode)
+{
+  operands[0] = gen_lowpart (mode, operands[0]);
+  operands[1] = gen_lowpart (mode, operands[1]);
+  operands[2] = gen_lowpart (mode, operands[2]);
+}
+})
+
+;; Convert add to the lea pattern to avoid flags dependency.
+(define_split
   [(set (match_operand:DI 0 "register_operand")
(zero_extend:DI
  (plus:SI (match_operand:SI 1 "register_operand")
@@ -6237,7 +6232,7 @@
   [(set (match_operand:SWI12 0 "register_operand" "=r")
(plus:SWI12
  (mult:SWI12 (match_operand:SWI12 1 "index_register_operand" "l")
- (match_operand:SWI12 2 "const248_operand" "n"))
+ (match_operand 2 "const248_operand" "n"))
  (match_operand:SWI12 3 "nonmemory_operand" "ri")))]
   "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
   "#"
@@ -6259,7 +6254,7 @@
(plus:SWI12
  (plus:SWI12
(mult:SWI12 (match_operand:SWI12 1 "index_register_operand" "l")
-   (match_operand:SWI12 2 "const248_operand" "n"))
+   (match_operand 2 "const248_operand" "n"))
(match_operand:SWI12 3 "register_operand" "r"))
  (match_operand:SWI12 4 "immediate_operand" "i")))]
   "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
@@ -6285,8 +6280,8 @@
(any_or:SWI12
  (ashift:SWI12
(match_operand:SWI12 1 "index_register_operand" "l")
-   (match_operand:SWI12 2 "const_0_to_3_operand" "n"))
- (match_operand:SWI12 3 "const_int_operand" "n")))]
+   (match_operand 2 "const_0_to_3_operand" "n"))
+ (match_operand 3 "const_int_operand" "n")))]
   "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
&& ((unsigned HOST_WIDE_INT) INTVAL (operands[3])
< (HOST_WIDE_INT_1U << INTVAL (operands[2])))"
@@ -6309,8 +6304,8 @@
(any_or:SWI48
  (ashift:SWI48
(match_operand:SWI48 1 "index_register_operand" "l")
-   (match_operand:SWI48 2 "const_0_to_3_operand" "n"))
- (match_operand:SWI48 3 "const_int_operand" "n")))]
+   (match_operand 2 "const_0_to_3_operand" "n"))
+ (match_operand 3 "const_int_operand" "n")))]
   "(unsigned HOST_WIDE_INT) INTVAL (operands[3])
< (HOST_WIDE_INT_1U << INTVAL (operands[2]))"
   "#"
@@ -10063,31 +10058,21 @@
 
 ;;

Re: [PATCH 1/2] Add OVERRIDE and FINAL macros to coretypes.h

2016-05-06 Thread Jason Merrill

On Fri, May 6, 2016 at 1:56 PM, Pedro Alves  wrote:
> On 05/06/2016 05:40 PM, David Malcolm wrote:
>> +#if __cplusplus >= 201103
>> +/* C++11 claims to be available: use it: */
>> +#define OVERRIDE override
>> +#define FINAL final
>> +#else
>> +/* No C++11 support; leave the macros empty: */
>> +#define OVERRIDE
>> +#define FINAL
>> +#endif
>> +
>
> Is there a reason this is preferred over using override/final in
> the sources directly, and then define them away as empty
> on pre-C++11?
>
> I mean:
>
> #if __cplusplus < 201103
> # define override
> # define final
> #endif
>
> then use override/final throughout instead of OVERRIDE/FINAL.

This would break any existing use of those identifiers; they are not
keywords, so a variable named "final" is perfectly valid C++11.

Jason

Re: [PATCH 1/4] Make argv const char ** in read_md_files etc

2016-05-06 Thread Jakub Jelinek

On Wed, May 04, 2016 at 04:49:27PM -0400, David Malcolm wrote:
> This patch makes the argv param to read_md_files const, needed
> so that the RTL frontend can call it on a const char *.
> 
> While we're at it, it similarly makes const the argv for all
> of the "main" functions of the various gen*.

Just noticed this broken make mddump.

Fixed thusly, committed as obvious:

2016-05-06  Jakub Jelinek  

* genmddump.c (main): Convert argv from char ** to const char **.

--- gcc/genmddump.c.jj  2016-01-04 14:55:53.0 +0100
+++ gcc/genmddump.c 2016-05-06 22:40:01.537097183 +0200
@@ -35,10 +35,10 @@
 #include "gensupport.h"
 
 
-extern int main (int, char **);
+extern int main (int, const char **);
 
 int
-main (int argc, char **argv)
+main (int argc, const char **argv)
 {
   progname = "genmddump";
 
@@ -57,4 +57,3 @@ main (int argc, char **argv)
   fflush (stdout);
   return (ferror (stdout) != 0 ? FATAL_EXIT_CODE : SUCCESS_EXIT_CODE);
 }
-


Jakub

[gomp4.5] Parsing of most of OpenMP 4.5 clauses

2016-05-06 Thread Jakub Jelinek

Hi!

This patch adds parsing of most of the OpenMP 4.5 clause changes,
though doesn't do anything during resolve or later with them yet.
Missing is still depend clause parsing changes (sink and source) and
link and to clause for declare target construct.

2016-05-06  Jakub Jelinek  

* gfortran.h (enum gfc_omp_map_op): Add OMP_MAP_RELEASE,
OMP_MAP_ALWAYS_TO, OMP_MAP_ALWAYS_FROM and OMP_MAP_ALWAYS_TOFROM.
(OMP_LIST_IS_DEVICE_PTR, OMP_LIST_USE_DEVICE_PTR): New.
(enum gfc_omp_if_kind): New.
(struct gfc_omp_clauses): Add orderedc, defaultmap, nogroup,
sched_simd, sched_monotonic, sched_nonmonotonic, simd, threads,
grainsize, hint, num_tasks, priority and if_exprs fields.
* openmp.c (gfc_free_omp_clauses): Free grainsize, hint, num_tasks,
priority and if_exprs.
(enum omp_mask1): Add OMP_CLAUSE_DEFAULTMAP, OMP_CLAUSE_GRAINSIZE,
OMP_CLAUSE_HINT, OMP_CLAUSE_IS_DEVICE_PTR, OMP_CLAUSE_LINK,
OMP_CLAUSE_NOGROUP, OMP_CLAUSE_NUM_TASKS, OMP_CLAUSE_PRIORITY,
OMP_CLAUSE_SIMD, OMP_CLAUSE_THREADS, OMP_CLAUSE_USE_DEVICE_PTR
and OMP_CLAUSE_NOWAIT.
(enum omp_mask2): Remove OMP_CLAUSE_OACC_DEVICE and OMP_CLAUSE_LINK.
(gfc_match_omp_clauses): Move delete clause handling to where it
alphabetically belongs.  Parse defaultmap, grainsize, hint,
is_device_ptr, nogroup, nowait, num_tasks, priority, simd, threads
and use_device_ptr clauses.  Parse if clause modifier.  Parse map
clause always modifier, and release and delete kinds.  Parse ordered
clause with argument.  Parse schedule clause modifiers.  Differentiate
device clause parsing based on openacc flag.  Guard link clause
parsing with openacc flag.
(OACC_UPDATE_CLAUSES): Replace OMP_CLAUSE_OACC_DEVICE with
OMP_CLAUSE_DEVICE.
(OMP_TASK_CLAUSES): Add OMP_CLAUSE_PRIORITY.
(OMP_TARGET_CLAUSES): Add OMP_CLAUSE_DEPEND, OMP_CLAUSE_NOWAIT,
OMP_CLAUSE_PRIVATE, OMP_CLAUSE_FIRSTPRIVATE, OMP_CLAUSE_DEFAULTMAP
and OMP_CLAUSE_IS_DEVICE_PTR. 
(OMP_TARGET_DATA_CLAUSES): Add OMP_CLAUSE_USE_DEVICE_PTR.
(OMP_TARGET_UPDATE_CLAUSES): Add OMP_CLAUSE_DEPEND and
OMP_CLAUSE_NOWAIT.
(resolve_omp_clauses): Add dummy OMP_LIST_IS_DEVICE_PTR and
OMP_LIST_USE_DEVICE_PTR cases.
* frontend-passes.c (gfc_code_walker): Handle new OpenMP 4.5
expressions.
* dump-parse-tree.c (show_omp_clauses): Adjust for OpenMP 4.5
clause changes.  

--- gcc/fortran/gfortran.h.jj   2016-05-04 18:37:26.0 +0200
+++ gcc/fortran/gfortran.h  2016-05-06 19:01:13.813857650 +0200
@@ -1120,7 +1120,11 @@ enum gfc_omp_map_op
   OMP_MAP_FORCE_PRESENT,
   OMP_MAP_FORCE_DEVICEPTR,
   OMP_MAP_DEVICE_RESIDENT,
-  OMP_MAP_LINK
+  OMP_MAP_LINK,
+  OMP_MAP_RELEASE,
+  OMP_MAP_ALWAYS_TO,
+  OMP_MAP_ALWAYS_FROM,
+  OMP_MAP_ALWAYS_TOFROM
 };
 
 /* For use in OpenMP clauses in case we need extra information
@@ -1165,6 +1169,8 @@ enum
   OMP_LIST_LINK,
   OMP_LIST_USE_DEVICE,
   OMP_LIST_CACHE,
+  OMP_LIST_IS_DEVICE_PTR,
+  OMP_LIST_USE_DEVICE_PTR,
   OMP_LIST_NUM
 };
 
@@ -1207,6 +1213,19 @@ enum gfc_omp_cancel_kind
   OMP_CANCEL_TASKGROUP
 };
 
+enum gfc_omp_if_kind
+{
+  OMP_IF_PARALLEL,
+  OMP_IF_TASK,
+  OMP_IF_TASKLOOP,
+  OMP_IF_TARGET,
+  OMP_IF_TARGET_DATA,
+  OMP_IF_TARGET_UPDATE,
+  OMP_IF_TARGET_ENTER_DATA,
+  OMP_IF_TARGET_EXIT_DATA,
+  OMP_IF_LAST
+};
+
 typedef struct gfc_omp_clauses
 {
   struct gfc_expr *if_expr;
@@ -1216,9 +1235,11 @@ typedef struct gfc_omp_clauses
   enum gfc_omp_sched_kind sched_kind;
   struct gfc_expr *chunk_size;
   enum gfc_omp_default_sharing default_sharing;
-  int collapse;
+  int collapse, orderedc;
   bool nowait, ordered, untied, mergeable;
-  bool inbranch, notinbranch;
+  bool inbranch, notinbranch, defaultmap, nogroup;
+  bool sched_simd, sched_monotonic, sched_nonmonotonic;
+  bool simd, threads;
   enum gfc_omp_cancel_kind cancel;
   enum gfc_omp_proc_bind_kind proc_bind;
   struct gfc_expr *safelen_expr;
@@ -1226,6 +1247,11 @@ typedef struct gfc_omp_clauses
   struct gfc_expr *num_teams;
   struct gfc_expr *device;
   struct gfc_expr *thread_limit;
+  struct gfc_expr *grainsize;
+  struct gfc_expr *hint;
+  struct gfc_expr *num_tasks;
+  struct gfc_expr *priority;
+  struct gfc_expr *if_exprs[OMP_IF_LAST];
   enum gfc_omp_sched_kind dist_sched_kind;
   struct gfc_expr *dist_chunk_size;
 
--- gcc/fortran/openmp.c.jj 2016-05-06 11:25:50.322794151 +0200
+++ gcc/fortran/openmp.c2016-05-06 19:00:17.642600199 +0200
@@ -76,6 +76,12 @@ gfc_free_omp_clauses (gfc_omp_clauses *c
   gfc_free_expr (c->device);
   gfc_free_expr (c->thread_limit);
   gfc_free_expr (c->dist_chunk_size);
+  gfc_free_expr (c->grainsize);
+  gfc_free_expr (c->hint);
+  gfc_free_expr (c->num_tasks);
+  gfc_free_expr (c->priority);
+  for (i = 0; i < OMP_IF_LAST; i++)
+gfc_free_expr

Re: tuple move constructor

2016-05-06 Thread Ville Voutilainen

On 6 May 2016 at 20:51, Marc Glisse  wrote:
> Hi Ville,
>
> since you wrote the latest patches on tuple constructors, do you have an
> opinion on this patch, or alternate strategies to achieve the same goal?
>
> https://gcc.gnu.org/ml/libstdc++/2016-04/msg00041.html

I have fairly mixed feelings about the approach; it's adding a tag
type and more enable_ifs into the
base classes of tuple, which I'd rather not do unless absolutely
necessary. Then again, the testcase
you add looks like something we want to support properly. I haven't
analyzed your patch in a very detailed
manner; my initial thought was "can't we do this in the constraints of
tuple's constructors", but looking
at the patch and knowing the constructors of tuple, I don't think we can.

I think the patch is ok, but I think it would be a good idea to have a
comment on the added tag type and
its purpose.

Minor point: the technique that looks like

typename enable_if<
!is_same<_UHead, _Head>::value, bool>::type = false

isn't necessary unless we have another overload that we need to
distinguish with true/false. For a single overload,
just using typename = typename enable_if<
!is_same<_UHead, _Head>::value, bool>::type
works equally well. The amount of boilerplate is more or less the
same, so that's really not a significant matter,
just an fyi. :)

Re: [PATCH 1/2] Add OVERRIDE and FINAL macros to coretypes.h

2016-05-06 Thread Pedro Alves

On 05/06/2016 07:33 PM, Trevor Saunders wrote:
> On Fri, May 06, 2016 at 07:10:33PM +0100, Pedro Alves wrote:

>> I like your names without the GCC_ prefix better though,
>> for the same reason of standardizing binutils-gdb + gcc
>> on the same symbols.
> 
> I agree, though I'm not really sure when gdb / binutils stuff will
> support building as C++11.

gdb already builds as a C++ compiler by default today, and will
switch to C++-only right after the next release (couple months),
the latest.

Thanks,
Pedro Alves

Re: [PATCH 1/2] Add OVERRIDE and FINAL macros to coretypes.h

2016-05-06 Thread Trevor Saunders

On Fri, May 06, 2016 at 07:10:33PM +0100, Pedro Alves wrote:
> On 05/06/2016 06:56 PM, Pedro Alves wrote:
> 
> > If building gcc as a C++11 program is supported, then it
> > won't be possible to use these names as symbols for
> > anything else anyway?
> 
> Just found out the above is not true.  Apparently I've
> been stuck in C++98 for too long...  Sorry about the noise.
> 
> I was going to suggest to put this in include/ansidecl.h,
> so that all C++ libraries / programs in binutils-gdb use the same
> thing, instead of each reinventing the wheel, and I found
> there's already something there:
> 
> /* This is used to mark a class or virtual function as final.  */
> #if __cplusplus >= 201103L
> #define GCC_FINAL final
> #elif GCC_VERSION >= 4007
> #define GCC_FINAL __final
> #else
> #define GCC_FINAL
> #endif
> 
> From:
> 
>  https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00455.html
> 
> Apparently the patch that actually uses that was reverted,
> as I can't find any use.

Yeah, I wanted to use it to work around gdb not dealing well with stuff
in the anon namespace, but somehow that broke aix, and some people
objected and I haven't gotten back to it.

> I like your names without the GCC_ prefix better though,
> for the same reason of standardizing binutils-gdb + gcc
> on the same symbols.

I agree, though I'm not really sure when gdb / binutils stuff will
support building as C++11.

Trev

> 
> 
> -- 
> Thanks,
> Pedro Alves

Re: [PATCH 1/2] Add OVERRIDE and FINAL macros to coretypes.h

2016-05-06 Thread Pedro Alves

On 05/06/2016 06:56 PM, Pedro Alves wrote:

> If building gcc as a C++11 program is supported, then it
> won't be possible to use these names as symbols for
> anything else anyway?

Just found out the above is not true.  Apparently I've
been stuck in C++98 for too long...  Sorry about the noise.

I was going to suggest to put this in include/ansidecl.h,
so that all C++ libraries / programs in binutils-gdb use the same
thing, instead of each reinventing the wheel, and I found
there's already something there:

/* This is used to mark a class or virtual function as final.  */
#if __cplusplus >= 201103L
#define GCC_FINAL final
#elif GCC_VERSION >= 4007
#define GCC_FINAL __final
#else
#define GCC_FINAL
#endif

From:

 https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00455.html

Apparently the patch that actually uses that was reverted,
as I can't find any use.

I like your names without the GCC_ prefix better though,
for the same reason of standardizing binutils-gdb + gcc
on the same symbols.

-- 
Thanks,
Pedro Alves

Re: [PATCH 1/2] Add OVERRIDE and FINAL macros to coretypes.h

2016-05-06 Thread Pedro Alves

On 05/06/2016 05:40 PM, David Malcolm wrote:
> +#if __cplusplus >= 201103
> +/* C++11 claims to be available: use it: */
> +#define OVERRIDE override
> +#define FINAL final
> +#else
> +/* No C++11 support; leave the macros empty: */
> +#define OVERRIDE
> +#define FINAL
> +#endif
> +

Is there a reason this is preferred over using override/final in
the sources directly, and then define them away as empty
on pre-C++11?

I mean:

#if __cplusplus < 201103
# define override
# define final
#endif

then use override/final throughout instead of OVERRIDE/FINAL.

If building gcc as a C++11 program is supported, then it
won't be possible to use these names as symbols for
anything else anyway?

Thanks,
Pedro Alves

Re: tuple move constructor

2016-05-06 Thread Marc Glisse


Hi Ville,

since you wrote the latest patches on tuple constructors, do you have an 
opinion on this patch, or alternate strategies to achieve the same goal?


https://gcc.gnu.org/ml/libstdc++/2016-04/msg00041.html


On Thu, 21 Apr 2016, Marc Glisse wrote:


On Thu, 21 Apr 2016, Jonathan Wakely wrote:


On 20 April 2016 at 21:42, Marc Glisse wrote:

Hello,

does anyone remember why the move constructor of _Tuple_impl is not
defaulted? The attached patch does not cause any test to fail (whitespace
kept to avoid line number changes). Maybe something about tuples of
references?


I don't know/remember why. It's possible it was to workaround a
front-end bug that required it, or maybe just a mistake and it should
always have been defaulted.


Ok, then how about something like this? In order to suppress the move
constructor in tuple (when there is a non-movable element), we need to
either declare it with suitable constraints, or keep it defaulted and
ensure that we don't bypass a missing move constructor anywhere along
the way (_Tuple_impl, _Head_base). There is a strange mix of 2
strategies in the patch, I prefer the tag class, but I started using
enable_if before I realized how many places needed those horrors.

Bootstrap+regtest on powerpc64le-unknown-linux-gnu.


2016-04-22  Marc Glisse  

* include/std/tuple (__element_arg_t): New class.
	(_Head_base(const _Head&), _Tuple_impl(const _Head&, const 
_Tail&...):

Remove.
(_Head_base(_UHead&&)): Add __element_arg_t argument...
(_Tuple_impl): ... and adjust callers.
(_Tuple_impl(_Tuple_impl&&)): Default.
(_Tuple_impl(const _Tuple_impl&),
_Tuple_impl(_Tuple_impl&&), _Tuple_impl(_UHead&&): Constrain.
* testsuite/20_util/tuple/nomove.cc: New.


--
Marc Glisse

Go patch committed: Escape analysis framework

2016-05-06 Thread Ian Lance Taylor

This patch by Chris Manghane implements the basic framework for the
new escape analysis.  It doesn't really do anything at this point,
this is just a skeleton.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian

2016-05-06  Chris Manghane  

* Make-lang.in (GO_OBJS): Add go/escape.o (based on an entirely
new escape.cc).
Index: gcc/go/Make-lang.in
===
--- gcc/go/Make-lang.in (revision 235649)
+++ gcc/go/Make-lang.in (working copy)
@@ -50,6 +50,7 @@ go-warn = $(STRICT_WARN)
 
 GO_OBJS = \
go/ast-dump.o \
+   go/escape.o \
go/export.o \
go/expressions.o \
go/go-backend.o \
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 235649)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-46b108136c0d102f181f0cc7c398e3db8c4d08a3
+33f1d1d151721305ba37f3e23652d21310f868af
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 0)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -0,0 +1,95 @@
+// escape.cc -- Go escape analysis (based on Go compiler algorithm).
+
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "gogo.h"
+#include "escape.h"
+
+// Analyze the program flow for escape information.
+
+void
+Gogo::analyze_escape()
+{
+  // Discover strongly connected groups of functions to analyze for escape
+  // information in this package.
+  this->discover_analysis_sets();
+
+  for (std::vector::iterator p = this->analysis_sets_.begin();
+   p != this->analysis_sets_.end();
+   ++p)
+{
+  std::vector stack = p->first;
+  Escape_context* context = new Escape_context(p->second);
+
+  // Analyze the flow of each function; build the connection graph.
+  // This is the assign phase.
+  for (std::vector::reverse_iterator fn = stack.rbegin();
+   fn != stack.rend();
+   ++fn)
+   {
+ context->set_current_function(*fn);
+ this->assign_connectivity(context, *fn);
+   }
+
+  // TODO(cmang): Introduce escape node.
+  // Propagate levels across each dst.  This is the flood phase.
+  // std::vector dsts = context->dsts();
+  // for (std::vector::iterator n = dsts.begin();
+  //  n != dsts.end();
+  //  ++n)
+  //   this->propagate_escape(context, *n);
+
+  // Tag each exported function's parameters with escape information.
+  for (std::vector::iterator fn = stack.begin();
+   fn != stack.end();
+   ++fn)
+this->tag_function(context, *fn);
+
+  delete context;
+}
+}
+
+// Discover strongly connected groups of functions to analyze.
+
+void
+Gogo::discover_analysis_sets()
+{
+  // TODO(cmang): Implement Analysis_set discovery traversal.
+  // Escape_analysis_discover(this);
+  // this->traverse();
+}
+
+// Build a connectivity graph between nodes in the function being analyzed.
+
+void
+Gogo::assign_connectivity(Escape_context*, Named_object*)
+{
+  // TODO(cmang): Model the flow analysis of input parameters and results for a
+  // function.
+  // TODO(cmang): Analyze the current function's body.
+}
+
+// Propagate escape information across the nodes modeled in this Analysis_set,
+// TODO(cmang): Introduce escape analysis node.
+
+void
+Gogo::propagate_escape(Escape_context*)
+{
+  // TODO(cmang): Do a breadth-first traversal of a node's upstream, adjusting
+  // the Level appropriately.
+}
+
+
+// Tag each top-level function with escape information that will be used to
+// retain analysis results across imports.
+
+void
+Gogo::tag_function(Escape_context*, Named_object*)
+{
+  // TODO(cmang): Create escape information notes for each input and output
+  // parameter in a given function.
+  // Escape_analysis_tag eat(context, fn);
+  // this->traverse();
+}
Index: gcc/go/gofrontend/escape.h
===
--- gcc/go/gofrontend/escape.h  (revision 0)
+++ gcc/go/gofrontend/escape.h  (working copy)
@@ -0,0 +1,44 @@
+// escape.h -- Go escape analysis (based on Go compiler algorithm).
+
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#ifndef GO_ESCAPE_H
+#define GO_ESCAPE_H
+
+class Named_object;
+
+// The escape context for a set of functions being analyzed.
+
+class Escape_context
+{
+ public:
+  Escape_context(bool recursive)
+: current_function_(NULL), recursive_(recursive)
+  { }
+
+

Re: [PATCH v2] add support for placing variables in shared memory

2016-05-06 Thread Alexander Monakov

Allow using __attribute__((shared)) to place static variables in '.shared'
memory space.

Changes in v2:
- reword diagnostic message in nvptx_handle_shared_attribute to follow other
  backends ("... attribute not allowed with auto storage class");
- reject explicit initialization of ".shared" memory variables;
- add testcases.

testsuite/

2016-05-06  Alexander Monakov  

* gcc.target/nvptx/decl-shared.c: New test.
* gcc.target/nvptx/decl-shared-init.c: New test.

gcc/

2016-05-06  Alexander Monakov  

* config/nvptx/nvptx.c (nvptx_encode_section_info): Diagnose explicit
static initialization of variables in .shared memory. 
(nvptx_handle_shared_attribute): Reword diagnostic message.   

2016-04-19  Alexander Monakov  

* doc/extend.texi (Nvidia PTX Variable Attributes): New section.

2016-01-17  Alexander Monakov  

* config/nvptx/nvptx.c (nvptx_encode_section_info): Handle "shared"
attribute.
(nvptx_handle_shared_attribute): New.  Use it...
(nvptx_attribute_table): ... here (new entry).


diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 2d4dad1..e9e4d06 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -234,9 +224,17 @@ nvptx_encode_section_info (tree decl, rtx rtl, int first)
   if (TREE_CONSTANT (decl))
area = DATA_AREA_CONST;
   else if (TREE_CODE (decl) == VAR_DECL)
-   /* TODO: This would be a good place to check for a .shared or
-  other section name.  */
-   area = TREE_READONLY (decl) ? DATA_AREA_CONST : DATA_AREA_GLOBAL;
+   {
+ if (lookup_attribute ("shared", DECL_ATTRIBUTES (decl)))
+   {
+ area = DATA_AREA_SHARED;
+ if (DECL_INITIAL (decl))
+   error ("static initialization of variable %q+D in %<.shared%>"
+  " memory is not supported", decl);
+   }
+ else
+   area = TREE_READONLY (decl) ? DATA_AREA_CONST : DATA_AREA_GLOBAL;
+   }

   SET_SYMBOL_DATA_AREA (XEXP (rtl, 0), area);
 }
@@ -3805,12 +4025,36 @@ nvptx_handle_kernel_attribute (tree *node, tree name, 
tree ARG_UNUSED (args),
   return NULL_TREE;
 }
 
+/* Handle a "shared" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+nvptx_handle_shared_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  tree decl = *node;
+
+  if (TREE_CODE (decl) != VAR_DECL)
+{
+  error ("%qE attribute only applies to variables", name);
+  *no_add_attrs = true;
+}
+  else if (current_function_decl && !TREE_STATIC (decl))
+{
+  error ("%qE attribute not allowed with auto storage class", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
 /* Table of valid machine attributes.  */
 static const struct attribute_spec nvptx_attribute_table[] =
 {
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
affects_type_identity } */
   { "kernel", 0, 0, true, false,  false, nvptx_handle_kernel_attribute, false 
},
+  { "shared", 0, 0, true, false,  false, nvptx_handle_shared_attribute, false 
},
   { NULL, 0, 0, false, false, false, NULL, false }
 };
 
diff --git a/gcc/testsuite/gcc.target/nvptx/decl-shared-init.c 
b/gcc/testsuite/gcc.target/nvptx/decl-shared-init.c
new file mode 100644
index 000..6a99b1c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/decl-shared-init.c
@@ -0,0 +1 @@
+int var __attribute__((shared)) = 0; /* { dg-error "static initialization .* 
not supported" } */
diff --git a/gcc/testsuite/gcc.target/nvptx/decl-shared.c 
b/gcc/testsuite/gcc.target/nvptx/decl-shared.c
new file mode 100644
index 000..367075c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/decl-shared.c
@@ -0,0 +1,14 @@
+static int v_internal __attribute__((shared,used));
+int v_common __attribute__((shared));
+int v_extdef __attribute__((shared,nocommon));
+extern int v_extdecl __attribute__((shared));
+
+int use()
+{
+  return v_extdecl;
+}
+
+/* { dg-final { scan-assembler "\[\r\n\]\[\t \]*.shared \[^,\r\n\]*v_internal" 
} } */
+/* { dg-final { scan-assembler "\[\r\n\]\[\t \]*.weak .shared 
\[^,\r\n\]*v_common" } } */
+/* { dg-final { scan-assembler "\[\r\n\]\[\t \]*.visible .shared 
\[^,\r\n\]*v_extdef" } } */
+/* { dg-final { scan-assembler "\[\r\n\]\[\t \]*.extern .shared 
\[^,\r\n\]*v_extdecl" } } */
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e11ce4d..5eeb179 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -5469,6 +5469,7 @@ attributes.
 * MeP Variable Attributes::
 * Microsoft Windows Variable Attributes::
 * MSP430 Variable Attributes::
+* Nvidia PTX Variable Attributes::
 * PowerPC Variable Attributes::
 * RL78 Variable Attributes::
 * SPU Variable Attributes::
@@ -6099,6 +6100,20 @@ same name

Re: CONSTEXPR macro (was "Re: [PATCH 1/2] Add OVERRIDE and FINAL macros to coretypes.h")

2016-05-06 Thread Jakub Jelinek

On Fri, May 06, 2016 at 12:32:47PM -0400, David Malcolm wrote:
> Perhaps, but CONSTEXPR seems to be more awkward than OVERRIDE and
> FINAL.  The meanings of "final" and "override" are consistent between
> C++11 and C++14, but C++14 allows more things to be marked as
> "constexpr" than C++11.  Hence having a single "CONSTEXPR" macro might
> not be sufficient.  Perhaps there'd be CONSTEXPR_11 and CONSTEXPR_14
> macros for things that are constexpr in C++11 onwards and constexpr in
> C++14 onwards, respectively? (seems ugly to me).

Yeah, or CONSTEXPR and CONSTEXPR14 could work, sure.

> Are the OVERRIDE and FINAL macros OK for trunk?

Yes.

Jakub

CONSTEXPR macro (was "Re: [PATCH 1/2] Add OVERRIDE and FINAL macros to coretypes.h")

2016-05-06 Thread David Malcolm

On Fri, 2016-05-06 at 18:20 +0200, Jakub Jelinek wrote:
> On Fri, May 06, 2016 at 12:40:45PM -0400, David Malcolm wrote:
> > C++11 adds the ability to add "override" after an implementation of
> > a
> > virtual function in a subclass, to:
> > (A) document that this is an override of a virtual function
> > (B) allow the compiler to issue a warning if it isn't (e.g. a
> > mismatch
> > of the type signature).
> > 
> > Similarly, it allows us to add a "final" to indicate that no
> > subclass
> > may subsequently override the vfunc.
> > 
> > We use virtual functions in a few places (e.g. in the jit), so it
> > would
> > be good to get this extra checking.
> > 
> > This patch adds OVERRIDE and FINAL as macros to coretypes.h
> > allowing us to get this extra checking when compiling with a
> > compiler
> > that implements C++11 or later (e.g. gcc 6 by default),
> > but without requiring C++11.
> 
> Don't we also want CONSTEXPR similarly defined to constexpr for C++11
> and
> above and nothing otherwise?

Perhaps, but CONSTEXPR seems to be more awkward than OVERRIDE and
FINAL.  The meanings of "final" and "override" are consistent between
C++11 and C++14, but C++14 allows more things to be marked as
"constexpr" than C++11.  Hence having a single "CONSTEXPR" macro might
not be sufficient.  Perhaps there'd be CONSTEXPR_11 and CONSTEXPR_14
macros for things that are constexpr in C++11 onwards and constexpr in
C++14 onwards, respectively? (seems ugly to me).

Are the OVERRIDE and FINAL macros OK for trunk?

Thanks
Dave

Re: [PATCH 1/2] Add OVERRIDE and FINAL macros to coretypes.h

2016-05-06 Thread Jakub Jelinek

On Fri, May 06, 2016 at 12:40:45PM -0400, David Malcolm wrote:
> C++11 adds the ability to add "override" after an implementation of a
> virtual function in a subclass, to:
> (A) document that this is an override of a virtual function
> (B) allow the compiler to issue a warning if it isn't (e.g. a mismatch
> of the type signature).
> 
> Similarly, it allows us to add a "final" to indicate that no subclass
> may subsequently override the vfunc.
> 
> We use virtual functions in a few places (e.g. in the jit), so it would
> be good to get this extra checking.
> 
> This patch adds OVERRIDE and FINAL as macros to coretypes.h
> allowing us to get this extra checking when compiling with a compiler
> that implements C++11 or later (e.g. gcc 6 by default),
> but without requiring C++11.

Don't we also want CONSTEXPR similarly defined to constexpr for C++11 and
above and nothing otherwise?

Jakub

Re: Fix for PR68159 in Libiberty Demangler (6)

2016-05-06 Thread Ian Lance Taylor

On Fri, May 6, 2016 at 2:51 AM, Jakub Jelinek  wrote:
>
> Anyway, perhaps I'm misremembering, if there is a mode that really can't
> fail due to allocation failures or not, we need to deal with that.
> Ian or Jason, can all the demangle users allocate heap memory or not?
> And, if __cxa_demangle can fail, there is some allocation_failure stuff
> in the file.

The function cplus_demangle_v3_callback must not call malloc.  The
whole point of that function is to work when nothing else works.  That
is why d_demangle_callback does not, and must not, call malloc.

Ian

Re: Fix for PR68159 in Libiberty Demangler (6)

2016-05-06 Thread Jakub Jelinek

On Sat, May 07, 2016 at 12:05:11AM +0800, Marcel Böhme wrote:
> This patch also removes the following part of the comment for method 
> cplus_demangle_print_callback:
> "It does not use heap memory to build an output string, so cannot encounter 
> memory allocation failure”.

But that exactly is the thing I've talked about.  Removing the comment
doesn't make it right, supposedly it has been done that way for a reason.

The file has lots of different entrypoints, some of them depend on various
macros on what is it built for (libstdc++, libgcc, binutils/gdb/gcc in
libiberty, ...).

And some of them clearly can cope with memory allocation failures, but
they should be turned into the allocation_failure flag setting.

Others don't want any allocations.

E.g. if you read the description of __cxa_demangle, there is
   *STATUS is set to one of the following values:
  0: The demangling operation succeeded.
 -1: A memory allocation failure occurred.
 -2: MANGLED_NAME is not a valid name under the C++ ABI mangling rules.
 -3: One of the arguments is invalid.
and thus, it should be ensured that we end up with *STATUS -1 even for
the cases where malloc failed on those.

But then look at e.g. __gcclibcxx_demangle_callback (but there are various
others).

Jakub

[PATCH 2/2] jit: use FINAL and OVERRIDE throughout

2016-05-06 Thread David Malcolm

Mark most virtual functions in gcc/jit as being FINAL OVERRIDE.
gcc::jit::recording::lvalue::access_as_rvalue is the sole OVERRIDE
that isn't a FINAL.

Successfully bootstrapped on x86_64-pc-linux-gnu.

I can self-approve this, but as asked in patch 1,
does "final" imply "override"?  Is "final override" a tautology?

gcc/jit/ChangeLog:
* jit-playback.h: Within namespace gcc:jit::playback...
(compile_to_memory::postprocess): Mark with FINAL OVERRIDE.
(compile_to_file::postprocess): Likewise.
(function::finalizer): Likewise.
(block::finalizer): Likewise.
(source_file::finalizer): Likewise.
(source_line::finalizer): Likewise.
* jit-recording.c (gcc::jit::rvalue_usage_validator):: Likewise.
* jit-recording.h: Within namespace gcc::jit::recording...
(string::replay_into): Mark with FINAL OVERRIDE.
(string::make_debug_string): Likewise.
(string::write_reproducer): Likewise.
(location::replay_into): Likewise.
(location::dyn_cast_location): Likewise.
(location::make_debug_string): Likewise.
(location::write_reproducer): Likewise.
(memento_of_get_type::dereference): Likewise.
(memento_of_get_type::accepts_writes_from): Likewise.
(memento_of_get_type::is_int): Likewise.
(memento_of_get_type::is_float): Likewise.
(memento_of_get_type::is_bool): Likewise.
(memento_of_get_type::is_pointer): Likewise.
(memento_of_get_type::is_array): Likewise.
(memento_of_get_type::is_void): Likewise.
(memento_of_get_type::replay_into): Likewise.
(memento_of_get_type::make_debug_string): Likewise.
(memento_of_get_type::write_reproducer): Likewise.
(memento_of_get_pointer::dereference): Likewise.
(memento_of_get_pointer::accepts_writes_from): Likewise.
(memento_of_get_pointer::replay_into): Likewise.
(memento_of_get_pointer::is_int): Likewise.
(memento_of_get_pointer::is_float): Likewise.
(memento_of_get_pointer::is_bool): Likewise.
(memento_of_get_pointer::is_pointer): Likewise.
(memento_of_get_pointer::is_array): Likewise.
(memento_of_get_pointer::make_debug_string): Likewise.
(memento_of_get_pointer::write_reproducer): Likewise.
(memento_of_get_const::dereference): Likewise.
(memento_of_get_const::accepts_writes_from): Likewise.
(memento_of_get_const::unqualified): Likewise.
(memento_of_get_const::is_int): Likewise.
(memento_of_get_const::is_float): Likewise.
(memento_of_get_const::is_bool): Likewise.
(memento_of_get_const::is_pointer): Likewise.
(memento_of_get_const::is_array): Likewise.
(memento_of_get_const::void replay_into): Likewise;
(memento_of_get_const::make_debug_string): Likewise.
(memento_of_get_const::write_reproducer): Likewise.
(memento_of_get_volatile::dereference): Likewise.
(memento_of_get_volatile::unqualified): Likewise.
(memento_of_get_volatile::is_int): Likewise.
(memento_of_get_volatile::is_float): Likewise.
(memento_of_get_volatile::is_bool): Likewise.
(memento_of_get_volatile::is_pointer): Likewise.
(memento_of_get_volatile::is_array): Likewise.
(memento_of_get_volatile::replay_into): Likewise;
(memento_of_get_volatile::make_debug_string): Likewise.
(memento_of_get_volatile::write_reproducer): Likewise.
(array_type::dereference): Likewise.
(array_type::is_int): Likewise.
(array_type::is_float): Likewise.
(array_type::is_bool): Likewise.
(array_type::is_pointer): Likewise.
(array_type::is_array): Likewise.
(array_type::replay_into): Likewise;
(array_type::make_debug_string): Likewise.
(array_type::write_reproducer): Likewise.
(function_type::dereference): Likewise.
(function_type::function_dyn_cast_function_type): Likewise.
(function_type::function_as_a_function_type): Likewise.
(function_type::is_int): Likewise.
(function_type::is_float): Likewise.
(function_type::is_bool): Likewise.
(function_type::is_pointer): Likewise.
(function_type::is_array): Likewise.
(function_type::replay_into): Likewise;
(function_type::make_debug_string): Likewise.
(function_type::write_reproducer): Likewise.
(field::replay_into): Likewise;
(field::write_to_dump): Likewise.
(field::make_debug_string): Likewise.
(field::write_reproducer): Likewise.
(compound_type::dereference): Likewise.
(compound_type::is_int): Likewise.
(compound_type::is_float): Likewise.
(compound_type::is_bool): Likewise.
(compound_type::is_pointer): Likewise.
(compound_type::is_array): Likewise.
(compound_type::has_known_size): Likewise.

[PATCH 1/2] Add OVERRIDE and FINAL macros to coretypes.h

2016-05-06 Thread David Malcolm

C++11 adds the ability to add "override" after an implementation of a
virtual function in a subclass, to:
(A) document that this is an override of a virtual function
(B) allow the compiler to issue a warning if it isn't (e.g. a mismatch
of the type signature).

Similarly, it allows us to add a "final" to indicate that no subclass
may subsequently override the vfunc.

We use virtual functions in a few places (e.g. in the jit), so it would
be good to get this extra checking.

This patch adds OVERRIDE and FINAL as macros to coretypes.h
allowing us to get this extra checking when compiling with a compiler
that implements C++11 or later (e.g. gcc 6 by default),
but without requiring C++11.

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

Does "final" imply "override"?  Is "final override" a tautology?

gcc/ChangeLog:
* coretypes.h (OVERRIDE): New macro.
(FINAL): New macro.
---
 gcc/coretypes.h | 25 +
 1 file changed, 25 insertions(+)

diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 2932d73..b3a91a6 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -361,6 +361,31 @@ typedef void (*gt_pointer_operator) (void *, void *);
 typedef unsigned char uchar;
 #endif
 
+/* C++11 adds the ability to add "override" after an implementation of a
+   virtual function in a subclass, to:
+ (A) document that this is an override of a virtual function
+ (B) allow the compiler to issue a warning if it isn't (e.g. a mismatch
+ of the type signature).
+
+   Similarly, it allows us to add a "final" to indicate that no subclass
+   may subsequently override the vfunc.
+
+   Provide OVERRIDE and FINAL as macros, allowing us to get these benefits
+   when compiling with C++11 support, but without requiring C++11.
+
+   For gcc, use "-std=c++11" to enable C++11 support; gcc 6 onwards enables
+   this by default (actually GNU++14).  */
+
+#if __cplusplus >= 201103
+/* C++11 claims to be available: use it: */
+#define OVERRIDE override
+#define FINAL final
+#else
+/* No C++11 support; leave the macros empty: */
+#define OVERRIDE
+#define FINAL
+#endif
+
 /* Most host source files will require the following headers.  */
 #if !defined (GENERATOR_FILE) && !defined (USED_FOR_TARGET)
 #include "machmode.h"
-- 
1.8.5.3

Re: Fix for PR68159 in Libiberty Demangler (6)

2016-05-06 Thread Marcel Böhme

Hi,

This patch also removes the following part of the comment for method 
cplus_demangle_print_callback:
"It does not use heap memory to build an output string, so cannot encounter 
memory allocation failure”.

> On 6 May 2016, at 11:11 PM, Marcel Böhme  wrote:
> 
> 
>> If one malloc succeeds and the other fails, you leak memory.
>> 
>>  Jakub
> Nice catch. Thanks!
> 
> Bootstrapped and regression tested on x86_64-pc-linux-gnu.

Best - Marcel

Index: libiberty/ChangeLog
===
--- libiberty/ChangeLog (revision 235962)
+++ libiberty/ChangeLog (working copy)
@@ -1,3 +1,14 @@
+2016-05-06  Marcel Böhme  
+
+   PR c++/68159
+   * cp-demangle.c: Allocate arrays of user-defined size on the heap,
+   not on the stack. Do not include .
+   (CP_DYNAMIC_ARRAYS): Remove definition.
+   (cplus_demangle_print_callback): Allocate memory for two arrays on
+   the heap. Free memory before return / exit.
+   (d_demangle_callback): Likewise.
+   (is_ctor_or_dtor): Likewise.
+   * testsuite/demangle-expected: Add regression test cases.
+
2016-05-02  Marcel Böhme  

PR c++/70498
Index: libiberty/cp-demangle.c
===
--- libiberty/cp-demangle.c (revision 235962)
+++ libiberty/cp-demangle.c (working copy)
@@ -116,18 +116,6 @@
 #include 
 #endif
 
-#ifdef HAVE_ALLOCA_H
-# include 
-#else
-# ifndef alloca
-#  ifdef __GNUC__
-#   define alloca __builtin_alloca
-#  else
-extern char *alloca ();
-#  endif /* __GNUC__ */
-# endif /* alloca */
-#endif /* HAVE_ALLOCA_H */
-
 #ifdef HAVE_LIMITS_H
 #include 
 #endif
@@ -186,20 +174,6 @@ static void d_init_info (const char *, int, size_t
 #define CP_STATIC_IF_GLIBCPP_V3
 #endif /* ! defined(IN_GLIBCPP_V3) */
 
-/* See if the compiler supports dynamic arrays.  */
-
-#ifdef __GNUC__
-#define CP_DYNAMIC_ARRAYS
-#else
-#ifdef __STDC__
-#ifdef __STDC_VERSION__
-#if __STDC_VERSION__ >= 199901L
-#define CP_DYNAMIC_ARRAYS
-#endif /* __STDC__VERSION >= 199901L */
-#endif /* defined (__STDC_VERSION__) */
-#endif /* defined (__STDC__) */
-#endif /* ! defined (__GNUC__) */
-
 /* We avoid pulling in the ctype tables, to prevent pulling in
additional unresolved symbols when this code is used in a library.
FIXME: Is this really a valid reason?  This comes from the original
@@ -4112,9 +4086,7 @@ d_last_char (struct d_print_info *dpi)
CALLBACK is a function to call to flush demangled string segments
as they fill the intermediate buffer, and OPAQUE is a generalized
callback argument.  On success, this returns 1.  On failure,
-   it returns 0, indicating a bad parse.  It does not use heap
-   memory to build an output string, so cannot encounter memory
-   allocation failure.  */
+   it returns 0, indicating a bad parse.  */
 
 CP_STATIC_IF_GLIBCPP_V3
 int
@@ -4126,25 +4098,32 @@ cplus_demangle_print_callback (int options,
 
   d_print_init (, callback, opaque, dc);
 
-  {
-#ifdef CP_DYNAMIC_ARRAYS
-__extension__ struct d_saved_scope scopes[dpi.num_saved_scopes];
-__extension__ struct d_print_template temps[dpi.num_copy_templates];
+  dpi.copy_templates
+= (struct d_print_template *) malloc (((size_t) dpi.num_copy_templates) 
+ * sizeof (*dpi.copy_templates));
+  if (! dpi.copy_templates)
+{
+  d_print_error ();
+  return 0;
+}
 
-dpi.saved_scopes = scopes;
-dpi.copy_templates = temps;
-#else
-dpi.saved_scopes = alloca (dpi.num_saved_scopes
-  * sizeof (*dpi.saved_scopes));
-dpi.copy_templates = alloca (dpi.num_copy_templates
-* sizeof (*dpi.copy_templates));
-#endif
+  dpi.saved_scopes
+= (struct d_saved_scope *) malloc (((size_t) dpi.num_saved_scopes) 
+  * sizeof (*dpi.saved_scopes));  
+  if (! dpi.saved_scopes)
+{
+  free (dpi.copy_templates);
+  d_print_error ();
+  return 0;
+}
 
-d_print_comp (, options, dc);
-  }
+  d_print_comp (, options, dc);
 
   d_print_flush ();
 
+  free (dpi.copy_templates);
+  free (dpi.saved_scopes);
+
   return ! d_print_saw_error ();
 }
 
@@ -5945,57 +5924,61 @@ d_demangle_callback (const char *mangled, int opti
 
   cplus_demangle_init_info (mangled, options, strlen (mangled), );
 
-  {
-#ifdef CP_DYNAMIC_ARRAYS
-__extension__ struct demangle_component comps[di.num_comps];
-__extension__ struct demangle_component *subs[di.num_subs];
+  di.comps = (struct demangle_component *) malloc (((size_t) di.num_comps) 
+  * sizeof (*di.comps));
+  if (! di.comps)
+return 0;
 
-di.comps = comps;
-di.subs = subs;
-#else
-di.comps = alloca (di.num_comps * sizeof (*di.comps));
-di.subs = alloca (di.num_subs * sizeof (*di.subs));
-#endif

[gom[4] Improve loop partitioning

2016-05-06 Thread Nathan Sidwell


This patch improves the auto loop partitioning algorithm in  2 ways.

1) rather than try and assign just the outer loop to the outer partition and 
then all innermost loops to partitioning axis, this changes the algorithm to 
assign the innermost loop to the innermost partition and then the outermost loop 
nests to the outermost partitions.  The difference will be seen when the loop 
nest exceeds the number of available partitioning axes.  Now the unpartitioned 
loops will be the loops just outside the innermost loop -- rather than the loops 
just inside the outermost loop.


2) If the loop nest is shallower than the number of available partitions,  we 
attempt to assign an outer loop to  two partitions.  This piece of the algorithm 
isn't fully generalized as there are only 3 paritioning axes.  The interesting 
cases are nests of 1 or 2 loops.   In the latter case, the outer loop will get 
gang+worker partitioning and the inner loop will get vector partitioning.  In 
the former  case the loop will get gang+vector partitioning.  Whether it's worth 
extending this case to assign all  3 axes is something to investigate further.


This patch gives a 5-fold speedup of 304.olbm and a 16-fold speedup of 360.ilbdc 
over the current implementation.


committed to gomp4.

nathan
2016-05-06  Nathan Sidwell  

	gcc/
	* omp-low.c (lower_oacc_head_mark): Ensure 2 levels for auto
	loops.
	(oacc_loop_auto_partitions): Add outer_assign parm. Assign all but
	vector partitioning to outer loops.  Assign 2 partitions to loops
	when available.
	(oacc_loop_partition): Adjust oacc_loop_auto_partitions call.

	gcc/testsuite/
	* c-c++-common/goacc/loop-auto-1.c: Adjust and add additionnal
	case.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust and
	add additional case.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 235775)
+++ gcc/omp-low.c	(working copy)
@@ -6396,9 +6396,9 @@ lower_oacc_head_mark (location_t loc, tr
 	   | OLF_SEQ)))
   tag |= OLF_AUTO;
 
-  /* Ensure at least one level.  */
-  if (!levels)
-levels++;
+  /* Ensure at least one level, or 2 for AUTO partitioning  */
+  if (levels < 1 + ((tag & OLF_AUTO) != 0))
+levels = 1 + ((tag & OLF_AUTO) != 0);
 
   args.quick_push (build_int_cst (integer_type_node, levels));
   args.quick_push (build_int_cst (integer_type_node, tag));
@@ -19682,11 +19682,13 @@ oacc_loop_fixed_partitions (oacc_loop *l
 
 /* Walk the OpenACC loop heirarchy to assign auto-partitioned loops.
OUTER_MASK is the partitioning this loop is contained within.
+   OUTER_ASSIGN is true if an outer loop is being auto-partitioned.
Return the cumulative partitioning used by this loop, siblings and
children.  */
 
 static unsigned
-oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask)
+oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask,
+			   bool outer_assign)
 {
   bool assign = (loop->flags & OLF_AUTO) && (loop->flags & OLF_INDEPENDENT);
   bool noisy = true;
@@ -19697,31 +19699,34 @@ oacc_loop_auto_partitions (oacc_loop *lo
   noisy = false;
 #endif
 
-  if (assign && outer_mask < GOMP_DIM_MASK (GOMP_DIM_MAX - 1))
+  if (assign && (!outer_assign | loop->inner))
 {
-  /* Allocate the outermost loop at the outermost available
-	 level.  */
+  /* Allocate outermost and non-innermost loops at the outermost
+	 non-innermost available level.  */
   unsigned this_mask = outer_mask + 1;
 
-  if (!(this_mask & loop->inner))
+  /* Make sure it's the single outermost available partition.  */
+  while (this_mask != (this_mask & -this_mask))
+	this_mask += this_mask & -this_mask;
+
+  if (!(this_mask & (loop->inner | GOMP_DIM_MASK (GOMP_DIM_MAX)
+			 | GOMP_DIM_MASK (GOMP_DIM_MAX - 1
 	loop->mask = this_mask;
 }
 
   if (loop->child)
-{
-  unsigned child_mask = outer_mask | loop->mask;
-
-  if (loop->mask || assign)
-	child_mask |= GOMP_DIM_MASK (GOMP_DIM_MAX);
-
-  loop->inner = oacc_loop_auto_partitions (loop->child, child_mask);
-}
-
-  if (assign && !loop->mask)
-{
-  /* Allocate the loop at the innermost available level.  */
+loop->inner = oacc_loop_auto_partitions (loop->child,
+	 outer_mask | loop->mask,
+	 outer_assign | assign);
+
+  if (assign && (!loop->mask || !outer_assign))
+{
+  /* Allocate the loop at the innermost available level.  Note
+	 that we do this even if we already assigned this loop the
+	 outermost available level above.  That way we'll partition
+	 this along 2 axes, if they are available.  */
   unsigned this_mask = 0;
-  
+
   /* Determine the outermost partitioning used within this loop. */
   this_mask = loop->inner | GOMP_DIM_MASK (GOMP_DIM_MAX);
   this_mask = (this_mask & -this_mask);
@@ -19732,11 +19737,11 @@ oacc_loop_auto_partitions (oacc_loop *lo
   /* And avoid picking one use by an outer

Re: [PATCH] Fix PR70941

2016-05-06 Thread Jakub Jelinek

On Fri, May 06, 2016 at 09:37:46AM +0200, Richard Biener wrote:
> 
> The following completes the fix for PR67921 now that we have a testcase
> for the non-pointer case.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

The testcase (for obvious reasons) fails on -funsigned-char defaulting
targets.  Plus, just theoretically, if int is 32-bits or larger, but
char is not 8-bit, it could fail as well.

Fixed thusly, committed as obvious.

2016-05-06  Jakub Jelinek  

PR middle-end/70941
* gcc.dg/torture/pr70941.c (abort): Remove prototype.
(a, b, c, d): Change type from char to signed char.
(main): Compare against (signed char) -1634678893 instead of
hardcoded -109.  Use __builtin_abort instead of abort.

--- gcc/testsuite/gcc.dg/torture/pr70941.c.jj   2016-05-06 15:09:06.0 
+0200
+++ gcc/testsuite/gcc.dg/torture/pr70941.c  2016-05-06 17:11:25.0 
+0200
@@ -1,14 +1,12 @@
 /* { dg-do run } */
 /* { dg-require-effective-target int32plus } */

-extern void abort (void);
-
-char a = 0, b = 0, c = 0, d = 0;
+signed char a = 0, b = 0, c = 0, d = 0;

 int main()
 {
   a = -(b - 405418259) - ((d && c) ^ 2040097152);
-  if (a != -109)
-abort();
+  if (a != (signed char) -1634678893)
+__builtin_abort ();
   return 0;
 }

Jakub

RE: [PATCH 2/4] [MIPS] Add pipeline description for MSA

2016-05-06 Thread Matthew Fortune

Robert Suchanek  writes:
> 
> gcc/ChangeLog:
> 
>   * config/mips/i6400.md (i6400_fpu_intadd, i6400_fpu_logic)
>   (i6400_fpu_div, i6400_fpu_cmp, i6400_fpu_float, i6400_fpu_store)
>   (i6400_fpu_long_pipe, i6400_fpu_logic_l, i6400_fpu_float_l)
>   (i6400_fpu_mult): New cpu units.
>   (i6400_msa_add_d, i6400_msa_int_add, i6400_msa_short_logic3)
>   (i6400_msa_short_logic2, i6400_msa_short_logic, i6400_msa_move)
>   (i6400_msa_cmp, i6400_msa_short_float2, i6400_msa_div_d)
>   (i6400_msa_div_w, i6400_msa_div_h, i6400_msa_div_b, i6400_msa_copy)
>   (i6400_msa_branch, i6400_fpu_msa_store, i6400_fpu_msa_load)
>   (i6400_fpu_msa_move, i6400_msa_long_logic1, i6400_msa_long_logic2)
>   (i6400_msa_mult, i6400_msa_long_float2, i6400_msa_long_float4)
>   (i6400_msa_long_float5, i6400_msa_long_float8, i6400_msa_fdiv_df)
>   (i6400_msa_fdiv_sf): New reservations.
>   * config/mips/p5600.md (p5600_fpu_intadd, p5600_fpu_cmp)
>   (p5600_fpu_float, p5600_fpu_logic_a, p5600_fpu_logic_b,
> p5600_fpu_div)
>   (p5600_fpu_logic, p5600_fpu_float_a, p5600_fpu_float_b,)

Typo with "," at the end of the list

>   (p5600_fpu_float_c, p5600_fpu_float_d, p5600_fpu_mult,
> p5600_fpu_fdiv)
>   (p5600_fpu_load): New cpu units.
>   (msa_short_int_add, msa_short_logic, msa_short_logic_move_v)
>   (msa_short_cmp, msa_short_float2, msa_short_logic3,
> msa_short_store4)
>   (msa_long_load, msa_short_store, msa_long_logic, msa_long_float2)
>   (msa_long_float4, msa_long_float5, msa_long_float8, msa_long_mult)
>   (msa_long_fdiv, msa_long_div): New reservations.

I assume this patch has not changed since it was posted.

OK to commit.

Thanks,
Matthew

Re: Fix for PR68159 in Libiberty Demangler (6)

2016-05-06 Thread Jakub Jelinek

On Fri, May 06, 2016 at 11:11:29PM +0800, Marcel Böhme wrote:
> +  dpi.copy_templates
> += (struct d_print_template *) malloc (((size_t) dpi.num_copy_templates) 
> +   * sizeof (*dpi.copy_templates));
> +  if (! dpi.copy_templates)
> +{
> +  d_print_error ();
> +  return 0;
> +}

Another thing to consider is if the common values of dpi.num_*
and similarly in the other block are small enough, it might be desirable
to just use an automatic fixed size array (or even alloca) and only
fall back to malloc if it is too large.
Would be nice to say on some distro grab using nm and nm -D all _Z* symbols
from all binaries and shared libraries and run the demangler with some
statistics gathering.  If say dpi.num_saved_scopes is <= 16 in 99.5% cases
(completely random guess), it might be a useful optimization.

Anyway, that is all from me, I'll defer to the demangler maintainers for the
rest.

Jakub

Re: Fix for PR68159 in Libiberty Demangler (6)

2016-05-06 Thread Marcel Böhme


> If one malloc succeeds and the other fails, you leak memory.
> 
>   Jakub
Nice catch. Thanks!

Bootstrapped and regression tested on x86_64-pc-linux-gnu.

Best - Marcel

Index: libiberty/ChangeLog
===
--- libiberty/ChangeLog (revision 235962)
+++ libiberty/ChangeLog (working copy)
@@ -1,3 +1,14 @@
+2016-05-06  Marcel Böhme  
+
+   PR c++/68159
+   * cp-demangle.c: Allocate arrays of user-defined size on the heap,
+   not on the stack. Do not include .
+   (CP_DYNAMIC_ARRAYS): Remove definition.
+   (cplus_demangle_print_callback): Allocate memory for two arrays on
+   the heap. Free memory before return / exit.
+   (d_demangle_callback): Likewise.
+   (is_ctor_or_dtor): Likewise.
+   * testsuite/demangle-expected: Add regression test cases.
+
2016-05-02  Marcel Böhme  

PR c++/70498
Index: libiberty/cp-demangle.c
===
--- libiberty/cp-demangle.c (revision 235962)
+++ libiberty/cp-demangle.c (working copy)
@@ -116,18 +116,6 @@
 #include 
 #endif
 
-#ifdef HAVE_ALLOCA_H
-# include 
-#else
-# ifndef alloca
-#  ifdef __GNUC__
-#   define alloca __builtin_alloca
-#  else
-extern char *alloca ();
-#  endif /* __GNUC__ */
-# endif /* alloca */
-#endif /* HAVE_ALLOCA_H */
-
 #ifdef HAVE_LIMITS_H
 #include 
 #endif
@@ -186,20 +174,6 @@ static void d_init_info (const char *, int, size_t
 #define CP_STATIC_IF_GLIBCPP_V3
 #endif /* ! defined(IN_GLIBCPP_V3) */
 
-/* See if the compiler supports dynamic arrays.  */
-
-#ifdef __GNUC__
-#define CP_DYNAMIC_ARRAYS
-#else
-#ifdef __STDC__
-#ifdef __STDC_VERSION__
-#if __STDC_VERSION__ >= 199901L
-#define CP_DYNAMIC_ARRAYS
-#endif /* __STDC__VERSION >= 199901L */
-#endif /* defined (__STDC_VERSION__) */
-#endif /* defined (__STDC__) */
-#endif /* ! defined (__GNUC__) */
-
 /* We avoid pulling in the ctype tables, to prevent pulling in
additional unresolved symbols when this code is used in a library.
FIXME: Is this really a valid reason?  This comes from the original
@@ -4126,25 +4100,31 @@ cplus_demangle_print_callback (int options,
 
   d_print_init (, callback, opaque, dc);
 
-  {
-#ifdef CP_DYNAMIC_ARRAYS
-__extension__ struct d_saved_scope scopes[dpi.num_saved_scopes];
-__extension__ struct d_print_template temps[dpi.num_copy_templates];
+  dpi.copy_templates
+= (struct d_print_template *) malloc (((size_t) dpi.num_copy_templates) 
+ * sizeof (*dpi.copy_templates));
+  if (! dpi.copy_templates)
+{
+  d_print_error ();
+  return 0;
+}
 
-dpi.saved_scopes = scopes;
-dpi.copy_templates = temps;
-#else
-dpi.saved_scopes = alloca (dpi.num_saved_scopes
-  * sizeof (*dpi.saved_scopes));
-dpi.copy_templates = alloca (dpi.num_copy_templates
-* sizeof (*dpi.copy_templates));
-#endif
+  dpi.saved_scopes
+= (struct d_saved_scope *) malloc (((size_t) dpi.num_saved_scopes) 
+  * sizeof (*dpi.saved_scopes));  
+  if (! dpi.saved_scopes)
+{
+  d_print_error ();
+  return 0;
+}
 
-d_print_comp (, options, dc);
-  }
+  d_print_comp (, options, dc);
 
   d_print_flush ();
 
+  free (dpi.copy_templates);
+  free (dpi.saved_scopes);
+
   return ! d_print_saw_error ();
 }
 
@@ -5945,57 +5925,58 @@ d_demangle_callback (const char *mangled, int opti
 
   cplus_demangle_init_info (mangled, options, strlen (mangled), );
 
-  {
-#ifdef CP_DYNAMIC_ARRAYS
-__extension__ struct demangle_component comps[di.num_comps];
-__extension__ struct demangle_component *subs[di.num_subs];
+  di.comps = (struct demangle_component *) malloc (((size_t) di.num_comps) 
+  * sizeof (*di.comps));
+  if (! di.comps)
+return 0;
 
-di.comps = comps;
-di.subs = subs;
-#else
-di.comps = alloca (di.num_comps * sizeof (*di.comps));
-di.subs = alloca (di.num_subs * sizeof (*di.subs));
-#endif
+  di.subs = (struct demangle_component **) malloc (((size_t) di.num_subs) 
+  * sizeof (*di.subs));  
+  if (! di.subs)
+return 0;
+
+  switch (type)
+{
+case DCT_TYPE:
+  dc = cplus_demangle_type ();
+  break;
+case DCT_MANGLED:
+  dc = cplus_demangle_mangled_name (, 1);
+  break;
+case DCT_GLOBAL_CTORS:
+case DCT_GLOBAL_DTORS:
+  d_advance (, 11);
+  dc = d_make_comp (,
+   (type == DCT_GLOBAL_CTORS
+? DEMANGLE_COMPONENT_GLOBAL_CONSTRUCTORS
+: DEMANGLE_COMPONENT_GLOBAL_DESTRUCTORS),
+   d_make_demangle_mangled_name (, d_str ()),
+   NULL);
+  d_advance (, strlen (d_str ()));
+  break;
+default:
+

RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)

2016-05-06 Thread Matthew Fortune

Hi Robert,

Robert Suchanek  writes:
> Revised patch attached.
> 
> Tested with mips-img-linux-gnu and bootstrapped x86_64-unknown-linux-
> gnu.

One small tweak, ChangeLog should wrap at 74 columns. Please consider the
full list of authors for this patch as it has had many major contributors
now. I believe this includes at least the following for the implementation
but fewer for the testsuite updates:

Robert Suchanek
Sameera Deshpande
Matthew Fortune
Graham Stott
Chao-ying Fu

Otherwise, OK to commit!

Matthew

> 
> > > mips_gen_const_int_vector
> > This should use gen_int_for_mode instead of GEN_INT to avoid the
> > issues that msa_ldi is trying to handle.
> 
> gen_int_mode cannot be used to generate a vector of constants as it
> expects a scalar mode.
> AFAICS, there isn't any generic helper to replace this.
> 
> >
> > > mips_const_vector_same_bytes_p
> > comment on this function is same as previous function
> 
> Corrected.
> 
> >
> > > mips_msa_idiv_insns
> > Why not just update mips_idiv_insns and add a mode argument?
> 
> Done.
> 
> >
> > > Implement TARGET_PRINT_OPERAND.
> > Comment spacing between 'E' 'B' and description is different to
> > existing
> 
> Updated.
> 
> >
> > > mips_print_operand
> > case 'v' subcases V4SImode and V4SFmode are identical. same for DI/DF.
> 
> Updated.
> 
> >
> > >@@ -12272,13 +12837,25 @@ mips_class_max_nregs (enum reg_class
> > >rclass,
> > machine_mode mode)
> > >   if (hard_reg_set_intersect_p (left, reg_class_contents[(int)
> ST_REGS]))
> > > {
> > >   if (HARD_REGNO_MODE_OK (ST_REG_FIRST, mode))
> > >-  size = MIN (size, 4);
> > >+  {
> > >+if (MSA_SUPPORTED_MODE_P (mode))
> > >+  size = MIN (size, UNITS_PER_MSA_REG);
> > >+else
> > >+  size = MIN (size, UNITS_PER_FPREG);
> > >+  }
> > >+
> >
> > This hunk should be removed. MSA modes are not supported in ST_REGS.
> 
> Indeed.  Removed.
> 
> >
> > >@@ -12299,6 +12876,10 @@ mips_cannot_change_mode_class (machine_mode
> from,
> > >   && INTEGRAL_MODE_P (from) && INTEGRAL_MODE_P (to))
> > > return false;
> > >
> > >+  /* Allow conversions between different MSA vector modes and
> > >+ TImode.  */
> >
> > Remove 'and TImode' we do not support it.
> 
> Done.
> 
> >
> > >@@ -19497,9 +21284,64 @@ mips_expand_vec_unpack (rtx operands[2],
> > >bool
> > unsigned_p, bool high_p)
> > >+if (!unsigned_p)
> > >+{
> > >+  /* Extract sign extention for each element comparing each
> element with
> > >+   immediate zero.  */
> > >+  tmp = gen_reg_rtx (imode);
> > >+  emit_insn (cmpFunc (tmp, operands[1], CONST0_RTX (imode)));
> > >+}
> > >+else
> > >+{
> > >+  tmp = force_reg (imode, CONST0_RTX (imode));
> > >+}
> >
> > Indentation and unnecessary braces on the else.
> 
> Fixed.
> 
> >
> > +   A single N-word move is usually the same cost as N single-word
> moves.
> > +   For MSA, we set MOVE_MAX to 16 bytes.
> > +   Then, MAX_MOVE_MAX is 16 unconditionally.  */ #define MOVE_MAX
> > +(TARGET_MSA ? 16 : UNITS_PER_WORD) #define MAX_MOVE_MAX 16
> >
> > The 16 here should be UNITS_PER_MSA_REG
> >
> 
> The changes have been reverted because of link to MAX_FIXED_MODE_SIZE
> macro causing failures in the by_pieces logic if MOVE_MAX_PIECES is
> larger than MAX_FIXED_MODE_SIZE.
> As it stands, vector modes appear to be handled explicitly in the common
> code so it's unlikely we need to modify any of these.
> If they do then it will be included in the follow up.
> 
> > > mips_expand_builtin_insn
> >
> > General comment about operations that take an immediate. There is code
> > to perform range checking but it does not seem to leave any trail when
> > the maybe_expand_insn fails to tell the user it was an out of range
> > immediate that was the problem. (follow up
> > work)
> 
> Will do.
> 
> >
> > >+case CODE_FOR_msa_andi_b:
> > >+case CODE_FOR_msa_ori_b:
> > >+case CODE_FOR_msa_nori_b:
> > >+case CODE_FOR_msa_xori_b:
> > >+  gcc_assert (has_target_p && nops == 3);
> > >+  if (!CONST_INT_P (ops[2].value))
> > >+  break;
> > >+  ops[2].mode = ops[0].mode;
> > >+  /* We need to convert the unsigned value to signed.  */
> > >+  val = sext_hwi (INTVAL (ops[2].value),
> > >+GET_MODE_UNIT_PRECISION (ops[2].mode));
> > >+  ops[2].value = mips_gen_const_int_vector (ops[2].mode, val);
> > >+  break
> >
> > Isn't the sext_hwi just effectively doing what gen_int_for_mode would?
> > Fixing mips_gen_const_int_vector would eliminate all of them.
> 
> That's correct. I've moved it to mips_gen_cost_int_vector and used
> gen_int_mode.
> 
> >
> > >@@ -527,7 +551,9 @@ (define_attr "insn_count" ""
> > >(const_int 2)
> > >
> > >(eq_attr "type" "idiv,idiv3")
> > >-   (symbol_ref "mips_idiv_insns ()")
> > >+   (cond [(eq_attr "mode" "TI")
> > >+  (symbol_ref "mips_msa_idiv_insns () * 4")]
> > >+  (symbol_ref "mips_idiv_insns () * 4"))
> >
> > Why *4?
> 
> I'm not sure but it appears

[AArch64][3/4] Don't generate redundant checks when there is no composite arg

2016-05-06 Thread Jiong Wang


AArch64 va_arg gimplify hook is generating redundant instructions.

The current va_arg fetch logic is:

1  if (va_arg offset shows the arg is saved at reg_save_area)
2 if ((va_arg_offset + va_arg_type_size) <= 0)
3fetch va_arg from reg_save_area.
4 else
5fetch va_arg from incoming_stack.
6  else
7fetch va_arg from incoming_stack.

The logic hunk "fetch va_arg from incoming_stack" will be generated
*twice*, thus cause redundance.

There is a particular further "if" check at line 2 because for composite
argument, we don't support argument split, so it's either passed
entirely from reg_save_area, or entirely from incoming_stack area.

Thus, we need the further check at A to decide whether the left space at
reg_save_area is enough, if not, then it's passed from incoming_stack.

While this complex logic is only necessary for composite types, not for
others.

this patch thus *let those redundance only generated for composite types*,
while for basic types like "int", "float" etc, we could just simplify it
into:

  if (va_arg_offset < 0)
fetch va_arg from reg_save_area.
  else
fetch va_arg from incoming_stack.

And this simplified version actually is the most usual case.

For example, this patch reduced this instructions number from about 130 to
100 for the included testcase.

ok for trunk?

2016-05-06  Jiong Wang  

gcc/
  * config/aarch64/aarch64.c (aarch64_gimplify_va_arg_expr): Avoid
  duplicated check code.

gcc/testsuite/
  * gcc.target/aarch64/va_arg_4.c: New testcase.
>From b92a4c4b8e52a9a952e91f307836022f667ab403 Mon Sep 17 00:00:00 2001
From: "Jiong.Wang" 
Date: Fri, 6 May 2016 14:37:37 +0100
Subject: [PATCH 3/4] 3

---
 gcc/config/aarch64/aarch64.c| 94 -
 gcc/testsuite/gcc.target/aarch64/va_arg_4.c | 23 +++
 2 files changed, 87 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/va_arg_4.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b1a0287..06904d5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9587,6 +9587,7 @@ aarch64_gimplify_va_arg_expr (tree valist, tree type, gimple_seq *pre_p,
   bool indirect_p;
   bool is_ha;		/* is HFA or HVA.  */
   bool dw_align;	/* double-word align.  */
+  bool composite_type_p;
   machine_mode ag_mode = VOIDmode;
   int nregs;
   machine_mode mode;
@@ -9594,13 +9595,14 @@ aarch64_gimplify_va_arg_expr (tree valist, tree type, gimple_seq *pre_p,
   tree f_stack, f_grtop, f_vrtop, f_groff, f_vroff;
   tree stack, f_top, f_off, off, arg, roundup, on_stack;
   HOST_WIDE_INT size, rsize, adjust, align;
-  tree t, u, cond1, cond2;
+  tree t, t1, u, cond1, cond2;
 
   indirect_p = pass_by_reference (NULL, TYPE_MODE (type), type, false);
   if (indirect_p)
 type = build_pointer_type (type);
 
   mode = TYPE_MODE (type);
+  composite_type_p = aarch64_composite_type_p (type, mode);
 
   f_stack = TYPE_FIELDS (va_list_type_node);
   f_grtop = DECL_CHAIN (f_stack);
@@ -9671,35 +9673,38 @@ aarch64_gimplify_va_arg_expr (tree valist, tree type, gimple_seq *pre_p,
 	  build_int_cst (TREE_TYPE (off), 0));
   cond1 = build3 (COND_EXPR, ptr_type_node, t, NULL_TREE, NULL_TREE);
 
-  if (dw_align)
+  if (composite_type_p)
 {
-  /* Emit: offs = (offs + 15) & -16.  */
-  t = build2 (PLUS_EXPR, TREE_TYPE (off), off,
-		  build_int_cst (TREE_TYPE (off), 15));
-  t = build2 (BIT_AND_EXPR, TREE_TYPE (off), t,
-		  build_int_cst (TREE_TYPE (off), -16));
-  roundup = build2 (MODIFY_EXPR, TREE_TYPE (off), off, t);
-}
-  else
-roundup = NULL;
+  if (dw_align)
+	{
+	  /* Emit: offs = (offs + 15) & -16.  */
+	  t = build2 (PLUS_EXPR, TREE_TYPE (off), off,
+		  build_int_cst (TREE_TYPE (off), 15));
+	  t = build2 (BIT_AND_EXPR, TREE_TYPE (off), t,
+		  build_int_cst (TREE_TYPE (off), -16));
+	  roundup = build2 (MODIFY_EXPR, TREE_TYPE (off), off, t);
+	}
+  else
+	roundup = NULL;
 
-  /* Update ap.__[g|v]r_offs  */
-  t = build2 (PLUS_EXPR, TREE_TYPE (off), off,
-	  build_int_cst (TREE_TYPE (off), rsize));
-  t = build2 (MODIFY_EXPR, TREE_TYPE (f_off), unshare_expr (f_off), t);
+  /* Update ap.__[g|v]r_offs  */
+  t = build2 (PLUS_EXPR, TREE_TYPE (off), off,
+		  build_int_cst (TREE_TYPE (off), rsize));
+  t = build2 (MODIFY_EXPR, TREE_TYPE (f_off), unshare_expr (f_off), t);
 
-  /* String up.  */
-  if (roundup)
-t = build2 (COMPOUND_EXPR, TREE_TYPE (t), roundup, t);
+  /* String up.  */
+  if (roundup)
+	t = build2 (COMPOUND_EXPR, TREE_TYPE (t), roundup, t);
 
-  /* [cond2] if (ap.__[g|v]r_offs > 0)  */
-  u = build2 (GT_EXPR, boolean_type_node, unshare_expr (f_off),
-	  build_int_cst (TREE_TYPE (f_off), 0));
-  cond2 = build3 (COND_EXPR, ptr_type_node, u, NULL_TREE, NULL_TREE);
+  /* [cond2] if (ap.__[g|v]r_offs > 0)  */
+  u = build2 (GT_EXPR, boolean_type_node, unshare_expr (f_off),
+

[AArch64][4/4] Simplify cfg during vaarg gimplification

2016-05-06 Thread Jiong Wang


Based on patch [3/4], we can further optimize the vaarg gimplification
logic, this time not for redundant checks, but for redundant basic
blocks. Thus we can simplify the control graph and eventually generate
less branch instructions.

The current gimplification logic requires three basic blocks:

 // check if we already stepped into stack area
 if (vaarg_offset >= 0)
   {
 // we still in register area, but composite type will not
 // be passed partly in registers and partly on stack, make
 // sure the left register area is not left empty by composite
 // type. if it is, then skip them, and fetch from stack.
 if (vaarg_offset + arg_size > 0)
   fetch from stack
 else
   fetch from register
   }
else
  fetch from register

while we can further optimize the logic into the following to reduce BB
number into two:

if (vaarg_offset < 0 || (vaarg_offset + arg_size > 0))
   fetch from stack
 else
   fetch from register

OK for trunk?

2016-05-06 Alan Lawrence  
   Jiong Wang  

gcc/
  * config/aarch64/aarch64.c (aarch64_gimplify_va_arg_expr): Use
  TRUTH_ORIF_EXPR.

gcc/testsuite/
  * gcc.target/aarch64/va_arg_5.c: New test.

>From d742eaa3469f28e4207034f3fe4ebd4d54b3dd42 Mon Sep 17 00:00:00 2001
From: "Jiong.Wang" 
Date: Fri, 6 May 2016 14:38:00 +0100
Subject: [PATCH 4/4] 4

---
 gcc/config/aarch64/aarch64.c| 53 +
 gcc/testsuite/gcc.target/aarch64/va_arg_5.c | 20 +++
 2 files changed, 58 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/va_arg_5.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 06904d5..bd4a9fe 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9577,7 +9577,32 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
   expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
 }
 
-/* Implement TARGET_GIMPLIFY_VA_ARG_EXPR.  */
+/* Implement TARGET_GIMPLIFY_VA_ARG_EXPR.
+   The VA_ARG gimplify logic was:
+
+ // check if we already stepped into stack area
+ if (vaarg_offset >= 0)
+   {
+	 // we still in register area, but composite type will not
+	 // be passed partly in registers and partly on stack, make
+	 // sure the left register area is not left empty by composite
+	 // type. if it is, then skip them, and fetch from stack.
+	 if (vaarg_offset + arg_size > 0)
+	   fetch from stack
+	 else
+	   fetch from register
+   }
+else
+  fetch from register
+
+   we can further optimize the logic into the following to reduce BB.
+
+ if (vaarg_offset < 0 || (vaarg_offset + arg_size > 0))
+   fetch from stack
+ else
+   fetch from register
+
+   the tree node TRUTH_ORIF_EXPR can express the condition we want.  */
 
 static tree
 aarch64_gimplify_va_arg_expr (tree valist, tree type, gimple_seq *pre_p,
@@ -9595,7 +9620,7 @@ aarch64_gimplify_va_arg_expr (tree valist, tree type, gimple_seq *pre_p,
   tree f_stack, f_grtop, f_vrtop, f_groff, f_vroff;
   tree stack, f_top, f_off, off, arg, roundup, on_stack;
   HOST_WIDE_INT size, rsize, adjust, align;
-  tree t, t1, u, cond1, cond2;
+  tree t, t1, u, cond1, pred1, pred2;
 
   indirect_p = pass_by_reference (NULL, TYPE_MODE (type), type, false);
   if (indirect_p)
@@ -9669,9 +9694,8 @@ aarch64_gimplify_va_arg_expr (tree valist, tree type, gimple_seq *pre_p,
   off = get_initialized_tmp_var (f_off, pre_p, NULL);
 
   /* Emit code to branch if off >= 0.  */
-  t = build2 (GE_EXPR, boolean_type_node, off,
-	  build_int_cst (TREE_TYPE (off), 0));
-  cond1 = build3 (COND_EXPR, ptr_type_node, t, NULL_TREE, NULL_TREE);
+  pred1 = build2 (GE_EXPR, boolean_type_node, off,
+		  build_int_cst (TREE_TYPE (off), 0));
 
   if (composite_type_p)
 {
@@ -9696,16 +9720,16 @@ aarch64_gimplify_va_arg_expr (tree valist, tree type, gimple_seq *pre_p,
   if (roundup)
 	t = build2 (COMPOUND_EXPR, TREE_TYPE (t), roundup, t);
 
-  /* [cond2] if (ap.__[g|v]r_offs > 0)  */
-  u = build2 (GT_EXPR, boolean_type_node, unshare_expr (f_off),
-		  build_int_cst (TREE_TYPE (f_off), 0));
-  cond2 = build3 (COND_EXPR, ptr_type_node, u, NULL_TREE, NULL_TREE);
+  /* [pred2] if (ap.__[g|v]r_offs > 0)  */
+  pred2 = build2 (GT_EXPR, boolean_type_node, unshare_expr (f_off),
+		  build_int_cst (TREE_TYPE (f_off), 0));
+  pred2 = build2 (COMPOUND_EXPR, TREE_TYPE (pred2), t, pred2);
 
-  /* String up: make sure the assignment happens before the use.  */
-  t = build2 (COMPOUND_EXPR, TREE_TYPE (cond2), t, cond2);
-  COND_EXPR_ELSE (cond1) = t;
+  pred1 = build2 (TRUTH_ORIF_EXPR, boolean_type_node, pred1, pred2);
 }
 
+  cond1 = build3 (COND_EXPR, ptr_type_node, pred1, NULL_TREE, NULL_TREE);
+
   /* Prepare the trees handling the argument that is passed on the stack;
  the top level node will store in ON_STACK.  */
   arg =

[AArch64][2/4] PR63596, honor tree-stdarg analysis result to improve VAARG codegen

2016-05-06 Thread Jiong Wang


This patch fixes PR63596.

There is no need to push/pop all arguments registers. We only need to
push and pop those registers used. These use info is calculated by a
dedicated vaarg optimization tree pass "tree-stdarg", the backend should
honor it's analysis result.

For a simple testcase where vaarg declared but actually not used:

int
f (int a, ...)
{
  return a;
}

before this patch, we are generating:

f:
sub sp, sp, #192
stp x1, x2, [sp, 136]
stp x3, x4, [sp, 152]
stp x5, x6, [sp, 168]
str x7, [sp, 184]
str q0, [sp]
str q1, [sp, 16]
str q2, [sp, 32]
str q3, [sp, 48]
str q4, [sp, 64]
str q5, [sp, 80]
str q6, [sp, 96]
str q7, [sp, 112]
add sp, sp, 192
ret

after this patch, it's optimized into:

f:
ret

OK for trunk?

2016-05-06  Jiong Wang  
gcc/
  PR63596
  * config/aarch64/aarch64.c (aarch64_expand_builtin_va_start): Honor
  tree-stdarg analysis results.
  (aarch64_setup_incoming_varargs): Likewise.

gcc/testsuite/
  PR63596
  * gcc.target/aarch64/va_arg_1.c: New testcase.
  * gcc.target/aarch64/va_arg_2.c: Likewise.
  * gcc.target/aarch64/va_arg_3.c: Likewise.

>From dfcfe78511047501ed4b2f323b190c1290314104 Mon Sep 17 00:00:00 2001
From: "Jiong.Wang" 
Date: Fri, 6 May 2016 14:36:42 +0100
Subject: [PATCH 2/4] 2

---
 gcc/config/aarch64/aarch64.c| 35 ++---
 gcc/testsuite/gcc.target/aarch64/va_arg_1.c | 11 +
 gcc/testsuite/gcc.target/aarch64/va_arg_2.c | 18 +++
 gcc/testsuite/gcc.target/aarch64/va_arg_3.c | 26 +
 4 files changed, 77 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/va_arg_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/va_arg_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/va_arg_3.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aff4a95..b1a0287 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9502,15 +9502,17 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
   tree f_stack, f_grtop, f_vrtop, f_groff, f_vroff;
   tree stack, grtop, vrtop, groff, vroff;
   tree t;
-  int gr_save_area_size;
-  int vr_save_area_size;
+  int gr_save_area_size = cfun->va_list_gpr_size;
+  int vr_save_area_size = cfun->va_list_fpr_size;
   int vr_offset;
 
   cum = >args.info;
-  gr_save_area_size
-= (NUM_ARG_REGS - cum->aapcs_ncrn) * UNITS_PER_WORD;
-  vr_save_area_size
-= (NUM_FP_ARG_REGS - cum->aapcs_nvrn) * UNITS_PER_VREG;
+  if (cfun->va_list_gpr_size)
+gr_save_area_size = MIN ((NUM_ARG_REGS - cum->aapcs_ncrn) * UNITS_PER_WORD,
+			 cfun->va_list_gpr_size);
+  if (cfun->va_list_fpr_size)
+vr_save_area_size = MIN ((NUM_FP_ARG_REGS - cum->aapcs_nvrn)
+			 * UNITS_PER_VREG, cfun->va_list_fpr_size);
 
   if (!TARGET_FLOAT)
 {
@@ -9844,7 +9846,8 @@ aarch64_setup_incoming_varargs (cumulative_args_t cum_v, machine_mode mode,
 {
   CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
   CUMULATIVE_ARGS local_cum;
-  int gr_saved, vr_saved;
+  int gr_saved = cfun->va_list_gpr_size;
+  int vr_saved = cfun->va_list_fpr_size;
 
   /* The caller has advanced CUM up to, but not beyond, the last named
  argument.  Advance a local copy of CUM past the last "real" named
@@ -9852,9 +9855,14 @@ aarch64_setup_incoming_varargs (cumulative_args_t cum_v, machine_mode mode,
   local_cum = *cum;
   aarch64_function_arg_advance (pack_cumulative_args(_cum), mode, type, true);
 
-  /* Found out how many registers we need to save.  */
-  gr_saved = NUM_ARG_REGS - local_cum.aapcs_ncrn;
-  vr_saved = NUM_FP_ARG_REGS - local_cum.aapcs_nvrn;
+  /* Found out how many registers we need to save.
+ Honor tree-stdvar analysis results.  */
+  if (cfun->va_list_gpr_size)
+gr_saved = MIN (NUM_ARG_REGS - local_cum.aapcs_ncrn,
+		cfun->va_list_gpr_size / UNITS_PER_WORD);
+  if (cfun->va_list_fpr_size)
+vr_saved = MIN (NUM_FP_ARG_REGS - local_cum.aapcs_nvrn,
+		cfun->va_list_fpr_size / UNITS_PER_VREG);
 
   if (!TARGET_FLOAT)
 {
@@ -9882,7 +9890,7 @@ aarch64_setup_incoming_varargs (cumulative_args_t cum_v, machine_mode mode,
 	  /* We can't use move_block_from_reg, because it will use
 	 the wrong mode, storing D regs only.  */
 	  machine_mode mode = TImode;
-	  int off, i;
+	  int off, i, vr_start;
 
 	  /* Set OFF to the offset from virtual_incoming_args_rtx of
 	 the first vector register.  The VR save area lies below
@@ -9891,14 +9899,15 @@ aarch64_setup_incoming_varargs (cumulative_args_t cum_v, machine_mode mode,
 			   STACK_BOUNDARY / BITS_PER_UNIT);
 	  off -= vr_saved * UNITS_PER_VREG;
 
-	  for (i = local_cum.aapcs_nvrn; i < NUM_FP_ARG_REGS; ++i)
+	  vr_start = V0_REGNUM + local_cum.aapcs_nvrn;
+	  for (i = 0; i < vr_saved;

[AArch64][1/4] Enable tree-stdarg pass for AArch64 by defining counter fields

2016-05-06 Thread Jiong Wang


This patch initialize va_list_gpr_counter_field and
va_list_fpr_counter_field properly for AArch64 backend that tree-stdarg
pass will be enabled.

The "required register" analysis is largely target independent, but the
user might operate on the inner offset field in vaarg structure directly,
for example:

  d = __builtin_va_arg (ap, int);
  ap.__gr_offs += 0x20;
  e = __builtin_va_arg (ap, int);

in which case tree-stdarg require us to tell him what's the backend offset
field inside vaarg structure that it can still figure out we actually need
to save 6 general registers.

ok for upstream?

2016-05-06  Jiong Wang  
gcc/
  * config/aarch64/aarch64.c (aarch64_build_builtin_va_list): Initialize
  va_list_gpr_counter_field and va_list_fpr_counter_field.

gcc/testsuite/
  * gcc.dg/tree-ssa/stdarg-2.c: Enable all testcases for AArch64.
  * gcc.dg/tree-ssa/stdarg-3.c: Likewise.
  * gcc.dg/tree-ssa/stdarg-4.c: Likewise.
  * gcc.dg/tree-ssa/stdarg-5.c: Likewise.
  * gcc.dg/tree-ssa/stdarg-6.c: Likewise.

>From 93485b0163bbaddf7fdf472aac2d3a96823bd63a Mon Sep 17 00:00:00 2001
From: "Jiong.Wang" 
Date: Fri, 6 May 2016 14:36:12 +0100
Subject: [PATCH 1/4] 1

---
 gcc/config/aarch64/aarch64.c |  7 +++
 gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c | 15 +++
 gcc/testsuite/gcc.dg/tree-ssa/stdarg-3.c | 11 +++
 gcc/testsuite/gcc.dg/tree-ssa/stdarg-4.c |  4 
 gcc/testsuite/gcc.dg/tree-ssa/stdarg-5.c |  7 +++
 gcc/testsuite/gcc.dg/tree-ssa/stdarg-6.c |  1 +
 6 files changed, 45 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9995494..aff4a95 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9463,6 +9463,13 @@ aarch64_build_builtin_va_list (void)
 			FIELD_DECL, get_identifier ("__vr_offs"),
 			integer_type_node);
 
+  /* Tell tree-stdarg pass what's our internal offset fields.
+ NOTE: va_list_gpr/fpr_counter_field are only used for tree comparision
+ purpose to identify whether the code is updating va_list internal
+ offset fields through irregular way.  */
+  va_list_gpr_counter_field = f_groff;
+  va_list_fpr_counter_field = f_vroff;
+
   DECL_ARTIFICIAL (f_stack) = 1;
   DECL_ARTIFICIAL (f_grtop) = 1;
   DECL_ARTIFICIAL (f_vrtop) = 1;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c b/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c
index c73294a..0224997 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c
@@ -25,6 +25,7 @@ f1 (int i, ...)
 /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
 /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target alpha*-*-linux* } } } */
 /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
+/* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
 /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
 /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units" "stdarg" { target ia64-*-* } } } */
 /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
@@ -45,6 +46,7 @@ f2 (int i, ...)
 /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save \[148\] GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
 /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 8 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
 /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 1 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
+/* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 8 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
 /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
 /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target ia64-*-* } } } */
 /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
@@ -60,6 +62,7 @@ f3 (int i, ...)
 /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 0 GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
 /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 0 GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } }

[AArch64][0/4] Improve variable argument (vaarg) support

2016-05-06 Thread Jiong Wang


Currently, there are three major issues in AArch64 variable argument
(vaarg) support.

  * tree-stdarg pass is not enabled, thus we are doing uncessary
register pushes/popes.  This is PR63596.

  * va_arg gimplification hook is generating sub-optimal code due to the
runtime boundary check code always consider composite types while we
can make the check code lighter if there is no composite type.

  * Even when there is composite type, we can simplify the cfg generated
during va_arg gimplification to avoid creating unnecessary basic
block.

This patch set fixes above issues.

AArch64 boostrap OK, no regression, new testcases passed.

---
Jiong Wang (4)
  Enable tree-stdarg pass for AArch64 by defining counter fields
  R63596, honor tree-stdarg analysis result to improve VAARG codegen
  Don't generate redundant checks when there is no composite arg
  Simplify cfg during vaarg gimplification

 gcc/config/aarch64/aarch64.c| 165 
 gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c|  15 +++
 gcc/testsuite/gcc.dg/tree-ssa/stdarg-3.c|  11 +++
 gcc/testsuite/gcc.dg/tree-ssa/stdarg-4.c|   4 
 gcc/testsuite/gcc.dg/tree-ssa/stdarg-5.c|   7 +++
 gcc/testsuite/gcc.dg/tree-ssa/stdarg-6.c|   1 +
 gcc/testsuite/gcc.target/aarch64/va_arg_1.c |  11 +++
 gcc/testsuite/gcc.target/aarch64/va_arg_2.c |  18 ++
 gcc/testsuite/gcc.target/aarch64/va_arg_3.c |  26 
++

 gcc/testsuite/gcc.target/aarch64/va_arg_4.c |  23 +++
 gcc/testsuite/gcc.target/aarch64/va_arg_5.c |  20 
 11 files changed, 255 insertions(+), 46 deletions(-)

Re: Fix for PR68159 in Libiberty Demangler (6)

2016-05-06 Thread Jakub Jelinek

On Fri, May 06, 2016 at 10:46:12PM +0800, Marcel Böhme wrote:
>d_print_init (, callback, opaque, dc);
>  
> -  {
> -#ifdef CP_DYNAMIC_ARRAYS
> -__extension__ struct d_saved_scope scopes[dpi.num_saved_scopes];
> -__extension__ struct d_print_template temps[dpi.num_copy_templates];
> +  dpi.copy_templates
> += (struct d_print_template *) malloc (((size_t) dpi.num_copy_templates) 
> +   * sizeof (*dpi.copy_templates));
> +  dpi.saved_scopes
> += (struct d_saved_scope *) malloc (((size_t) dpi.num_saved_scopes) 
> +* sizeof (*dpi.saved_scopes));
> +  
> +  if (! dpi.copy_templates || ! dpi.saved_scopes)
> +{
> +  d_print_error ();
> +  return 0;
> +}

If one malloc succeeds and the other fails, you leak memory.

Jakub

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Jakub Jelinek

On Fri, May 06, 2016 at 04:41:41PM +0200, Martin Liška wrote:
> On 05/06/2016 03:25 PM, Jakub Jelinek wrote:
> > Well, we already have the gimple poisoning/unpoisoning code on RTL (emitted
> > after the prologue and before the epilogue), so it shouldn't be that hard.
> > I'd only do the most common/easy cases inline though, like 1/2/4/8/16/32
> > bytes long variables.
> > 
> > Jakub
> 
> You are right, I didn't realize it earlier.
> As I've collected statistics for tramp3d, poisoning code has following 
> distribution:
> 
> 4:1.62%
> 8:3.53%
> 12:94.76%
> 
> which is quite interesting that 12B are such a common size :)
> Probably due to a lot of time spent in ::evaluate (MultiArgEvaluator and 
> MultiArgEvaluator).
> Considering just variables which needs_to_live_in_memory, tramp3d is still 
> ~15x slower.

Please look at other testcases, not just tramp3d - we in the end don't want
to tune it to just tramp3d.  Pick up some 3-4 C/C++ benchmarks, tramp3d can
be one of them ;)

Jakub

Re: Fix for PR68159 in Libiberty Demangler (6)

2016-05-06 Thread Marcel Böhme

Hi Jakub,

> On 6 May 2016, at 5:51 PM, Jakub Jelinek  wrote:
>> 
> 
> If you just want an array, restricting the size including the sizeof
> to fit into int makes no sense, you want to guard it against overflows
> during the multiplication.
Okay, done. (Someone might want to substitute size_t with unsigned int if the 
former causes any problems).

> Ian or Jason, can all the demangle users allocate heap memory or not?
This question remains.

> But much more importantly, you don't handle the allocation failure in
> anyway, so if malloc fails, you'll just segfault.
It is handled now. No abort. No overflow.

I also checked: Even if num_saved_scopes or num_copy_templates happen to 
overflow in d_count_templates_scopes, that integer overflow won’t lead to a 
buffer overflow because of the checks (and only their uses) in lines 4292 and 
4307.
4292:  if (dpi->next_saved_scope >= dpi->num_saved_scopes)
4293: {
4294:   d_print_error (dpi);
4295:   return;
4296: }

4307:  if (dpi->next_saved_scope >= dpi->num_saved_scopes)
4307:{
4307:  d_print_error (dpi);
4307:  return;
4307:}

As for your previous email:
> On 6 May 2016, at 3:09 PM, Jakub Jelinek  wrote:
> 
> Furthermore, if I apply your patch and rebuild libstdc++, it stops working
> altogether:
> ldd -d -r ./libstdc++.so.6.0.22 
>   linux-vdso.so.1 (0x7ffe6053c000)
>   libm.so.6 => /lib64/libm.so.6 (0x7f9de23fb000)
>   libc.so.6 => /lib64/libc.so.6 (0x7f9de203a000)
>   /lib64/ld-linux-x86-64.so.2 (0x5585ecc1d000)
>   libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f9de1e22000)
> undefined symbol: xmalloc (./libstdc++.so.6.0.22)
> undefined symbol: xmalloc_failed  (./libstdc++.so.6.0.22)

Hmm. Just checked, xmalloc should be available through libiberty.h which is 
imported by cp-demangle.c. 
That earlier patch was successfully bootstrapped and regression tested on on 
x86_64-pc-linux-gnu from the sources in trunk.

BTW: If I configure libstdc++-v3 directly, I receive an error message:
...
./config.status: creating include/Makefile
./config.status: line 2950: ./../../config-ml.in: No such file or directory 
  

Best regards,
- Marcel


Index: libiberty/ChangeLog
===
--- libiberty/ChangeLog (revision 235962)
+++ libiberty/ChangeLog (working copy)
@@ -1,3 +1,14 @@
+2016-05-06  Marcel Böhme  
+
+   PR c++/68159
+   * cp-demangle.c: Allocate arrays of user-defined size on the heap,
+   not on the stack. Do not include .
+   (CP_DYNAMIC_ARRAYS): Remove definition.
+   (cplus_demangle_print_callback): Allocate memory for two arrays on
+   the heap. Free memory before return / exit.
+   (d_demangle_callback): Likewise.
+   (is_ctor_or_dtor): Likewise. 
+
 2016-05-02  Marcel Böhme  
 
PR c++/70498
Index: libiberty/cp-demangle.c
===
--- libiberty/cp-demangle.c (revision 235962)
+++ libiberty/cp-demangle.c (working copy)
@@ -116,18 +116,6 @@
 #include 
 #endif
 
-#ifdef HAVE_ALLOCA_H
-# include 
-#else
-# ifndef alloca
-#  ifdef __GNUC__
-#   define alloca __builtin_alloca
-#  else
-extern char *alloca ();
-#  endif /* __GNUC__ */
-# endif /* alloca */
-#endif /* HAVE_ALLOCA_H */
-
 #ifdef HAVE_LIMITS_H
 #include 
 #endif
@@ -186,20 +174,6 @@ static void d_init_info (const char *, int, size_t
 #define CP_STATIC_IF_GLIBCPP_V3
 #endif /* ! defined(IN_GLIBCPP_V3) */
 
-/* See if the compiler supports dynamic arrays.  */
-
-#ifdef __GNUC__
-#define CP_DYNAMIC_ARRAYS
-#else
-#ifdef __STDC__
-#ifdef __STDC_VERSION__
-#if __STDC_VERSION__ >= 199901L
-#define CP_DYNAMIC_ARRAYS
-#endif /* __STDC__VERSION >= 199901L */
-#endif /* defined (__STDC_VERSION__) */
-#endif /* defined (__STDC__) */
-#endif /* ! defined (__GNUC__) */
-
 /* We avoid pulling in the ctype tables, to prevent pulling in
additional unresolved symbols when this code is used in a library.
FIXME: Is this really a valid reason?  This comes from the original
@@ -4126,25 +4100,26 @@ cplus_demangle_print_callback (int options,
 
   d_print_init (, callback, opaque, dc);
 
-  {
-#ifdef CP_DYNAMIC_ARRAYS
-__extension__ struct d_saved_scope scopes[dpi.num_saved_scopes];
-__extension__ struct d_print_template temps[dpi.num_copy_templates];
+  dpi.copy_templates
+= (struct d_print_template *) malloc (((size_t) dpi.num_copy_templates) 
+ * sizeof (*dpi.copy_templates));
+  dpi.saved_scopes
+= (struct d_saved_scope *) malloc (((size_t) dpi.num_saved_scopes) 
+  * sizeof (*dpi.saved_scopes));
+  
+  if (! dpi.copy_templates || ! dpi.saved_scopes)
+{
+  d_print_error ();
+  return 0;
+}
 
-dpi.saved_scopes = scopes;
-dpi.copy_templates

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Martin Liška

On 05/06/2016 03:25 PM, Jakub Jelinek wrote:
> Well, we already have the gimple poisoning/unpoisoning code on RTL (emitted
> after the prologue and before the epilogue), so it shouldn't be that hard.
> I'd only do the most common/easy cases inline though, like 1/2/4/8/16/32
> bytes long variables.
> 
>   Jakub

You are right, I didn't realize it earlier.
As I've collected statistics for tramp3d, poisoning code has following 
distribution:

4:1.62%
8:3.53%
12:94.76%

which is quite interesting that 12B are such a common size :)
Probably due to a lot of time spent in ::evaluate (MultiArgEvaluator and 
MultiArgEvaluator).
Considering just variables which needs_to_live_in_memory, tramp3d is still ~15x 
slower.

Anyway profile report tells:
26.51%  a.outlibasan.so.3.0.0  [.] __asan::PoisonShadow
18.49%  a.outlibasan.so.3.0.0  [.] PoisonAlignedStackMemory
 5.61%  a.outlibc-2.22.so  [.] __memset_avx2
 5.41%  a.outa.out [.] 
MultiArgEvaluator::evaluate
 3.56%  a.outlibasan.so.3.0.0  [.] __asan_unpoison_stack_memory
 2.69%  a.outlibasan.so.3.0.0  [.] __asan_poison_stack_memory

I'll continue working on that after weekend.

Martin

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Jakub Jelinek

On Fri, May 06, 2016 at 05:22:46PM +0300, Yury Gribov wrote:
> On 05/06/2016 03:38 PM, Jakub Jelinek wrote:
> >On Fri, May 06, 2016 at 02:48:30PM +0300, Yury Gribov wrote:
> >>>6) As the use-after-scope stuff is already included in libsanitizer, no 
> >>>change is needed for the library
> >>
> >>Note that upstream seems to use a different cmdline interface. They don't
> >>have a dedicated -fsanitize=use-after-scope and instead consider it to be a
> >>part of -fsanitize=address (disabled by default, enabled via -mllvm
> >>-asan-use-after-scope=1). I'd suggest to keep this interface (or at least
> >>discuss with them) and use GCC's --param.
> >
> >I personally think -fsanitize=use-after-scope (which implies address
> >sanitization in it) is better, can upstream be convinved not to change it?
> 
> Will that work with -fsanitize=kernel-address?

Depends on how exactly it is defined.  It could be enabling just its own
sanitizer bit and nothing else, then users would need to use
-fsanitize=address,use-after-scope
or
-fsanitize=kernel-address,use-after-scope
(order doesn't matter), or it could enable the SANITIZE_ADDRESS
bit together with its own, and then we'd just post-option processing
(where we e.g. reject address,kernel-address) default to
SANITIZE_USER_ADDRESS if SANITIZE_ADDRESS is on together with
SANITIZE_USE_AFTER_SCOPE, but neither SANITIZE_{USER,KERNEL}_ADDRESS
is defined.
-fsanitize=address -fno-sanitize=use-after-scope
obviously shouldn't in any case disable SANITIZE_ADDRESS, similarly
-fsanitize=kernel-address -fno-sanitize=use-after-scope

Jakub

[PATCH, i386]: Consolidate and remove unused register_and_not_{,any_}fp_reg_operand predicates

2016-05-06 Thread Uros Bizjak

Hello!

2016-05-06  Uros Bizjak  

* config/i386/i386.md (int cmove peephole2s): Use general_reg_operand
instead of register_and_not_any_fp_reg_operand as operand 0 predicate.
* config/i386/predicates.md (register_and_not_any_fp_reg_operand):
Remove unused predicate.
(register_and_not_fp_reg_operand): Ditto.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: i386.md
===
--- i386.md (revision 235936)
+++ i386.md (working copy)
@@ -17211,7 +17211,7 @@
(set_attr "mode" "DF,DF,DI,DI,DI,DI")])
 
 (define_split
-  [(set (match_operand:DF 0 "register_and_not_any_fp_reg_operand")
+  [(set (match_operand:DF 0 "general_reg_operand")
(if_then_else:DF (match_operator 1 "fcmov_comparison_operator"
[(reg FLAGS_REG) (const_int 0)])
  (match_operand:DF 2 "nonimmediate_operand")
@@ -17267,7 +17267,7 @@
 ;; Don't do conditional moves with memory inputs
 (define_peephole2
   [(match_scratch:MODEF 4 "r")
-   (set (match_operand:MODEF 0 "register_and_not_any_fp_reg_operand")
+   (set (match_operand:MODEF 0 "general_reg_operand")
(if_then_else:MODEF (match_operator 1 "fcmov_comparison_operator"
  [(reg FLAGS_REG) (const_int 0)])
  (match_operand:MODEF 2 "nonimmediate_operand")
Index: predicates.md
===
--- predicates.md   (revision 235932)
+++ predicates.md   (working copy)
@@ -27,11 +27,6 @@
   (and (match_code "reg")
(match_test "STACK_REGNO_P (REGNO (op))")))
 
-;; Return true if OP is a non-fp register_operand.
-(define_predicate "register_and_not_any_fp_reg_operand"
-  (and (match_code "reg")
-   (not (match_test "ANY_FP_REGNO_P (REGNO (op))"
-
 ;; True if the operand is a GENERAL class register.
 (define_predicate "general_reg_operand"
   (and (match_code "reg")
@@ -43,11 +38,6 @@
 (match_test "GENERAL_REGNO_P (REGNO (op))")
 (match_operand 0 "nonimmediate_operand")))
 
-;; Return true if OP is a register operand other than an i387 fp register.
-(define_predicate "register_and_not_fp_reg_operand"
-  (and (match_code "reg")
-   (not (match_test "STACK_REGNO_P (REGNO (op))"
-
 ;; True if the operand is an MMX register.
 (define_predicate "mmx_reg_operand"
   (and (match_code "reg")

[PATCH, rs6000] Add support for int versions of vec_addec

2016-05-06 Thread Bill Seurer

This patch adds support for the signed and unsigned int versions of the
vec_addec altivec builtins from the Power Architecture 64-Bit ELF V2 ABI
OpenPOWER ABI for Linux Supplement (16 July 2015 Version 1.1). There are
many of the builtins that are missing and this is part of a series
of patches to add them.

There aren't instructions for the int versions of vec_addec so the
output code is built from other built-ins that do have instructions
which in this case is the following.

vec_addec (va, vb, carryv) == vec_or (vec_addc (va, vb),
vec_addc(vec_add(va, vb),
 vec_and (carryv, 0x1)))

The new test cases are executable tests which verify that the generated
code produces expected values. C macros were used so that the same
test case could be used for both the signed and unsigned versions. An
extra executable test case is also included to ensure that the modified
support for the __int128 versions of vec_addec is not broken. The same
test case could not be used for both int and __int128 because of some
differences in loading and storing the vectors.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu and
powerpc64-unknown-linux-gnu with no regressions. Is this ok for trunk?

[gcc]

2016-05-06  Bill Seurer  

* config/rs6000/rs6000-builtin.def (vec_addec): Change vec_addec to a
special case builtin.
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Add
support for ALTIVEC_BUILTIN_VEC_ADDEC.
* config/rs6000/rs6000.c (altivec_init_builtins): Add definition
for __builtin_vec_addec.

[gcc/testsuite]

2016-05-06  Bill Seurer  

* gcc.target/powerpc/vec-addec.c: New test.
* gcc.target/powerpc/vec-addec-int128.c: New test.

Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 235962)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -951,7 +951,6 @@ BU_ALTIVEC_X (VEC_EXT_V4SF, "vec_ext_v4sf", CO
before we get to the point about classifying the builtin type.  */
 
 /* 3 argument Altivec overloaded builtins.  */
-BU_ALTIVEC_OVERLOAD_3 (ADDEC, "addec")
 BU_ALTIVEC_OVERLOAD_3 (MADD,   "madd")
 BU_ALTIVEC_OVERLOAD_3 (MADDS,  "madds")
 BU_ALTIVEC_OVERLOAD_3 (MLADD,  "mladd")
@@ -1137,6 +1136,7 @@ BU_ALTIVEC_OVERLOAD_P (VCMPGE_P,   "vcmpge_p")
 
 /* Overloaded Altivec builtins that are handled as special cases.  */
 BU_ALTIVEC_OVERLOAD_X (ADDE,  "adde")
+BU_ALTIVEC_OVERLOAD_X (ADDEC, "addec")
 BU_ALTIVEC_OVERLOAD_X (CTF,   "ctf")
 BU_ALTIVEC_OVERLOAD_X (CTS,   "cts")
 BU_ALTIVEC_OVERLOAD_X (CTU,   "ctu")
Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 235962)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -4661,6 +4661,79 @@ assignment for unaligned loads and stores");
}
 }
 
+  if (fcode == ALTIVEC_BUILTIN_VEC_ADDEC)
+{
+  /* vec_addec needs to be special cased because there is no instruction
+   for the {un}signed int version.  */
+  if (nargs != 3)
+   {
+ error ("vec_addec only accepts 3 arguments");
+ return error_mark_node;
+   }
+
+  tree arg0 = (*arglist)[0];
+  tree arg0_type = TREE_TYPE (arg0);
+  tree arg1 = (*arglist)[1];
+  tree arg1_type = TREE_TYPE (arg1);
+  tree arg2 = (*arglist)[2];
+  tree arg2_type = TREE_TYPE (arg2);
+
+  /* All 3 arguments must be vectors of (signed or unsigned) (int or
+   __int128) and the types must match.  */
+  if ((arg0_type != arg1_type) || (arg1_type != arg2_type))
+   goto bad; 
+  if (TREE_CODE (arg0_type) != VECTOR_TYPE)
+   goto bad; 
+
+  switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+   {
+ /* For {un}signed ints, 
+ vec_addec (va, vb, carryv) == vec_or (vec_addc (va, vb),
+   vec_addc(vec_add(va, vb),
+vec_and (carryv, 0x1))).  */
+ case SImode:
+   {
+   /* Use save_expr to ensure that operands used more than once
+   that may have side effects (like calls) are only evaluated
+   once.  */
+   arg0 = save_expr(arg0);
+   arg1 = save_expr(arg1);
+   vec *params = make_tree_vector();
+   vec_safe_push (params, arg0);
+   vec_safe_push (params, arg1);
+   tree call1 = altivec_resolve_overloaded_builtin
+   (loc, rs6000_builtin_decls[ALTIVEC_BUILTIN_VEC_ADDC], params);
+   params = make_tree_vector();
+   vec_safe_push (params, arg0);
+   vec_safe_push (params, arg1);
+   tree call2 = altivec_resolve_overloaded_builtin
+

Re: [PATCH, ARM] use vmov.i64 to load 0 into FP reg if neon enabled

2016-05-06 Thread Kyrill Tkachov


Hi Jim,

On 05/05/16 22:37, Jim Wilson wrote:

For this simple testcase

double
sub (void)
{
   return 0.0;
}

Without the attached patch, an ARM compiler with neon support enabled, gives
  vldr.64 d0, .L2
With the attached patch, an ARM compiler with neon enabled, gives
  vmov.i64 d0, #0@ float
which is faster and smaller, as there is no load from a constant pool entry.

There are a few ways to implement this.  I added a neon enabled
attribute.  Another way to do this would be a new constraint, like Dg,
that tests for both neon and 0.


Good idea.


I don't see any mention of targets that only support single-float in
the ARM ARM, so it isn't obvious how to handle that.  I see no targets
that support both neon and single-float, but maybe I need to check for
that anyways?

I don't think we have any.

I think adding a gcc_assert (TARGET_VFP_DOUBLE); to the
alternative you're adding would be the way to go.
We already have case 2 in the *movdf_vfp pattern that does that.


Most of the patch involves renumbering constraints and matching
attributes.  The new alternative w/G must come before w/UvF or else we
still get a constant pool reference.  Otherwise the patch is pretty
small and simple.

We can do the same thing in the movdi pattern.  I haven't tried
writing that yet.

This patch was tested with a bootstrap and make check in an armhf
schroot on an xgene box.  There were no regressions.


Since you're modifying the both the ARM and Thumb2 pattern
can you please do two bootstrap and tests, one with --with-mode=arm
and one with --with-mode=thumb.


OK to check in?


Ok after adding the assert mentioned above, the arm/thumb testing and fixing
a minor nit below...


@@ -410,16 +410,18 @@
   case 2:
gcc_assert (TARGET_VFP_DOUBLE);
 return \"vmov%?.f64\\t%P0, %1\";
-  case 3: case 4:
+  case 3:
+   return \"vmov.i64\\t%P0, #0@ float\";
+  case 4: case 5:


Please add a tab before the "@float" comment i.e. "\\t%@ float".

Thanks for working on this,
Kyrill

Re: [PING][PATCH] New plugin event when evaluating a constexpr call

2016-05-06 Thread Andres Tiraboschi

Hi
 I made the corrections to the patch.

Changelog 2016-5-6  Andres Tiraboschi

*gcc/plugin.c (PLUGIN_EVAL_CALL_CONSTEXPR): New event.
*gcc/plugin.def (PLUGIN_EVAL_CALL_CONSTEXPR): New event.
*gcc/cp/constexpr.c (constexpr_fundef): Moved to gcc/cp/constexpr.h.
*gcc/cp/constexpr.c (constexpr_call): Ditto.
*gcc/cp/constexpr.c (constexpr_ctx): Ditto.
*gcc/cp/constexpr.c (eval_call_pugin_callback): New Function.
*gcc/cp/constexpr.c (cxx_eval_constant_expression): Added a call
to eval_call_pugin_callback.
*gcc/cp/constexpr.c (cxx_eval_constant_expression): Not static anymore.
*gcc/cp/constexpr.c (cxx_bind_parameters_in_call): Ditto.
*gcc/cp/constexpr.h: New file.
*gcc/cp/constexpr.h (constexpr_call_info): New Type.
*gcc/cp/constexpr.h (constexpr_fundef): Moved type from gcc/cp/constexpr.c.
*gcc/cp/constexpr.h (constexpr_call): Ditto.
*gcc/cp/constexpr.h (constexpr_ctx): Ditto.
*gcc/cp/constexpr.h (cxx_eval_constant_expression): Declared.
*gcc/cp/constexpr.h (cxx_bind_parameters_in_call): Declared
*gcc/cp/config-lang.in (gtfiles): Added \$(srcdir)/cp/constexpr.h
*gcc/cp/Make-lang.in (CP_PLUGIN_HEADERS): Added constexpr.h.

2016-05-05 10:29 GMT-03:00 Andres Tiraboschi
:
> Hi,
> thanks for the feedback, I'll do the changes.
>
> 2016-05-04 13:16 GMT-03:00 Jason Merrill :
>> On 05/02/2016 03:28 PM, Andres Tiraboschi wrote:
>>>
>>> +  constexpr_call_info call_info;
>>> +  call_info.function = t;
>>> +  call_info.call_stack = call_stack;
>>> +  call_info.ctx = ctx;
>>> +  call_info.lval_p = lval;
>>> +  call_info.non_constant_p = non_constant_p;
>>> +  call_info.overflow_p = overflow_p;
>>> +  call_info.result = NULL_TREE;
>>> +
>>> +  invoke_plugin_callbacks (PLUGIN_EVAL_CALL_CONSTEXPR, _info);
>>
>>
>> Let's move this into a separate function so that it doesn't increase the
>> stack footprint of cxx_eval_call_expression.
>>
>> Jason
>>
diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
index 625a77c..025ebc1 100644
--- a/gcc/cp/Make-lang.in
+++ b/gcc/cp/Make-lang.in
@@ -39,7 +39,7 @@ CXX_INSTALL_NAME := $(shell echo c++|sed 
'$(program_transform_name)')
 GXX_INSTALL_NAME := $(shell echo g++|sed '$(program_transform_name)')
 CXX_TARGET_INSTALL_NAME := $(target_noncanonical)-$(shell echo c++|sed 
'$(program_transform_name)')
 GXX_TARGET_INSTALL_NAME := $(target_noncanonical)-$(shell echo g++|sed 
'$(program_transform_name)')
-CP_PLUGIN_HEADERS := cp-tree.h cxx-pretty-print.h name-lookup.h type-utils.h
+CP_PLUGIN_HEADERS := cp-tree.h cxx-pretty-print.h name-lookup.h type-utils.h 
constexpr.h
 
 #
 # Define the names for selecting c++ in LANGUAGES.
diff --git a/gcc/cp/config-lang.in b/gcc/cp/config-lang.in
index 276fc1d..2ca4d03 100644
--- a/gcc/cp/config-lang.in
+++ b/gcc/cp/config-lang.in
@@ -29,4 +29,4 @@ compilers="cc1plus\$(exeext)"
 
 target_libs="target-libstdc++-v3"
 
-gtfiles="\$(srcdir)/cp/rtti.c \$(srcdir)/cp/mangle.c 
\$(srcdir)/cp/name-lookup.h \$(srcdir)/cp/name-lookup.c \$(srcdir)/cp/cp-tree.h 
\$(srcdir)/cp/decl.h \$(srcdir)/cp/call.c \$(srcdir)/cp/decl.c 
\$(srcdir)/cp/decl2.c \$(srcdir)/cp/pt.c \$(srcdir)/cp/repo.c 
\$(srcdir)/cp/semantics.c \$(srcdir)/cp/tree.c \$(srcdir)/cp/parser.h 
\$(srcdir)/cp/parser.c \$(srcdir)/cp/method.c \$(srcdir)/cp/typeck2.c 
\$(srcdir)/c-family/c-common.c \$(srcdir)/c-family/c-common.h 
\$(srcdir)/c-family/c-objc.h \$(srcdir)/c-family/c-lex.c 
\$(srcdir)/c-family/c-pragma.h \$(srcdir)/c-family/c-pragma.c 
\$(srcdir)/cp/class.c \$(srcdir)/cp/cp-objcp-common.c \$(srcdir)/cp/cp-lang.c 
\$(srcdir)/cp/except.c \$(srcdir)/cp/vtable-class-hierarchy.c 
\$(srcdir)/cp/constexpr.c \$(srcdir)/cp/cp-gimplify.c"
+gtfiles="\$(srcdir)/cp/rtti.c \$(srcdir)/cp/mangle.c 
\$(srcdir)/cp/name-lookup.h \$(srcdir)/cp/name-lookup.c \$(srcdir)/cp/cp-tree.h 
\$(srcdir)/cp/decl.h \$(srcdir)/cp/call.c \$(srcdir)/cp/decl.c 
\$(srcdir)/cp/decl2.c \$(srcdir)/cp/pt.c \$(srcdir)/cp/repo.c 
\$(srcdir)/cp/semantics.c \$(srcdir)/cp/tree.c \$(srcdir)/cp/parser.h 
\$(srcdir)/cp/parser.c \$(srcdir)/cp/method.c \$(srcdir)/cp/typeck2.c 
\$(srcdir)/c-family/c-common.c \$(srcdir)/c-family/c-common.h 
\$(srcdir)/c-family/c-objc.h \$(srcdir)/c-family/c-lex.c 
\$(srcdir)/c-family/c-pragma.h \$(srcdir)/c-family/c-pragma.c 
\$(srcdir)/cp/class.c \$(srcdir)/cp/cp-objcp-common.c \$(srcdir)/cp/cp-lang.c 
\$(srcdir)/cp/except.c \$(srcdir)/cp/vtable-class-hierarchy.c 
\$(srcdir)/cp/constexpr.h \$(srcdir)/cp/constexpr.c \$(srcdir)/cp/cp-gimplify.c"
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 6054d1a..7c50b06 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -31,6 +31,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "tree-inline.h"
 #include "ubsan.h"
+#include "constexpr.h"
+#include "plugin-api.h"
+#include "plugin.h"
 
 static bool verify_constant (tree, bool, bool *, bool *);

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Yury Gribov


On 05/06/2016 03:38 PM, Jakub Jelinek wrote:

On Fri, May 06, 2016 at 02:48:30PM +0300, Yury Gribov wrote:

6) As the use-after-scope stuff is already included in libsanitizer, no change 
is needed for the library


Note that upstream seems to use a different cmdline interface. They don't
have a dedicated -fsanitize=use-after-scope and instead consider it to be a
part of -fsanitize=address (disabled by default, enabled via -mllvm
-asan-use-after-scope=1). I'd suggest to keep this interface (or at least
discuss with them) and use GCC's --param.


I personally think -fsanitize=use-after-scope (which implies address
sanitization in it) is better, can upstream be convinved not to change it?


Will that work with -fsanitize=kernel-address?




FTR here's the upstream work on this: http://reviews.llvm.org/D19347


Example:

int
main (void)
{
   char *ptr;
   {
 char my_char[9];
 ptr = _char[0];
   }

   *(ptr+9) = 'c';
}


Well, this testcase shows not just use after scope, but also out of bound
access.  Would be better not to combine it, at least in the majority of
testcases.

Jakub

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Jakub Jelinek

On Fri, May 06, 2016 at 03:17:23PM +0200, Martin Liška wrote:
> On 05/06/2016 01:48 PM, Yury Gribov wrote:
> > On 05/06/2016 02:04 PM, Martin Liška wrote:
> >> I've started working on the patch couple of month go, basically after
> >> a brief discussion with Jakub on IRC.
> >>
> >> I'm sending the initial version which can successfully run instrumented
> >> tramp3d, postgresql server and Inkscape. It catches the basic set of
> >> examples which are added in following patch.
> >>
> >> The implementation is quite straightforward as works in following steps:
> >>
> >> 1) Every local variable stack slot is poisoned at the very beginning of a 
> >> function (RTL emission)
> >> 2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
> >> emitting ASAN_MARK builtin)
> >> and the variable is marked as addressable
> >> 3) Similarly, BIND_EXPR is the place where we poison the variable (scope 
> >> exit)
> >> 4) At the very end of the function, we clean up the poisoned memory
> >> 5) The builtins are expanded to call to libsanitizer run-time library 
> >> (__asan_poison_stack_memory, __asan_unpoison_stack_memory)
> > 
> > Can we inline these?
> 
> Currently not as libasan is a shared library that an instrumented executable 
> is linked with.
> Possible solution would be to directly emit gimple instruction that would 
> poison/unpoison the memory.
> But it's not a trivial job which is done in the poisoning code (ALWAYS_INLINE 
> void FastPoisonShadow(uptr aligned_beg, uptr aligned_size, u8 value)

Well, we already have the gimple poisoning/unpoisoning code on RTL (emitted
after the prologue and before the epilogue), so it shouldn't be that hard.
I'd only do the most common/easy cases inline though, like 1/2/4/8/16/32
bytes long variables.

Jakub

Re: [PATCH v2] gcov: Runtime configurable destination output

2016-05-06 Thread Nathan Sidwell


On 02/24/16 16:52, Aaron Conole wrote:

The previous gcov behavior was to always output errors on the stderr channel.
This is fine for most uses, but some programs will require stderr to be
untouched by libgcov for certain tests. This change allows configuring
the gcov output via an environment variable which will be used to open
the appropriate file.


this is ok in principle.  I have a couple of questions & nits below though.

I don't see a previous commit from you -- do you have a copyright assignment 
with the FSF? (although this patch is simple, my guess is the idea it implements 
is sufficiently novel to need one).  We can handle that off list.




diff --git a/libgcc/libgcov-driver-system.c b/libgcc/libgcov-driver-system.c
index 4e3b244..0eb9755 100644
--- a/libgcc/libgcov-driver-system.c
+++ b/libgcc/libgcov-driver-system.c
@@ -23,6 +23,24 @@ a copy of the GCC Runtime Library Exception along with this 
program;
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */

+FILE *__gcov_error_file = NULL;


Unless I'm missing something, isn't this only accessed from this file? (So could 
 be static with a non-underbarred name)




@@ -30,12 +48,27 @@ gcov_error (const char *fmt, ...)
 {
   int ret;
   va_list argp;
+
+  if (!__gcov_error_file)
+__gcov_error_file = get_gcov_error_file();


Needs space before ()


+
   va_start (argp, fmt);
-  ret = vfprintf (stderr, fmt, argp);
+  ret = vfprintf (__gcov_error_file, fmt, argp);
   va_end (argp);
   return ret;
 }

+#if !IN_GCOV_TOOL


And this protection here, makes me wonder what happens if one is IN_GCOV_TOOL. 
Does it pay attention to GCOV_ERROR_FILE?  That would seem incorrect, and thus 
the above should be changed so that stderr is unconditionally used when 
IN_GCOV_TOOL?



+static void
+gcov_error_exit(void)
+{
+  if (__gcov_error_file && __gcov_error_file != stderr)
+{


Braces are not needed here.



--- a/libgcc/libgcov-driver.c
+++ b/libgcc/libgcov-driver.c
@@ -46,6 +46,10 @@ void __gcov_init (struct gcov_info *p __attribute__ 
((unused))) {}



+  gcov_error_exit();


Needs space before ().

nathan

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Martin Liška

On 05/06/2016 01:48 PM, Yury Gribov wrote:
> On 05/06/2016 02:04 PM, Martin Liška wrote:
>> Hello.
>>
>> I've started working on the patch couple of month go, basically after
>> a brief discussion with Jakub on IRC.
>>
>> I'm sending the initial version which can successfully run instrumented
>> tramp3d, postgresql server and Inkscape. It catches the basic set of
>> examples which are added in following patch.
>>
>> The implementation is quite straightforward as works in following steps:
>>
>> 1) Every local variable stack slot is poisoned at the very beginning of a 
>> function (RTL emission)
>> 2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
>> emitting ASAN_MARK builtin)
>> and the variable is marked as addressable
>> 3) Similarly, BIND_EXPR is the place where we poison the variable (scope 
>> exit)
>> 4) At the very end of the function, we clean up the poisoned memory
>> 5) The builtins are expanded to call to libsanitizer run-time library 
>> (__asan_poison_stack_memory, __asan_unpoison_stack_memory)
> 
> Can we inline these?

Currently not as libasan is a shared library that an instrumented executable is 
linked with.
Possible solution would be to directly emit gimple instruction that would 
poison/unpoison the memory.
But it's not a trivial job which is done in the poisoning code (ALWAYS_INLINE 
void FastPoisonShadow(uptr aligned_beg, uptr aligned_size, u8 value)

> 
>> 6) As the use-after-scope stuff is already included in libsanitizer, no 
>> change is needed for the library
> 
> Note that upstream seems to use a different cmdline interface. They don't 
> have a dedicated -fsanitize=use-after-scope and instead consider it to be a 
> part of -fsanitize=address (disabled by default, enabled via -mllvm 
> -asan-use-after-scope=1). I'd suggest to keep this interface (or at least 
> discuss with them) and use GCC's --param.
> 
> FTR here's the upstream work on this: http://reviews.llvm.org/D19347

Thanks for the link, I will adapt part of the test to our test-suite.
Some of them are really interesting.

Martin

> 
>> Example:
>>
>> int
>> main (void)
>> {
>>char *ptr;
>>{
>>  char my_char[9];
>>  ptr = _char[0];
>>}
>>
>>*(ptr+9) = 'c';
>> }
>>
>> ./a.out
>> =
>> ==12811==ERROR: AddressSanitizer: stack-use-after-scope on address 
>> 0x7ffec9bcff69 at pc 0x00400a73 bp 0x7ffec9bcfef0 sp 0x7ffec9bcfee8
>> WRITE of size 1 at 0x7ffec9bcff69 thread T0
>>  #0 0x400a72 in main (/tmp/a.out+0x400a72)
>>  #1 0x7f100824860f in __libc_start_main (/lib64/libc.so.6+0x2060f)
>>  #2 0x400868 in _start (/tmp/a.out+0x400868)
>>
>> Address 0x7ffec9bcff69 is located in stack of thread T0 at offset 105 in 
>> frame
>>  #0 0x400945 in main (/tmp/a.out+0x400945)
>>
>>This frame has 2 object(s):
>>  [32, 40) 'ptr'
>>  [96, 105) 'my_char' <== Memory access at offset 105 overflows this 
>> variable
>> HINT: this may be a false positive if your program uses some custom stack 
>> unwind mechanism or swapcontext
>>(longjmp and C++ exceptions *are* supported)
>> SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x400a72) in 
>> main
>> Shadow bytes around the buggy address:
>>0x100059371f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059371fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059371fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059371fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059371fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> =>0x100059371fe0: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 f8[f8]f4 f4
>>0x100059371ff0: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059372000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059372010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059372020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>0x100059372030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> Shadow byte legend (one shadow byte represents 8 application bytes):
>>Addressable:   00
>>Partially addressable: 01 02 03 04 05 06 07
>>Heap left redzone:   fa
>>Heap right redzone:  fb
>>Freed heap region:   fd
>>Stack left redzone:  f1
>>Stack mid redzone:   f2
>>Stack right redzone: f3
>>Stack partial redzone:   f4
>>Stack after return:  f5
>>Stack use after scope:   f8
>>Global redzone:  f9
>>Global init order:   f6
>>Poisoned by user:f7
>>Container overflow:  fc
>>Array cookie:ac
>>Intra object redzone:bb
>>ASan internal:   fe
>>Left alloca redzone: ca
>>Right alloca redzone:cb
>> ==12811==ABORTING
>>
>> As mentioned, it's request for comment as it still has couple of limitations:
>> a) VLA are not supported, which should make sense as we are unable to 
>> allocate a

Re: [PATCH] Make basic asm implicitly clobber memory

2016-05-06 Thread Bernd Edlinger

On 05/06/16 08:35, David Wohlferd wrote:
> On 5/5/2016 10:29 AM, Bernd Edlinger wrote:
>> Hi!
>>
>> this patch is inspired by recent discussion about basic asm:
>>
>> Currently a basic asm is an instruction scheduling barrier,
>> but not a memory barrier, and most surprising, basic asm
>> does _not_ implicitly clobber CC on targets where
>> extended asm always implicitly clobbers CC, even if
>> nothing is in the clobber section.
>>
>> This patch makes basic asm implicitly clobber CC on certain
>> targets, and makes the basic asm implicitly clobber memory,
>> but no general registers, which is what could be expected.
>>
>> This is however only done for basic asm with non-empty
>> assembler string, which is in sync with David's proposed
>> basic asm warnings patch.
>>
>> Due to the change in the tree representation, where
>> ASM_INPUT can now be the first element of a
>> PARALLEL block with the implicit clobber elements,
>> there are some changes necessary.
>>
>> Most of the changes in the middle end, were necessary
>> because extract_asm_operands can not be used to find out
>> if a PARALLEL block is an asm statement, but in most cases
>> asm_noperands can be used instead.
>>
>> There are also changes necessary in two targets: pa, and ia64.
>> I have successfully built cross-compilers for these targets.
>>
>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu
>> OK for trunk?
>
> A few questions:
>
> 1) I'm not clear precisely what problem this patch fixes.  It's true
> that some people have incorrectly assumed that basic asm clobbers memory
> and this change would fix their code.  But some people also incorrectly
> assume it clobbers registers.  I assume that's why Jeff Law proposed
> making basic asm "an opaque blob that read/write/clobber any register or
> memory location."  Do we have enough problem reports from users to know
> which is the real solution here?
>

Whenever I do something for gcc I do it actually for myself, in my own
best interest.  And this is no exception.

The way I see it, is this: in simple cases a basic asm behaves as if
it would clobber memory, because of the way Jeff implemented the
asm handling in sched-deps.c some 20 years ago.

Look for ASM_INPUT where we have this comment:
"Traditional and volatile asm instructions must be considered to use
  and clobber all hard registers, all pseudo-registers and all of
  memory."

The assumption that it is OK to clobber memory in a basic asm will only
break if the asm statement is inlined in a loop, and that may happen
unexpectedly, when gcc rolls out new optimizations.
That's why I consider this to be security relevant.

But OTOH you see immediately that all general registers are in use
by gcc, unless you declare a variable like
register int eax __asm__("rax");
then it is perfectly OK to use rax in a basic asm of course.

And if we want to have implicitly clobbered registers, like the
diab compiler handles the basic asm, then this patch will
make it possible to add a target hook that clobbers additional
registers for basic asm.

> 2) The -Wbasic-asm warning patch wasn't approved for v6.  If we are
> going to change this behavior now, is it time?
>

Yes. We have stage1 for gcc-7 development, I can't think of a better
time for it.
I would even say, the -Wbasic-asm warning patch makes more sense now,
because we could warn, that the basich asm clobbers memory, which it
did not previously.

> 3) I assume there are good reasons why extended asm can't be used at top
> level.  Will adding these clobbers cause those problems in basic asm too?
>

No, these don't come along here, and nothing should change for them.

> 4) There are more basic asm docs that need to change: "It also does not
> know about side effects of the assembler code, such as modifications to
> memory or registers. Unlike some compilers, GCC assumes that no changes
> to either memory or registers occur. This assumption may change in a
> future release."
>

Yes, I should change that sentence too.

Maybe this way:

"Unlike some compilers, GCC assumes that no changes to registers
occur.  This assumption may change in a future release."

Thanks
Bernd.

> dw

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Martin Liška

On 05/06/2016 02:38 PM, Jakub Jelinek wrote:
> On Fri, May 06, 2016 at 02:48:30PM +0300, Yury Gribov wrote:
>>> 6) As the use-after-scope stuff is already included in libsanitizer, no 
>>> change is needed for the library
>>
>> Note that upstream seems to use a different cmdline interface. They don't
>> have a dedicated -fsanitize=use-after-scope and instead consider it to be a
>> part of -fsanitize=address (disabled by default, enabled via -mllvm
>> -asan-use-after-scope=1). I'd suggest to keep this interface (or at least
>> discuss with them) and use GCC's --param.
> 
> I personally think -fsanitize=use-after-scope (which implies address
> sanitization in it) is better, can upstream be convinved not to change it?

I also incline to the original -fsanitize=use-after-scope, which is compatible
to remaining -fsanitize=... options we have in the GCC.

> 
>> FTR here's the upstream work on this: http://reviews.llvm.org/D19347
>>
>>> Example:
>>>
>>> int
>>> main (void)
>>> {
>>>   char *ptr;
>>>   {
>>> char my_char[9];
>>> ptr = _char[0];
>>>   }
>>>
>>>   *(ptr+9) = 'c';
>>> }
> 
> Well, this testcase shows not just use after scope, but also out of bound
> access.  Would be better not to combine it, at least in the majority of
> testcases.

Sure, that's a typo, should be:
  *(ptr+8) = 'c';

with:
[96, 105) 'my_char' <== Memory access at offset 104 is inside this variable

Intention was to touch the second shadow byte for the array.

Martin

> 
>   Jakub
>

Re: [PATCH] Fix memory leak in tree-if-conv.c

2016-05-06 Thread Richard Biener

On Fri, May 6, 2016 at 2:40 PM, Martin Liška  wrote:
> On 05/03/2016 11:07 AM, Bin.Cheng wrote:
>> Patch applied as suggested at r235808.
>>
>> Thanks,
>> bin
>
> Hi.
>
> Following patch introduces memory leak:
> /home/marxin/Programming/gcc2/objdir/gcc/xgcc 
> -B/home/marxin/Programming/gcc2/objdir/gcc/-fno-diagnostics-show-caret 
> -fdiagnostics-color=never-O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  -w -c -o 920928-2.o 
> /home/marxin/Programming/gcc2/gcc/testsuite/gcc.c-torture/compile/920928-2.c
>
> ==5714== 40 bytes in 1 blocks are definitely lost in loss record 38 of 905
>
> ==5714==at 0x4C2A00F: malloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>
> ==5714==by 0x109C76F: xrealloc (xmalloc.c:178)
>
> ==5714==by 0xA89696: reserve (vec.h:288)
>
> ==5714==by 0xA89696: reserve (vec.h:1438)
>
> ==5714==by 0xA89696: safe_push (vec.h:1547)
>
> ==5714==by 0xA89696: ifcvt_split_critical_edges (tree-if-conv.c:2397)
>
> ==5714==by 0xA89696: tree_if_conversion (tree-if-conv.c:2725)
>
> ==5714==by 0xA89696: (anonymous 
> namespace)::pass_if_conversion::execute(function*) (tree-if-conv.c:2832)
>
> ==5714==by 0x98A9D3: execute_one_pass(opt_pass*) (passes.c:2348)
>
> ==5714==by 0x98AF17: execute_pass_list_1(opt_pass*) [clone .constprop.84] 
> (passes.c:2432)
>
> ==5714==by 0x98AF29: execute_pass_list_1(opt_pass*) [clone .constprop.84] 
> (passes.c:2433)
>
> ==5714==by 0x98AF29: execute_pass_list_1(opt_pass*) [clone .constprop.84] 
> (passes.c:2433)
>
> ==5714==by 0x98AF74: execute_pass_list(function*, opt_pass*) 
> (passes.c:2443)
>
> ==5714==by 0x72DEB2: cgraph_node::expand() (cgraphunit.c:1982)
>
> ==5714==by 0x72F3A3: expand_all_functions (cgraphunit.c:2118)
>
> ==5714==by 0x72F3A3: symbol_table::compile() [clone .part.49] 
> (cgraphunit.c:2474)
>
> ==5714==by 0x730D47: compile (cgraphunit.c:2538)
>
> ==5714==by 0x730D47: symbol_table::finalize_compilation_unit() 
> (cgraphunit.c:2564)
>
> ==5714==by 0xA392B7: compile_file() (toplev.c:488)
>
> ==5714==by 0x616117: do_compile (toplev.c:1987)
>
> ==5714==by 0x616117: toplev::main(int, char**) (toplev.c:2095)
>
> ==5714==by 0x6182B6: main (main.c:39)
>
>
> Following patch fixes that, ready after it bootstraps and survives regtests?

Ok.

Richard.

> Thanks,
> Martin

[PATCH] Fix memory leak in tree-if-conv.c

2016-05-06 Thread Martin Liška

On 05/03/2016 11:07 AM, Bin.Cheng wrote:
> Patch applied as suggested at r235808.
> 
> Thanks,
> bin

Hi.

Following patch introduces memory leak:
/home/marxin/Programming/gcc2/objdir/gcc/xgcc 
-B/home/marxin/Programming/gcc2/objdir/gcc/-fno-diagnostics-show-caret 
-fdiagnostics-color=never-O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  -w -c -o 920928-2.o 
/home/marxin/Programming/gcc2/gcc/testsuite/gcc.c-torture/compile/920928-2.c

==5714== 40 bytes in 1 blocks are definitely lost in loss record 38 of 905

==5714==at 0x4C2A00F: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)

==5714==by 0x109C76F: xrealloc (xmalloc.c:178)

==5714==by 0xA89696: reserve (vec.h:288)

==5714==by 0xA89696: reserve (vec.h:1438)

==5714==by 0xA89696: safe_push (vec.h:1547)

==5714==by 0xA89696: ifcvt_split_critical_edges (tree-if-conv.c:2397)

==5714==by 0xA89696: tree_if_conversion (tree-if-conv.c:2725)

==5714==by 0xA89696: (anonymous 
namespace)::pass_if_conversion::execute(function*) (tree-if-conv.c:2832)

==5714==by 0x98A9D3: execute_one_pass(opt_pass*) (passes.c:2348)

==5714==by 0x98AF17: execute_pass_list_1(opt_pass*) [clone .constprop.84] 
(passes.c:2432)

==5714==by 0x98AF29: execute_pass_list_1(opt_pass*) [clone .constprop.84] 
(passes.c:2433)

==5714==by 0x98AF29: execute_pass_list_1(opt_pass*) [clone .constprop.84] 
(passes.c:2433)

==5714==by 0x98AF74: execute_pass_list(function*, opt_pass*) (passes.c:2443)

==5714==by 0x72DEB2: cgraph_node::expand() (cgraphunit.c:1982)

==5714==by 0x72F3A3: expand_all_functions (cgraphunit.c:2118)

==5714==by 0x72F3A3: symbol_table::compile() [clone .part.49] 
(cgraphunit.c:2474)

==5714==by 0x730D47: compile (cgraphunit.c:2538)

==5714==by 0x730D47: symbol_table::finalize_compilation_unit() 
(cgraphunit.c:2564)

==5714==by 0xA392B7: compile_file() (toplev.c:488)

==5714==by 0x616117: do_compile (toplev.c:1987)

==5714==by 0x616117: toplev::main(int, char**) (toplev.c:2095)

==5714==by 0x6182B6: main (main.c:39)


Following patch fixes that, ready after it bootstraps and survives regtests?

Thanks,
Martin
>From 6ea5e00ce55e8e22fe9429f0cbd942f4938643a6 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 6 May 2016 14:20:28 +0200
Subject: [PATCH] Fix memory leak in tree-if-conv.c

gcc/ChangeLog:

2016-05-06  Martin Liska  

	* tree-if-conv.c (ifcvt_split_critical_edges): Use auto_vec
	instead of vec as the vector is local to the function.
---
 gcc/tree-if-conv.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 3d7c613..3ad8e87 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -2361,7 +2361,7 @@ ifcvt_split_critical_edges (struct loop *loop, bool aggressive_if_conv)
   gimple *stmt;
   edge e;
   edge_iterator ei;
-  vec critical_edges = vNULL;
+  auto_vec critical_edges;
 
   /* Loop is not well formed.  */
   if (num <= 2 || loop->inner || !single_exit (loop))
@@ -2381,7 +2381,6 @@ ifcvt_split_critical_edges (struct loop *loop, bool aggressive_if_conv)
 		 bb->index, MAX_PHI_ARG_NUM);
 
 	  free (body);
-	  critical_edges.release ();
 	  return false;
 	}
   if (bb == loop->latch || bb_with_exit_edge_p (loop, bb))
-- 
2.8.1

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Jakub Jelinek

On Fri, May 06, 2016 at 02:48:30PM +0300, Yury Gribov wrote:
> >6) As the use-after-scope stuff is already included in libsanitizer, no 
> >change is needed for the library
> 
> Note that upstream seems to use a different cmdline interface. They don't
> have a dedicated -fsanitize=use-after-scope and instead consider it to be a
> part of -fsanitize=address (disabled by default, enabled via -mllvm
> -asan-use-after-scope=1). I'd suggest to keep this interface (or at least
> discuss with them) and use GCC's --param.

I personally think -fsanitize=use-after-scope (which implies address
sanitization in it) is better, can upstream be convinved not to change it?

> FTR here's the upstream work on this: http://reviews.llvm.org/D19347
> 
> >Example:
> >
> >int
> >main (void)
> >{
> >   char *ptr;
> >   {
> > char my_char[9];
> > ptr = _char[0];
> >   }
> >
> >   *(ptr+9) = 'c';
> >}

Well, this testcase shows not just use after scope, but also out of bound
access.  Would be better not to combine it, at least in the majority of
testcases.

Jakub

Re: [PATCH] Fix memory leak in tree-inliner

2016-05-06 Thread Martin Liška

On 05/06/2016 12:56 PM, Richard Biener wrote:
> Hmmm.  But this means debug stmt remapping calls
> remap_dependence_clique which may end up bumping
> cfun->last_clique and thus may change code generation.
> 
> So what debug stmts contain MEM_REFs?  If you put an assert
> processing_debug_stmt == 0 in
> remap_dependence_clique I'd like to see a testcase that triggers it.
> 
> Richard.

Ok, I've placed the suggested assert which is triggered for following debug 
statement:

(gdb) p debug_gimple_stmt(stmt)
# DEBUG D#21 => a_1(D)->dim[0].ubound

(gdb) p debug_tree(*tp)
 
unit size 
align 64 symtab -160828560 alias set -1 canonical type 0x76a4f000
fields 
unsigned DI file 
/home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/actual_array_constructor_1.f90
 line 21 col 0
size 
unit size 
align 64 offset_align 128
offset 
bit offset  context 
 chain >
pointer_to_this  reference_to_this 
 chain >
   
arg 0 
public unsigned restrict DI size  
unit size 
align 64 symtab 0 alias set -1 canonical type 0x76a53150>
var def_stmt GIMPLE_NOP

version 1>
arg 1  
constant 0>>

for the following test-case:
gfortran 
/home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/actual_array_constructor_1.f90
 -O3 -g

Martin

[PATCH] Fix PR70948

2016-05-06 Thread Richard Biener


The following fixes PR70948, a failure of PTA considering all
fields of va_list being clobbered (assigned from NONLOCAL) for
__builtin_va_start.  With the new pointer-vs.-decl comparison
optimization this bug manifests as a miscompile of
gcc.c-torture/execute/va-arg-pack-1.c on AARCH64.

Bootstrapped and tested on x86_64-unknown-linux-gnu, Jiong Wang
verified it fixes the aarch64 issue (I verified by dump inspection only).

Applied to trunk.

Richard.

2016-05-06  Richard Biener  

PR tree-optimization/70948
* tree-ssa-structalias.c (find_func_aliases_for_builtin_call):
Properly clobber all fields of va_list for __builtin_va_start.

Index: gcc/tree-ssa-structalias.c
===
--- gcc/tree-ssa-structalias.c  (revision 235945)
+++ gcc/tree-ssa-structalias.c  (working copy)
@@ -4492,7 +4492,7 @@ find_func_aliases_for_builtin_call (stru
  tree valist = gimple_call_arg (t, 0);
  struct constraint_expr rhs, *lhsp;
  unsigned i;
- get_constraint_for (valist, );
+ get_constraint_for_ptr_offset (valist, NULL_TREE, );
  do_deref ();
  /* The va_list gets access to pointers in variadic
 arguments.  Which we know in the case of IPA analysis

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Jakub Jelinek

On Fri, May 06, 2016 at 01:04:30PM +0200, Martin Liška wrote:
> I've started working on the patch couple of month go, basically after
> a brief discussion with Jakub on IRC.
> 
> I'm sending the initial version which can successfully run instrumented
> tramp3d, postgresql server and Inkscape. It catches the basic set of
> examples which are added in following patch.
> 
> The implementation is quite straightforward as works in following steps:
> 
> 1) Every local variable stack slot is poisoned at the very beginning of a 
> function (RTL emission)
> 2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
> emitting ASAN_MARK builtin)
> and the variable is marked as addressable

Not all vars have DECL_EXPRs though.

> 3) Similarly, BIND_EXPR is the place where we poison the variable (scope exit)
> 4) At the very end of the function, we clean up the poisoned memory
> 5) The builtins are expanded to call to libsanitizer run-time library 
> (__asan_poison_stack_memory, __asan_unpoison_stack_memory)
> 6) As the use-after-scope stuff is already included in libsanitizer, no 
> change is needed for the library

> As mentioned, it's request for comment as it still has couple of limitations:
> a) VLA are not supported, which should make sense as we are unable to 
> allocate a stack slot for that
> b) we can possibly strip some instrumentation in situations where a variable 
> is introduced in a very first BB (RTL poisoning is superfluous).
> Similarly for a very last BB of a function, we can strip end of scope 
> poisoning (and RTL unpoisoning). I'll do that incrementally.

Yeah.

> c) We require -fstack-reuse=none option, maybe it worth to warn a user if 
> -fsanitize=use-after-scope is provided without the option?

This should be implicitly set by -fsanitize=use-after-scope.  Only if some
other -fstack-reuse= option is explicitly set together with
-fsanitize=use-after-scope, we should warn and reset it anyway.

> d) An instrumented binary is quite slow (~20x for tramp3d) as every function 
> call produces many memory read/writes. I'm wondering whether
> we should provide a faster alternative (like instrument just variables that 
> have address taken) ?

I don't see any point in instrumenting !needs_to_live_in_memory vars,
at least not those that don't need to live in memory at gimplification time.
How could one use those after scope?  They can't be accessed by
dereferencing some pointer, and the symbol itself should be unavailable for
symbol lookup after the symbol goes out of scope.
Plus obviously ~20x slowdown isn't acceptable.

Another thing is what to do with variables that are addressable at
gimplification time, but generally are made non-addressable afterwards,
such as due to optimizing away the taking of their address, inlining, etc.

Perhaps depending on how big slowdown you get after just instrumenting
needs_to_live_in_memory vars from ~ gimplification time and/or with the
possible inlining of the poisoning/unpoisoning (again, should be another
knob), at least with small sized vars, there might be another knob,
which would tell if vars that are made non-addressable during optimizations
later on should be instrumented or not.
E.g. if you ASAN_MARK all needs_to_live_in_memory vars early, you could
during the addressable determination when the knob says stuff should be
faster, but less precise, ignore the vars that are addressable just because
of the ASAN_MARK calls, and if they'd then turn to be non-addressable,
remove the corresponding ASAN_MARK calls.

> 2016-05-04  Martin Liska  
> 
>   * asan/asan_poisoning.cc: Do not call PoisonShadow in case
>   of zero size of aligned size.

Generally, libsanitizer changes would need to go through upstream.

> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "varasm.h"
>  #include "stor-layout.h"
>  #include "tree-iterator.h"
> +#include "params.h"
>  #include "asan.h"
>  #include "dojump.h"
>  #include "explow.h"
> @@ -54,7 +55,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cfgloop.h"
>  #include "gimple-builder.h"
>  #include "ubsan.h"
> -#include "params.h"
>  #include "builtins.h"
>  #include "fnmatch.h"

Why do you need to move params.h around?  Does asan.h now depend on
params.h?

> +  gimplify_seq_add_stmt
> +(seq_p, gimple_build_call_internal (IFN_ASAN_MARK, 3,
> + build_int_cst (integer_type_node,
> +flags),
> + base, unit_size));

Formatting, better introduce some temporary variables, like
  gimple *g = gimple_build_call_internal (...);
  gimplify_seq_add_stmt (seq_p, g);

> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -3570,7 +3570,8 @@ vect_recog_mask_conversion_pattern (vec 
> *stmts, tree *type_in,
>  {
>gimple *last_stmt = stmts->pop ();
>enum tree_code rhs_code;

Re: [PATCH] Improve min/max

2016-05-06 Thread Kirill Yukhin

On 04 May 21:53, Jakub Jelinek wrote:
> Hi!
> 
> AVX512BW has EVEX insns for these.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK for trunk.

--
Thanks, K
> 
> 2016-05-04  Jakub Jelinek  
> 
>   * config/i386/sse.md (*v8hi3, *v16qi3): Add
>   avx512bw alternative.
> 
> --- gcc/config/i386/sse.md.jj 2016-05-04 14:36:08.0 +0200
> +++ gcc/config/i386/sse.md2016-05-04 15:16:44.180894303 +0200
> @@ -10442,19 +10459,20 @@ (define_insn "*sse4_1_3 (set_attr "mode" "TI")])
>  
>  (define_insn "*v8hi3"
> -  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
> +  [(set (match_operand:V8HI 0 "register_operand" "=x,x,v")
>   (smaxmin:V8HI
> -   (match_operand:V8HI 1 "vector_operand" "%0,x")
> -   (match_operand:V8HI 2 "vector_operand" "xBm,xm")))]
> +   (match_operand:V8HI 1 "vector_operand" "%0,x,v")
> +   (match_operand:V8HI 2 "vector_operand" "xBm,xm,vm")))]
>"TARGET_SSE2 && ix86_binary_operator_ok (, V8HImode, operands)"
>"@
> pw\t{%2, %0|%0, %2}
> +   vpw\t{%2, %1, %0|%0, %1, %2}
> vpw\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,avx")
> +  [(set_attr "isa" "noavx,avx,avx512bw")
> (set_attr "type" "sseiadd")
> -   (set_attr "prefix_data16" "1,*")
> -   (set_attr "prefix_extra" "*,1")
> -   (set_attr "prefix" "orig,vex")
> +   (set_attr "prefix_data16" "1,*,*")
> +   (set_attr "prefix_extra" "*,1,1")
> +   (set_attr "prefix" "orig,vex,evex")
> (set_attr "mode" "TI")])
>  
>  (define_expand "3"
> @@ -10526,19 +10544,20 @@ (define_insn "*sse4_1_3 (set_attr "mode" "TI")])
>  
>  (define_insn "*v16qi3"
> -  [(set (match_operand:V16QI 0 "register_operand" "=x,x")
> +  [(set (match_operand:V16QI 0 "register_operand" "=x,x,v")
>   (umaxmin:V16QI
> -   (match_operand:V16QI 1 "vector_operand" "%0,x")
> -   (match_operand:V16QI 2 "vector_operand" "xBm,xm")))]
> +   (match_operand:V16QI 1 "vector_operand" "%0,x,v")
> +   (match_operand:V16QI 2 "vector_operand" "xBm,xm,vm")))]
>"TARGET_SSE2 && ix86_binary_operator_ok (, V16QImode, operands)"
>"@
> pb\t{%2, %0|%0, %2}
> +   vpb\t{%2, %1, %0|%0, %1, %2}
> vpb\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,avx")
> +  [(set_attr "isa" "noavx,avx,avx512bw")
> (set_attr "type" "sseiadd")
> -   (set_attr "prefix_data16" "1,*")
> -   (set_attr "prefix_extra" "*,1")
> -   (set_attr "prefix" "orig,vex")
> +   (set_attr "prefix_data16" "1,*,*")
> +   (set_attr "prefix_extra" "*,1,1")
> +   (set_attr "prefix" "orig,vex,evex")
> (set_attr "mode" "TI")])
>  
>  ;
> 
>   Jakub

Re: [PATCH] Improve whole vector right shift

2016-05-06 Thread Kirill Yukhin

On 04 May 21:51, Jakub Jelinek wrote:
> Hi!
> 
> In this case the situation is more complicated, because for
> V*HI we need avx512bw and avx512vl, while for V*SI only avx512vl
> is needed and both are in the same pattern.  But we already have
> a pattern that does the right thing right after the "ashr3"
> - but as it is after it, the "ashr3" will win during recog
> and will limit RA decisions.
> 
> The testcase shows that moving the pattern improves it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK for trunk.

--
Thanks, K
> 
> 2016-05-04  Jakub Jelinek  
> 
>   * config/i386/sse.md (ashr3): Move
>   before the ashr3 pattern.
> 
>   * gcc.target/i386/avx512bw-vpsraw-3.c: New test.
>   * gcc.target/i386/avx512vl-vpsrad-3.c: New test.
> 
> --- gcc/config/i386/sse.md.jj 2016-05-04 16:54:31.0 +0200
> +++ gcc/config/i386/sse.md2016-05-04 16:55:31.155848054 +0200
> @@ -10088,6 +10088,20 @@ (define_expand "usadv32qi"
>DONE;
>  })
>  
> +(define_insn "ashr3"
> +  [(set (match_operand:VI24_AVX512BW_1 0 "register_operand" "=v,v")
> + (ashiftrt:VI24_AVX512BW_1
> +   (match_operand:VI24_AVX512BW_1 1 "nonimmediate_operand" "v,vm")
> +   (match_operand:SI 2 "nonmemory_operand" "v,N")))]
> +  "TARGET_AVX512VL"
> +  "vpsra\t{%2, %1, %0|%0, %1, 
> %2}"
> +  [(set_attr "type" "sseishft")
> +   (set (attr "length_immediate")
> + (if_then_else (match_operand 2 "const_int_operand")
> +   (const_string "1")
> +   (const_string "0")))
> +   (set_attr "mode" "")])
> +
>  (define_insn "ashr3"
>[(set (match_operand:VI24_AVX2 0 "register_operand" "=x,x")
>   (ashiftrt:VI24_AVX2
> @@ -10107,20 +10121,6 @@ (define_insn "ashr3"
> (set_attr "prefix" "orig,vex")
> (set_attr "mode" "")])
>  
> -(define_insn "ashr3"
> -  [(set (match_operand:VI24_AVX512BW_1 0 "register_operand" "=v,v")
> - (ashiftrt:VI24_AVX512BW_1
> -   (match_operand:VI24_AVX512BW_1 1 "nonimmediate_operand" "v,vm")
> -   (match_operand:SI 2 "nonmemory_operand" "v,N")))]
> -  "TARGET_AVX512VL"
> -  "vpsra\t{%2, %1, %0|%0, %1, 
> %2}"
> -  [(set_attr "type" "sseishft")
> -   (set (attr "length_immediate")
> - (if_then_else (match_operand 2 "const_int_operand")
> -   (const_string "1")
> -   (const_string "0")))
> -   (set_attr "mode" "")])
> -
>  (define_insn "ashrv2di3"
>[(set (match_operand:V2DI 0 "register_operand" "=v,v")
>   (ashiftrt:V2DI
> --- gcc/testsuite/gcc.target/i386/avx512bw-vpsraw-3.c.jj  2016-05-04 
> 17:01:52.332810541 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512bw-vpsraw-3.c 2016-05-04 
> 17:02:56.104966537 +0200
> @@ -0,0 +1,44 @@
> +/* { dg-do assemble { target { avx512bw && { avx512vl && { ! ia32 } } } } } 
> */
> +/* { dg-options "-O2 -mavx512bw -mavx512vl" } */
> +
> +#include 
> +
> +void
> +f1 (__m128i x, int y)
> +{
> +  register __m128i a __asm ("xmm16");
> +  a = x;
> +  asm volatile ("" : "+v" (a));
> +  a = _mm_srai_epi16 (a, y);
> +  asm volatile ("" : "+v" (a));
> +}
> +
> +void
> +f2 (__m128i x)
> +{
> +  register __m128i a __asm ("xmm16");
> +  a = x;
> +  asm volatile ("" : "+v" (a));
> +  a = _mm_srai_epi16 (a, 16);
> +  asm volatile ("" : "+v" (a));
> +}
> +
> +void
> +f3 (__m256i x, int y)
> +{
> +  register __m256i a __asm ("xmm16");
> +  a = x;
> +  asm volatile ("" : "+v" (a));
> +  a = _mm256_srai_epi16 (a, y);
> +  asm volatile ("" : "+v" (a));
> +}
> +
> +void
> +f4 (__m256i x)
> +{
> +  register __m256i a __asm ("xmm16");
> +  a = x;
> +  asm volatile ("" : "+v" (a));
> +  a = _mm256_srai_epi16 (a, 16);
> +  asm volatile ("" : "+v" (a));
> +}
> --- gcc/testsuite/gcc.target/i386/avx512vl-vpsrad-3.c.jj  2016-05-04 
> 17:01:58.770725338 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512vl-vpsrad-3.c 2016-05-04 
> 17:00:16.0 +0200
> @@ -0,0 +1,44 @@
> +/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */
> +/* { dg-options "-O2 -mavx512vl" } */
> +
> +#include 
> +
> +void
> +f1 (__m128i x, int y)
> +{
> +  register __m128i a __asm ("xmm16");
> +  a = x;
> +  asm volatile ("" : "+v" (a));
> +  a = _mm_srai_epi32 (a, y);
> +  asm volatile ("" : "+v" (a));
> +}
> +
> +void
> +f2 (__m128i x)
> +{
> +  register __m128i a __asm ("xmm16");
> +  a = x;
> +  asm volatile ("" : "+v" (a));
> +  a = _mm_srai_epi32 (a, 16);
> +  asm volatile ("" : "+v" (a));
> +}
> +
> +void
> +f3 (__m256i x, int y)
> +{
> +  register __m256i a __asm ("xmm16");
> +  a = x;
> +  asm volatile ("" : "+v" (a));
> +  a = _mm256_srai_epi32 (a, y);
> +  asm volatile ("" : "+v" (a));
> +}
> +
> +void
> +f4 (__m256i x)
> +{
> +  register __m256i a __asm ("xmm16");
> +  a = x;
> +  asm volatile ("" : "+v" (a));
> +  a = _mm256_srai_epi32 (a, 16);
> +  asm volatile ("" : "+v" (a));
> +}
> 
>   Jakub

Simple bitop reassoc in match.pd (was: Canonicalize X u< X to UNORDERED_EXPR)

2016-05-06 Thread Marc Glisse


On Tue, 3 May 2016, Richard Biener wrote:


On Tue, May 3, 2016 at 3:26 PM, Marc Glisse  wrote:

On Tue, 3 May 2016, Richard Biener wrote:


On Tue, May 3, 2016 at 8:36 AM, Marc Glisse  wrote:


This removes the duplication. I also removed the case (A)&(A) which
is
handled by reassoc. And I need 2 NOP checks, for the case where @0 is a
constant (that couldn't happen before my patch because canonicalization
would put the constant as second operand).



Nicely spotted.  Not sure we want to delay (A)&(A) until re-assoc.  We
have
many patterns that reassoc would also catch, like (A + CST) + CST or (A +
B)- A,
albeit reassoc only handles the unsigned cases.



(A) seems simple enough for match.pd, I thought (A)&(A) was starting
to be a bit specialized. I could easily add it back (making it work with |
at the same time), but then I am not convinced A&(B) is the best output.
If A or A have several uses, then (A) or B&(A) seem preferable
(and we would still have a transformation for (A) so we wouldn't
lose in the case where B and C are constants). We may still end up having to
add some :s to the transformation I just touched.


Yeah, these are always tricky questions.  Note that re-assoc won't
handle the case
with multi-use A or A  The only reason to care for the single-use case is
when we change operations for the mixed operand cases.  For the all-&| case
there is only the (usual) consideration about SSA lifetime extension.

So maybe it makes sense to split out the all-&| cases.


Here they are. I did (X) and (X)&(X). The next one would be 
((X)), but at some point we have to defer to reassoc.


I didn't add the convert?+tree_nop_conversion_p to the existing transform 
I modified. I guess at some point we should make a pass and add them to 
all the transformations on bit operations...


For (X & Y) & Y, I believe that any conversion is fine. For the others, 
tree_nop_conversion_p is probably too strict (narrowing should be fine for 
all), but I was too lazy to look for tighter conditions.


(X ^ Y) ^ Y -> X should probably have (non_lvalue ...) on its output, but 
in a simple test it didn't seem to matter. Is non_lvalue still needed?



Bootstrap+regtest on powerpc64le-unknown-linux-gnu.

2016-05-06  Marc Glisse  

gcc/
* fold-const.c (fold_binary_loc) [(X ^ Y) & Y]: Remove and merge with...
* match.pd ((X & Y) ^ Y): ... this.
((X & Y) & Y, (X | Y) | Y, (X ^ Y) ^ Y, (X & Y) & (X & Z), (X | Y)
| (X | Z), (X ^ Y) ^ (X ^ Z)): New transformations.

gcc/testsuite/
* gcc.dg/tree-ssa/bit-assoc.c: New testcase.
* gcc.dg/tree-ssa/pr69270.c: Adjust.
* gcc.dg/tree-ssa/vrp59.c: Disable forwprop.

--
Marc GlisseIndex: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 235933)
+++ gcc/fold-const.c(working copy)
@@ -10063,59 +10063,20 @@ fold_binary_loc (location_t loc,
}
   /* Fold !X & 1 as X == 0.  */
   if (TREE_CODE (arg0) == TRUTH_NOT_EXPR
  && integer_onep (arg1))
{
  tem = TREE_OPERAND (arg0, 0);
  return fold_build2_loc (loc, EQ_EXPR, type, tem,
  build_zero_cst (TREE_TYPE (tem)));
}
 
-  /* Fold (X ^ Y) & Y as ~X & Y.  */
-  if (TREE_CODE (arg0) == BIT_XOR_EXPR
- && operand_equal_p (TREE_OPERAND (arg0, 1), arg1, 0))
-   {
- tem = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
- return fold_build2_loc (loc, BIT_AND_EXPR, type,
- fold_build1_loc (loc, BIT_NOT_EXPR, type, tem),
- fold_convert_loc (loc, type, arg1));
-   }
-  /* Fold (X ^ Y) & X as ~Y & X.  */
-  if (TREE_CODE (arg0) == BIT_XOR_EXPR
- && operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0)
- && reorder_operands_p (TREE_OPERAND (arg0, 1), arg1))
-   {
- tem = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1));
- return fold_build2_loc (loc, BIT_AND_EXPR, type,
- fold_build1_loc (loc, BIT_NOT_EXPR, type, tem),
- fold_convert_loc (loc, type, arg1));
-   }
-  /* Fold X & (X ^ Y) as X & ~Y.  */
-  if (TREE_CODE (arg1) == BIT_XOR_EXPR
- && operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
-   {
- tem = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 1));
- return fold_build2_loc (loc, BIT_AND_EXPR, type,
- fold_convert_loc (loc, type, arg0),
- fold_build1_loc (loc, BIT_NOT_EXPR, type, tem));
-   }
-  /* Fold X & (Y ^ X) as ~Y & X.  */
-  if (TREE_CODE (arg1) == BIT_XOR_EXPR
- && operand_equal_p (arg0, TREE_OPERAND (arg1, 1), 0)
- && reorder_operands_p (arg0, TREE_OPERAND (arg1, 0)))
-   {
- tem = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 0));

Re: [PATCH] Improve vec extraction

2016-05-06 Thread Kirill Yukhin

On 04 May 21:47, Jakub Jelinek wrote:
> Hi!
> 
> While EVEX doesn't have vextracti128, we can use vextracti32x4;
> unfortunately without avx512dq we need to use full zmm input operand,
> but that shouldn't be a big deal when we hardcode 1 as immediate.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK for trunk

--
Thanks, K
> 
> 2016-05-04  Jakub Jelinek  
> 
>   * config/i386/sse.md (*vec_extractv4sf_0, *sse4_1_extractps,
>   *vec_extractv4sf_mem, vec_extract_lo_v16hi, vec_extract_hi_v16hi,
>   vec_extract_lo_v32qi, vec_extract_hi_v32qi): Use v instead of x
>   in vex or maybe_vex alternatives, use maybe_evex instead of vex
>   in prefix.
> 
> --- gcc/config/i386/sse.md.jj 2016-05-04 14:36:08.0 +0200
> +++ gcc/config/i386/sse.md2016-05-04 15:16:44.180894303 +0200
> @@ -6613,9 +6613,9 @@ (define_expand "vec_set"
>  })
>  
>  (define_insn_and_split "*vec_extractv4sf_0"
> -  [(set (match_operand:SF 0 "nonimmediate_operand" "=x,m,f,r")
> +  [(set (match_operand:SF 0 "nonimmediate_operand" "=v,m,f,r")
>   (vec_select:SF
> -   (match_operand:V4SF 1 "nonimmediate_operand" "xm,x,m,m")
> +   (match_operand:V4SF 1 "nonimmediate_operand" "vm,v,m,m")
> (parallel [(const_int 0)])))]
>"TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
>"#"
> @@ -6624,9 +6624,9 @@ (define_insn_and_split "*vec_extractv4sf
>"operands[1] = gen_lowpart (SFmode, operands[1]);")
>  
>  (define_insn_and_split "*sse4_1_extractps"
> -  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,x,x")
> +  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,v,v")
>   (vec_select:SF
> -   (match_operand:V4SF 1 "register_operand" "Yr,*x,0,x")
> +   (match_operand:V4SF 1 "register_operand" "Yr,*v,0,v")
> (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n,n")])))]
>"TARGET_SSE4_1"
>"@
> @@ -6665,7 +6665,7 @@ (define_insn_and_split "*sse4_1_extractp
> (set_attr "mode" "V4SF,V4SF,*,*")])
>  
>  (define_insn_and_split "*vec_extractv4sf_mem"
> -  [(set (match_operand:SF 0 "register_operand" "=x,*r,f")
> +  [(set (match_operand:SF 0 "register_operand" "=v,*r,f")
>   (vec_select:SF
> (match_operand:V4SF 1 "memory_operand" "o,o,o")
> (parallel [(match_operand 2 "const_0_to_3_operand" "n,n,n")])))]
> @@ -7239,9 +7239,9 @@ (define_insn "vec_extract_hi_v32hi"
> (set_attr "mode" "XI")])
>  
>  (define_insn_and_split "vec_extract_lo_v16hi"
> -  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=x,m")
> +  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=v,m")
>   (vec_select:V8HI
> -   (match_operand:V16HI 1 "nonimmediate_operand" "xm,x")
> +   (match_operand:V16HI 1 "nonimmediate_operand" "vm,v")
> (parallel [(const_int 0) (const_int 1)
>(const_int 2) (const_int 3)
>(const_int 4) (const_int 5)
> @@ -7253,20 +7253,27 @@ (define_insn_and_split "vec_extract_lo_v
>"operands[1] = gen_lowpart (V8HImode, operands[1]);")
>  
>  (define_insn "vec_extract_hi_v16hi"
> -  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=x,m")
> +  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=x,m,v,m,v,m")
>   (vec_select:V8HI
> -   (match_operand:V16HI 1 "register_operand" "x,x")
> +   (match_operand:V16HI 1 "register_operand" "x,x,v,v,v,v")
> (parallel [(const_int 8) (const_int 9)
>(const_int 10) (const_int 11)
>(const_int 12) (const_int 13)
>(const_int 14) (const_int 15)])))]
>"TARGET_AVX"
> -  "vextract%~128\t{$0x1, %1, %0|%0, %1, 0x1}"
> +  "@
> +   vextract%~128\t{$0x1, %1, %0|%0, %1, 0x1}
> +   vextract%~128\t{$0x1, %1, %0|%0, %1, 0x1}
> +   vextracti32x4\t{$0x1, %1, %0|%0, %1, 0x1}
> +   vextracti32x4\t{$0x1, %1, %0|%0, %1, 0x1}
> +   vextracti32x4\t{$0x1, %g1, %0|%0, %g1, 0x1}
> +   vextracti32x4\t{$0x1, %g1, %0|%0, %g1, 0x1}"
>[(set_attr "type" "sselog")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> -   (set_attr "memory" "none,store")
> -   (set_attr "prefix" "vex")
> +   (set_attr "isa" "*,*,avx512dq,avx512dq,avx512f,avx512f")
> +   (set_attr "memory" "none,store,none,store,none,store")
> +   (set_attr "prefix" "vex,vex,evex,evex,evex,evex")
> (set_attr "mode" "OI")])
>  
>  (define_insn_and_split "vec_extract_lo_v64qi"
> @@ -7325,9 +7332,9 @@ (define_insn "vec_extract_hi_v64qi"
> (set_attr "mode" "XI")])
>  
>  (define_insn_and_split "vec_extract_lo_v32qi"
> -  [(set (match_operand:V16QI 0 "nonimmediate_operand" "=x,m")
> +  [(set (match_operand:V16QI 0 "nonimmediate_operand" "=v,m")
>   (vec_select:V16QI
> -   (match_operand:V32QI 1 "nonimmediate_operand" "xm,x")
> +   (match_operand:V32QI 1 "nonimmediate_operand" "vm,v")
> (parallel [(const_int 0) (const_int 1)
>(const_int 2) (const_int 3)
>(const_int 4) (const_int 5)
> @@

Re: [PATCH] Improve *pmaddwd

2016-05-06 Thread Kirill Yukhin

On 04 May 21:48, Jakub Jelinek wrote:
> Hi!
> 
> As the testcase shows, we unnecessarily disallow xmm16+, even when
> we can use them for -mavx512bw.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK for trunk.

--
Thanks, K
> 
> 2016-05-04  Jakub Jelinek  
> 
>   * config/i386/sse.md (*avx2_pmaddwd, *sse2_pmaddwd): Use
>   v instead of x in vex or maybe_vex alternatives, use
>   maybe_evex instead of vex in prefix.
> 
>   * gcc.target/i386/avx512bw-vpmaddwd-3.c: New test.
> 
> --- gcc/config/i386/sse.md.jj 2016-05-04 14:36:08.0 +0200
> +++ gcc/config/i386/sse.md2016-05-04 15:16:44.180894303 +0200
> @@ -9803,19 +9817,19 @@ (define_expand "avx2_pmaddwd"
>"ix86_fixup_binary_operands_no_copy (MULT, V16HImode, operands);")
>  
>  (define_insn "*avx2_pmaddwd"
> -  [(set (match_operand:V8SI 0 "register_operand" "=x")
> +  [(set (match_operand:V8SI 0 "register_operand" "=x,v")
>   (plus:V8SI
> (mult:V8SI
>   (sign_extend:V8SI
> (vec_select:V8HI
> - (match_operand:V16HI 1 "nonimmediate_operand" "%x")
> + (match_operand:V16HI 1 "nonimmediate_operand" "%x,v")
>   (parallel [(const_int 0) (const_int 2)
>  (const_int 4) (const_int 6)
>  (const_int 8) (const_int 10)
>  (const_int 12) (const_int 14)])))
>   (sign_extend:V8SI
> (vec_select:V8HI
> - (match_operand:V16HI 2 "nonimmediate_operand" "xm")
> + (match_operand:V16HI 2 "nonimmediate_operand" "xm,vm")
>   (parallel [(const_int 0) (const_int 2)
>  (const_int 4) (const_int 6)
>  (const_int 8) (const_int 10)
> @@ -9836,7 +9850,8 @@ (define_insn "*avx2_pmaddwd"
>"TARGET_AVX2 && ix86_binary_operator_ok (MULT, V16HImode, operands)"
>"vpmaddwd\t{%2, %1, %0|%0, %1, %2}"
>[(set_attr "type" "sseiadd")
> -   (set_attr "prefix" "vex")
> +   (set_attr "isa" "*,avx512bw")
> +   (set_attr "prefix" "vex,evex")
> (set_attr "mode" "OI")])
>  
>  (define_expand "sse2_pmaddwd"
> @@ -9866,17 +9881,17 @@ (define_expand "sse2_pmaddwd"
>"ix86_fixup_binary_operands_no_copy (MULT, V8HImode, operands);")
>  
>  (define_insn "*sse2_pmaddwd"
> -  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
> +  [(set (match_operand:V4SI 0 "register_operand" "=x,x,v")
>   (plus:V4SI
> (mult:V4SI
>   (sign_extend:V4SI
> (vec_select:V4HI
> - (match_operand:V8HI 1 "vector_operand" "%0,x")
> + (match_operand:V8HI 1 "vector_operand" "%0,x,v")
>   (parallel [(const_int 0) (const_int 2)
>  (const_int 4) (const_int 6)])))
>   (sign_extend:V4SI
> (vec_select:V4HI
> - (match_operand:V8HI 2 "vector_operand" "xBm,xm")
> + (match_operand:V8HI 2 "vector_operand" "xBm,xm,vm")
>   (parallel [(const_int 0) (const_int 2)
>  (const_int 4) (const_int 6)]
> (mult:V4SI
> @@ -9891,12 +9906,13 @@ (define_insn "*sse2_pmaddwd"
>"TARGET_SSE2 && ix86_binary_operator_ok (MULT, V8HImode, operands)"
>"@
> pmaddwd\t{%2, %0|%0, %2}
> +   vpmaddwd\t{%2, %1, %0|%0, %1, %2}
> vpmaddwd\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,avx")
> +  [(set_attr "isa" "noavx,avx,avx512bw")
> (set_attr "type" "sseiadd")
> (set_attr "atom_unit" "simul")
> -   (set_attr "prefix_data16" "1,*")
> -   (set_attr "prefix" "orig,vex")
> +   (set_attr "prefix_data16" "1,*,*")
> +   (set_attr "prefix" "orig,vex,evex")
> (set_attr "mode" "TI")])
>  
>  (define_insn "avx512dq_mul3"
> --- gcc/testsuite/gcc.target/i386/avx512bw-vpmaddwd-3.c.jj2016-05-04 
> 16:37:21.196223424 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512bw-vpmaddwd-3.c   2016-05-04 
> 16:37:51.867819502 +0200
> @@ -0,0 +1,24 @@
> +/* { dg-do assemble { target { avx512bw && { avx512vl && { ! ia32 } } } } } 
> */
> +/* { dg-options "-O2 -mavx512bw -mavx512vl" } */
> +
> +#include 
> +
> +void
> +f1 (__m128i x, __m128i y)
> +{
> +  register __m128i a __asm ("xmm16"), b __asm ("xmm17");
> +  a = x; b = y;
> +  asm volatile ("" : "+v" (a), "+v" (b));
> +  a = _mm_madd_epi16 (a, b);
> +  asm volatile ("" : "+v" (a));
> +}
> +
> +void
> +f2 (__m256i x, __m256i y)
> +{
> +  register __m256i a __asm ("xmm16"), b __asm ("xmm17");
> +  a = x; b = y;
> +  asm volatile ("" : "+v" (a), "+v" (b));
> +  a = _mm256_madd_epi16 (a, b);
> +  asm volatile ("" : "+v" (a));
> +}
> 
>   Jakub

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Yury Gribov


On 05/06/2016 02:04 PM, Martin Liška wrote:

Hello.

I've started working on the patch couple of month go, basically after
a brief discussion with Jakub on IRC.

I'm sending the initial version which can successfully run instrumented
tramp3d, postgresql server and Inkscape. It catches the basic set of
examples which are added in following patch.

The implementation is quite straightforward as works in following steps:

1) Every local variable stack slot is poisoned at the very beginning of a 
function (RTL emission)
2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
emitting ASAN_MARK builtin)
and the variable is marked as addressable
3) Similarly, BIND_EXPR is the place where we poison the variable (scope exit)
4) At the very end of the function, we clean up the poisoned memory
5) The builtins are expanded to call to libsanitizer run-time library 
(__asan_poison_stack_memory, __asan_unpoison_stack_memory)


Can we inline these?


6) As the use-after-scope stuff is already included in libsanitizer, no change 
is needed for the library


Note that upstream seems to use a different cmdline interface. They 
don't have a dedicated -fsanitize=use-after-scope and instead consider 
it to be a part of -fsanitize=address (disabled by default, enabled via 
-mllvm -asan-use-after-scope=1). I'd suggest to keep this interface (or 
at least discuss with them) and use GCC's --param.


FTR here's the upstream work on this: http://reviews.llvm.org/D19347


Example:

int
main (void)
{
   char *ptr;
   {
 char my_char[9];
 ptr = _char[0];
   }

   *(ptr+9) = 'c';
}

./a.out
=
==12811==ERROR: AddressSanitizer: stack-use-after-scope on address 
0x7ffec9bcff69 at pc 0x00400a73 bp 0x7ffec9bcfef0 sp 0x7ffec9bcfee8
WRITE of size 1 at 0x7ffec9bcff69 thread T0
 #0 0x400a72 in main (/tmp/a.out+0x400a72)
 #1 0x7f100824860f in __libc_start_main (/lib64/libc.so.6+0x2060f)
 #2 0x400868 in _start (/tmp/a.out+0x400868)

Address 0x7ffec9bcff69 is located in stack of thread T0 at offset 105 in frame
 #0 0x400945 in main (/tmp/a.out+0x400945)

   This frame has 2 object(s):
 [32, 40) 'ptr'
 [96, 105) 'my_char' <== Memory access at offset 105 overflows this variable
HINT: this may be a false positive if your program uses some custom stack 
unwind mechanism or swapcontext
   (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x400a72) in main
Shadow bytes around the buggy address:
   0x100059371f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059371fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059371fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059371fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059371fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100059371fe0: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 f8[f8]f4 f4
   0x100059371ff0: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059372000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059372010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059372020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   0x100059372030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
   Addressable:   00
   Partially addressable: 01 02 03 04 05 06 07
   Heap left redzone:   fa
   Heap right redzone:  fb
   Freed heap region:   fd
   Stack left redzone:  f1
   Stack mid redzone:   f2
   Stack right redzone: f3
   Stack partial redzone:   f4
   Stack after return:  f5
   Stack use after scope:   f8
   Global redzone:  f9
   Global init order:   f6
   Poisoned by user:f7
   Container overflow:  fc
   Array cookie:ac
   Intra object redzone:bb
   ASan internal:   fe
   Left alloca redzone: ca
   Right alloca redzone:cb
==12811==ABORTING

As mentioned, it's request for comment as it still has couple of limitations:
a) VLA are not supported, which should make sense as we are unable to allocate 
a stack slot for that


Note that we plan some work on VLA sanitization later this year 
(upstream ASan now sanitizes dynamic allocas and VLAs).



b) we can possibly strip some instrumentation in situations where a variable is 
introduced in a very first BB (RTL poisoning is superfluous).
Similarly for a very last BB of a function, we can strip end of scope poisoning 
(and RTL unpoisoning). I'll do that incrementally.
c) We require -fstack-reuse=none option, maybe it worth to warn a user if 
-fsanitize=use-after-scope is provided without the option?


As a user, I'd prefer it to be automatically disabled when 
use-after-scope is on (unless it has been set explicitly in cmdline in 
which case we should probably issue error).



d) An instrumented binary is quite slow (~20x for tramp3d) as every function

Re: [PATCH] Improve vec_concatv?sf*

2016-05-06 Thread Kirill Yukhin

On 04 May 21:44, Jakub Jelinek wrote:
> Hi!
> 
> Another pair of define_insns.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK for trunk.

--
Thanks, K
> 
> 2016-05-04  Jakub Jelinek  
> 
>   * config/i386/sse.md (*vec_concatv2sf_sse4_1, *vec_concatv4sf): Use
>   v instead of x in vex or maybe_vex alternatives, use
>   maybe_evex instead of vex in prefix.
> 
> --- gcc/config/i386/sse.md.jj 2016-05-04 14:36:08.0 +0200
> +++ gcc/config/i386/sse.md2016-05-04 15:16:44.180894303 +0200
> @@ -6415,12 +6415,12 @@ (define_insn "avx512f_vec_dup_1"
>  ;; unpcklps with register source since it is shorter.
>  (define_insn "*vec_concatv2sf_sse4_1"
>[(set (match_operand:V2SF 0 "register_operand"
> -   "=Yr,*x,x,Yr,*x,x,x,*y ,*y")
> +   "=Yr,*x,v,Yr,*x,v,v,*y ,*y")
>   (vec_concat:V2SF
> (match_operand:SF 1 "nonimmediate_operand"
> -   "  0, 0,x, 0,0, x,m, 0 , m")
> +   "  0, 0,v, 0,0, v,m, 0 , m")
> (match_operand:SF 2 "vector_move_operand"
> -   " Yr,*x,x, m,m, m,C,*ym, C")))]
> +   " Yr,*x,v, m,m, m,C,*ym, C")))]
>"TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
>"@
> unpcklps\t{%2, %0|%0, %2}
> @@ -6437,7 +6437,7 @@ (define_insn "*vec_concatv2sf_sse4_1"
> (set_attr "prefix_data16" "*,*,*,1,1,*,*,*,*")
> (set_attr "prefix_extra" "*,*,*,1,1,1,*,*,*")
> (set_attr "length_immediate" "*,*,*,1,1,1,*,*,*")
> -   (set_attr "prefix" "orig,orig,vex,orig,orig,vex,maybe_vex,orig,orig")
> +   (set_attr "prefix" 
> "orig,orig,maybe_evex,orig,orig,maybe_evex,maybe_vex,orig,orig")
> (set_attr "mode" "V4SF,V4SF,V4SF,V4SF,V4SF,V4SF,SF,DI,DI")])
>  
>  ;; ??? In theory we can match memory for the MMX alternative, but allowing
> @@ -6458,10 +6458,10 @@ (define_insn "*vec_concatv2sf_sse"
> (set_attr "mode" "V4SF,SF,DI,DI")])
>  
>  (define_insn "*vec_concatv4sf"
> -  [(set (match_operand:V4SF 0 "register_operand"   "=x,x,x,x")
> +  [(set (match_operand:V4SF 0 "register_operand"   "=x,v,x,v")
>   (vec_concat:V4SF
> -   (match_operand:V2SF 1 "register_operand" " 0,x,0,x")
> -   (match_operand:V2SF 2 "nonimmediate_operand" " x,x,m,m")))]
> +   (match_operand:V2SF 1 "register_operand" " 0,v,0,v")
> +   (match_operand:V2SF 2 "nonimmediate_operand" " x,v,m,m")))]
>"TARGET_SSE"
>"@
> movlhps\t{%2, %0|%0, %2}
> @@ -6470,7 +6470,7 @@ (define_insn "*vec_concatv4sf"
> vmovhps\t{%2, %1, %0|%0, %1, %q2}"
>[(set_attr "isa" "noavx,avx,noavx,avx")
> (set_attr "type" "ssemov")
> -   (set_attr "prefix" "orig,vex,orig,vex")
> +   (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex")
> (set_attr "mode" "V4SF,V4SF,V2SF,V2SF")])
>  
>  (define_expand "vec_init"
> 
>   Jakub

Re: [PATCH] Improve vec_interleave*

2016-05-06 Thread Kirill Yukhin

On 04 May 21:41, Jakub Jelinek wrote:
> Hi!
> 
> Another 3 define_insns that can handle xmm16+ operands.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK for trunk.

--
Thanks, K
> 
> 2016-05-04  Jakub Jelinek  
> 
>   * config/i386/sse.md (vec_interleave_lowv4sf,
>   *vec_interleave_highv2df, *vec_interleave_lowv2df): Use
>   v instead of x in vex or maybe_vex alternatives, use
>   maybe_evex instead of vex in prefix.
> 
> --- gcc/config/i386/sse.md.jj 2016-05-04 14:36:08.0 +0200
> +++ gcc/config/i386/sse.md2016-05-04 15:16:44.180894303 +0200
> @@ -5987,11 +5987,11 @@ (define_expand "vec_interleave_lowv8sf"
>  })
>  
>  (define_insn "vec_interleave_lowv4sf"
> -  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
> +  [(set (match_operand:V4SF 0 "register_operand" "=x,v")
>   (vec_select:V4SF
> (vec_concat:V8SF
> - (match_operand:V4SF 1 "register_operand" "0,x")
> - (match_operand:V4SF 2 "vector_operand" "xBm,xm"))
> + (match_operand:V4SF 1 "register_operand" "0,v")
> + (match_operand:V4SF 2 "vector_operand" "xBm,vm"))
> (parallel [(const_int 0) (const_int 4)
>(const_int 1) (const_int 5)])))]
>"TARGET_SSE"
> @@ -6000,7 +6000,7 @@ (define_insn "vec_interleave_lowv4sf"
> vunpcklps\t{%2, %1, %0|%0, %1, %2}"
>[(set_attr "isa" "noavx,avx")
> (set_attr "type" "sselog")
> -   (set_attr "prefix" "orig,vex")
> +   (set_attr "prefix" "orig,maybe_evex")
> (set_attr "mode" "V4SF")])
>  
>  ;; These are modeled with the same vec_concat as the others so that we
> @@ -7480,11 +7494,11 @@ (define_expand "vec_interleave_highv2df"
>  })
>  
>  (define_insn "*vec_interleave_highv2df"
> -  [(set (match_operand:V2DF 0 "nonimmediate_operand" "=x,x,x,x,x,m")
> +  [(set (match_operand:V2DF 0 "nonimmediate_operand" "=x,v,v,x,v,m")
>   (vec_select:V2DF
> (vec_concat:V4DF
> - (match_operand:V2DF 1 "nonimmediate_operand" " 0,x,o,o,o,x")
> - (match_operand:V2DF 2 "nonimmediate_operand" " x,x,1,0,x,0"))
> + (match_operand:V2DF 1 "nonimmediate_operand" " 0,v,o,o,o,v")
> + (match_operand:V2DF 2 "nonimmediate_operand" " x,v,1,0,v,0"))
> (parallel [(const_int 1)
>(const_int 3)])))]
>"TARGET_SSE2 && ix86_vec_interleave_v2df_operator_ok (operands, 1)"
> @@ -7498,7 +7512,7 @@ (define_insn "*vec_interleave_highv2df"
>[(set_attr "isa" "noavx,avx,sse3,noavx,avx,*")
> (set_attr "type" "sselog,sselog,sselog,ssemov,ssemov,ssemov")
> (set_attr "prefix_data16" "*,*,*,1,*,1")
> -   (set_attr "prefix" "orig,vex,maybe_vex,orig,vex,maybe_vex")
> +   (set_attr "prefix" "orig,maybe_evex,maybe_vex,orig,maybe_evex,maybe_vex")
> (set_attr "mode" "V2DF,V2DF,DF,V1DF,V1DF,V1DF")])
>  
>  (define_expand "avx512f_movddup512"
> @@ -7639,11 +7653,11 @@ (define_expand "vec_interleave_lowv2df"
>  })
>  
>  (define_insn "*vec_interleave_lowv2df"
> -  [(set (match_operand:V2DF 0 "nonimmediate_operand" "=x,x,x,x,x,o")
> +  [(set (match_operand:V2DF 0 "nonimmediate_operand" "=x,v,v,x,v,o")
>   (vec_select:V2DF
> (vec_concat:V4DF
> - (match_operand:V2DF 1 "nonimmediate_operand" " 0,x,m,0,x,0")
> - (match_operand:V2DF 2 "nonimmediate_operand" " x,x,1,m,m,x"))
> + (match_operand:V2DF 1 "nonimmediate_operand" " 0,v,m,0,v,0")
> + (match_operand:V2DF 2 "nonimmediate_operand" " x,v,1,m,m,v"))
> (parallel [(const_int 0)
>(const_int 2)])))]
>"TARGET_SSE2 && ix86_vec_interleave_v2df_operator_ok (operands, 0)"
> @@ -7657,7 +7671,7 @@ (define_insn "*vec_interleave_lowv2df"
>[(set_attr "isa" "noavx,avx,sse3,noavx,avx,*")
> (set_attr "type" "sselog,sselog,sselog,ssemov,ssemov,ssemov")
> (set_attr "prefix_data16" "*,*,*,1,*,1")
> -   (set_attr "prefix" "orig,vex,maybe_vex,orig,vex,maybe_vex")
> +   (set_attr "prefix" "orig,maybe_evex,maybe_vex,orig,maybe_evex,maybe_vex")
> (set_attr "mode" "V2DF,V2DF,DF,V1DF,V1DF,V1DF")])
>  
>  (define_split
> 
>   Jakub

Re: [PATCH] Improve other 13 define_insns

2016-05-06 Thread Kirill Yukhin

On 04 May 21:43, Jakub Jelinek wrote:
> Hi!
> 
> This patch tweaks more define_insns at once, again all the insns
> should be already in AVX512F or AVX512VL.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK for trunk.

--
Thanks, K
> 
> 2016-05-04  Jakub Jelinek  
> 
>   * config/i386/sse.md (sse_shufps_, sse_storehps, sse_loadhps,
>   sse_storelps, sse_movss, avx2_vec_dup, avx2_vec_dupv8sf_1,
>   sse2_shufpd_, sse2_storehpd, sse2_storelpd, sse2_loadhpd,
>   sse2_loadlpd, sse2_movsd): Use v instead of x in vex or maybe_vex
>   alternatives, use maybe_evex instead of vex in prefix.
> 
> --- gcc/config/i386/sse.md.jj 2016-05-04 14:36:08.0 +0200
> +++ gcc/config/i386/sse.md2016-05-04 15:16:44.180894303 +0200
> @@ -6219,11 +6219,11 @@ (define_insn "sse_shufps_v4sf_mask"
> (set_attr "mode" "V4SF")])
>  
>  (define_insn "sse_shufps_"
> -  [(set (match_operand:VI4F_128 0 "register_operand" "=x,x")
> +  [(set (match_operand:VI4F_128 0 "register_operand" "=x,v")
>   (vec_select:VI4F_128
> (vec_concat:
> - (match_operand:VI4F_128 1 "register_operand" "0,x")
> - (match_operand:VI4F_128 2 "vector_operand" "xBm,xm"))
> + (match_operand:VI4F_128 1 "register_operand" "0,v")
> + (match_operand:VI4F_128 2 "vector_operand" "xBm,vm"))
> (parallel [(match_operand 3 "const_0_to_3_operand")
>(match_operand 4 "const_0_to_3_operand")
>(match_operand 5 "const_4_to_7_operand")
> @@ -6250,13 +6250,13 @@ (define_insn "sse_shufps_"
>[(set_attr "isa" "noavx,avx")
> (set_attr "type" "sseshuf")
> (set_attr "length_immediate" "1")
> -   (set_attr "prefix" "orig,vex")
> +   (set_attr "prefix" "orig,maybe_evex")
> (set_attr "mode" "V4SF")])
>  
>  (define_insn "sse_storehps"
> -  [(set (match_operand:V2SF 0 "nonimmediate_operand" "=m,x,x")
> +  [(set (match_operand:V2SF 0 "nonimmediate_operand" "=m,v,v")
>   (vec_select:V2SF
> -   (match_operand:V4SF 1 "nonimmediate_operand" "x,x,o")
> +   (match_operand:V4SF 1 "nonimmediate_operand" "v,v,o")
> (parallel [(const_int 2) (const_int 3)])))]
>"TARGET_SSE"
>"@
> @@ -6288,12 +6288,12 @@ (define_expand "sse_loadhps_exp"
>  })
>  
>  (define_insn "sse_loadhps"
> -  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,x,x,x,o")
> +  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,v,x,v,o")
>   (vec_concat:V4SF
> (vec_select:V2SF
> - (match_operand:V4SF 1 "nonimmediate_operand" " 0,x,0,x,0")
> + (match_operand:V4SF 1 "nonimmediate_operand" " 0,v,0,v,0")
>   (parallel [(const_int 0) (const_int 1)]))
> -   (match_operand:V2SF 2 "nonimmediate_operand"   " m,m,x,x,x")))]
> +   (match_operand:V2SF 2 "nonimmediate_operand"   " m,m,x,v,v")))]
>"TARGET_SSE"
>"@
> movhps\t{%2, %0|%0, %q2}
> @@ -6303,13 +6303,13 @@ (define_insn "sse_loadhps"
> %vmovlps\t{%2, %H0|%H0, %2}"
>[(set_attr "isa" "noavx,avx,noavx,avx,*")
> (set_attr "type" "ssemov")
> -   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
> +   (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex,maybe_vex")
> (set_attr "mode" "V2SF,V2SF,V4SF,V4SF,V2SF")])
>  
>  (define_insn "sse_storelps"
> -  [(set (match_operand:V2SF 0 "nonimmediate_operand"   "=m,x,x")
> +  [(set (match_operand:V2SF 0 "nonimmediate_operand"   "=m,v,v")
>   (vec_select:V2SF
> -   (match_operand:V4SF 1 "nonimmediate_operand" " x,x,m")
> +   (match_operand:V4SF 1 "nonimmediate_operand" " v,v,m")
> (parallel [(const_int 0) (const_int 1)])))]
>"TARGET_SSE"
>"@
> @@ -6341,11 +6341,11 @@ (define_expand "sse_loadlps_exp"
>  })
>  
>  (define_insn "sse_loadlps"
> -  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,x,x,x,m")
> +  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,v,x,v,m")
>   (vec_concat:V4SF
> -   (match_operand:V2SF 2 "nonimmediate_operand"   " 0,x,m,m,x")
> +   (match_operand:V2SF 2 "nonimmediate_operand"   " 0,v,m,m,v")
> (vec_select:V2SF
> - (match_operand:V4SF 1 "nonimmediate_operand" " x,x,0,x,0")
> + (match_operand:V4SF 1 "nonimmediate_operand" " x,v,0,v,0")
>   (parallel [(const_int 2) (const_int 3)]]
>"TARGET_SSE"
>"@
> @@ -6357,14 +6357,14 @@ (define_insn "sse_loadlps"
>[(set_attr "isa" "noavx,avx,noavx,avx,*")
> (set_attr "type" "sseshuf,sseshuf,ssemov,ssemov,ssemov")
> (set_attr "length_immediate" "1,1,*,*,*")
> -   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
> +   (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex,maybe_vex")
> (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
>  
>  (define_insn "sse_movss"
> -  [(set (match_operand:V4SF 0 "register_operand"   "=x,x")
> +  [(set (match_operand:V4SF 0 "register_operand"   "=x,v")
>   (vec_merge:V4SF
> -   (match_operand:V4SF 2 "register_operand" " x,x")
> -

Re: [PATCh] Improve sse_mov{hl,lh}ps

2016-05-06 Thread Kirill Yukhin

On 04 May 21:37, Jakub Jelinek wrote:
> Hi!
> 
> Another pair of define_insns where all the VEX insns have EVEX variant
> in AVX512VL.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK for trunk.

--
Thanks, K
> 
> 2016-05-04  Jakub Jelinek  
> 
>   * config/i386/sse.md (sse_movhlps, sse_movlhps): Use
>   v instead of x in vex or maybe_vex alternatives, use
>   maybe_evex instead of vex in prefix.
> 
> --- gcc/config/i386/sse.md.jj 2016-05-04 14:36:08.0 +0200
> +++ gcc/config/i386/sse.md2016-05-04 15:16:44.180894303 +0200
> @@ -5744,11 +5744,11 @@ (define_expand "sse_movhlps_exp"
>  })
>  
>  (define_insn "sse_movhlps"
> -  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,x,x,x,m")
> +  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,v,x,v,m")
>   (vec_select:V4SF
> (vec_concat:V8SF
> - (match_operand:V4SF 1 "nonimmediate_operand" " 0,x,0,x,0")
> - (match_operand:V4SF 2 "nonimmediate_operand" " x,x,o,o,x"))
> + (match_operand:V4SF 1 "nonimmediate_operand" " 0,v,0,v,0")
> + (match_operand:V4SF 2 "nonimmediate_operand" " x,v,o,o,v"))
> (parallel [(const_int 6)
>(const_int 7)
>(const_int 2)
> @@ -5762,7 +5762,7 @@ (define_insn "sse_movhlps"
> %vmovhps\t{%2, %0|%q0, %2}"
>[(set_attr "isa" "noavx,avx,noavx,avx,*")
> (set_attr "type" "ssemov")
> -   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
> +   (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex,maybe_vex")
> (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
>  
>  (define_expand "sse_movlhps_exp"
> @@ -5789,11 +5789,11 @@ (define_expand "sse_movlhps_exp"
>  })
>  
>  (define_insn "sse_movlhps"
> -  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,x,x,x,o")
> +  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,v,x,v,o")
>   (vec_select:V4SF
> (vec_concat:V8SF
> - (match_operand:V4SF 1 "nonimmediate_operand" " 0,x,0,x,0")
> - (match_operand:V4SF 2 "nonimmediate_operand" " x,x,m,m,x"))
> + (match_operand:V4SF 1 "nonimmediate_operand" " 0,v,0,v,0")
> + (match_operand:V4SF 2 "nonimmediate_operand" " x,v,m,v,v"))
> (parallel [(const_int 0)
>(const_int 1)
>(const_int 4)
> @@ -5807,7 +5807,7 @@ (define_insn "sse_movlhps"
> %vmovlps\t{%2, %H0|%H0, %2}"
>[(set_attr "isa" "noavx,avx,noavx,avx,*")
> (set_attr "type" "ssemov")
> -   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
> +   (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex,maybe_vex")
> (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
>  
>  (define_insn "avx512f_unpckhps512"
> 
>   Jakub

Re: [PATCH] Improve *avx_cvtp?2??256_2

2016-05-06 Thread Kirill Yukhin

On 04 May 21:35, Jakub Jelinek wrote:
> Hi!
> 
> Not sure how to easily construct a testcase for this (these insns are
> usually used for vectorization, and then it really depends on register
> pressure).
> But in any case, looking at documentation it seems all the used insns are
> available (generally even for further patches, what I'm looking for is
> whether the insns are available already in AVX512F, or, if all the operands
> are 128-bit or 256-bit vectors, in AVX512VL, or if they need further ISA
> extensions; HARD_REGNO_MODE_OK should guarantee that the 128-bit and 256-bit
> vectors would not be assigned to xmm16+ unless -mavx512vl).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK for trunk.

--
Thanks, K
> 
> 2016-05-04  Jakub Jelinek  
> 
>   * config/i386/sse.md (*avx_cvtpd2dq256_2, *avx_cvtps2pd256_2): Use
>   v constraint instead of x.
> 
> --- gcc/config/i386/sse.md.jj 2016-05-04 14:36:08.0 +0200
> +++ gcc/config/i386/sse.md2016-05-04 15:16:44.180894303 +0200
> @@ -4735,9 +4735,9 @@ (define_expand "avx_cvtpd2dq256_2"
>"operands[2] = CONST0_RTX (V4SImode);")
>  
>  (define_insn "*avx_cvtpd2dq256_2"
> -  [(set (match_operand:V8SI 0 "register_operand" "=x")
> +  [(set (match_operand:V8SI 0 "register_operand" "=v")
>   (vec_concat:V8SI
> -   (unspec:V4SI [(match_operand:V4DF 1 "nonimmediate_operand" "xm")]
> +   (unspec:V4SI [(match_operand:V4DF 1 "nonimmediate_operand" "vm")]
>  UNSPEC_FIX_NOTRUNC)
> (match_operand:V4SI 2 "const0_operand")))]
>"TARGET_AVX"
> @@ -5050,10 +5050,10 @@ (define_insn "_cvtps2p
> (set_attr "mode" "")])
>  
>  (define_insn "*avx_cvtps2pd256_2"
> -  [(set (match_operand:V4DF 0 "register_operand" "=x")
> +  [(set (match_operand:V4DF 0 "register_operand" "=v")
>   (float_extend:V4DF
> (vec_select:V4SF
> - (match_operand:V8SF 1 "nonimmediate_operand" "xm")
> + (match_operand:V8SF 1 "nonimmediate_operand" "vm")
>   (parallel [(const_int 0) (const_int 1)
>  (const_int 2) (const_int 3)]]
>"TARGET_AVX"
> 
>   Jakub

Re: [PATCH] Fix PR70937

2016-05-06 Thread Richard Biener

On Fri, 6 May 2016, Richard Biener wrote:

> 
> The following patch fixes another case of missing DECL_EXPR in the FE.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> Ok for trunk?

Dominique noticed a FAIL early which is fixed by adjusting the patch
to only handle TYPE_DECL TYPE_NAME like so:

Index: gcc/fortran/trans-decl.c
===
--- gcc/fortran/trans-decl.c(revision 235945)
+++ gcc/fortran/trans-decl.c(working copy)
@@ -3818,6 +3818,12 @@ gfc_trans_vla_type_sizes (gfc_symbol *sy
 }
 
   gfc_trans_vla_type_sizes_1 (type, body);
+  /* gfc_build_qualified_array may have built this type but left 
TYPE_NAME
+ pointing to the original type whose type sizes we need to expose to
+ the gimplifier unsharing.  */
+  if (TYPE_NAME (type)
+  && TREE_CODE (TYPE_NAME (type)) == TYPE_DECL)
+gfc_add_expr_to_block (body, build1 (DECL_EXPR, type, TYPE_NAME 
(type)));
 }
 
 
I've re-started testing.

Ok with that change?

Thanks,
Richard.

> Thanks,
> Richard.
> 
> 2016-05-06  Richard Biener  
> 
>   PR fortran/70937
>   * trans-decl.c (gfc_trans_vla_type_sizes): Add a DECL_EXPR for
>   the TYPE_DECL as well.
> 
>   * gfortran.dg/pr70937.f90: New testcase.
> 
> Index: gcc/fortran/trans-decl.c
> ===
> *** gcc/fortran/trans-decl.c  (revision 235945)
> --- gcc/fortran/trans-decl.c  (working copy)
> *** gfc_trans_vla_type_sizes (gfc_symbol *sy
> *** 3818,3823 
> --- 3818,3828 
>   }
>   
> gfc_trans_vla_type_sizes_1 (type, body);
> +   /* gfc_build_qualified_array may have built this type but left TYPE_NAME
> +  pointing to the original type whose type sizes we need to expose to
> +  the gimplifier unsharing.  */
> +   if (TYPE_NAME (type))
> + gfc_add_expr_to_block (body, build1 (DECL_EXPR, type, TYPE_NAME 
> (type)));
>   }
>   
>   
> Index: gcc/testsuite/gfortran.dg/pr70937.f90
> ===
> *** gcc/testsuite/gfortran.dg/pr70937.f90 (revision 0)
> --- gcc/testsuite/gfortran.dg/pr70937.f90 (working copy)
> ***
> *** 0 
> --- 1,10 
> + ! { dg-do compile }
> + ! { dg-options "-flto" }
> +   SUBROUTINE dbcsr_test_read_args(narg, args)
> + CHARACTER(len=*), DIMENSION(:), &
> +   INTENT(out) :: args
> + CHARACTER(len=80) :: line
> + DO
> +args(narg) = line
> + ENDDO
> +   END SUBROUTINE dbcsr_test_read_args
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

RE: [PATCH 3/4] Add support to run auto-vectorization tests for multiple effective targets

2016-05-06 Thread Matthew Fortune

Robert Suchanek  writes:
> I'm resending this patch as it has been rebased and updated.  I reverted
> a change to check_effective_target_vect_call_lrint procedure because it
> does not use cached result.

Conceptually I think this is a good idea and to the extent that I can
follow TCL code it looks OK. I can't approve this though so need a global
reviewer to comment.

Thanks,
Matthew

> 
> Regards,
> Robert
> 
> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org
> > [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Robert Suchanek
> > Sent: 10 August 2015 13:15
> > To: catherine_mo...@mentor.com; Matthew Fortune
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: [PATCH 3/4] Add support to run auto-vectorization tests for
> > multiple effective targets
> >
> > Hi,
> >
> > This patch allows to run auto-vectorization tests for more than one
> > effective target.  The initial proposal
> >
> > https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02289.html
> >
> > had some issues that have been addressed and should work as expected
> now.
> >
> > The idea was to add a wrapper procedure that would:
> > 1. Iterative over a list of EFFECTIVE_TARGETS e.g. mips_msa,
> mpaired_single.
> > 2. Add necessary compile time options for each effective target.
> > 3. Check if it's possible to compile and/or run on a target, and set
> >dg-do-what-default accordingly.
> > 4. Set the target index to tell check_effective_target_vect_* which
> target is
> >currently being processed.
> > 5. Invoke {gfortran-,g++-,}dg-runtest with the list of vector tests as
> normal.
> >
> > The above required that every vector feature e.g. vect_int that caches
> > the result is capable of tracking what target supports a feature.  The
> > result is saved to a list at an index controlled by the wrapper
> > (et-dg-runtest).  Ports not using this feature, set DEFAULT_VECTFLAGS
> > and the tests should run as they used to.
> >
> > The patch was additionally tested on x86_64-unknown-linux-gnu and
> > aarch64- linux-gnu.
> >
> > Regards,
> > Robert
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/vect/vect.exp: Add and set new global EFFECTIVE_TARGETS.
> Call
> > g++-dg-runtest via et-dg-runtest.
> > * gcc.dg/graphite/graphite.exp: Likewise, but for dg-runtest.
> > * gcc.dg/vect/vect.exp: Likewise.
> > * gfortran.dg/graphite/graphite.exp: Likewise, but for
> > gfortran-dg-runtest.
> > * gfortran.dg/vect/vect.exp: Likewise.
> > * lib/target-supports.exp (check_mpaired_single_hw_available): New.
> > (check_mips_loongson_hw_available): Likewise.
> > (check_effective_target_mpaired_single_runtime): Likewise.
> > (check_effective_target_mips_loongson_runtime): Likewise.
> > (add_options_for_mpaired_single): Likewise.
> > (check_effective_target_vect_int): Add global et_index.
> > Check and save the supported feature for a target selected by
> > the et_index target.  Break long lines where appropriate.  Call
> > et-is-effective-target for MIPS with an argument instead of
> > check_effective_target_* where appropriate.
> > (check_effective_target_vect_intfloat_cvt): Likewise.
> > (check_effective_target_vect_uintfloat_cvt): Likewise.
> > (check_effective_target_vect_floatint_cvt): Likewise.
> > (check_effective_target_vect_floatuint_cvt): Likewise.
> > (check_effective_target_vect_simd_clones): Likewise.
> > (check_effective_target_vect_shift): ewise.
> > (check_effective_target_whole_vector_shift): Likewise.
> > (check_effective_target_vect_bswap): Likewise.
> > (check_effective_target_vect_shift_char): Likewise.
> > (check_effective_target_vect_long): Likewise.
> > (check_effective_target_vect_float): Likewise.
> > (check_effective_target_vect_double): Likewise.
> > (check_effective_target_vect_long_long): Likewise.
> > (check_effective_target_vect_no_int_max): Likewise.
> > (check_effective_target_vect_no_int_add): Likewise.
> > (check_effective_target_vect_no_bitwise): Likewise.
> > (check_effective_target_vect_widen_shift): Likewise.
> > (check_effective_target_vect_no_align): Likewise.
> > (check_effective_target_vect_hw_misalign): Likewise.
> > (check_effective_target_vect_element_align): Likewise.
> > (check_effective_target_vect_condition): Likewise.
> > (check_effective_target_vect_cond_mixed): Likewise.
> > (check_effective_target_vect_char_mult): Likewise.
> > (check_effective_target_vect_short_mult): Likewise.
> > (check_effective_target_vect_int_mult): Likewise.
> > (check_effective_target_vect_extract_even_odd): Likewise.
> > (check_effective_target_vect_interleave): Likewise.
> > (check_effective_target_vect_stridedN): Likewise.
> > (check_effective_target_vect_multiple_sizes): Likewise.
> > (check_effective_target_vect64): Likewise.
> > (check_effective_target_vect_call_copysignf): Likewise.
> >

[PATCH] Fix PR70937

2016-05-06 Thread Richard Biener


The following patch fixes another case of missing DECL_EXPR in the FE.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Ok for trunk?

Thanks,
Richard.

2016-05-06  Richard Biener  

PR fortran/70937
* trans-decl.c (gfc_trans_vla_type_sizes): Add a DECL_EXPR for
the TYPE_DECL as well.

* gfortran.dg/pr70937.f90: New testcase.

Index: gcc/fortran/trans-decl.c
===
*** gcc/fortran/trans-decl.c(revision 235945)
--- gcc/fortran/trans-decl.c(working copy)
*** gfc_trans_vla_type_sizes (gfc_symbol *sy
*** 3818,3823 
--- 3818,3828 
  }
  
gfc_trans_vla_type_sizes_1 (type, body);
+   /* gfc_build_qualified_array may have built this type but left TYPE_NAME
+  pointing to the original type whose type sizes we need to expose to
+  the gimplifier unsharing.  */
+   if (TYPE_NAME (type))
+ gfc_add_expr_to_block (body, build1 (DECL_EXPR, type, TYPE_NAME (type)));
  }
  
  
Index: gcc/testsuite/gfortran.dg/pr70937.f90
===
*** gcc/testsuite/gfortran.dg/pr70937.f90   (revision 0)
--- gcc/testsuite/gfortran.dg/pr70937.f90   (working copy)
***
*** 0 
--- 1,10 
+ ! { dg-do compile }
+ ! { dg-options "-flto" }
+   SUBROUTINE dbcsr_test_read_args(narg, args)
+ CHARACTER(len=*), DIMENSION(:), &
+   INTENT(out) :: args
+ CHARACTER(len=80) :: line
+ DO
+args(narg) = line
+ ENDDO
+   END SUBROUTINE dbcsr_test_read_args

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Martin Liška

Hello.

One more issue I forgot to mention in the previous email:
e) As one can come up with a source code which jumps to a label within
a block scope (use-after-scope-goto-1.c):

// { dg-do run }
// { dg-additional-options "-fsanitize=use-after-scope -fstack-reuse=none" }

int main(int argc, char **argv)
{
  int a = 123;

  if (argc == 0)
  {
int *ptr;
label:
  {
ptr = 
*ptr = 1;
return 0;
  }
  }
  else
goto label;

  return 0;
}

It's necessary to record all local variables in gimplifier and possibly
emit unpoisoning code when a LABEL_EXPR is seen. That results in following 
gimple
output:

label:
  _20 = (unsigned long) 
  _21 = (unsigned long) 4;
  __builtin___asan_unpoison_stack_memory (_20, _21);
  _22 = (unsigned long) 
  _23 = (unsigned long) 8;
  __builtin___asan_unpoison_stack_memory (_22, _23);
  ptr = 
  ptr.0_10 = ptr;
  _24 = (unsigned long) ptr.0_10;
  _25 = _24 >> 3;
  _26 = _25 + 2147450880;
  _27 = (signed char *) _26;
  _28 = *_27;
  _29 = _28 != 0;
  _30 = _24 & 7;
  _31 = (signed char) _30;
  _32 = _31 + 3;
  _33 = _32 >= _28;
  _34 = _29 & _33;
  if (_34 != 0)
goto ;
  else
goto ;

I know that the solution is a big hammer, but it works.

Martin

[PATCH] Introduce tests for -fsanitize=use-after-scope

2016-05-06 Thread Martin Liška

Hi.

This is a new test coverage for the new sanitizer option.

Martin
>From 753bfb3edb12c9f3fd13f320e308556f63330c97 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 4 May 2016 12:57:05 +0200
Subject: [PATCH 2/2] Introduce tests for -fsanitize=use-after-scope

gcc/testsuite/ChangeLog:

2016-05-04  Martin Liska  
	* gcc.dg/asan/use-after-scope-1.c: New test.
	* gcc.dg/asan/use-after-scope-2.c: New test.
	* gcc.dg/asan/use-after-scope-3.c: New test.
	* gcc.dg/asan/use-after-scope-4.c: New test.
	* gcc.dg/asan/use-after-scope-goto-1.c: New test.
---
 gcc/testsuite/gcc.dg/asan/use-after-scope-1.c  | 19 +
 gcc/testsuite/gcc.dg/asan/use-after-scope-2.c  | 48 ++
 gcc/testsuite/gcc.dg/asan/use-after-scope-3.c  | 21 ++
 gcc/testsuite/gcc.dg/asan/use-after-scope-4.c  | 17 
 gcc/testsuite/gcc.dg/asan/use-after-scope-goto-1.c | 22 ++
 5 files changed, 127 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-1.c
 create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-2.c
 create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-3.c
 create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-4.c
 create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-goto-1.c

diff --git a/gcc/testsuite/gcc.dg/asan/use-after-scope-1.c b/gcc/testsuite/gcc.dg/asan/use-after-scope-1.c
new file mode 100644
index 000..b4a4f52
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/use-after-scope-1.c
@@ -0,0 +1,19 @@
+// { dg-do run }
+// { dg-additional-options "-fsanitize=use-after-scope -fstack-reuse=none" }
+// { dg-shouldfail "asan" }
+
+int
+main (void)
+{
+  char *ptr;
+  {
+char my_char[9];
+ptr = _char[0];
+  }
+
+  *(ptr+9) = 'c';
+}
+
+// { dg-output "ERROR: AddressSanitizer: stack-use-after-scope on address.*(\n|\r\n|\r)" }
+// { dg-output "WRITE of size 1 at.*" }
+// { dg-output ".*'my_char' <== Memory access at offset \[0-9\]* overflows this variable.*" }
diff --git a/gcc/testsuite/gcc.dg/asan/use-after-scope-2.c b/gcc/testsuite/gcc.dg/asan/use-after-scope-2.c
new file mode 100644
index 000..3f99fb7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/use-after-scope-2.c
@@ -0,0 +1,48 @@
+// { dg-do run }
+// { dg-additional-options "-fsanitize=use-after-scope -fstack-reuse=none" }
+// { dg-shouldfail "asan" }
+
+int *bar (int *x, int *y) { return y; }
+
+int foo (void)
+{
+  char *p;
+  {
+char a = 0;
+p = 
+  }
+
+  if (*p)
+return 1;
+  else
+return 0;
+}
+
+int
+main (void)
+{
+  char *ptr;
+  {
+char my_char[9];
+ptr = _char[0];
+  }
+
+  int a[16];
+  int *p, *q = a;
+  {
+int b[16];
+p = bar (a, b);
+  }
+  bar (a, q);
+  {
+int c[16];
+q = bar (a, c);
+  }
+  int v = *bar (a, q);
+  return v;
+}
+
+
+// { dg-output "ERROR: AddressSanitizer: stack-use-after-scope on address.*(\n|\r\n|\r)" }
+// { dg-output "READ of size 4 at.*" }
+// { dg-output ".*'c' <== Memory access at offset \[0-9\]* is inside this variable.*" }
diff --git a/gcc/testsuite/gcc.dg/asan/use-after-scope-3.c b/gcc/testsuite/gcc.dg/asan/use-after-scope-3.c
new file mode 100644
index 000..abd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/use-after-scope-3.c
@@ -0,0 +1,21 @@
+// { dg-do run }
+// { dg-additional-options "-fsanitize=use-after-scope -fstack-reuse=none" }
+// { dg-shouldfail "asan" }
+
+int
+main (void)
+{
+  char *ptr;
+  char *ptr2;
+  {
+char my_char[9];
+ptr = _char[0];
+__builtin_memcpy (, , sizeof (ptr2));
+  }
+
+  *(ptr2+9) = 'c';
+}
+
+// { dg-output "ERROR: AddressSanitizer: stack-use-after-scope on address.*(\n|\r\n|\r)" }
+// { dg-output "WRITE of size 1 at.*" }
+// { dg-output ".*'my_char' <== Memory access at offset \[0-9\]* overflows this variable.*" }
diff --git a/gcc/testsuite/gcc.dg/asan/use-after-scope-4.c b/gcc/testsuite/gcc.dg/asan/use-after-scope-4.c
new file mode 100644
index 000..7254c9c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/use-after-scope-4.c
@@ -0,0 +1,17 @@
+// { dg-do run }
+// { dg-additional-options "-fsanitize=use-after-scope -fstack-reuse=none" }
+
+int
+__attribute__((no_sanitize_address))
+main (void)
+{
+  char *ptr;
+  char *ptr2;
+  {
+char my_char[9];
+ptr = _char[0];
+__builtin_memcpy (, , sizeof (ptr2));
+  }
+
+  *(ptr2+9) = 'c';
+}
diff --git a/gcc/testsuite/gcc.dg/asan/use-after-scope-goto-1.c b/gcc/testsuite/gcc.dg/asan/use-after-scope-goto-1.c
new file mode 100644
index 000..7bb8ba4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/use-after-scope-goto-1.c
@@ -0,0 +1,22 @@
+// { dg-do run }
+// { dg-additional-options "-fsanitize=use-after-scope -fstack-reuse=none" }
+
+int main(int argc, char **argv)
+{
+  int a = 123;
+
+  if (argc == 0)
+  {
+int *ptr;
+label:
+  {
+	ptr = 
+*ptr = 1;
+	return 0;
+  }
+  }
+  else
+goto label;
+
+  return 0;
+}
-- 
2.8.1

[SH][committed] Remove deprecated options

2016-05-06 Thread Oleg Endo

Hi,

The attached patch removes some deprecated SH options.
Tested on sh-elf with 'make all-gcc' and with 'make info dvi pdf'.

Committed as r235960.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh.opt (madjust-unroll, minvalid-symbols, msoft-atomic,
mspace): Remove deprecated options.
* doc/invoke.texi (SH options): Remove -mspace.diff --git a/gcc/config/sh/sh.opt b/gcc/config/sh/sh.opt
index f9b02c5..2a94c9b 100644
--- a/gcc/config/sh/sh.opt
+++ b/gcc/config/sh/sh.opt
@@ -181,10 +181,6 @@ maccumulate-outgoing-args
 Target Report Var(TARGET_ACCUMULATE_OUTGOING_ARGS) Init(1)
 Reserve space for outgoing arguments in the function prologue.
 
-madjust-unroll
-Target Ignore
-Does nothing.  Preserved for backward compatibility.
-
 mb
 Target Report RejectNegative InverseMask(LITTLE_ENDIAN)
 Generate code in big endian mode.
@@ -245,10 +241,6 @@ minline-ic_invalidate
 Target Report Var(TARGET_INLINE_IC_INVALIDATE)
 inline code to invalidate instruction cache entries after setting up nested function trampolines.
 
-minvalid-symbols
-Target Report Mask(INVALID_SYMBOLS) Condition(SUPPORT_ANY_SH5)
-Assume symbols might be invalid.
-
 misize
 Target Report RejectNegative Mask(DUMPISIZE)
 Annotate assembler instructions with estimated addresses.
@@ -279,10 +271,6 @@ mrenesas
 Target Mask(HITACHI)
 Follow Renesas (formerly Hitachi) / SuperH calling conventions.
 
-msoft-atomic
-Target Undocumented Alias(matomic-model=, soft-gusa, none)
-Deprecated.  Use -matomic= instead to select the atomic model.
-
 matomic-model=
 Target Report RejectNegative Joined Var(sh_atomic_model_str)
 Specify the model for atomic operations.
@@ -291,10 +279,6 @@ mtas
 Target Report RejectNegative Var(TARGET_ENABLE_TAS)
 Use tas.b instruction for __atomic_test_and_set.
 
-mspace
-Target RejectNegative Alias(Os)
-Deprecated.  Use -Os instead.
-
 multcost=
 Target RejectNegative Joined UInteger Var(sh_multcost) Init(-1)
 Cost to assume for a multiply insn.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 821f8fd..3d398a5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1049,7 +1049,7 @@ See RS/6000 and PowerPC Options.
 -mb  -ml  -mdalign  -mrelax @gol
 -mbigtable -mfmovd -mrenesas -mno-renesas -mnomacsave @gol
 -mieee -mno-ieee -mbitops  -misize  -minline-ic_invalidate -mpadstruct @gol
--mspace -mprefergot  -musermode -multcost=@var{number} -mdiv=@var{strategy} @gol
+-mprefergot -musermode -multcost=@var{number} -mdiv=@var{strategy} @gol
 -mdivsi3_libfunc=@var{name} -mfixed-range=@var{register-range} @gol
 -maccumulate-outgoing-args @gol
 -matomic-model=@var{atomic-model} @gol

[PATCH, RFC] Introduce -fsanitize=use-after-scope

2016-05-06 Thread Martin Liška

Hello.

I've started working on the patch couple of month go, basically after
a brief discussion with Jakub on IRC.

I'm sending the initial version which can successfully run instrumented
tramp3d, postgresql server and Inkscape. It catches the basic set of
examples which are added in following patch.

The implementation is quite straightforward as works in following steps:

1) Every local variable stack slot is poisoned at the very beginning of a 
function (RTL emission)
2) In gimplifier, once we spot a DECL_EXPR, a variable is unpoisoned (by 
emitting ASAN_MARK builtin)
and the variable is marked as addressable
3) Similarly, BIND_EXPR is the place where we poison the variable (scope exit)
4) At the very end of the function, we clean up the poisoned memory
5) The builtins are expanded to call to libsanitizer run-time library 
(__asan_poison_stack_memory, __asan_unpoison_stack_memory)
6) As the use-after-scope stuff is already included in libsanitizer, no change 
is needed for the library

Example:

int
main (void)
{
  char *ptr;
  {
char my_char[9];
ptr = _char[0];
  }

  *(ptr+9) = 'c';
}

./a.out 
=
==12811==ERROR: AddressSanitizer: stack-use-after-scope on address 
0x7ffec9bcff69 at pc 0x00400a73 bp 0x7ffec9bcfef0 sp 0x7ffec9bcfee8
WRITE of size 1 at 0x7ffec9bcff69 thread T0
#0 0x400a72 in main (/tmp/a.out+0x400a72)
#1 0x7f100824860f in __libc_start_main (/lib64/libc.so.6+0x2060f)
#2 0x400868 in _start (/tmp/a.out+0x400868)

Address 0x7ffec9bcff69 is located in stack of thread T0 at offset 105 in frame
#0 0x400945 in main (/tmp/a.out+0x400945)

  This frame has 2 object(s):
[32, 40) 'ptr'
[96, 105) 'my_char' <== Memory access at offset 105 overflows this variable
HINT: this may be a false positive if your program uses some custom stack 
unwind mechanism or swapcontext
  (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x400a72) in main
Shadow bytes around the buggy address:
  0x100059371f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059371fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059371fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059371fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059371fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100059371fe0: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 f8[f8]f4 f4
  0x100059371ff0: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059372000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059372010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059372020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100059372030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:   fa
  Heap right redzone:  fb
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack partial redzone:   f4
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init order:   f6
  Poisoned by user:f7
  Container overflow:  fc
  Array cookie:ac
  Intra object redzone:bb
  ASan internal:   fe
  Left alloca redzone: ca
  Right alloca redzone:cb
==12811==ABORTING

As mentioned, it's request for comment as it still has couple of limitations:
a) VLA are not supported, which should make sense as we are unable to allocate 
a stack slot for that
b) we can possibly strip some instrumentation in situations where a variable is 
introduced in a very first BB (RTL poisoning is superfluous).
Similarly for a very last BB of a function, we can strip end of scope poisoning 
(and RTL unpoisoning). I'll do that incrementally.
c) We require -fstack-reuse=none option, maybe it worth to warn a user if 
-fsanitize=use-after-scope is provided without the option?
d) An instrumented binary is quite slow (~20x for tramp3d) as every function 
call produces many memory read/writes. I'm wondering whether
we should provide a faster alternative (like instrument just variables that 
have address taken) ?

Patch can survive bootstrap and regression tests on x86_64-linux-gnu.

Thanks for feedback.
Martin
>From 242bcaf2faded33291d05a5c4c5306f849de Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 3 May 2016 15:35:22 +0200
Subject: [PATCH 1/2] Introduce -fsanitize=use-after-scope

gcc/ChangeLog:

2016-05-03  Martin Liska  

	* asan.c (enum asan_check_flags): Cut the enum from here.
	(asan_poison_stack_variables): New function.
	(asan_emit_stack_protection): Poison stack variables.
	(asan_expand_mark_ifn): New function.
	* asan.h (enum asan_mark_flags): Paste here the enum from source
	file.
	(asan_sanitize_stack_p): Move function

Re: [PATCH] Fix memory leak in tree-inliner

2016-05-06 Thread Richard Biener

On Fri, May 6, 2016 at 12:10 PM, Martin Liška  wrote:
> Hi.
>
> I've spotted couple of occurrences of following memory leak seen by valgrind:
>
>   malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>   operator new(unsigned long) (new_op.cc:50)
>   remap_dependence_clique(copy_body_data*, unsigned short) (tree-inline.c:845)
>   remap_gimple_op_r(tree_node**, int*, void*) (tree-inline.c:954)
>   walk_tree_1(tree_node**, tree_node* (*)(tree_node**, int*, void*), void*, 
> hash_set >*, tree_node* 
> (*)(tree_node**, int*, tree_node* (*)(tree_node**, int*, void*), void*, 
> hash_set >*)) (tree.c:11498)
>   walk_tree_1(tree_node**, tree_node* (*)(tree_node**, int*, void*), void*, 
> hash_set >*, tree_node* 
> (*)(tree_node**, int*, tree_node* (*)(tree_node**, int*, void*), void*, 
> hash_set >*)) (tree.c:11815)
>   walk_tree_1(tree_node**, tree_node* (*)(tree_node**, int*, void*), void*, 
> hash_set >*, tree_node* 
> (*)(tree_node**, int*, tree_node* (*)(tree_node**, int*, void*), void*, 
> hash_set >*)) (tree.c:11815)
>   walk_tree_1(tree_node**, tree_node* (*)(tree_node**, int*, void*), void*, 
> hash_set >*, tree_node* 
> (*)(tree_node**, int*, tree_node* (*)(tree_node**, int*, void*), void*, 
> hash_set >*)) (tree.c:11815)
>   copy_debug_stmt (tree-inline.c:2869)
>   copy_debug_stmts (tree-inline.c:2927)
>   copy_body(copy_body_data*, long, int, basic_block_def*, basic_block_def*, 
> basic_block_def*) (tree-inline.c:2961)
>   tree_function_versioning(tree_node*, tree_node*, vec va_gc, vl_embed>*, bool, bitmap_head*, bool, bitmap_head*, basic_block_def*) 
> (tree-inline.c:5907)
>   save_inline_function_body (ipa-inline-transform.c:485)
>   inline_transform(cgraph_node*) (ipa-inline-transform.c:541)
>
> Problem is that the id->dependence_map is released before copy_debug_stmts is 
> called.
>
> Patch can bootstrap and survives regression tests on x86_64-linux-gnu.
> Ready for trunk?

Hmmm.  But this means debug stmt remapping calls
remap_dependence_clique which may end up bumping
cfun->last_clique and thus may change code generation.

So what debug stmts contain MEM_REFs?  If you put an assert
processing_debug_stmt == 0 in
remap_dependence_clique I'd like to see a testcase that triggers it.

Richard.

> Martin

Re: [PATCH] Fix coding style in tree-ssa-uninit.c

2016-05-06 Thread Richard Biener

On Fri, May 6, 2016 at 12:06 PM, Martin Liška  wrote:
> On 11/26/2015 10:04 PM, Bernd Schmidt wrote:
>> As I said previously, the one to just replace whitespace is ok for now. 
>> Please ping the other one when stage1 opens (I expect it'll need changes by 
>> then).
>>
>>
>> Bernd
>
> Hello.
>
> This part of the part remains to be installed from the previous stage3.
> I've rebased the patch and rerun reg on x88_64-linux-gnu system.
>
> Ready to be installed?
Ok.

Richard.

> Thanks,
> Martin

Re: [PATCH PR70935, Regression 6,7]

2016-05-06 Thread Richard Biener

On Thu, May 5, 2016 at 5:19 PM, Yuri Rumyantsev  wrote:
> Hi All,
>
> Here is a simple patch which cures the problem with nonlegal
> transformation of endless loop. THe fix is simply check that guard
> edge destination is not loop latch block.
>
> Bootstrapping and regression testing did not show any new failures.
> Is it OK for trunk?

Ok for trunk and branch.

THanks,
RIchard.

> ChangeLog:
> 2016-05-05  Yuri Rumyantsev  
>
> PR debug/70935
> * tree-ssa-loop-unswitch.c (find_loop_guard): Reject guard edge with
> loop latch destination.
>
> gcc/testsuite/ChangeLog
> * gcc.dg/torture/pr70935.c: New test.

Re: [patch] Coalesce in more cases

2016-05-06 Thread Richard Biener

On Thu, May 5, 2016 at 5:08 PM, Eric Botcazou  wrote:
> Hi,
>
> gimple_can_coalesce_p is rather picky about the conditions under which SSA
> names can be coalesced.  In particular, when it comes to the type, it's:
>
>   /* Now check the types.  If the types are the same, then we should
>  try to coalesce V1 and V2.  */
>   tree t1 = TREE_TYPE (name1);
>   tree t2 = TREE_TYPE (name2);
>   if (t1 == t2)
>
> or
>
>   /* If the types are not the same, check for a canonical type match.  This
>  (for example) allows coalescing when the types are fundamentally the
>  same, but just have different names.
>
>  Note pointer types with different address spaces may have the same
>  canonical type.  Those are rejected for coalescing by the
>  types_compatible_p check.  */
>   if (TYPE_CANONICAL (t1)
>   && TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2)
>   && types_compatible_p (t1, t2))
> goto check_modes;
>
> The test on TYPE_CANONICAL looks overkill to me.  It's needed in the non-
> optimized case (-fno-tree-coalesce-vars) as compute_samebase_partition_bases
> uses TYPE_CANONICAL to discriminate partitions, but it's not needed in the
> optimized case as compute_optimized_partition_bases uses the full information.
> For example, in Ada it prevents subtypes from being coalesced with types and
> in C++ it prevents different pointer types from being coalesced.  Hence the
> attached patch, which lifts the restriction in the optimized case.
>
> Tested on x86_64-suse-linux, OK for the mainline?

Ok.

Richard.

>
> 2016-05-05  Eric Botcazou  
>
> * tree-ssa-coalesce.c (gimple_can_coalesce_p): In the optimized case,
> allow coalescing if the types are compatible.
>
> --
> Eric Botcazou

Re: Missing pointer dereference in tree-affine.c

2016-05-06 Thread Richard Biener

On Thu, May 5, 2016 at 4:19 PM, Richard Sandiford
 wrote:
> wide_int_constant_multiple_p used:
>
>   if (*mult_set && mult != 0)
> return false;
>
> to check whether we had previously seen a nonzero multiple, but "mult" is
> a pointer to the previous value rather than the previous value itself.
>
> Noticed by inspection while working on another patch, so I don't have a
> testcase.  I tried adding an assert for combinations that were wrongly
> rejected before but it didn't trigger during a bootstrap and regtest.
>
> Tested on x86_64-linux-gnu.  OK to install?

Ok.

Richard.

> Thanks,
> Richard
>
>
> gcc/
> * tree-affine.c (wide_int_constant_multiple_p): Add missing
> pointer dereference.
>
> diff --git a/gcc/tree-affine.c b/gcc/tree-affine.c
> index 32f2301..4884241 100644
> --- a/gcc/tree-affine.c
> +++ b/gcc/tree-affine.c
> @@ -769,7 +769,7 @@ wide_int_constant_multiple_p (const widest_int , 
> const widest_int ,
>
>if (val == 0)
>  {
> -  if (*mult_set && mult != 0)
> +  if (*mult_set && *mult != 0)
> return false;
>*mult_set = true;
>*mult = 0;
>

[SH][committed] Fix length of ic_invalidate_line_sh4a pattern

2016-05-06 Thread Oleg Endo

Hi,

The attached patch fixes the length of the ic_invalidate_line_sh4a
pattern.

Tested on sh-elf with

make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r235957.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh.md (ic_invalidate_line_sh4a): Fix insn length. diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 62a03f3..b054c9e 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -5401,7 +5401,7 @@
 	 "	synco"		"\n"
 	 "	icbi	@%0";
 }
-  [(set_attr "length" "16")	;; FIXME: Why 16 and not 6?  Looks like typo.
+  [(set_attr "length" "6")
(set_attr "type" "cwb")])
 
 (define_expand "mov"

Re: [PATCH 3/4] Extract deferred-location handling from jit

2016-05-06 Thread Richard Biener

On Wed, May 4, 2016 at 10:49 PM, David Malcolm  wrote:
> In order to faithfully load RTL dumps that contain references to
> source locations, the RTL frontend needs to be able to parse file
> and line information and turn then into location_t values.
>
> Unfortunately, the libcpp API makes it rather fiddly to create
> location_t values from a sequence of arbitrary file/line pairs: the
> API assumes that the locations are created in ascending order as
> if we were parsing the source file, but as we read an RTL dump,
> the insns could be jumping forwards and backwards in lines and
> between files.  Also, if we want to support column numbers, the
> presence of a very high column number could exceed the bits available
> in a line_map_ordinary for storing it.
>
> The JIT has some code for handling this, in gcc/jit/jit-playback.[ch],
> (since the JIT support source location information, and doesn't impose
> any ordering requirement on users of the API).
>
> This patch moves the relevant code from
>   gcc/jit/jit-playback.[ch]
> into a new pair of files:
>   gcc/deferred-locations.[ch]
>
> The idea is that a deferred_locations instances manages these
> "deferred locations"; they are created, and then all of the location_t
> values are created at once by calling
>   deferred_locations::add_to_line_table
> After this call, the actual location_t values can be read from out of
> deferred_location instances.
>
> There are some suboptimal parts of the code (some linear searches, and
> the use of gc), but it's mostly a move of existing code from out of the
> jit subdirectory and into "gcc" proper for reuse by the RTL frontend.
>
> This is likely to be useful for the gimple frontend as well.
>
> OK for trunk?

The LTO FE faces similar issues - see lto-streamer-in.c

Richard.

> gcc/ChangeLog:
> * Makefile.in (OBJS): Add deferred-locations.o.
> * deferred-locations.c: New file, adapted from parts of
> jit/jit-playback.c.
> * deferred-locations.h: New file, adapted from parts of
> jit/jit-playback.h.
>
> gcc/jit/ChangeLog:
> * jit-common.h: Include deferred-locations.h.
> (gcc::jit::playback::source_file): Remove forward decl.
> (gcc::jit::playback::source_line): Likewise.
> (gcc::jit::playback::location): Replace forward decl, with
> a typedef, aliasing deferred_location.
> * jit-playback.c (gcc::jit::playback::context::context): Remove
> create call on m_source_files.
> (line_comparator): Move to deferred-locations.c.
> (location_comparator): Likewise.
> (handle_locations): Move logic to deferred-locations.c, as
> deferred_locations::add_to_line_table.
> (get_recording_loc): New function.
> (gcc::jit::playback::context::add_error): Call get_recording_loc
> as a function, rather than as a method.
> (gcc::jit::playback::context::add_error_va): Likewise.
> (gcc::jit::playback::context::get_source_file): Update return type
> to reflect move of source_file to deferred-locations.h.
> Replace body with a call to m_deferred_locations.get_source_file.
> (gcc::jit::playback::source_file::source_file): Move to
> deferred-locations.h, losing the namespaces.
> (gcc::jit::playback::source_file::finalizer): Likewise.
> (gcc::jit::playback::source_file::get_source_line): Likewise.
> (gcc::jit::playback::source_line::source_line): Likewise.
> (gcc::jit::playback::source_line::finalizer): Likewise.
> (gcc::jit::playback::source_line::get_location): Likewise.
> (gcc::jit::playback::location::location): Likewise, renaming to
> deferred_location.
> * jit-playback.h: Include deferred-locations.h.
> (gcc::jit::playback::context::m_source_files): Replace field with
> m_deferred_locations.
> (gcc::jit::playback::source_file): Move to deferred-locations.h,
> losing the namespaces.
> (gcc::jit::playback::source_line): Likewise.
> (gcc::jit::playback::location): Likewise, renaming to
> deferred_location.  Eliminate get_recording_loc accessor and
> m_recording_loc field in favor of get_user_data and m_user_data
> respectively.
> ---
>  gcc/Makefile.in  |   1 +
>  gcc/deferred-locations.c | 240 
> +++
>  gcc/deferred-locations.h | 139 +++
>  gcc/jit/jit-common.h |   5 +-
>  gcc/jit/jit-playback.c   | 194 --
>  gcc/jit/jit-playback.h   |  73 +-
>  6 files changed, 402 insertions(+), 250 deletions(-)
>  create mode 100644 gcc/deferred-locations.c
>  create mode 100644 gcc/deferred-locations.h
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 6c5adc0..c61f303 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1239,6 +1239,7 @@ OBJS = \
> dce.o \
>

Re: [PATCH 1/4] Make argv const char ** in read_md_files etc

2016-05-06 Thread Richard Biener

On Wed, May 4, 2016 at 10:49 PM, David Malcolm  wrote:
> This patch makes the argv param to read_md_files const, needed
> so that the RTL frontend can call it on a const char *.
>
> While we're at it, it similarly makes const the argv for all
> of the "main" functions of the various gen*.
>
> OK for trunk?

Ok.

Richard.

> gcc/ChangeLog:
> * genattr-common.c (main): Convert argv from
> char ** to const char **.
> * genattr.c (main): Likewise.
> * genattrtab.c (main): Likewise.
> * genautomata.c (initiate_automaton_gen): Likewise.
> (main): Likewise.
> * gencodes.c (main): Likewise.
> * genconditions.c (main): Likewise.
> * genconfig.c (main): Likewise.
> * genconstants.c (main): Likewise.
> * genemit.c (main): Likewise.
> * genenums.c (main): Likewise.
> * genextract.c (main): Likewise.
> * genflags.c (main): Likewise.
> * genmddeps.c (main): Likewise.
> * genopinit.c (main): Likewise.
> * genoutput.c (main): Likewise.
> * genpeep.c (main): Likewise.
> * genpreds.c (main): Likewise.
> * genrecog.c (main): Likewise.
> * gensupport.c (init_rtx_reader_args_cb): Likewise.
> (init_rtx_reader_args): Likewise.
> * gensupport.h (init_rtx_reader_args_cb): Likewise.
> (init_rtx_reader_args): Likewise.
> * gentarget-def.c (main): Likewise.
> * read-md.c (read_md_files): Likewise.
> * read-md.h (read_md_files): Likewise.
> ---
>  gcc/genattr-common.c | 2 +-
>  gcc/genattr.c| 2 +-
>  gcc/genattrtab.c | 2 +-
>  gcc/genautomata.c| 4 ++--
>  gcc/gencodes.c   | 2 +-
>  gcc/genconditions.c  | 2 +-
>  gcc/genconfig.c  | 2 +-
>  gcc/genconstants.c   | 2 +-
>  gcc/genemit.c| 2 +-
>  gcc/genenums.c   | 2 +-
>  gcc/genextract.c | 2 +-
>  gcc/genflags.c   | 2 +-
>  gcc/genmddeps.c  | 2 +-
>  gcc/genopinit.c  | 2 +-
>  gcc/genoutput.c  | 4 ++--
>  gcc/genpeep.c| 4 ++--
>  gcc/genpreds.c   | 2 +-
>  gcc/genrecog.c   | 2 +-
>  gcc/gensupport.c | 4 ++--
>  gcc/gensupport.h | 5 +++--
>  gcc/gentarget-def.c  | 2 +-
>  gcc/read-md.c| 2 +-
>  gcc/read-md.h| 2 +-
>  23 files changed, 29 insertions(+), 28 deletions(-)
>
> diff --git a/gcc/genattr-common.c b/gcc/genattr-common.c
> index e073faf..a11fbf7 100644
> --- a/gcc/genattr-common.c
> +++ b/gcc/genattr-common.c
> @@ -61,7 +61,7 @@ gen_attr (md_rtx_info *info)
>  }
>
>  int
> -main (int argc, char **argv)
> +main (int argc, const char **argv)
>  {
>bool have_delay = false;
>bool have_sched = false;
> diff --git a/gcc/genattr.c b/gcc/genattr.c
> index c6db37f..656a9a7 100644
> --- a/gcc/genattr.c
> +++ b/gcc/genattr.c
> @@ -138,7 +138,7 @@ find_tune_attr (rtx exp)
>  }
>
>  int
> -main (int argc, char **argv)
> +main (int argc, const char **argv)
>  {
>bool have_annul_true = false;
>bool have_annul_false = false;
> diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c
> index c956527..d39d4a7 100644
> --- a/gcc/genattrtab.c
> +++ b/gcc/genattrtab.c
> @@ -5197,7 +5197,7 @@ handle_arg (const char *arg)
>  }
>
>  int
> -main (int argc, char **argv)
> +main (int argc, const char **argv)
>  {
>struct attr_desc *attr;
>struct insn_def *id;
> diff --git a/gcc/genautomata.c b/gcc/genautomata.c
> index e3a6c59..dcde604 100644
> --- a/gcc/genautomata.c
> +++ b/gcc/genautomata.c
> @@ -9300,7 +9300,7 @@ parse_automata_opt (const char *str)
>  /* The following is top level function to initialize the work of
> pipeline hazards description translator.  */
>  static void
> -initiate_automaton_gen (char **argv)
> +initiate_automaton_gen (const char **argv)
>  {
>const char *base_name;
>
> @@ -9592,7 +9592,7 @@ write_automata (void)
>  }
>
>  int
> -main (int argc, char **argv)
> +main (int argc, const char **argv)
>  {
>progname = "genautomata";
>
> diff --git a/gcc/gencodes.c b/gcc/gencodes.c
> index e0dd32a..3b0fc5c 100644
> --- a/gcc/gencodes.c
> +++ b/gcc/gencodes.c
> @@ -47,7 +47,7 @@ gen_insn (md_rtx_info *info)
>  }
>
>  int
> -main (int argc, char **argv)
> +main (int argc, const char **argv)
>  {
>progname = "gencodes";
>
> diff --git a/gcc/genconditions.c b/gcc/genconditions.c
> index 8abf1c2..e4f45b0 100644
> --- a/gcc/genconditions.c
> +++ b/gcc/genconditions.c
> @@ -212,7 +212,7 @@ write_writer (void)
>  }
>
>  int
> -main (int argc, char **argv)
> +main (int argc, const char **argv)
>  {
>progname = "genconditions";
>
> diff --git a/gcc/genconfig.c b/gcc/genconfig.c
> index b6ca35a..815e30d 100644
> --- a/gcc/genconfig.c
> +++ b/gcc/genconfig.c
> @@ -269,7 +269,7 @@ gen_peephole2 (md_rtx_info *info)
>  }
>
>  int
> -main (int argc, char **argv)
> +main (int argc, const char **argv)
>  {
>progname = "genconfig";
>
> diff --git a/gcc/genconstants.c b/gcc/genconstants.c
> index b96bc50..c10e3e3 100644
> ---

Re: [PATCH 2/4] Move name_to_pass_map into class pass_manager

2016-05-06 Thread Richard Biener

On Wed, May 4, 2016 at 10:49 PM, David Malcolm  wrote:
> The RTL frontend needs to be able to lookup passes by name.
>
> passes.c has global state name_to_pass_map (albeit static, scoped
> to passes.c), for use by enable_disable_pass.
>
> Move it to be a field of class pass_manager, and add
> a get_pass_by_name method.
>
> OK for trunk?

Ok.

> gcc/ChangeLog:
> * pass_manager.h (pass_manager::register_pass_name): New method.
> (pass_manager::get_pass_by_name): New method.
> (pass_manager::create_pass_tab): New method.
> (pass_manager::m_name_to_pass_map): New field.
> * passes.c (name_to_pass_map): Delete global in favor of field
> "m_name_to_pass_map" of pass_manager.
> (register_pass_name): Rename from a function to...
> (pass_manager::register_pass_name): ...this method, updating
> for renaming of global "name_to_pass_map" to field
> "m_name_to_pass_map".
> (create_pass_tab): Rename from a function to...
> (pass_manager::create_pass_tab): ...this method, updating
> for renaming of global "name_to_pass_map" to field.
> (get_pass_by_name): Rename from a function to...
> (pass_manager::get_pass_by_name): ...this method.
> (enable_disable_pass): Convert use of get_pass_by_name to
> a method call, locating the pass_manager singleton.
> ---
>  gcc/pass_manager.h |  6 ++
>  gcc/passes.c   | 34 +++---
>  2 files changed, 21 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
> index 4f89d31..464e25f 100644
> --- a/gcc/pass_manager.h
> +++ b/gcc/pass_manager.h
> @@ -78,6 +78,10 @@ public:
>opt_pass *get_pass_peephole2 () const { return pass_peephole2_1; }
>opt_pass *get_pass_profile () const { return pass_profile_1; }
>
> +  void register_pass_name (opt_pass *pass, const char *name);
> +
> +  opt_pass *get_pass_by_name (const char *name);
> +
>  public:
>/* The root of the compilation pass tree, once constructed.  */
>opt_pass *all_passes;
> @@ -95,9 +99,11 @@ public:
>  private:
>void set_pass_for_id (int id, opt_pass *pass);
>void register_dump_files (opt_pass *pass);
> +  void create_pass_tab () const;
>
>  private:
>context *m_ctxt;
> +  hash_map *m_name_to_pass_map;
>
>/* References to all of the individual passes.
>   These fields are generated via macro expansion.
> diff --git a/gcc/passes.c b/gcc/passes.c
> index 2b70846..0565cfa 100644
> --- a/gcc/passes.c
> +++ b/gcc/passes.c
> @@ -66,8 +66,6 @@ using namespace gcc;
> The variable current_pass is also used for statistics and plugins.  */
>  opt_pass *current_pass;
>
> -static void register_pass_name (opt_pass *, const char *);
> -
>  /* Most passes are single-instance (within their context) and thus don't
> need to implement cloning, but passes that support multiple instances
> *must* provide their own implementation of the clone method.
> @@ -844,21 +842,19 @@ pass_manager::register_dump_files (opt_pass *pass)
>while (pass);
>  }
>
> -static hash_map *name_to_pass_map;
> -
>  /* Register PASS with NAME.  */
>
> -static void
> -register_pass_name (opt_pass *pass, const char *name)
> +void
> +pass_manager::register_pass_name (opt_pass *pass, const char *name)
>  {
> -  if (!name_to_pass_map)
> -name_to_pass_map = new hash_map (256);
> +  if (!m_name_to_pass_map)
> +m_name_to_pass_map = new hash_map (256);
>
> -  if (name_to_pass_map->get (name))
> +  if (m_name_to_pass_map->get (name))
>  return; /* Ignore plugin passes.  */
>
> -  const char *unique_name = xstrdup (name);
> -  name_to_pass_map->put (unique_name, pass);
> +  const char *unique_name = xstrdup (name);
> +  m_name_to_pass_map->put (unique_name, pass);
>  }
>
>  /* Map from pass id to canonicalized pass name.  */
> @@ -882,14 +878,14 @@ passes_pass_traverse (const char *const , opt_pass 
> *const , void *)
>  /* The function traverses NAME_TO_PASS_MAP and creates a pass info
> table for dumping purpose.  */
>
> -static void
> -create_pass_tab (void)
> +void
> +pass_manager::create_pass_tab (void) const
>  {
>if (!flag_dump_passes)
>  return;
>
> -  pass_tab.safe_grow_cleared (g->get_passes ()->passes_by_id_size + 1);
> -  name_to_pass_map->traverse  (NULL);
> +  pass_tab.safe_grow_cleared (passes_by_id_size + 1);
> +  m_name_to_pass_map->traverse  (NULL);
>  }
>
>  static bool override_gate_status (opt_pass *, tree, bool);
> @@ -960,10 +956,10 @@ pass_manager::dump_passes () const
>
>  /* Returns the pass with NAME.  */
>
> -static opt_pass *
> -get_pass_by_name (const char *name)
> +opt_pass *
> +pass_manager::get_pass_by_name (const char *name)
>  {
> -  opt_pass **p = name_to_pass_map->get (name);
> +  opt_pass **p = m_name_to_pass_map->get (name);
>

Re: [PATCH] tail merge ICE

2016-05-06 Thread Richard Biener

On Wed, May 4, 2016 at 7:25 PM, Nathan Sidwell  wrote:
> This patch fixes an ICE Thomas observed in tree-ssa-tail-merge.c:
>
> On 05/03/16 06:34, Thomas Schwinge wrote:
>
>> I'm also seeing the following regression for C and C++,
>> libgomp.oacc-c-c++-common/loop-auto-1.c with -O2:
>>
>> source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c:
>> In function 'vector_1._omp_fn.0':
>>
>> source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c:104:9:
>> internal compiler error: Segmentation fault
>>  #pragma acc parallel num_workers (32) vector_length(32)
>> copy(ary[0:size]) firstprivate (size)
>>  ^
>>
>> #4  0x00f73d46 in internal_error
>> (gmsgid=gmsgid@entry=0x105be63 "%s")
>> at [...]/source-gcc/gcc/diagnostic.c:1270
>> #5  0x009fccb0 in crash_signal (signo=)
>> at [...]/source-gcc/gcc/toplev.c:333
>> #6  
>> #7  0x00beaf2e in same_succ_flush_bb (bb=,
>> bb=)
>> at [...]/source-gcc/gcc/hash-table.h:919
>> #8  0x00bec499 in same_succ_flush_bbs (bbs=)
>> at [...]/source-gcc/gcc/tree-ssa-tail-merge.c:823
>
>
> What's happening is we're trying to delete an object from a hash table, and
> asserting that we did indeed find the object.  The hash's equality function
> compares gimple sequences and ends up calling gimple_call_same_target_p.
> That returns false if the call is IFN_UNIQUE, and so the deletion fails to
> find anything.  IFN_UNIQUE function calls should not compare equal, but they
> should compare eq (in the lispy sense).
>
> The local fix is to augment the hash compare function with a check for
> pointer equality.  That way deleting items from the table works and
> comparing different sequences functions as before.
>
> The more general fix is to augment gimple_call_same_target_p so that unique
> fns are eq but not equal.  A cursory look at the other users of that
> function did not indicate this currently causes a problem, but IMHO it is
> odd for a value to not compare the same as itself -- though IEEE NaNs do
> that :)
>
> I placed the pointer equality comparison in gimple_call_same_target_p after
> the check for unique_fn_p, as I suspect that it is the rare case for that to
> be called with the same gimple call object for both parameters.  Although
> pointer equality would be applicable to all cases, in most instances it's
> going to be false.
>
> Of course, the gimple_call_same_target_p change fixes the problem on its
> own, but the local change to same_succ::equal seems beneficial on its own
> merits.
>
> ok?

Ok.

Richard.

> nathan
> --
> Nathan Sidwell

[SH][committed] Remove some workaround combine patterns

2016-05-06 Thread Oleg Endo

Hi,

The attached patch removes some workaround combine patterns.  As far as
I remember this issue has been addressed by some match.pd patterns.  In
any case, CSiBE code size shows no difference and the SH specific
testcases mentioned in the code pass without the patterns.

Tested on sh-elf with

make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r235956.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh.md (*cmpeqsi_t): Remove combine insn pattern and similar
corresponding combine split pattern.diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index b054c9e..f606e29 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -909,22 +909,6 @@
   FAIL;
 })
 
-;; FIXME: For some reason, on SH4A and SH2A combine fails to simplify this
-;; pattern by itself.  What this actually does is:
-;;	x == 0: (1 >> 0-0) & 1 = 1
-;;	x != 0: (1 >> 0-x) & 1 = 0
-;; Without this the test pr51244-8.c fails on SH2A and SH4A.
-(define_insn_and_split "*cmpeqsi_t"
-  [(set (reg:SI T_REG)
-	(and:SI (lshiftrt:SI
-		  (const_int 1)
-		  (neg:SI (match_operand:SI 0 "arith_reg_operand" "r")))
-		(const_int 1)))]
-  "TARGET_SH1"
-  "#"
-  "&& 1"
-  [(set (reg:SI T_REG) (eq:SI (match_dup 0) (const_int 0)))])
-
 (define_insn "cmpgtsi_t"
   [(set (reg:SI T_REG)
 	(gt:SI (match_operand:SI 0 "arith_reg_operand" "r,r")
@@ -1229,29 +1213,6 @@
 			   (label_ref (match_dup 2))
 			   (pc)))])
 
-;; FIXME: Similar to the *cmpeqsi_t pattern above, for some reason, on SH4A
-;; and SH2A combine fails to simplify this pattern by itself.
-;; What this actually does is:
-;;	x == 0: (1 >> 0-0) & 1 = 1
-;;	x != 0: (1 >> 0-x) & 1 = 0
-;; Without this the test pr51244-8.c fails on SH2A and SH4A.
-(define_split
-  [(set (pc)
-	(if_then_else
-	  (eq (and:SI (lshiftrt:SI
-			(const_int 1)
-			(neg:SI (match_operand:SI 0 "arith_reg_operand" "")))
-		  (const_int 1))
-	  (const_int 0))
-	  (label_ref (match_operand 2))
-	  (pc)))
-   (clobber (reg:SI T_REG))]
-  "TARGET_SH1"
-  [(set (reg:SI T_REG) (eq:SI (match_dup 0) (const_int 0)))
-   (set (pc) (if_then_else (eq (reg:SI T_REG) (const_int 0))
-			   (label_ref (match_dup 2))
-			   (pc)))])
-
 ;; FIXME: These don't seem to have any effect on the generated cbranch code
 ;;	  anymore, but only on some register allocation choices.
 (define_split

[SH][committed] Fix PR 58219

2016-05-06 Thread Oleg Endo

Hi,

The attached patch fixes PR 58219.

Tested on sh-elf with

make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r235954.

gcc/ChangeLog:
PR target/58219
* config/sh/predicates.md (long_displacement_mem_operand): New.
* config/sh/sh.md (movsi_i): Allow for SH2A, disallow for any FPU.
Add movi20, movi20s alternatives.  Adjust length attribute for
alternatives.
(movsi_ie): Allow for any FPU.  Adjust length attribute for
alternatives.
(movsi_i_lowpart): Add movi20, movi20s alternatives.  Adjust length
attribute for alternatives.
(*mov): Use long_displacement_mem_operand for length attribute.
(*movdi_i, movdf_k, movdf_i4, movsf_i, movsf_ie, movsf_ie_ra): Adjust
length attribute for alternatives.

gcc/testsuite/ChangeLog:
PR target/58219
* gcc.target/sh/pr58219.c: New tests.diff --git a/gcc/config/sh/predicates.md b/gcc/config/sh/predicates.md
index b582637..4de90af 100644
--- a/gcc/config/sh/predicates.md
+++ b/gcc/config/sh/predicates.md
@@ -230,6 +230,12 @@
(match_test "sh_disp_addr_displacement (op)
 		<= sh_max_mov_insn_displacement (GET_MODE (op), false)")))
 
+;; Returns true if OP is a displacement address that does not fit into
+;; a 16 bit (non-SH2A) memory load / store insn.
+(define_predicate "long_displacement_mem_operand"
+  (and (match_operand 0 "displacement_mem_operand")
+   (not (match_operand 0 "short_displacement_mem_operand"
+
 ;; Returns true if OP is a post-increment addressing mode memory reference.
 (define_predicate "post_inc_mem"
   (and (match_code "mem")
diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index e704e2a..39270ce 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -5181,20 +5142,23 @@
 ;; t/r must come after r/r, lest reload will try to reload stuff like
 ;; (set (subreg:SI (mem:QI (plus:SI (reg:SI SP_REG) (const_int 12)) 0) 0)
 ;; (made from (set (subreg:SI (reg:QI ###) 0) ) into T.
+;; Notice that although this pattern allows movi20 and movi20s on non-SH2A,
+;; those alternatives will not be taken, as they will be converted into
+;; PC-relative loads.
 (define_insn "movsi_i"
   [(set (match_operand:SI 0 "general_movdst_operand"
-	"=r,r,r,r,r,r,m,<,<,x,l,x,l,r")
+			"=r,r,  r,  r,  r, r,r,r,m,<,<,x,l,x,l,r")
 	(match_operand:SI 1 "general_movsrc_operand"
-	 "Q,r,I08,mr,x,l,r,x,l,r,r,>,>,i"))]
-  "TARGET_SH1
-   && ! TARGET_SH2E
-   && ! TARGET_SH2A
+			" Q,r,I08,I20,I28,mr,x,l,r,x,l,r,r,>,>,i"))]
+  "TARGET_SH1 && !TARGET_FPU_ANY
&& (register_operand (operands[0], SImode)
|| register_operand (operands[1], SImode))"
   "@
 	mov.l	%1,%0
 	mov	%1,%0
 	mov	%1,%0
+	movi20	%1,%0
+	movi20s	%1,%0
 	mov.l	%1,%0
 	sts	%1,%0
 	sts	%1,%0
@@ -5206,9 +5170,27 @@
 	lds.l	%1,%0
 	lds.l	%1,%0
 	fake	%1,%0"
-  [(set_attr "type" "pcload_si,move,movi8,load_si,mac_gp,prget,store,mac_mem,
-		 pstore,gp_mac,prset,mem_mac,pload,pcload_si")
-   (set_attr "length" "*,*,*,*,*,*,*,*,*,*,*,*,*,*")])
+  [(set_attr "type" "pcload_si,move,movi8,move,move,load_si,mac_gp,prget,store,
+		 mac_mem,pstore,gp_mac,prset,mem_mac,pload,pcload_si")
+   (set_attr_alternative "length"
+ [(const_int 2)
+  (const_int 2)
+  (const_int 2)
+  (const_int 4)
+  (const_int 4)
+  (if_then_else (match_operand 1 "long_displacement_mem_operand")
+		(const_int 4) (const_int 2))
+  (const_int 2)
+  (const_int 2)
+  (if_then_else (match_operand 0 "long_displacement_mem_operand")
+		(const_int 4) (const_int 2))
+  (const_int 2)
+  (const_int 2)
+  (const_int 2)
+  (const_int 2)
+  (const_int 2)
+  (const_int 2)
+  (const_int 2)])])
 
 ;; t/r must come after r/r, lest reload will try to reload stuff like
 ;; (subreg:SI (reg:SF FR14_REG) 0) into T (compiling stdlib/strtod.c -m3e -O2)
@@ -5216,12 +5198,15 @@
 ;; will require a reload.
 ;; ??? We can't include f/f because we need the proper FPSCR setting when
 ;; TARGET_FMOVD is in effect, and mode switching is done before reload.
+;; Notice that although this pattern allows movi20 and movi20s on non-SH2A,
+;; those alternatives will not be taken, as they will be converted into
+;; PC-relative loads.
 (define_insn "movsi_ie"
   [(set (match_operand:SI 0 "general_movdst_operand"
-	"=r,r,r,r,r,r,r,r,mr,<,<,x,l,x,l,y,<,r,y,r,*f,y,*f,y")
+	"=r,r,  r,  r,  r, r,r,r,mr,<,<,x,l,x,l,y,<,r,y,r,*f, y,*f,y")
 	(match_operand:SI 1 "general_movsrc_operand"
-	 "Q,r,I08,I20,I28,mr,x,l,r,x,l,r,r,>,>,>,y,i,r,y,y,*f,*f,y"))]
-  "(TARGET_SH2E || TARGET_SH2A)
+	" Q,r,I08,I20,I28,mr,x,l, r,x,l,r,r,>,>,>,y,i,r,y, y,*f,*f,y"))]
+  "TARGET_SH1 && TARGET_FPU_ANY
&& ((register_operand (operands[0], SImode)
 	&& !fpscr_operand (operands[0], SImode))
|| (register_operand (operands[1], SImode)
@@ -5261,14 +5246,12 @@
   (const_int 2)
   (const_int 4)

[PATCH] Fix PR70960

2016-05-06 Thread Richard Biener


The following fixes $subject.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-05-06  Richard Biener  

PR tree-optimization/70960
* tree-if-conv.c (ifcvt_walk_pattern_tree): Handle non-SSA ops.

* gfortran.fortran-torture/compile/pr70960.f90: New testcase.

Index: gcc/tree-if-conv.c
===
*** gcc/tree-if-conv.c  (revision 235945)
--- gcc/tree-if-conv.c  (working copy)
*** ifcvt_walk_pattern_tree (tree var, vec

[PATCH] Fix memory leak in tree-inliner

2016-05-06 Thread Martin Liška

Hi.

I've spotted couple of occurrences of following memory leak seen by valgrind:

  malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
  operator new(unsigned long) (new_op.cc:50)
  remap_dependence_clique(copy_body_data*, unsigned short) (tree-inline.c:845)
  remap_gimple_op_r(tree_node**, int*, void*) (tree-inline.c:954)
  walk_tree_1(tree_node**, tree_node* (*)(tree_node**, int*, void*), void*, 
hash_set >*, tree_node* 
(*)(tree_node**, int*, tree_node* (*)(tree_node**, int*, void*), void*, 
hash_set >*)) (tree.c:11498)
  walk_tree_1(tree_node**, tree_node* (*)(tree_node**, int*, void*), void*, 
hash_set >*, tree_node* 
(*)(tree_node**, int*, tree_node* (*)(tree_node**, int*, void*), void*, 
hash_set >*)) (tree.c:11815)
  walk_tree_1(tree_node**, tree_node* (*)(tree_node**, int*, void*), void*, 
hash_set >*, tree_node* 
(*)(tree_node**, int*, tree_node* (*)(tree_node**, int*, void*), void*, 
hash_set >*)) (tree.c:11815)
  walk_tree_1(tree_node**, tree_node* (*)(tree_node**, int*, void*), void*, 
hash_set >*, tree_node* 
(*)(tree_node**, int*, tree_node* (*)(tree_node**, int*, void*), void*, 
hash_set >*)) (tree.c:11815)
  copy_debug_stmt (tree-inline.c:2869)
  copy_debug_stmts (tree-inline.c:2927)
  copy_body(copy_body_data*, long, int, basic_block_def*, basic_block_def*, 
basic_block_def*) (tree-inline.c:2961)
  tree_function_versioning(tree_node*, tree_node*, vec*, bool, bitmap_head*, bool, bitmap_head*, basic_block_def*) 
(tree-inline.c:5907)
  save_inline_function_body (ipa-inline-transform.c:485)
  inline_transform(cgraph_node*) (ipa-inline-transform.c:541)

Problem is that the id->dependence_map is released before copy_debug_stmts is 
called.

Patch can bootstrap and survives regression tests on x86_64-linux-gnu.
Ready for trunk?

Martin
>From c67fe154a24b939edad28a823c2ecb16f6e056c1 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 6 Jan 2016 11:41:31 +0100
Subject: [PATCH] Properly release memory in copy_body

gcc/ChangeLog:

2016-01-06  Martin Liska  

	* tree-inline.c (copy_cfg_body): Remove eh_map deletion.
	(copy_body): Release it here.
---
 gcc/tree-inline.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 19f202e..92169e6 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -2807,17 +2807,6 @@ copy_cfg_body (copy_body_data * id, gcov_type count, int frequency_scale,
   entry_block_map->aux = NULL;
   exit_block_map->aux = NULL;
 
-  if (id->eh_map)
-{
-  delete id->eh_map;
-  id->eh_map = NULL;
-}
-  if (id->dependence_map)
-{
-  delete id->dependence_map;
-  id->dependence_map = NULL;
-}
-
   return new_fndecl;
 }
 
@@ -2963,6 +2952,17 @@ copy_body (copy_body_data *id, gcov_type count, int frequency_scale,
 			new_entry);
   copy_debug_stmts (id);
 
+  if (id->eh_map)
+{
+  delete id->eh_map;
+  id->eh_map = NULL;
+}
+  if (id->dependence_map)
+{
+  delete id->dependence_map;
+  id->dependence_map = NULL;
+}
+
   return body;
 }
 
-- 
2.8.1

[PATCH] Fix coding style in tree-ssa-uninit.c

2016-05-06 Thread Martin Liška

On 11/26/2015 10:04 PM, Bernd Schmidt wrote:
> As I said previously, the one to just replace whitespace is ok for now. 
> Please ping the other one when stage1 opens (I expect it'll need changes by 
> then).
> 
> 
> Bernd

Hello.

This part of the part remains to be installed from the previous stage3.
I've rebased the patch and rerun reg on x88_64-linux-gnu system.

Ready to be installed?
Thanks,
Martin
>From 8344ef65208e86493c0af82e41c031543717af45 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 2 May 2016 14:49:14 +0200
Subject: [PATCH] Manual changes to GCC coding style in tree-ssa-uninit.c

gcc/ChangeLog:

2016-05-02  Martin Liska  

	* tree-ssa-uninit.c: Apply manual changes
	to the GNU coding style.
	(prune_uninit_phi_opnds): Rename from
	prune_uninit_phi_opnds_in_unrealizable_paths.
---
 gcc/tree-ssa-uninit.c | 401 ++
 1 file changed, 179 insertions(+), 222 deletions(-)

diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index ea3ceb8..941d575 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -35,16 +35,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfg.h"
 
 /* This implements the pass that does predicate aware warning on uses of
-   possibly uninitialized variables. The pass first collects the set of
-   possibly uninitialized SSA names. For each such name, it walks through
-   all its immediate uses. For each immediate use, it rebuilds the condition
-   expression (the predicate) that guards the use. The predicate is then
+   possibly uninitialized variables.  The pass first collects the set of
+   possibly uninitialized SSA names.  For each such name, it walks through
+   all its immediate uses.  For each immediate use, it rebuilds the condition
+   expression (the predicate) that guards the use.  The predicate is then
examined to see if the variable is always defined under that same condition.
This is done either by pruning the unrealizable paths that lead to the
default definitions or by checking if the predicate set that guards the
defining paths is a superset of the use predicate.  */
 
-
 /* Pointer set of potentially undefined ssa names, i.e.,
ssa names that are defined by phi with operands that
are not defined or potentially undefined.  */
@@ -56,7 +55,7 @@ static hash_set *possibly_undefined_names = 0;
 #define MASK_EMPTY(mask) (mask == 0)
 
 /* Returns the first bit position (starting from LSB)
-   in mask that is non zero. Returns -1 if the mask is empty.  */
+   in mask that is non zero.  Returns -1 if the mask is empty.  */
 static int
 get_mask_first_set_bit (unsigned mask)
 {
@@ -80,13 +79,12 @@ has_undefined_value_p (tree t)
 	  && possibly_undefined_names->contains (t)));
 }
 
-
-
 /* Like has_undefined_value_p, but don't return true if TREE_NO_WARNING
is set on SSA_NAME_VAR.  */
 
 static inline bool
-uninit_undefined_value_p (tree t) {
+uninit_undefined_value_p (tree t)
+{
   if (!has_undefined_value_p (t))
 return false;
   if (SSA_NAME_VAR (t) && TREE_NO_WARNING (SSA_NAME_VAR (t)))
@@ -112,7 +110,7 @@ uninit_undefined_value_p (tree t) {
 /* Emit a warning for EXPR based on variable VAR at the point in the
program T, an SSA_NAME, is used being uninitialized.  The exact
warning text is in MSGID and DATA is the gimple stmt with info about
-   the location in source code. When DATA is a GIMPLE_PHI, PHIARG_IDX
+   the location in source code.  When DATA is a GIMPLE_PHI, PHIARG_IDX
gives which argument of the phi node to take the location from.  WC
is the warning code.  */
 
@@ -149,8 +147,7 @@ warn_uninit (enum opt_code wc, tree t, tree expr, tree var,
   else
 location = DECL_SOURCE_LOCATION (var);
   location = linemap_resolve_location (line_table, location,
-   LRK_SPELLING_LOCATION,
-   NULL);
+   LRK_SPELLING_LOCATION, NULL);
   cfun_loc = DECL_SOURCE_LOCATION (cfun->decl);
   xloc = expand_location (location);
   floc = expand_location (cfun_loc);
@@ -161,10 +158,8 @@ warn_uninit (enum opt_code wc, tree t, tree expr, tree var,
   if (location == DECL_SOURCE_LOCATION (var))
 	return;
   if (xloc.file != floc.file
-	  || linemap_location_before_p (line_table,
-	location, cfun_loc)
-	  || linemap_location_before_p (line_table,
-	cfun->function_end_locus,
+	  || linemap_location_before_p (line_table, location, cfun_loc)
+	  || linemap_location_before_p (line_table, cfun->function_end_locus,
 	location))
 	inform (DECL_SOURCE_LOCATION (var), "%qD was declared here", var);
 }
@@ -178,8 +173,8 @@ warn_uninitialized_vars (bool warn_possibly_uninitialized)
 
   FOR_EACH_BB_FN (bb, cfun)
 {
-  bool always_executed = dominated_by_p (CDI_POST_DOMINATORS,
-	 single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)), bb);
+  basic_block succ = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+  bool always_executed = dominated_by_p

[gomp4.5] Allow more than 64 clauses in gfc_match_omp_clauses

2016-05-06 Thread Jakub Jelinek

Hi!

With 32 OpenMP clauses and 27 further OpenACC ones, I can't add 10 further
clauses I need for OpenMP.

So, this patch implements using C++ classes a framework where the code
can use mostly what it used, yet it is now up to 128 bits, while being not
really more expensive at -O2 than passing two separate uint64_t masks and
knowing which mask it is in.

The changes that are needed is that one has to use
omp_mask () | OMP_CLAUSE_XXX | OMP_CLAUSE_YYY | OMP_CLAUSE_ZZZ
or
omp_mask (OMP_CLAUSE_XXX) | OMP_CLAUSE_YYY | OMP_CLAUSE_ZZZ
instead of just
OMP_CLAUSE_XXX | OMP_CLAUSE_YYY | OMP_CLAUSE_ZZZ
when creating the bit sets and when removing some bit (i.e. and not
operation) one needs to use & ~(omp_mask (OMP_CLAUSE_VVV))
instead of & ~OMP_CLAUSE_VVV.  Testing for bits is the same as before.

Regtested on x86_64-linux and i686-linux, committed to gomp-4_5-branch.

2016-05-06  Jakub Jelinek  

* openmp.c (enum omp_mask1, enum omp_mask2): New enums.
Change all OMP_CLAUSE_* defines into enum values, and change their
values from ((uint64_t) 1 << bit) to just bit.
(omp_mask, omp_inv_mask): New classes.  Add ctors and operators.
(gfc_match_omp_clauses): Change mask argument from uint64_t to
const omp_mask.  Assert OMP_MASK1_LAST and OMP_MASK2_LAST are
at most 64.
(OACC_PARALLEL_CLAUSES, OACC_KERNELS_CLAUSES, OACC_DATA_CLAUSES,
OACC_LOOP_CLAUSES, OACC_HOST_DATA_CLAUSES, OACC_DECLARE_CLAUSES,
OACC_UPDATE_CLAUSES, OACC_ENTER_DATA_CLAUSES, OACC_EXIT_DATA_CLAUSES,
OACC_WAIT_CLAUSES, OACC_ROUTINE_CLAUSES, OMP_PARALLEL_CLAUSES,
OMP_DECLARE_SIMD_CLAUSES, OMP_DO_CLAUSES, OMP_SECTIONS_CLAUSES,
OMP_SIMD_CLAUSES, OMP_TASK_CLAUSES, OMP_TARGET_CLAUSES,
OMP_TARGET_DATA_CLAUSES, OMP_TARGET_UPDATE_CLAUSES,
OMP_TEAMS_CLAUSES, OMP_DISTRIBUTE_CLAUSES): Replace first or only
OMP_CLAUSE_* value in bitset with omp_mask (OMP_CLAUSE_*).
(OMP_SINGLE_CLAUSES): Define.
(match_omp): Change mask argument from unsigned int to
const omp_mask.
(gfc_match_omp_distribute_parallel_do_simd, gfc_match_omp_do_simd,
gfc_match_omp_parallel_do_simd,
gfc_match_omp_target_teams_distribute_parallel_do_simd,
gfc_match_omp_teams_distribute_parallel_do_simd): Use
& ~(omp_mask (OMP_CLAUSE_*)) instead of & ~OMP_CLAUSE_*.
(gfc_match_omp_single): Use OMP_SINGLE_CLAUSES.
(gfc_match_omp_cancel, gfc_match_omp_end_single): Use
omp_mask (OMP_CLAUSE_*) instead of OMP_CLAUSE_*.

--- gcc/fortran/openmp.c.jj 2016-05-05 16:11:17.0 +0200
+++ gcc/fortran/openmp.c2016-05-06 11:23:47.126442317 +0200
@@ -539,67 +539,173 @@ cleanup:
   return MATCH_ERROR;
 }
 
-#define OMP_CLAUSE_PRIVATE ((uint64_t) 1 << 0)
-#define OMP_CLAUSE_FIRSTPRIVATE((uint64_t) 1 << 1)
-#define OMP_CLAUSE_LASTPRIVATE ((uint64_t) 1 << 2)
-#define OMP_CLAUSE_COPYPRIVATE ((uint64_t) 1 << 3)
-#define OMP_CLAUSE_SHARED  ((uint64_t) 1 << 4)
-#define OMP_CLAUSE_COPYIN  ((uint64_t) 1 << 5)
-#define OMP_CLAUSE_REDUCTION   ((uint64_t) 1 << 6)
-#define OMP_CLAUSE_IF  ((uint64_t) 1 << 7)
-#define OMP_CLAUSE_NUM_THREADS ((uint64_t) 1 << 8)
-#define OMP_CLAUSE_SCHEDULE((uint64_t) 1 << 9)
-#define OMP_CLAUSE_DEFAULT ((uint64_t) 1 << 10)
-#define OMP_CLAUSE_ORDERED ((uint64_t) 1 << 11)
-#define OMP_CLAUSE_COLLAPSE((uint64_t) 1 << 12)
-#define OMP_CLAUSE_UNTIED  ((uint64_t) 1 << 13)
-#define OMP_CLAUSE_FINAL   ((uint64_t) 1 << 14)
-#define OMP_CLAUSE_MERGEABLE   ((uint64_t) 1 << 15)
-#define OMP_CLAUSE_ALIGNED ((uint64_t) 1 << 16)
-#define OMP_CLAUSE_DEPEND  ((uint64_t) 1 << 17)
-#define OMP_CLAUSE_INBRANCH((uint64_t) 1 << 18)
-#define OMP_CLAUSE_LINEAR  ((uint64_t) 1 << 19)
-#define OMP_CLAUSE_NOTINBRANCH ((uint64_t) 1 << 20)
-#define OMP_CLAUSE_PROC_BIND   ((uint64_t) 1 << 21)
-#define OMP_CLAUSE_SAFELEN ((uint64_t) 1 << 22)
-#define OMP_CLAUSE_SIMDLEN ((uint64_t) 1 << 23)
-#define OMP_CLAUSE_UNIFORM ((uint64_t) 1 << 24)
-#define OMP_CLAUSE_DEVICE  ((uint64_t) 1 << 25)
-#define OMP_CLAUSE_MAP ((uint64_t) 1 << 26)
-#define OMP_CLAUSE_TO  ((uint64_t) 1 << 27)
-#define OMP_CLAUSE_FROM((uint64_t) 1 << 28)
-#define OMP_CLAUSE_NUM_TEAMS   ((uint64_t) 1 << 29)
-#define OMP_CLAUSE_THREAD_LIMIT((uint64_t) 1 << 30)
-#define OMP_CLAUSE_DIST_SCHEDULE   ((uint64_t) 1 << 31)
-
-/* OpenACC 2.0 clauses. */
-#define OMP_CLAUSE_ASYNC   ((uint64_t) 1 << 32)
-#define OMP_CLAUSE_NUM_GANGS   ((uint64_t) 1 << 33)
-#define OMP_CLAUSE_NUM_WORKERS ((uint64_t) 1 << 34)
-#define OMP_CLAUSE_VECTOR_LENGTH

[SH][committed] Add some more missing div0s cases

2016-05-06 Thread Oleg Endo

Hi,

The attached patch addresses some of the remaining things as mentioned
in the PR.

Tested on sh-elf with

make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r235952.

Cheers,
Oleg

gcc/ChangeLog:
PR target/52933
* config/sh/sh.md (*cmp_div0s_7, *cmp_div0s_8): Add div0s variants.
* config/sh/sh.c (sh_rtx_costs): Add another div0s case.

gcc/testsuite/ChangeLog:
PR target/52933
* gcc.target/sh/pr52933-1.c (test_31, test_32, test_33, test_34,
test_35, test_36, test_37, test_38, test_39, test_40): New sub-tests.
Adjust expected instruction counts.
* gcc.target/sh/pr52933-2.c: Adjust expected instruction counts.diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index ebdb523..809f679 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -3209,6 +3209,15 @@ sh_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
 	  *total = 1; //COSTS_N_INSNS (1);
 	  return true;
 	}
+
+  /* div0s variant.  */
+  if (GET_CODE (XEXP (x, 0)) == XOR
+	  && GET_CODE (XEXP (XEXP (x, 0), 0)) == XOR
+	  && CONST_INT_P (XEXP (XEXP (x, 0), 1)))
+	{
+	  *total = 1;
+	  return true;
+	}
   return false;
 
 /* The cost of a sign or zero extend depends on whether the source is a
diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 0ab76b5..e704e2a 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -1103,6 +1103,97 @@
 	(lshiftrt:SI (xor:SI (match_dup 0) (match_dup 1)) (const_int 31)))
(set (reg:SI T_REG) (xor:SI (reg:SI T_REG) (const_int 1)))])
 
+;; In some cases, it might be shorter to get a tested bit into bit 31 and
+;; use div0s.  Otherwise it's usually better to just leave the xor and tst
+;; sequence.  The only thing we can try to do here is avoiding the large
+;; tst constant.
+(define_insn_and_split "*cmp_div0s_7"
+  [(set (reg:SI T_REG)
+	(zero_extract:SI (xor:SI (match_operand:SI 0 "arith_reg_operand")
+ (match_operand:SI 1 "arith_reg_operand"))
+			 (const_int 1)
+			 (match_operand 2 "const_int_operand")))]
+  "TARGET_SH1 && can_create_pseudo_p ()
+   && (INTVAL (operands[2]) == 7 || INTVAL (operands[2]) == 15
+   || INTVAL (operands[2]) == 23 || INTVAL (operands[2]) == 29
+   || INTVAL (operands[2]) == 30 || INTVAL (operands[2]) == 31)"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  const int bitpos = INTVAL (operands[2]);
+
+  rtx op0 = gen_reg_rtx (SImode);
+  rtx op1 = gen_reg_rtx (SImode);
+
+  if (bitpos == 23 || bitpos == 30 || bitpos == 29)
+{
+  emit_insn (gen_ashlsi3 (op0, operands[0], GEN_INT (31 - bitpos)));
+  emit_insn (gen_ashlsi3 (op1, operands[1], GEN_INT (31 - bitpos)));
+}
+  else if (bitpos == 15)
+{
+  emit_insn (gen_extendhisi2 (op0, gen_lowpart (HImode, operands[0])));
+  emit_insn (gen_extendhisi2 (op1, gen_lowpart (HImode, operands[1])));
+}
+  else if (bitpos == 7)
+{
+  emit_insn (gen_extendqisi2 (op0, gen_lowpart (QImode, operands[0])));
+  emit_insn (gen_extendqisi2 (op1, gen_lowpart (QImode, operands[1])));
+}
+  else if (bitpos == 31)
+{
+  op0 = operands[0];
+  op1 = operands[1];
+}
+  else
+gcc_unreachable ();
+
+  emit_insn (gen_cmp_div0s (op0, op1));
+  DONE;
+})
+
+;; For bits 0..7 using a xor and tst #imm,r0 sequence seems to be better.
+;; Thus allow the following patterns only for higher bit positions where
+;; we it's more likely to save the large tst constant.
+(define_insn_and_split "*cmp_div0s_8"
+  [(set (reg:SI T_REG)
+	(eq:SI (zero_extract:SI (match_operand:SI 0 "arith_reg_operand")
+(const_int 1)
+(match_operand 2 "const_int_operand"))
+	   (zero_extract:SI (match_operand:SI 1 "arith_reg_operand")
+(const_int 1)
+(match_dup 2]
+  "TARGET_SH1 && can_create_pseudo_p ()
+   && (INTVAL (operands[2]) == 15
+   || INTVAL (operands[2]) == 23 || INTVAL (operands[2]) == 29
+   || INTVAL (operands[2]) == 30 || INTVAL (operands[2]) == 31)"
+  "#"
+  "&& 1"
+  [(set (reg:SI T_REG)
+	(zero_extract:SI (xor:SI (match_dup 0) (match_dup 1))
+			 (const_int 1) (match_dup 2)))
+   (set (reg:SI T_REG) (xor:SI (reg:SI T_REG) (const_int 1)))])
+
+(define_insn_and_split "*cmp_div0s_9"
+  [(set (reg:SI T_REG)
+	(zero_extract:SI (xor:SI (xor:SI (match_operand:SI 0 "arith_reg_operand")
+	 (match_operand:SI 1 "arith_reg_operand"))
+ (match_operand 2 "const_int_operand"))
+			 (const_int 1)
+			 (match_operand 3 "const_int_operand")))]
+  "TARGET_SH1 && can_create_pseudo_p ()
+   && (INTVAL (operands[2]) & 0x) == (1U << INTVAL (operands[3]))
+   && (INTVAL (operands[3]) == 15
+   || INTVAL (operands[3]) == 23 || INTVAL (operands[3]) == 29
+   || INTVAL (operands[3]) == 30 || INTVAL (operands[3]) == 31)"
+  "#"
+  "&& 1"
+  [(set (reg:SI T_REG)
+	(zero_extract:SI (xor:SI (match_dup 0) (match_dup 1))
+			 (const_int 1) (match_dup 3)))
+   (set (reg:SI T_REG) (xor:SI (reg:SI

Re: Fix for PR68159 in Libiberty Demangler (6)

2016-05-06 Thread Jakub Jelinek

On Fri, May 06, 2016 at 05:01:14PM +0800, Marcel Böhme wrote:
> The patch that is attached now is bootstrapped and regression tested on 
> x86_64-pc-linux-gnu.
> 
> > 
> > This file is used not just in the various tools like binutils or gdb, but
> > also in libstdc++, where it used e.g. in the std::terminate handler,
> > which I think can't just xmalloc_failed, that function can be called already
> > in out of memory situation, where heap allocation is not possible.
> 
> Earlier, I was working on libiberty/cplus-dem.c where xmalloc was explicitly 
> available. So, I assumed it would be in libiberty/cp-demangle.c as well.
> 
> > Why INT_MAX?
> > I'd have thought that the allocation size is computed in size_t and
> > thus it should be SIZE_MAX, (~(size_t) 0) or similar?
> 
> In two separate patches (the first in cplus-dem.c and the second in 
> cp-demangle.c) it was decided that we should import limit.h and otherwise 
> define INT_MAX, then check against INT_MAX.
> However, I removed the overflow check since it is not clear what the 
> behaviour should be when the integer actually overflows. Apparently, it can’t 
> abort. Still, this remains an unresolved security concern if actually inputs 
> can actually be generated that result in overflow.

If you just want an array, restricting the size including the sizeof
to fit into int makes no sense, you want to guard it against overflows
during the multiplication.

Anyway, perhaps I'm misremembering, if there is a mode that really can't
fail due to allocation failures or not, we need to deal with that.
Ian or Jason, can all the demangle users allocate heap memory or not?
And, if __cxa_demangle can fail, there is some allocation_failure stuff
in the file.

> @@ -4125,26 +4111,20 @@ cplus_demangle_print_callback (int options,
>struct d_print_info dpi;
>  
>d_print_init (, callback, opaque, dc);
> +  
> +  dpi.copy_templates = (struct d_print_template *)
> +  malloc (dpi.num_copy_templates * sizeof (*dpi.copy_templates));

The indentation is still wrong.  Either malloc would need to be below (struct
or it should be
  dpi.copy_templates
= (struct d_print_template *) malloc (...)
But much more importantly, you don't handle the allocation failure in
anyway, so if malloc fails, you'll just segfault.

> +  dpi.saved_scopes = (struct d_saved_scope *) 
> +  malloc (dpi.num_saved_scopes * sizeof (*dpi.saved_scopes));

See above.
>  
> +  free(dpi.copy_templates);
> +  free(dpi.saved_scopes);

Formatting, missing space before (.

Jakub

[SH][committed] Add another rotcr variant

2016-05-06 Thread Oleg Endo

Hi,

The attached patch adds another combine pattern variant for the SH
rotcr instruction.

Tested on sh-elf with

make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r235950.

Cheers,
Oleg

gcc/ChangeLog:
PR target/54089
* config/sh/sh.md (*rotcr): Add another variant.

gcc/testsuite/ChangeLog:
PR target/54089
* gcc.target/sh/pr54089-1.c (test_24): Add new sub-test.diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 2a8fbc8..0ab76b5 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -3359,6 +3359,22 @@
   DONE;
 })
 
+(define_insn_and_split "*rotcr"
+  [(set (match_operand:SI 0 "arith_reg_dest")
+	(ior:SI (lshiftrt:SI (match_operand:SI 1 "arith_reg_operand")
+			 (const_int 1))
+		(const_int -2147483648))) ;; 0x8000
+   (clobber (reg:SI T_REG))]
+  "TARGET_SH1"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+{
+  emit_insn (gen_sett ());
+  emit_insn (gen_rotcr (operands[0], operands[1], get_t_reg_rtx ()));
+  DONE;
+})
+
 ;; rotcr combine patterns for rotating in the negated T_REG value.
 (define_insn_and_split "*rotcr_neg_t"
   [(set (match_operand:SI 0 "arith_reg_dest")
diff --git a/gcc/testsuite/gcc.target/sh/pr54089-1.c b/gcc/testsuite/gcc.target/sh/pr54089-1.c
index 64f79eb..8b6a729 100644
--- a/gcc/testsuite/gcc.target/sh/pr54089-1.c
+++ b/gcc/testsuite/gcc.target/sh/pr54089-1.c
@@ -1,7 +1,8 @@
 /* Check that the rotcr instruction is generated.  */
 /* { dg-do compile }  */
 /* { dg-options "-O2" } */
-/* { dg-final { scan-assembler-times "rotcr" 24 } } */
+/* { dg-final { scan-assembler-times "rotcr" 25 } } */
+/* { dg-final { scan-assembler-times "sett" 1 } } */
 /* { dg-final { scan-assembler-times "shll\t" 1 } } */
 /* { dg-final { scan-assembler-not "and\t#1" } }  */
 /* { dg-final { scan-assembler-not "cmp/pl" } }  */
@@ -173,3 +174,9 @@ test_23 (unsigned int a, int b, int c)
   bool r = b != c;
   return ((a >> 31) | (r << 31));
 }
+
+unsigned int
+test_24 (unsigned int a)
+{
+  return (a >> 1) | (1 << 31);
+}

Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue

2016-05-06 Thread Dominik Vogt

> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 21f21c9..4d48afd 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
...
> @@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct 
> stack_vars_data *data)
>  
>/* If there were any, allocate space.  */
>if (large_size > 0)
> - large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
> -large_align, true);
> + {
> +   large_allocsize = GEN_INT (large_size);
> +   get_dynamic_stack_size (_allocsize, 0, large_align, NULL);
...

See below.

> @@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct 
> stack_vars_data *data)
> /* Large alignment is only processed in the last pass.  */
> if (pred)
>   continue;
> +
> +   if (large_allocsize && ! large_allocation_done)
> + {
> +   /* Allocate space the virtual stack vars area in the prologue.
> +*/
> +   HOST_WIDE_INT loffset;
> +
> +   loffset = alloc_stack_frame_space (INTVAL (large_allocsize),
> +  PREFERRED_STACK_BOUNDARY);

1) Should this use PREFERRED_STACK_BOUNDARY or just STACK_BOUNDARY?
2) Is this the right place for rounding up, or should 
   it be done above, maybe in get_dynamic_stack_size?

Not sure whether this is the right 

> +   large_base = get_dynamic_stack_base (loffset, large_align);
> +   large_allocation_done = true;
> + }
> gcc_assert (large_base != NULL);
>  
> large_alloc += alignb - 1;

> diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c 
> b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> new file mode 100644
> index 000..e06a16c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> @@ -0,0 +1,14 @@
> +/* Verify that run time aligned local variables are aloocated in the prologue
> +   in one pass together with normal local variables.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O0" } */
> +
> +extern void bar (void *, void *, void *);
> +void foo (void)
> +{
> +  int i;
> +  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
> +  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
> +  bar (, _aligned_1, _aligned_2);
> +}
> +/* { dg-final { scan-assembler-times "cfi_def_cfa_offset" 2 { target { 
> s390*-*-* } } } } */

I've no idea how to test this on other targets, or how to express
the test in a target independent way.  The scan-assembler-times
does not work on x86_64.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: [PATCH GCC]Proving no-trappness for array ref in tree if-conv using loop niter information.

2016-05-06 Thread Bin.Cheng

On Fri, May 6, 2016 at 10:40 AM, Bin.Cheng  wrote:
> On Tue, May 3, 2016 at 11:08 AM, Richard Biener
>  wrote:
>> On Tue, May 3, 2016 at 12:01 PM, Bin.Cheng  wrote:
>>> On Mon, May 2, 2016 at 10:00 AM, Richard Biener
>>>  wrote:
 On Fri, Apr 29, 2016 at 5:05 PM, Bin.Cheng  wrote:
> On Fri, Apr 29, 2016 at 12:16 PM, Richard Biener
>  wrote:
>> On Thu, Apr 28, 2016 at 2:56 PM, Bin Cheng  wrote:
>>> Hi,
>>> Tree if-conversion sometimes cannot convert conditional array reference 
>>> into unconditional one.  Root cause is GCC conservatively assumes newly 
>>> introduced array reference could be out of array bound and thus 
>>> trapping.  This patch improves the situation by proving the converted 
>>> unconditional array reference is within array bound using loop niter 
>>> information.  To be specific, it checks every index of array reference 
>>> to see if it's within bound in ifcvt_memrefs_wont_trap.  This patch 
>>> also factors out base_object_writable checking if the base object is 
>>> writable or not.
>>> Bootstrap and test on x86_64 and aarch64, is it OK?
>>
>> I think you miss to handle the case optimally where the only
>> non-ARRAY_REF idx is the dereference of the
>> base-pointer for, say, p->a[i].  In this case we can use
>> base_master_dr to see if p is unconditionally dereferenced
> Yes, will pick up this case.
>
>> in the loop.  You also fail to handle the case where we have
>> MEM_REF[].a[i] that is, you see a decl base.
> I am having difficulty in creating this case for ifcvt, any advices?  
> Thanks.

 Sth like

 float a[128];
 float foo (int n, int i)
 {
   return (*((float(*)[n])a))[i];
 }

 should do the trick (w/o the component-ref).  Any other type-punning
 would do it, too.

>> I suppose for_each_index should be fixed for this particular case (to
>> return true), same for TARGET_MEM_REF TMR_BASE.
>>
>> +  /* The case of nonconstant bounds could be handled, but it would be
>> + complicated.  */
>> +  if (TREE_CODE (low) != INTEGER_CST || !integer_zerop (low)
>> +  || !high || TREE_CODE (high) != INTEGER_CST)
>> +return false;
>> +
>>
>> handling of a non-zero but constant low bound is important - otherwise
>> all this is a no-op for Fortran.  It
>> shouldn't be too difficult to handle after all.  In fact I think your
>> code does handle it correctly already.
>>
>> +  if (!init || TREE_CODE (init) != INTEGER_CST
>> +  || !step || TREE_CODE (step) != INTEGER_CST || integer_zerop 
>> (step))
>> +return false;
>>
>> step == 0 should be easy to handle as well, no?  The index will simply
>> always be 'init' ...
>>
>> +  /* In case the relevant bound of the array does not fit in type, or
>> + it does, but bound + step (in type) still belongs into the range 
>> of the
>> + array, the index may wrap and still stay within the range of the 
>> array
>> + (consider e.g. if the array is indexed by the full range of
>> + unsigned char).
>> +
>> + To make things simpler, we require both bounds to fit into type, 
>> although
>> + there are cases where this would not be strictly necessary.  */
>> +  if (!int_fits_type_p (high, type) || !int_fits_type_p (low, type))
>> +return false;
>> +
>> +  low = fold_convert (type, low);
>>
>> please use wide_int for all of this.
> Now I use wi:fits_to_tree_p instead of int_fits_type_p. But I am not
> sure what's the meaning by "handle "low = fold_convert (type, low);"
> related code in wide_int".   Do you mean to use tree_int_cst_compare
> instead of tree_int_cst_compare in the following code?

 I don't think you need any kind of fits-to-type check here.  You'd simply
 use to_widest () when operating on / comparing with high/low.
>>> But what would happen if low/high and init/step are different in type
>>> sign-ness?  Anything special I need to do before using wi::ltu_p or
>>> wi::lts_p directly?
>>
>> You want to use to_widest (min) which extends according to sign to
>> an "infinite" precision signed integer.  So you can then use the new
>> operator< overloads as well.
>>
> Hi,
> Here is the updated patch.  It includes below changes according to
> review comments:
>
> 1) It uses widest_int for all INTEGER_CST tree computations, which
> simplifies the patch a lot.
> 2) It covers array with non-zero low bound, which is important for Fortran.
> 3) It picks up a boundary case so that ifc-11.c/vect-23.c/vect-24.c
> can be handled.
> 4) It also checks within bound array reference inside a structure like
> p->a[i] by

Re: [PATCH GCC]Proving no-trappness for array ref in tree if-conv using loop niter information.

2016-05-06 Thread Bin.Cheng

On Tue, May 3, 2016 at 11:08 AM, Richard Biener
 wrote:
> On Tue, May 3, 2016 at 12:01 PM, Bin.Cheng  wrote:
>> On Mon, May 2, 2016 at 10:00 AM, Richard Biener
>>  wrote:
>>> On Fri, Apr 29, 2016 at 5:05 PM, Bin.Cheng  wrote:
 On Fri, Apr 29, 2016 at 12:16 PM, Richard Biener
  wrote:
> On Thu, Apr 28, 2016 at 2:56 PM, Bin Cheng  wrote:
>> Hi,
>> Tree if-conversion sometimes cannot convert conditional array reference 
>> into unconditional one.  Root cause is GCC conservatively assumes newly 
>> introduced array reference could be out of array bound and thus 
>> trapping.  This patch improves the situation by proving the converted 
>> unconditional array reference is within array bound using loop niter 
>> information.  To be specific, it checks every index of array reference 
>> to see if it's within bound in ifcvt_memrefs_wont_trap.  This patch also 
>> factors out base_object_writable checking if the base object is writable 
>> or not.
>> Bootstrap and test on x86_64 and aarch64, is it OK?
>
> I think you miss to handle the case optimally where the only
> non-ARRAY_REF idx is the dereference of the
> base-pointer for, say, p->a[i].  In this case we can use
> base_master_dr to see if p is unconditionally dereferenced
 Yes, will pick up this case.

> in the loop.  You also fail to handle the case where we have
> MEM_REF[].a[i] that is, you see a decl base.
 I am having difficulty in creating this case for ifcvt, any advices?  
 Thanks.
>>>
>>> Sth like
>>>
>>> float a[128];
>>> float foo (int n, int i)
>>> {
>>>   return (*((float(*)[n])a))[i];
>>> }
>>>
>>> should do the trick (w/o the component-ref).  Any other type-punning
>>> would do it, too.
>>>
> I suppose for_each_index should be fixed for this particular case (to
> return true), same for TARGET_MEM_REF TMR_BASE.
>
> +  /* The case of nonconstant bounds could be handled, but it would be
> + complicated.  */
> +  if (TREE_CODE (low) != INTEGER_CST || !integer_zerop (low)
> +  || !high || TREE_CODE (high) != INTEGER_CST)
> +return false;
> +
>
> handling of a non-zero but constant low bound is important - otherwise
> all this is a no-op for Fortran.  It
> shouldn't be too difficult to handle after all.  In fact I think your
> code does handle it correctly already.
>
> +  if (!init || TREE_CODE (init) != INTEGER_CST
> +  || !step || TREE_CODE (step) != INTEGER_CST || integer_zerop 
> (step))
> +return false;
>
> step == 0 should be easy to handle as well, no?  The index will simply
> always be 'init' ...
>
> +  /* In case the relevant bound of the array does not fit in type, or
> + it does, but bound + step (in type) still belongs into the range of 
> the
> + array, the index may wrap and still stay within the range of the 
> array
> + (consider e.g. if the array is indexed by the full range of
> + unsigned char).
> +
> + To make things simpler, we require both bounds to fit into type, 
> although
> + there are cases where this would not be strictly necessary.  */
> +  if (!int_fits_type_p (high, type) || !int_fits_type_p (low, type))
> +return false;
> +
> +  low = fold_convert (type, low);
>
> please use wide_int for all of this.
 Now I use wi:fits_to_tree_p instead of int_fits_type_p. But I am not
 sure what's the meaning by "handle "low = fold_convert (type, low);"
 related code in wide_int".   Do you mean to use tree_int_cst_compare
 instead of tree_int_cst_compare in the following code?
>>>
>>> I don't think you need any kind of fits-to-type check here.  You'd simply
>>> use to_widest () when operating on / comparing with high/low.
>> But what would happen if low/high and init/step are different in type
>> sign-ness?  Anything special I need to do before using wi::ltu_p or
>> wi::lts_p directly?
>
> You want to use to_widest (min) which extends according to sign to
> an "infinite" precision signed integer.  So you can then use the new
> operator< overloads as well.
>
Hi,
Here is the updated patch.  It includes below changes according to
review comments:

1) It uses widest_int for all INTEGER_CST tree computations, which
simplifies the patch a lot.
2) It covers array with non-zero low bound, which is important for Fortran.
3) It picks up a boundary case so that ifc-11.c/vect-23.c/vect-24.c
can be handled.
4) It also checks within bound array reference inside a structure like
p->a[i] by using base_master_dr in tree-if-conv.c so that ifc-12.c can
be handled.

It leaves two other review comments not addressed:
1) It doesn't handle array reference whose idx is a wrapping SCEV.
Because

Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue

2016-05-06 Thread Dominik Vogt

Updated version of the patch described below.  Apart from fixing a
bug and adding a test, the new logic is now used always, for all
targets.  The discussion of the original patch starts here:

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03052.html

The new patch has been bootstrapped and regression tested on s390,
s390x and x86_64, but please check the questions/comments in the
follow up message.

On Wed, Nov 25, 2015 at 01:56:10PM +0100, Dominik Vogt wrote:
> The attached patch fixes a warning during Linux kernel compilation
> on S/390 due to -mwarn-dynamicstack and runtime alignment of stack
> variables with constant size causing cfun->calls_alloca to be set
> (even if alloca is not used at all).  The patched code places
> constant size runtime aligned variables in the "virtual stack
> vars" area instead of creating a "virtual stack dynamic" area.
> 
> This behaviour is activated by defining
> 
>   #define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
> 
> in the backend; otherwise the old logic is used.
> 
> The kernel uses runtime alignment for the page structure (aligned
> to 16 bytes), and apart from triggereing the alloca warning
> (-mwarn-dynamicstack), the current Gcc also generates inefficient
> code like
> 
>   aghi %r15,-160  # prologue: create stack frame
>   lgr %r11,%r15   # prologue: generate frame pointer
>   aghi %r15,-32   # space for dynamic stack
> 
> which could be simplified to
> 
>   aghi %r15,-192
> 
> (if later optimization passes are able to get rid of the frame
> pointer).  Is there a specific reason why the patched behaviour
> shouldn't be used for all platforms?
> 
> --
> 
> As the placement of runtime aligned stack variables with constant
> size is done completely in the middleend, I don't see a way to fix
> this in the backend.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* cfgexpand.c (expand_stack_vars): Implement synamic stack space
allocation in the prologue.
* explow.c (get_dynamic_stack_base): New function to return an address
expression for the dynamic stack base.
(get_dynamic_stack_size): New function to do the required dynamic stack
space size calculations.
(allocate_dynamic_stack_space): Use new functions.
(align_dynamic_address): Move some code from
allocate_dynamic_stack_space to new function.
* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
gcc/testsuite/ChangeLog

* gcc.target/s390/warn-dynamicstack-1.c: New test.
* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.
stack-layout-dynamic-1.c: New test.
>From e76a7e02f7862681d1b5344e64aca1b0a62cdc2c Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 25 Nov 2015 09:31:19 +0100
Subject: [PATCH] Allocate constant size dynamic stack space in the
 prologue ...

... and place it in the virtual stack vars area, if the platform supports it.
On S/390 this saves adjusting the stack pointer twice and forcing the frame
pointer into existence.  It also removes the warning with -mwarn-dynamicstack
that is triggered by cfun->calls_alloca == 1.

This fixes a problem with the Linux kernel which aligns the page structure to
16 bytes at run time using inefficient code and issuing a bogus warning.
---
 gcc/cfgexpand.c|  20 +-
 gcc/explow.c   | 232 ++---
 gcc/explow.h   |   9 +
 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c  |  14 ++
 gcc/testsuite/gcc.dg/stack-usage-2.c   |   4 +-
 .../gcc.target/s390/warn-dynamicstack-1.c  |  17 ++
 6 files changed, 212 insertions(+), 84 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 21f21c9..4d48afd 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1052,7 +1052,9 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
   size_t si, i, j, n = stack_vars_num;
   HOST_WIDE_INT large_size = 0, large_alloc = 0;
   rtx large_base = NULL;
+  rtx large_allocsize = NULL;
   unsigned large_align = 0;
+  bool large_allocation_done = false;
   tree decl;
 
   /* Determine if there are any variables requiring "large" alignment.
@@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 
   /* If there were any, allocate space.  */
   if (large_size > 0)
-	large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-		   large_align, true);
+	{
+	  large_allocsize = GEN_INT (large_size);
+	  get_dynamic_stack_size (_allocsize, 0, large_align, NULL);
+	}
 }
 
   for (si = 0; si < n; ++si)
@@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	  /* Large alignment is only processed in the last pass.  */

1 2 >

1 - 100 of 115 matches

Mail list logo