Re: [PATCH] Fix PR69771, bogus CONST_INT during shift expansion

2016-02-15 Thread Jakub Jelinek
On Mon, Feb 15, 2016 at 08:43:22PM +0100, Richard Biener wrote:
> On February 15, 2016 7:15:35 PM GMT+01:00, Jakub Jelinek  
> wrote:
> >On Mon, Feb 15, 2016 at 06:58:45PM +0100, Richard Biener wrote:
> >> We could also force_reg those at expansion or apply
> >SHIFT_COUNT_TRUNCATED to those invalid constants there.
> >
> >Sure, but for force_reg we'd still need the gen_int_mode anyway.
> >As for SHIFT_COUNT_TRUNCATED, it should have been applied already from
> >the
> >caller - expand_shift_1.
> 
> But then no out of bound values should remain.
> Until we get 256bit ints where your workaround wouldn't work either?

Of course it would work, because in that case mode would be OImode, not
QImode, and thus the code would ensure the shift count is valid OImode
constant.

Anyway, the patch I've posted has been broken for vector shifts,
the last argument to gen_int_mode should have been GET_MODE_INNER (mode).

Here is a variant of that patch with force_reg, seems to work on
aarch64 and x86_64.

2016-02-16  Jakub Jelinek  

PR rtl-optimization/69764
PR rtl-optimization/69771
* optabs.c (expand_binop): Ensure for shift optabs invalid CONST_INT
op1 is valid for GET_MODE_INNER (mode) and force it into a reg.

--- gcc/optabs.c.jj 2016-02-15 22:22:46.161674598 +0100
+++ gcc/optabs.c2016-02-16 08:20:01.206889067 +0100
@@ -1125,6 +1125,16 @@ expand_binop (machine_mode mode, optab b
   op1 = negate_rtx (mode, op1);
   binoptab = add_optab;
 }
+  /* For shifts, constant invalid op1 might be expanded from different
+ mode than MODE.  As those are invalid, force them to a register
+ to avoid further problems during expansion.  */
+  else if (CONST_INT_P (op1)
+  && shift_optab_p (binoptab)
+  && UINTVAL (op1) >= GET_MODE_BITSIZE (GET_MODE_INNER (mode)))
+{
+  op1 = gen_int_mode (INTVAL (op1), GET_MODE_INNER (mode));
+  op1 = force_reg (GET_MODE_INNER (mode), op1);
+}
 
   /* Record where to delete back to if we backtrack.  */
   last = get_last_insn ();


Jakub


RFC: [Patch, PR Bug 60818] - ICE in validate_condition_mode on powerpc*-linux-gnu* ]

2016-02-15 Thread Rohit Arul Raj D
Hello All,

This is related to the following bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60818

Test case:
unsigned int ou;
int jv(void)
{
  unsigned int rg;
  return rg < ou;
}

Command line options used: '-O1' (fails for -O1 and above).

Target: e500v2 (I was able to reproduce with e500mc, e5500 targets as well).

Error message:
0x885f068 validate_condition_mode(rtx_code, machine_mode)
../../src_gcc/gcc/config/rs6000/rs6000.c:16389
0x89521d0 branch_comparison_operator(rtx_def*, machine_mode)
../../src_gcc/gcc/config/rs6000/predicates.md:1171
0x895229f scc_comparison_operator(rtx_def*, machine_mode)
../../src_gcc/gcc/config/rs6000/predicates.md:1221
0x8965eba recog_6
../../src_gcc/gcc/config/rs6000/rs6000.md:13910
0x8984e01 recog_20
../../src_gcc/gcc/config/rs6000/rs6000.md:343
0x89c6a93 recog(rtx_def*, rtx_def*, int*)
../../src_gcc/gcc/config/rs6000/sync.md:128
0x8a2fd4f recog_for_combine
../../src_gcc/gcc/combine.c:10888
0x8a1af67 try_combine
../../src_gcc/gcc/combine.c:3500
0x8a15ed6 combine_instructions
../../src_gcc/gcc/combine.c:1509
0x8a37e71 rest_of_handle_combine
../../src_gcc/gcc/combine.c:14204
0x8a37f02 execute
../../src_gcc/gcc/combine.c:14247

Analysis so far:

a) The test case passes with '-mno-isel' option.
 The bug description has 2 test cases (comment #4) and both of them pass 
with this option.

b) The test case fails at this predicate: (as can be seen from the backtrace 
above) 

;; Return 1 if OP is a comparison operation that is valid for an SCC insn --
;; it must be a positive comparison.
(define_predicate "scc_comparison_operator"
  (and (match_operand 0 "branch_comparison_operator")
   (match_code "eq,lt,gt,ltu,gtu,unordered")))

(define_predicate "branch_comparison_operator"
   (and (match_operand 0 "comparison_operator")
(and (match_test "GET_MODE_CLASS (GET_MODE (XEXP (op, 0))) == MODE_CC")
 (match_test "validate_condition_mode (GET_CODE (op),  GET_MODE 
(XEXP (op, 0))),  1"

Corresponding content of "op" which causes the ICE:
gdb) p debug_rtx (op)
(gtu:SI (reg:CC 166)  -- (operator and mode doesn't match)
(const_int 0 [0]))
$37 = void

My initial fix was to have signed_scc_comparison_operator and 
unsigned_scc_comparison_operator but that led to ICE from another stage of 
combiner pass. So I thought it would be better to fix this at the stage where 
the conditional mode is being wrongly set.

This is the sequence of steps that the combiner pass does which eventually 
leads to the ICE:

i) Initial instruction pattern before entering the combiner pass:
(insn 11 10 16 2 (set (reg:SI 165 [ D.2339+-3 ])
(if_then_else:SI (gtu (reg:CCUNS 166)
(const_int 0 [0]))
(reg:SI 168)
(reg:SI 167))) test.c:7 317 {isel_unsigned_si}
 (expr_list:REG_DEAD (reg:SI 168)
(expr_list:REG_DEAD (reg:SI 167)
(expr_list:REG_DEAD (reg:CCUNS 166)
(expr_list:REG_EQUAL (gtu:SI (reg:CCUNS 166)
(const_int 0 [0]))
(nil))

ii) combiner pass converts "gtu" to "ne" and converts the mode to "CC" (from 
CCUNS).
set (reg/i:SI 3 3)
(if_then_else:SI (ne (reg:CC 166)   -> operator and mode 
changed
(const_int 0 [0]))
(reg:SI 168)
(const_int 0 [0])))

(gdb) p debug_rtx (other_insn)
(insn 11 10 16 2 (set (reg:SI 165 [ D.2339+-3 ])
(if_then_else:SI (ne (reg:CC 166)
(const_int 0 [0]))
(reg:SI 168)
(reg:SI 167))) test.c:7 317 {isel_unsigned_si}
 (expr_list:REG_DEAD (reg:SI 168)
(expr_list:REG_DEAD (reg:SI 167)
(expr_list:REG_DEAD (reg:CC 166)
(expr_list:REG_EQUAL (gtu:SI (reg:CC 166)
(const_int 0 [0]))
(nil))

iii) once the instruction match fails, it tries to link back to the pattern 
stored in the REG_EQUAL note to the SRC. But while doing so, it doesn't change 
the conditional mode based on the operator which leads to the ICE.

File:combine.c (function: combine_instructions)
  /* Try this insn with each REG_EQUAL note it links back to.  */
  FOR_EACH_LOG_LINK (links, insn)
{
  rtx set, note;
  rtx_insn *temp = links->insn;

...
{
  /* Temporarily replace the set's source with the   contents 
of the REG_EQUAL note.  The insn will
 be deleted or recognized by try_combine.  */
  rtx orig = SET_SRC (set);
  SET_SRC (set) = note;
  i2mod = temp;
  i2mod_old_rhs = copy_rtx (orig);
  i2mod_new_rhs = copy_rtx (note);
  next = try_combine (insn, i2mod, NULL, NULL,  
_direct_jump_p,  last_combined_insn); --> fails here.

I have added the 

Re: PATCH] Fix PR 31531: A microoptimization of isnegative of signed integer

2016-02-15 Thread Hurugalawadi, Naveen
Hi,

>> I'm also failing to see why you can't enhance the existing

Please find attached the patch that enhances the existing pattern.
Please review the patch and let me know if any further modifications
are required.

Thanks,
Naveendiff --git a/gcc/match.pd b/gcc/match.pd
index 6c8ebd5..bd47a91 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1871,10 +1871,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for cmp (simple_comparison)
  scmp (swapped_simple_comparison)
  (simplify
-  (cmp (bit_not@2 @0) CONSTANT_CLASS_P@1)
+  (cmp (convert?@3 (bit_not@2 @0)) CONSTANT_CLASS_P@1)
   (if (single_use (@2)
-   && (TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST))
-   (scmp @0 (bit_not @1)
+   && ((TREE_CODE (@1) == INTEGER_CST && TREE_TYPE (@3) == TREE_TYPE (@2))
+|| (TREE_CODE (@1) == VECTOR_CST
+		&& (VECTOR_TYPE_P (TREE_TYPE (@3))
+		== VECTOR_TYPE_P (TREE_TYPE (@2)))
+		&& (TYPE_VECTOR_SUBPARTS (TREE_TYPE (@3))
+		== TYPE_VECTOR_SUBPARTS (TREE_TYPE (@2)))
+		&& (TYPE_MODE (TREE_TYPE (TREE_TYPE (@3)))
+		== TYPE_MODE (TREE_TYPE (TREE_TYPE (@2)))
+   (scmp @0 (bit_not @1))
+  (if (TYPE_PRECISION (TREE_TYPE (@3)) == TYPE_PRECISION (TREE_TYPE (@2))
+   && (TREE_CODE (@1) == INTEGER_CST))
+   (with { tree newtype = TREE_TYPE (@1); }
+(scmp (convert:newtype @0) (bit_not @1)))
 
 (for cmp (simple_comparison)
  /* Fold (double)float1 CMP (double)float2 into float1 CMP float2.  */
diff --git a/gcc/testsuite/gcc.dg/pr31531.c b/gcc/testsuite/gcc.dg/pr31531.c
new file mode 100644
index 000..cf9dd82
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr31531.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-gimple" } */
+/* { dg-require-effective-target int32 } */
+
+int isnegative_optimized_4 (unsigned int X)
+{
+  int result;
+  if ((~X) >> 31)
+result = 0;
+  else
+result = 1;
+  return result;
+}
+
+/* { dg-final { scan-tree-dump-times "signed int X.0" 1 "gimple" } } */


Re: [Patch, regex, libstdc++/69794] Unify special character parsing

2016-02-15 Thread Tim Shen
On Mon, Feb 15, 2016 at 4:26 AM, Jonathan Wakely  wrote:
> Those new members change the size of the type, so are an ABI change.
>
> Couldn't they be static members?

Ahh right. Since they are just used for once, use them in the line.


-- 
Regards,
Tim Shen
commit 4db39da3091a33e4125ffd8b55da37859277d0d2
Author: Tim Shen 
Date:   Sat Feb 13 10:55:38 2016 -0800

PR libstdc++/69794
* include/bits/regex_scanner.h: Add different special character
sets for grep and egrep regex.
* include/bits/regex_scanner.tcc: Use _M_spec_char more unifiedly.
* testsuite/28_regex/regression.cc: Add new testcase.

diff --git a/libstdc++-v3/include/bits/regex_scanner.h 
b/libstdc++-v3/include/bits/regex_scanner.h
index bff7366..37dea84 100644
--- a/libstdc++-v3/include/bits/regex_scanner.h
+++ b/libstdc++-v3/include/bits/regex_scanner.h
@@ -95,11 +95,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  : _M_awk_escape_tbl),
 _M_spec_char(_M_is_ecma()
 ? _M_ecma_spec_char
-: _M_is_basic()
+: _M_flags & regex_constants::basic
 ? _M_basic_spec_char
-: _M_extended_spec_char),
+: _M_flags & regex_constants::extended
+? _M_extended_spec_char
+: _M_flags & regex_constants::grep
+?  ".[\\*^$\n"
+: _M_flags & regex_constants::egrep
+? ".[\\()*+?{|^$\n"
+: _M_flags & regex_constants::awk
+? _M_extended_spec_char
+: nullptr),
 _M_at_bracket_start(false)
-{ }
+{ __glibcxx_assert(_M_spec_char); }
 
   protected:
 const char*
@@ -137,6 +145,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return _M_flags & regex_constants::awk; }
 
   protected:
+// TODO: Make them static in the next abi change.
 const std::pair _M_token_tbl[9] =
   {
{'^', _S_token_line_begin},
diff --git a/libstdc++-v3/include/bits/regex_scanner.tcc 
b/libstdc++-v3/include/bits/regex_scanner.tcc
index 920cb14..fedba09 100644
--- a/libstdc++-v3/include/bits/regex_scanner.tcc
+++ b/libstdc++-v3/include/bits/regex_scanner.tcc
@@ -97,9 +97,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_scan_normal()
 {
   auto __c = *_M_current++;
-  const char* __pos;
 
-  if (std::strchr(_M_spec_char, _M_ctype.narrow(__c, '\0')) == nullptr)
+  if (std::strchr(_M_spec_char, _M_ctype.narrow(__c, ' ')) == nullptr)
{
  _M_token = _S_token_ord_char;
  _M_value.assign(1, __c);
@@ -177,12 +176,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _M_state = _S_state_in_brace;
  _M_token = _S_token_interval_begin;
}
-  else if (((__pos = std::strchr(_M_spec_char, _M_ctype.narrow(__c, '\0')))
- != nullptr
-   && *__pos != '\0'
-   && __c != ']'
-   && __c != '}')
-  || (_M_is_grep() && __c == '\n'))
+  else if (__c != ']' && __c != '}')
{
  auto __it = _M_token_tbl;
  auto __narrowc = _M_ctype.narrow(__c, '\0');
diff --git a/libstdc++-v3/testsuite/28_regex/regression.cc 
b/libstdc++-v3/testsuite/28_regex/regression.cc
index f95bef9..c9a3402 100644
--- a/libstdc++-v3/testsuite/28_regex/regression.cc
+++ b/libstdc++-v3/testsuite/28_regex/regression.cc
@@ -33,10 +33,26 @@ test01()
   regex re("((.)", regex_constants::basic);
 }
 
+void
+test02()
+{
+  bool test __attribute__((unused)) = true;
+
+  std::string re_str
+{
+  "/abcd" "\n"
+  "/aecf" "\n"
+  "/ghci"
+};
+  auto rx = std::regex(re_str, std::regex_constants::grep | 
std::regex_constants::icase);
+  VERIFY(std::regex_search("/abcd", rx));
+}
+
 int
 main()
 {
   test01();
+  test02();
   return 0;
 }
 


Re: [PATCH] 69759 - document __builtin_alloca and __builtin_alloca_with_align

2016-02-15 Thread Martin Sebor

On 02/15/2016 04:18 PM, Joseph Myers wrote:

The description here is self-contradictory; __BIGGEST_ALIGNMENT__ bytes is
often different from the greatest fundamental alignment (fundamental
alignments and max_align_t only consider standard C types,
__BIGGEST_ALIGNMENT__ can allow for e.g. vector type extensions).


Thank you for reviewing the patch.  You're right that I conflated
fundamental alignment with the strictest alignment.  I've adjusted
the description to make a distinction between __BIGGEST_ALIGNMENT__
and _Alignof(max_align_t) since they, as you point out, need not be
the same (for example on i386 Linux with GLIBC I see that the former
is 16 while the latter 8, which is correct because on GLIBC's malloc
returns 8-byte aligned pointers).  I've also fixed a few typos and
made additional adjustments to reflect the fixes in my patch for
bug 69780 that I anticipate committing later this week.

That said, I think it's worth pointing out that max_align_t has
nothing to do with standard C types.  The intent of the type is
to expose a type with the strictest alignment supported by
an implementation for an object of any type and with any storage
duration, not the alignment of the most strictly aligned basic
(or fundamental) type.

Martin
PR c/69759 - __builtin_alloca and __builtin_alloca_with_align undocumented

gcc/ChangeLog:
2016-02-15  Martin Sebor  

	PR c/69759
	* doc/extend.texi (Other Builtins): Document __builtin_alloca and
	__builtin_alloca_with_align.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi	(revision 233367)
+++ gcc/doc/extend.texi	(working copy)
@@ -10144,6 +10144,8 @@ in the Cilk Plus language manual which c
 @node Other Builtins
 @section Other Built-in Functions Provided by GCC
 @cindex built-in functions
+@findex __builtin_alloca
+@findex __builtin_alloca_with_align
 @findex __builtin_call_with_static_chain
 @findex __builtin_fpclassify
 @findex __builtin_isfinite
@@ -10690,6 +10692,93 @@ In the same fashion, GCC provides @code{
 @code{__builtin_} prefixed.  The @code{isinf} and @code{isnan}
 built-in functions appear both with and without the @code{__builtin_} prefix.
 
+@deftypefn {Built-in Function} void* __builtin_alloca (size_t size)
+The @code{__builtin_alloca} function must be called at block scope.
+The function allocates an object @var{size} bytes large on the stack of
+the calling function.  The object is aligned at the greatest supported
+alignment boundary for the target.  The greatest supported alignment
+is the value of the @code{__BIGGEST_ALIGNMENT__} macro.  Portable C11
+and C++11 (or later) programs can obtain a similar value by evaluating
+the @code{_Alignof (max_align_t)} and @code{alignof (std::max_align_t)}
+expressions, respectively, which yield an alignment suitable for any
+standard type in each language.  @code{__builtin_alloca} returns a pointer
+to the first byte of the allocated object.  The lifetime of the allocated
+object ends just before the calling function returns to its caller.   This
+is so even when @code{__builtin_alloca_with_align} is called within a nested
+block.
+
+For example, the following function allocates eight objects of @code{n}
+bytes each on the stack, storing a pointer to each in consecutive elements
+of the array @code{a}.  It then passes the array to function @code{g()}
+which can safely use the storage pointed to by each of the array elements.
+
+@smallexample
+void f (unsigned n)
+@{
+  void *a [8];
+  for (int i = 0; i != 8; ++i)
+a [i] = __builtin_alloca (n);
+
+  g (a, n);   // safe
+@}
+@end smallexample
+
+Since the @code{__builtin_alloca} function doesn't validate its arguments
+it is the responsibility of its caller to make sure the argument doesn't
+cause it doesn't exceed the stack size limit.
+The @code{__builtin_alloca} function is provided to make it possible to
+allocate arrays with a runtime bound on the stack.  Since C99 variable
+length arrays offer similar functionality under a portable, more convenient,
+and safer interface they are recommended instead, in both C99 and C++
+programs where GCC provides them as an extension.
+
+@end deftypefn
+
+@deftypefn {Built-in Function} void* __builtin_alloca_with_align (size_t size, size_t align)
+The @code{__builtin_alloca_with_align} function must be called at block
+scope.  The function allocates an object @var{size} bytes large on
+the stack of the calling function.  The allocated object is aligned on
+the boundary specified by the argument @var{align} whose unit is given
+in bits (not bytes).  @var{size} must be positive and not exceed the stack
+size limit.  @var{align} must be a constant integer expression that
+evaluates to a power of 2 greater than or equal to @code{__CHAR_BIT__}.
+Invocations with other values are rejected with an error.  The function
+returns a pointer to the first byte of the allocated object.  The lifetime
+of the allocated object ends at the end 

Re: Fix PR69752, insn with REG_INC being removed as equiv_init insn

2016-02-15 Thread Jeff Law

On 02/15/2016 05:38 AM, Bernd Schmidt wrote:

On 02/12/2016 08:43 AM, Jeff Law wrote:

On 02/11/2016 06:28 PM, Bernd Schmidt wrote:



PR rtl-optimization/69752
* ira.c (update_equiv_regs): When looking for more than a single
SET,
also take other side effects into account.


OK for the trunk.


Branches too? The problem obviously exists everywhere.

Anywhere you deem appropriate.

jeff


Re: [PING][PATCH, PR67709 ] Don't call call_cgraph_insertion_hooks in simd_clone_create

2016-02-15 Thread Jan Hubicka
> On 08/02/16 13:54, Jakub Jelinek wrote:
> >On Mon, Feb 08, 2016 at 01:46:44PM +0100, Tom de Vries wrote:
> >>[ The pass before pass_omp_simd_clone is pass_dispatcher_calls. It has a
> >>function create_target_clone, similar to simd_clone_create, with a
> >>node.defition and !node.defition part. The !node.defition part does not call
> >>'symtab->call_cgraph_insertion_hooks (new_node)'. ]
> >
> >I'll defer to Honza or Richi if it is ok not to call cgraph insertion hooks
> >at this point (and since when they can be avoided), or what else should be
> >done.
> >
> >The patch could be ok even for 6.0, not just stage1, if they are ok with it
> >(or propose some other change).
> >
> 
> Ping (Given that Jakub suggested this or an alternative patch might
> be included in 6.0 stage4).
> 
> Original submission at
> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00543.html .

OK, so the ICE is that you intorduce new deifnition but you set 
in_other_partition
and do not really introudce gimple body for it?
In that case we should indeed not call creation hooks, because these are used 
only
when new function is introduced to a given partition.

Patch looks OK to me thus.
Honza.
> 
> Thanks,
> - Tom
> 
> >>Don't call call_cgraph_insertion_hooks in simd_clone_create
> >>
> >>2016-02-08  Tom de Vries  
> >>
> >>PR lto/67709
> >>* omp-low.c (simd_clone_create): Remove call to
> >>symtab->call_cgraph_insertion_hooks.
> >>
> >>* testsuite/libgomp.fortran/declare-simd-4.f90: New test.
> >
> > Jakub
> >


Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-15 Thread David Edelsohn
On Mon, Feb 15, 2016 at 4:24 PM, Alan Modra  wrote:
> On Mon, Feb 15, 2016 at 06:42:35AM -0800, David Edelsohn wrote:
>> Is there still an issue with the constraints used for movdi_internal64?
>
> Yes and no.  No because we shouldn't be attempting DI moves between vsx
> regs and gprs.  Yes because we ought to allow DImode in vsx regs, but
> fixing that is likely not trivial.
>
> Do we want to backport the PR68973 fixes to gcc-5 and gcc-4.9?  We are
> exposed to the reload_vsx_from_gprsf bug there, I think, but TFmode
> won't be IEEE.

Backporting to 5 and 4.9 branches is okay with me.

Thanks, David


Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-15 Thread Alan Modra
On Mon, Feb 15, 2016 at 06:42:35AM -0800, David Edelsohn wrote:
> Is there still an issue with the constraints used for movdi_internal64?

Yes and no.  No because we shouldn't be attempting DI moves between vsx
regs and gprs.  Yes because we ought to allow DImode in vsx regs, but
fixing that is likely not trivial.

Do we want to backport the PR68973 fixes to gcc-5 and gcc-4.9?  We are
exposed to the reload_vsx_from_gprsf bug there, I think, but TFmode
won't be IEEE.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] 69759 - document __builtin_alloca and __builtin_alloca_with_align

2016-02-15 Thread Joseph Myers
The description here is self-contradictory; __BIGGEST_ALIGNMENT__ bytes is 
often different from the greatest fundamental alignment (fundamental 
alignments and max_align_t only consider standard C types, 
__BIGGEST_ALIGNMENT__ can allow for e.g. vector type extensions).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC] [PATCH] Add __array_size keyword

2016-02-15 Thread Joseph Myers
On Sat, 13 Feb 2016, Stuart Brady wrote:

> > Critical issues to define and cover thoroughly in tests include the
> > rules for when operands of sizeof are evaluated, as adapted
> > appropriately for this keyword, and for when it returns various kinds
> > of constants.
> 
> So in other words, adapting all of the sizeof tests would be appropriate,
> and sizeof tests for non-array types would change from expected passes to
> expected failures?

It's not very clear what's a sizeof test, but all that are testing sizeof 
(as opposed to incidentally using it) and applicable to this keyword, yes.

> > Is the rule for your keyword that the operand is evaluated, and the
> > result not an integer constant, iff the operand is an array with a
> > variable number of elements (as opposed to an array with a constant
> > number of elements that themselves are variable-sized, for example)?
> 
> If I've understood correctly, then yes:
> 
>#include 
>void foo(int i) {
>  int a[i], b[__array_size(a)];
>  printf("%zi, %zi\n", __array_size(a), __array_size(b));
>};
>int main() { foo(42); }

That test doesn't relate to my question, which is about when arguments are 
evaluated and when results are or are not integer constant expressions.

For whether arguments are evaluated, you need __array_size with arguments 
that have side effects, and then test whether those side effects occurred.  
For whether results are integer constant expressions, you can test e.g. 
whether __array_size (a) - __array_size (a) is accepted in a context 
requiring a pointer (whether it acts as a valid null pointer constant).

> > C11 6.5.3.4#2 (sizeof) would need testing, 
> 
> Does this section differ from the September 7th draft in any way?

I don't know.

> > Presumably this keyword can be applied to an array at function prototype
> > scope whose size is explicitly or implicitly [*], though nothing useful
> > can be done with the results, as with [*]? (Cf. gcc.dg/vla-5.c.)
> 
> I'm not sure I quite understand the meaning of an implicit [*].  Does that
> just mean __array_size(foo) with an int foo[*] as another parameter?

Implicit [*] is e.g.

void f (int a, int (*b)[a], int (*c)[__array_size (*b)]);

where the VLA *b is at function prototype scope and so gets treated as [*] 
- and then __array_size (*) effectively means "an indeterminate value of 
type size_t" (but since that value only ever gets used in ways that end up 
with it being discarded, possibly through another implicit conversion to 
[*] as here, manipulating such indeterminate values is never a problem).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [6 Regession] Usage of unitialized pointer io/list_read.c (

2016-02-15 Thread Janne Blomqvist
On Mon, Feb 15, 2016 at 11:45 PM, Jerry DeLisle  wrote:
> The title of the PR should be "Mishandling of namelist comments" or
> "Interpreting '!' as a comment in non-namelist reads".
>
> The attached patch fixes the regression by reverting the previous attempt at
> namelist comments that used only CASE_SEPARATOR to enable comments in 
> namelists.
>  The approach now is to test specifically for '!' in each type of read various
> functions. If in namelist mode the respective case falls through to the 
> handling
> of separators which eats the line when a '!' is found.  Otherwise, the read is
> determined to be bad and an error is issued.
>
> Since the reporter of this PR noticed something screwy with the 'new' pointer 
> in
> push_char4, I took a close look at the code and deleted it.  I also heavily
> instrumented and tested this mechanism to grow the buffer and deleted the use 
> of
> memset which was commented to not be needed. The 'new' was not being 
> initialized
> and I think was a leftover from a previous edit and just missed.
>
> I added two new test cases in the patch. These test all the new error
> conditions. Also, read_bang4.f90 uses a large kind=4 string to exercise the
> buffer mechanism. Verification is through making sure what we read in matches
> what we wrote out to the test scratch file
>
> Regression tested on x86_64-Linux.  OK for trunk? any thoughts on back porting
> to 5 since it fixes a potentially bad pointer problem in push_char4?

Ok for both trunk and 5.


-- 
Janne Blomqvist


Re: [C++ PATCH] Fix regression due to reshape_init being called multiple times (PR c++/69658)

2016-02-15 Thread Jason Merrill

OK, thanks.

Jason


[6 Regession] Usage of unitialized pointer io/list_read.c (

2016-02-15 Thread Jerry DeLisle
The title of the PR should be "Mishandling of namelist comments" or
"Interpreting '!' as a comment in non-namelist reads".

The attached patch fixes the regression by reverting the previous attempt at
namelist comments that used only CASE_SEPARATOR to enable comments in namelists.
 The approach now is to test specifically for '!' in each type of read various
functions. If in namelist mode the respective case falls through to the handling
of separators which eats the line when a '!' is found.  Otherwise, the read is
determined to be bad and an error is issued.

Since the reporter of this PR noticed something screwy with the 'new' pointer in
push_char4, I took a close look at the code and deleted it.  I also heavily
instrumented and tested this mechanism to grow the buffer and deleted the use of
memset which was commented to not be needed. The 'new' was not being initialized
and I think was a leftover from a previous edit and just missed.

I added two new test cases in the patch. These test all the new error
conditions. Also, read_bang4.f90 uses a large kind=4 string to exercise the
buffer mechanism. Verification is through making sure what we read in matches
what we wrote out to the test scratch file

Regression tested on x86_64-Linux.  OK for trunk? any thoughts on back porting
to 5 since it fixes a potentially bad pointer problem in push_char4?

Regards,

Jerry

2016-02-15  Jerry DeLisle  

PR libgfortran/69651
* io/list_read.c: Entire file trailing spaces removed.
(CASE_SEPARATORS): Remove '!'.
(is_separator): Add namelist mode as condition with '!'.
(push_char): Remove un-needed memset. (push_char4): Likewise and remove
'new' pointer. (eat_separator): Remove un-needed use of notify_std.
(read_logical): If '!' bang encountered when not in namelist mode got
bad_logical to give an error. (read_integer): Likewise reject '!'.
(read_character): Remove condition testing c = '!' which is now inside
the is_separator macro. (parse_real): Reject '!' unless in namelist 
mode.
(read_complex): Reject '!' unless in namelist mode. (read_real): 
Likewise
reject '!'.
diff --git a/gcc/testsuite/gfortran.dg/read_bang.f90 b/gcc/testsuite/gfortran.dg/read_bang.f90
new file mode 100644
index ..7806ca77
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/read_bang.f90
@@ -0,0 +1,38 @@
+! { dg-do run }
+! PR69651 Usage of unitialized pointer io/list_read.c 
+! Note: The uninitialized pointer was not the cause of the problem
+!   observed with this test case. The problem was mishandling '!'
+!   See also test case read_bang4.f90.
+program test
+  implicit none
+  integer :: i, j, ios
+  real ::  r, s
+  complex :: c, d
+  character(20) :: str1, str2
+  
+  i = -5
+  j = -6
+  r = -3.14
+  s = -2.71
+  c = (-1.1,-2.2)
+  d = (-3.3,-4.4)
+  str1 = "candy"
+  str2 = "peppermint"
+  open(15, status='scratch')
+  write(15,*) "10  1!2"
+  write(15,*) "  23.5! 34.5"
+  write(15,*) "  (67.50,69.25)  (51.25,87.75)!"
+  write(15,*) "  'abcdefgh!' '  !klmnopq!'"
+  rewind(15)
+  read(15,*,iostat=ios) i, j
+  if (ios.ne.5010) call abort
+  read(15,*,iostat=ios) r, s
+  if (ios.ne.5010) call abort
+  read(15,*,iostat=ios) c, d
+  if (ios.ne.5010) call abort
+  read(15,*,iostat=ios) str1, str2
+  if (ios.ne.0) call abort
+  if (str1.ne."abcdefgh!") print *, str1
+  if (str2.ne."  !klmnopq!") print *, str2
+  close(15)
+end program
diff --git a/gcc/testsuite/gfortran.dg/read_bang4.f90 b/gcc/testsuite/gfortran.dg/read_bang4.f90
new file mode 100644
index ..78101fcb
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/read_bang4.f90
@@ -0,0 +1,47 @@
+! { dg-do run }
+! PR69651 Usage of unitialized pointer io/list_read.c 
+! Note: The uninitialized pointer was not the cause of the problem
+!   observed with this test case. This tests the case with UTF-8
+!   files. The large string test the realloc use in push_char4 of
+!   list_read.c
+program test
+  implicit none
+  integer :: i, j, k, ios
+  integer, parameter :: big = 600
+  real ::  r, s
+  complex :: c, d
+  character(kind=4,len=big) :: str1, str2, str3
+
+  do i=1,big, 10
+do j = 0, 9
+  k = i + j
+  str2(k:k) = char(65+j)
+end do
+  end do
+  i = -5
+  j = -6
+  r = -3.14
+  s = -2.71
+  c = (-1.1,-2.2)
+  d = (-3.3,-4.4)
+  str3 = str2
+  open(15, status='scratch', encoding="utf-8")
+  write(15,*) "10  1!2"
+  write(15,*) "  23.5! 34.5"
+  write(15,*) "  (67.50,69.25)  (51.25,87.75)!"
+  write(15,*) "  'abcdefgh!'", " ", str2
+  rewind(15)
+  str1 = 4_"candy"
+  str2 = 4_"peppermint"
+  read(15,*,iostat=ios) i, j
+  if (ios.ne.5010) call abort
+  read(15,*,iostat=ios) r, s
+  if (ios.ne.5010) call abort
+  read(15,*,iostat=ios) c, d
+  if (ios.ne.5010) call abort
+  read(15,*,iostat=ios) str1, str2
+  if (ios.ne.0) call abort
+  if (str1.ne.4_"abcdefgh!") call abort
+  if (str2.ne.str3) call abort
+  close(15)
+end program
diff 

Re: [PATCH] Fix reassoc ICE (PR tree-optimization/69802)

2016-02-15 Thread Jakub Jelinek
On Mon, Feb 15, 2016 at 10:27:16PM +0100, Michael Matz wrote:
> On Mon, 15 Feb 2016, Jakub Jelinek wrote:
> 
> > +  /* If op is default def SSA_NAME, there is no place to insert the
> > + new comparison.  Give up, unless we can use OP itself as the
> > + range test.  */
> > +  if (op && SSA_NAME_IS_DEFAULT_DEF (op))
> > +{
> > +  if (op == range->exp
> > + && ((TYPE_PRECISION (optype) == 1 && TYPE_UNSIGNED (optype))
> > + || TREE_CODE (optype) == BOOLEAN_TYPE)
> > + && (op == tem
> > + || (TREE_CODE (tem) == EQ_EXPR
> > + && TREE_OPERAND (tem, 0) == op
> > + && integer_onep (TREE_OPERAND (tem, 1
> > + && opcode != BIT_IOR_EXPR
> > + && (opcode != ERROR_MARK || oe->rank != BIT_IOR_EXPR))
> 
> Perhaps just give up always, instead of this complicated (and hence 
> fragile) hackery?  Are you 100% sure you catched everything, are there 

It is IMO not that fragile.  The
op == range->exp is quite obvious condition, it could have been even assert
instead.
The second condition is what is used e.g. in init_range_test, i.e.
bool/unsigned :1 only.  The third - op == tem is obviously good
transformation to tem = op, so the patch just makes sure we don't ICE
say in gsi_for_stmt (stmt); that is what we get for bool and is covered
in the testcase.  The EQ_EXPR case is what we get for unsigned : 1 instead.
Perhaps it could be also op != 0 instead of just op == 1.
And lastly, the BIT_IOR_EXPR tests are to make sure we don't
  if (opcode == BIT_IOR_EXPR
  || (opcode == ERROR_MARK && oe->rank == BIT_IOR_EXPR))
tem = invert_truthvalue_loc (loc, tem);
a few lines later.  The reason to perform the check earlier is just
to avoid printing it in the dumpfile that we are changing something if we
give up instead.

Jakub


Re: [PATCH] Fix reassoc ICE (PR tree-optimization/69802)

2016-02-15 Thread Michael Matz
Hi,

On Mon, 15 Feb 2016, Jakub Jelinek wrote:

> +  /* If op is default def SSA_NAME, there is no place to insert the
> + new comparison.  Give up, unless we can use OP itself as the
> + range test.  */
> +  if (op && SSA_NAME_IS_DEFAULT_DEF (op))
> +{
> +  if (op == range->exp
> +   && ((TYPE_PRECISION (optype) == 1 && TYPE_UNSIGNED (optype))
> +   || TREE_CODE (optype) == BOOLEAN_TYPE)
> +   && (op == tem
> +   || (TREE_CODE (tem) == EQ_EXPR
> +   && TREE_OPERAND (tem, 0) == op
> +   && integer_onep (TREE_OPERAND (tem, 1
> +   && opcode != BIT_IOR_EXPR
> +   && (opcode != ERROR_MARK || oe->rank != BIT_IOR_EXPR))

Perhaps just give up always, instead of this complicated (and hence 
fragile) hackery?  Are you 100% sure you catched everything, are there 
testcases for each part of the condition (I miss at least one proving 
that making the condition true is correct)?


Ciao,
Michael.


Re: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2016-02-15 Thread Evandro Menezes

On 02/15/16 04:53, James Greenhalgh wrote:

On Thu, Jan 21, 2016 at 04:55:40PM -0600, Evandro Menezes wrote:

Got it.

Let me try this again:

Add support for the FCCMP insn types

2016-01-21  Evandro Menezes  

gcc/
 * config/aarch64/aarch64.md (fccmp): Change insn type.
 (fccmpe): Likewise.
 * config/aarch64/thunderx.md (thunderx_fcmp): Add
"fccmp{s,d}" types.
 * config/arm/cortex-a53.md (cortex_a53_fpalu): Likewise.
 * config/arm/cortex-a57.md (cortex_a57_fp_cmp): Likewise.
 * config/arm/xgene1.md (xgene1_fcmp): Likewise.
 * config/arm/exynos-m1.md (exynos_m1_fp_ccmp): New insn
reservation.
 * config/arm/types.md (fccmps): Add new insn type.
 (fccmpd): Likewise.



This is OK. Sorry to have left it waiting so long.

Thanks,
James



 From 14874dec3257c7b59aed4b7c610305f76bbbcf33 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 4 Jan 2016 18:44:30 -0600
Subject: [PATCH] Add support for the FCCMP insn types

2016-01-21  Evandro Menezes  

gcc/
* config/aarch64/aarch64.md (fccmp): Change insn type.
(fccmpe): Likewise.
* config/aarch64/thunderx.md (thunderx_fcmp): Add "fccmp{s,d}" types.
* config/arm/cortex-a53.md (cortex_a53_fpalu): Likewise.
* config/arm/cortex-a57.md (cortex_a57_fp_cmp): Likewise.
* config/arm/xgene1.md (xgene1_fcmp): Likewise.
* config/arm/exynos-m1.md (exynos_m1_fp_ccmp): New insn reservation.
* config/arm/types.md (fccmps): Add new insn type.
(fccmpd): Likewise.
---
  gcc/config/aarch64/aarch64.md  | 4 ++--
  gcc/config/aarch64/thunderx.md | 2 +-
  gcc/config/arm/cortex-a53.md   | 4 ++--
  gcc/config/arm/cortex-a57.md   | 2 +-
  gcc/config/arm/exynos-m1.md| 5 +
  gcc/config/arm/types.md| 3 +++
  gcc/config/arm/xgene1.md   | 2 +-
  7 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 2f543aa..032b342 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -301,7 +301,7 @@
  (match_operand 5 "immediate_operand")))]
"TARGET_FLOAT"
"fccmp\\t%2, %3, %k5, %m4"
-  [(set_attr "type" "fcmp")]
+  [(set_attr "type" "fccmp")]
  )
  
  (define_insn "fccmpe"

@@ -316,7 +316,7 @@
  (match_operand 5 "immediate_operand")))]
"TARGET_FLOAT"
"fccmpe\\t%2, %3, %k5, %m4"
-  [(set_attr "type" "fcmp")]
+  [(set_attr "type" "fccmp")]
  )
  
  ;; Expansion of signed mod by a power of 2 using CSNEG.

diff --git a/gcc/config/aarch64/thunderx.md b/gcc/config/aarch64/thunderx.md
index 922df39..058713a 100644
--- a/gcc/config/aarch64/thunderx.md
+++ b/gcc/config/aarch64/thunderx.md
@@ -156,7 +156,7 @@
  
  (define_insn_reservation "thunderx_fcmp" 3

(and (eq_attr "tune" "thunderx")
-   (eq_attr "type" "fcmps,fcmpd"))
+   (eq_attr "type" "fcmps,fcmpd,fccmps,fccmpd"))
"thunderx_pipe1")
  
  (define_insn_reservation "thunderx_fmul" 6

diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index c1eeedb..fc60bc2 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -508,8 +508,8 @@
  (define_insn_reservation "cortex_a53_fpalu" 5
(and (eq_attr "tune" "cortexa53")
(eq_attr "type" "ffariths, fadds, ffarithd, faddd, fmov,
-   f_cvt, fcmps, fcmpd, fcsel, f_rints, f_rintd,
-   f_minmaxs, f_minmaxd"))
+   f_cvt, fcmps, fcmpd, fccmps, fccmpd, fcsel,
+   f_rints, f_rintd, f_minmaxs, f_minmaxd"))
"cortex_a53_slot_any,cortex_a53_fp_alu")
  
  (define_insn_reservation "cortex_a53_fconst" 3

diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md
index 0d28951..f4c112c 100644
--- a/gcc/config/arm/cortex-a57.md
+++ b/gcc/config/arm/cortex-a57.md
@@ -716,7 +716,7 @@
  
  (define_insn_reservation "cortex_a57_fp_cmp" 7

(and (eq_attr "tune" "cortexa57")
-   (eq_attr "type" "fcmps,fcmpd"))
+   (eq_attr "type" "fcmps,fcmpd,fccmps,fccmpd"))
"ca57_cx2")
  
  (define_insn_reservation "cortex_a57_fp_arith" 4

diff --git a/gcc/config/arm/exynos-m1.md b/gcc/config/arm/exynos-m1.md
index 0448073..973c8a9 100644
--- a/gcc/config/arm/exynos-m1.md
+++ b/gcc/config/arm/exynos-m1.md
@@ -823,6 +823,11 @@
 (eq_attr "type" "fcmps, fcmpd"))
"em1_nmisc")
  
+(define_insn_reservation "exynos_m1_fp_ccmp" 7

+  (and (eq_attr "tune" "exynosm1")
+   (eq_attr "type" "fccmps, fccmpd"))
+  "em1_st, em1_nmisc")
+
  (define_insn_reservation "exynos_m1_fp_sel" 4
(and (eq_attr "tune" "exynosm1")
 (eq_attr "type" "fcsel"))
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
index 321ff89..25f79b4 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -70,6 +70,7 @@
  ; f_rint[d,s]

Re: [PATCH] Fix ICE in sync_resolve_size (PR c++/69797)

2016-02-15 Thread Marek Polacek
On Mon, Feb 15, 2016 at 09:56:04PM +0100, Jakub Jelinek wrote:
> Hi!
> 
> In C++, if there are no parameters, params can be non-NULL, but still
> empty vector.  Fixed by properly testing for empty vector.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2016-02-15  Jakub Jelinek  
> 
>   PR c++/69797
>   * c-common.c (sync_resolve_size): Diagnose too few arguments
>   even when params is non-NULL empty vector.
> 
>   * c-c++-common/pr69797.c: New test.

Ok, thanks.  Seems there are no other spots to fix than this one.

Marek


C++ PATCH for c++/69753 (DR141 broke member template lookup)

2016-02-15 Thread Jason Merrill
When we stopped finding function templates with unqualified lookup due 
to the DR141 fix, that exposed bugs in our lookup within the object 
expression scope; an object-expression of the current instantiation does 
not make the expression dependent.  This patch fixes this issue 
specifically for implicit "this->", which is the case in question in 
this PR; addressing this issue more generally will take more work.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 9a196a7306eb2c0b6eed5e81dd2d5a65331347bb
Author: Jason Merrill 
Date:   Thu Feb 11 11:51:03 2016 -0500

	PR c++/69753

	* search.c (any_dependent_bases_p): Split out...
	* name-lookup.c (do_class_using_decl): ...from here.
	* call.c (build_new_method_call_1): Don't complain about missing object
	if there are dependent bases.  Tweak error.
	* tree.c (non_static_member_function_p): Remove.
	* pt.c (type_dependent_expression_p): A member template of a
	dependent type is dependent.
	* cp-tree.h: Adjust.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index cb71176..db40654 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8160,7 +8160,7 @@ build_new_method_call_1 (tree instance, tree fns, vec **args,
 
   if (permerror (input_location,
 		 "cannot call constructor %<%T::%D%> directly",
-		 basetype, name))
+		 BINFO_TYPE (access_binfo), name))
 	inform (input_location, "for a function-style cast, remove the "
 		"redundant %<::%D%>", name);
   call = build_functional_cast (basetype, build_tree_list_vec (user_args),
@@ -8377,6 +8377,9 @@ build_new_method_call_1 (tree instance, tree fns, vec **args,
 		 we know we really need it.  */
 		  cand->first_arg = instance;
 		}
+	  else if (any_dependent_bases_p ())
+		/* We can't tell until instantiation time whether we can use
+		   *this as the implicit object argument.  */;
 	  else
 		{
 		  if (complain & tf_error)
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 3b91089..b7d7bc6 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6252,6 +6252,7 @@ extern tree adjust_result_of_qualified_name_lookup
 extern tree copied_binfo			(tree, tree);
 extern tree original_binfo			(tree, tree);
 extern int shared_member_p			(tree);
+extern bool any_dependent_bases_p (tree = current_nonlambda_class_type ());
 
 /* The representation of a deferred access check.  */
 
@@ -6542,7 +6543,6 @@ extern tree get_first_fn			(tree);
 extern tree ovl_cons(tree, tree);
 extern tree build_overload			(tree, tree);
 extern tree ovl_scope(tree);
-extern bool non_static_member_function_p(tree);
 extern const char *cxx_printable_name		(tree, int);
 extern const char *cxx_printable_name_translate	(tree, int);
 extern tree build_exception_variant		(tree, tree);
diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 8d6e75a..b5961e5 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -,8 +,6 @@ do_class_using_decl (tree scope, tree name)
   /* True if any of the bases of CURRENT_CLASS_TYPE are dependent.  */
   bool bases_dependent_p;
   tree binfo;
-  tree base_binfo;
-  int i;
 
   if (name == error_mark_node)
 return NULL_TREE;
@@ -3371,16 +3369,7 @@ do_class_using_decl (tree scope, tree name)
 		  || (IDENTIFIER_TYPENAME_P (name)
 			  && dependent_type_p (TREE_TYPE (name;
 
-  bases_dependent_p = false;
-  if (processing_template_decl)
-for (binfo = TYPE_BINFO (current_class_type), i = 0;
-	 BINFO_BASE_ITERATE (binfo, i, base_binfo);
-	 i++)
-  if (dependent_type_p (TREE_TYPE (base_binfo)))
-	{
-	  bases_dependent_p = true;
-	  break;
-	}
+  bases_dependent_p = any_dependent_bases_p (current_class_type);
 
   decl = NULL_TREE;
 
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index a55dc10..52e60b9 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -22904,9 +22904,16 @@ type_dependent_expression_p (tree expression)
   && DECL_TEMPLATE_INFO (expression))
 return any_dependent_template_arguments_p (DECL_TI_ARGS (expression));
 
-  if (TREE_CODE (expression) == TEMPLATE_DECL
-  && !DECL_TEMPLATE_TEMPLATE_PARM_P (expression))
-return false;
+  if (TREE_CODE (expression) == TEMPLATE_DECL)
+{
+  if (DECL_CLASS_SCOPE_P (expression)
+	  && dependent_type_p (DECL_CONTEXT (expression)))
+	/* A template's own parameters don't make it dependent, since those can
+	   be deduced, but the enclosing class does.  */
+	return true;
+  if (!DECL_TEMPLATE_TEMPLATE_PARM_P (expression))
+	return false;
+}
 
   if (TREE_CODE (expression) == STMT_EXPR)
 expression = stmt_expr_value_expr (expression);
diff --git a/gcc/cp/search.c b/gcc/cp/search.c
index 7924611..49f3bc5 100644
--- a/gcc/cp/search.c
+++ b/gcc/cp/search.c
@@ -2842,3 +2842,21 @@ original_binfo (tree binfo, tree here)
   return result;
 }
 
+/* True iff TYPE has any dependent bases (and therefore we can't say
+   definitively that another class is not a base of an instantiation of
+  

[PATCH] Fix up vectorization of multiplication with bool cast to integer (PR tree-optimization/69820)

2016-02-15 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled, because we first
create a pattern stmt for _5 = (int) _3; where _3 is bool,
but then recognize the following multiply as widening multiply, ignore
there the previous pattern stmt and thus instead of expanding the
cast as cond ? 1 : 0 we actually end up expanding it as cond (i.e.
cond ? -1 : 0).  In the widen mult pattern recognizer it perhaps would be
possible to handle that case, unless both arguments are cast from
bool, but there are lots of other places which call type_conversion_p and
most of them would need to either give up in those cases or add special
handling for it.  So, it seems like the easiest fix at least for GCC6 is
to punt in type_conversion_p on casts from bool/unsigned :1 (or should
I instead check STMT_VINFO_IN_PATTERN_P on the cast stmt and give up
if true?).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-02-15  Jakub Jelinek  

PR tree-optimization/69820
* tree-vect-patterns.c (type_conversion_p): Return false if
*orig_type is unsigned single precision or boolean.
(vect_recog_dot_prod_pattern, vect_recog_widen_mult_pattern):
Formatting fix.

* gcc.dg/vect/pr69820.c: New test.

--- gcc/tree-vect-patterns.c.jj 2016-01-12 14:14:56.0 +0100
+++ gcc/tree-vect-patterns.c2016-02-15 18:52:52.695249972 +0100
@@ -184,6 +184,13 @@ type_conversion_p (tree name, gimple *us
   || ((TYPE_UNSIGNED (type) != TYPE_UNSIGNED (*orig_type)) && check_sign))
 return false;
 
+  /* Conversion from bool or unsigned single bit precision bitfields
+ should have been recognized by vect_recog_bool_pattern, callers
+ of this function are generally unprepared to handle those.  */
+  if ((TYPE_PRECISION (*orig_type) == 1 && TYPE_UNSIGNED (*orig_type))
+  || TREE_CODE (*orig_type) == BOOLEAN_TYPE)
+return false;
+
   if (TYPE_PRECISION (type) >= (TYPE_PRECISION (*orig_type) * 2))
 *promotion = true;
   else
@@ -334,8 +341,8 @@ vect_recog_dot_prod_pattern (vec

[PATCH] Fix ICE in vcond expansion with -mavx512f -mno-avx512bw (PR target/69820)

2016-02-15 Thread Jakub Jelinek
Hi!

We ICE on the following testcase, because vcondv32hiv32hi pattern
really needs avx512bw, but it is enabled for avx512f.
As VI_512 iterator is only used in vcond* patterns which need the
avx512bw ISA for the V64QI and V32HI modes, I've changed that iterator.
Or do you prefer to keep that iterator as is (so it will be unused)
and another one with these conditions?  If yes, how should it be called.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2016-02-15  Jakub Jelinek  

PR target/69820
* config/i386/sse.md (VI_512): Only include V64QImode and V32HImode
if TARGET_AVX512BW.

* gcc.target/i386/pr69820.c: New test.

--- gcc/config/i386/sse.md.jj   2016-02-03 23:36:39.0 +0100
+++ gcc/config/i386/sse.md  2016-02-15 17:07:40.694352994 +0100
@@ -522,7 +522,10 @@ (define_mode_iterator VI_128 [V16QI V8HI
 (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI])
 
 ;; All 512bit vector integer modes
-(define_mode_iterator VI_512 [V64QI V32HI V16SI V8DI])
+(define_mode_iterator VI_512
+  [(V64QI "TARGET_AVX512BW")
+   (V32HI "TARGET_AVX512BW")
+   V16SI V8DI])
 
 ;; Various 128bit vector integer mode combinations
 (define_mode_iterator VI12_128 [V16QI V8HI])
--- gcc/testsuite/gcc.target/i386/pr69820.c.jj  2016-02-15 17:13:57.397220839 
+0100
+++ gcc/testsuite/gcc.target/i386/pr69820.c 2016-02-15 17:13:28.0 
+0100
@@ -0,0 +1,14 @@
+/* PR target/69820 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512f -mno-avx512bw" } */
+
+int a[100], b[100];
+short c[100];
+
+void
+foo ()
+{
+  int i;
+  for (i = 0; i < 100; ++i)
+b[i] = a[i] * (_Bool) c[i];
+}

Jakub


[PATCH] Fix ICE in sync_resolve_size (PR c++/69797)

2016-02-15 Thread Jakub Jelinek
Hi!

In C++, if there are no parameters, params can be non-NULL, but still
empty vector.  Fixed by properly testing for empty vector.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-02-15  Jakub Jelinek  

PR c++/69797
* c-common.c (sync_resolve_size): Diagnose too few arguments
even when params is non-NULL empty vector.

* c-c++-common/pr69797.c: New test.

--- gcc/c-family/c-common.c.jj  2016-02-08 18:39:17.0 +0100
+++ gcc/c-family/c-common.c 2016-02-15 14:56:14.518790242 +0100
@@ -10675,7 +10675,7 @@ sync_resolve_size (tree function, vec

[PATCH] Fix reassoc ICE (PR tree-optimization/69802)

2016-02-15 Thread Jakub Jelinek
Hi!

The following patch fixes an ICE where one of the range tests
is SSA_NAME_DEF_STMT of a bool/_Bool or unsigned : 1 bitfield.
In that case, we don't know where to put the adjusted range test.
The patch for this uncommon case gives up, unless the range test
can be the SSA_NAME_DEF_STMT itself, and in that case makes sure we
DTRT.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-02-15  Jakub Jelinek  

PR tree-optimization/69802
* tree-ssa-reassoc.c (update_range_test): If op is
SSA_NAME_IS_DEFAULT_DEF, give up unless tem is a positive
op == 1 test of precision 1 integral op, otherwise handle
that case as op itself.  Fix up formatting.
(optimize_range_tests_to_bit_test, optimize_range_tests): Fix
up formatting.

* gcc.dg/pr69802.c: New test.

--- gcc/tree-ssa-reassoc.c.jj   2016-02-12 10:23:51.0 +0100
+++ gcc/tree-ssa-reassoc.c  2016-02-15 14:25:54.996572238 +0100
@@ -2046,19 +2046,41 @@ update_range_test (struct range_entry *r
 {
   operand_entry *oe = (*ops)[range->idx];
   tree op = oe->op;
-  gimple *stmt = op ? SSA_NAME_DEF_STMT (op) :
-last_stmt (BASIC_BLOCK_FOR_FN (cfun, oe->id));
+  gimple *stmt = op ? SSA_NAME_DEF_STMT (op)
+   : last_stmt (BASIC_BLOCK_FOR_FN (cfun, oe->id));
   location_t loc = gimple_location (stmt);
   tree optype = op ? TREE_TYPE (op) : boolean_type_node;
   tree tem = build_range_check (loc, optype, unshare_expr (exp),
in_p, low, high);
   enum warn_strict_overflow_code wc = WARN_STRICT_OVERFLOW_COMPARISON;
   gimple_stmt_iterator gsi;
-  unsigned int i;
+  unsigned int i, uid;
 
   if (tem == NULL_TREE)
 return false;
 
+  /* If op is default def SSA_NAME, there is no place to insert the
+ new comparison.  Give up, unless we can use OP itself as the
+ range test.  */
+  if (op && SSA_NAME_IS_DEFAULT_DEF (op))
+{
+  if (op == range->exp
+ && ((TYPE_PRECISION (optype) == 1 && TYPE_UNSIGNED (optype))
+ || TREE_CODE (optype) == BOOLEAN_TYPE)
+ && (op == tem
+ || (TREE_CODE (tem) == EQ_EXPR
+ && TREE_OPERAND (tem, 0) == op
+ && integer_onep (TREE_OPERAND (tem, 1
+ && opcode != BIT_IOR_EXPR
+ && (opcode != ERROR_MARK || oe->rank != BIT_IOR_EXPR))
+   {
+ stmt = NULL;
+ tem = op;
+   }
+  else
+   return false;
+}
+
   if (strict_overflow_p && issue_strict_overflow_warning (wc))
 warning_at (loc, OPT_Wstrict_overflow,
"assuming signed overflow does not occur "
@@ -2096,12 +2118,22 @@ update_range_test (struct range_entry *r
 tem = invert_truthvalue_loc (loc, tem);
 
   tem = fold_convert_loc (loc, optype, tem);
-  gsi = gsi_for_stmt (stmt);
-  unsigned int uid = gimple_uid (stmt);
+  if (stmt)
+{
+  gsi = gsi_for_stmt (stmt);
+  uid = gimple_uid (stmt);
+}
+  else
+{
+  gsi = gsi_none ();
+  uid = 0;
+}
+  if (stmt == NULL)
+gcc_checking_assert (tem == op);
   /* In rare cases range->exp can be equal to lhs of stmt.
  In that case we have to insert after the stmt rather then before
  it.  If stmt is a PHI, insert it at the start of the basic block.  */
-  if (op != range->exp)
+  else if (op != range->exp)
 {
   gsi_insert_seq_before (, seq, GSI_SAME_STMT);
   tem = force_gimple_operand_gsi (, tem, true, NULL_TREE, true,
@@ -2489,7 +2521,7 @@ optimize_range_tests_to_bit_test (enum t
  operand_entry *oe = (*ops)[ranges[i].idx];
  tree op = oe->op;
  gimple *stmt = op ? SSA_NAME_DEF_STMT (op)
-  : last_stmt (BASIC_BLOCK_FOR_FN (cfun, oe->id));
+   : last_stmt (BASIC_BLOCK_FOR_FN (cfun, oe->id));
  location_t loc = gimple_location (stmt);
  tree optype = op ? TREE_TYPE (op) : boolean_type_node;
 
@@ -2553,7 +2585,7 @@ optimize_range_tests_to_bit_test (enum t
  gcc_assert (TREE_CODE (exp) == SSA_NAME);
  gimple_set_visited (SSA_NAME_DEF_STMT (exp), true);
  gimple *g = gimple_build_assign (make_ssa_name (optype),
- BIT_IOR_EXPR, tem, exp);
+  BIT_IOR_EXPR, tem, exp);
  gimple_set_location (g, loc);
  gimple_seq_add_stmt_without_update (, g);
  exp = gimple_assign_lhs (g);
@@ -2599,8 +2631,9 @@ optimize_range_tests (enum tree_code opc
   oe = (*ops)[i];
   ranges[i].idx = i;
   init_range_entry (ranges + i, oe->op,
-   oe->op ? NULL :
- last_stmt (BASIC_BLOCK_FOR_FN (cfun, oe->id)));
+   oe->op
+   ? NULL
+   : last_stmt (BASIC_BLOCK_FOR_FN (cfun, oe->id)));
   /* For | invert it now, we will invert it again before emitting
 the optimized expression.  

[COMMITTED] Fix a typo in comment

2016-02-15 Thread Bernd Edlinger
Hi,

I've committed the following as obvious in trunk r233428.

Thanks
Bernd.


--- ChangeLog   (revision 233427)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2016-02-15  Bernd Edlinger  
+
+   * alias.c (get_alias_set): Fix a typo in comment.
+
  2016-02-15  Richard Biener  

PR tree-optimization/69595
Index: alias.c
===
--- alias.c (revision 233427)
+++ alias.c (working copy)
@@ -827,7 +827,7 @@ get_alias_set (tree t)

/* We can not give up with -fno-strict-aliasing because we need to build
   proper type representation for possible functions which are build 
with
- -fstirct-aliasing.  */
+ -fstrict-aliasing.  */

/* return 0 if this or its type is an error.  */
if (t == error_mark_node



Re: [PATCH, rs6000] Add -maltivec=be semantics in LE mode for vec_ld and vec_st

2016-02-15 Thread Bill Schmidt
Wow, that's pretty bad; obviously a pasto.  Thanks for pointing it out!
I'm really surprised this has survived this long, but that may be a
comment on how much lvxl is used.  I'll get this fixed asap.

Thanks,
Bill

On Tue, 2016-02-09 at 18:25 +0100, Ulrich Weigand wrote:
> Hi Bill,
> 
> > 2014-02-20  Bill Schmidt  
> > 
> > * config/rs6000/altivec.md (altivec_lvxl): Rename as
> > *altivec_lvxl__internal and use VM2 iterator instead of
> > V4SI.
> > (altivec_lvxl_): New define_expand incorporating
> > -maltivec=be semantics where needed.
> 
> I just noticed that this:
> 
> > -(define_insn "altivec_lvxl"
> > +(define_expand "altivec_lvxl_"
> >[(parallel
> > -[(set (match_operand:V4SI 0 "register_operand" "=v")
> > - (match_operand:V4SI 1 "memory_operand" "Z"))
> > +[(set (match_operand:VM2 0 "register_operand" "=v")
> > + (match_operand:VM2 1 "memory_operand" "Z"))
> >   (unspec [(const_int 0)] UNSPEC_SET_VSCR)])]
> >"TARGET_ALTIVEC"
> > -  "lvxl %0,%y1"
> > +{
> > +  if (!BYTES_BIG_ENDIAN && VECTOR_ELT_ORDER_BIG)
> > +{
> > +  altivec_expand_lvx_be (operands[0], operands[1], mode, 
> > UNSPEC_SET_VSCR);
> > +  DONE;
> > +}
> > +})
> > +
> > +(define_insn "*altivec_lvxl__internal"
> > +  [(parallel
> > +[(set (match_operand:VM2 0 "register_operand" "=v")
> > + (match_operand:VM2 1 "memory_operand" "Z"))
> > + (unspec [(const_int 0)] UNSPEC_SET_VSCR)])]
> > +  "TARGET_ALTIVEC"
> > +  "lvx %0,%y1"
> >[(set_attr "type" "vecload")])
> 
> now causes vec_ldl to emit the lvx instead of the lvxl instruction.
> I assume this was not actually intended?
> 
> Bye,
> Ulrich
> 




Re: [PATCH] Fix PR69771, bogus CONST_INT during shift expansion

2016-02-15 Thread Richard Biener
On February 15, 2016 7:15:35 PM GMT+01:00, Jakub Jelinek  
wrote:
>On Mon, Feb 15, 2016 at 06:58:45PM +0100, Richard Biener wrote:
>> We could also force_reg those at expansion or apply
>SHIFT_COUNT_TRUNCATED to those invalid constants there.
>
>Sure, but for force_reg we'd still need the gen_int_mode anyway.
>As for SHIFT_COUNT_TRUNCATED, it should have been applied already from
>the
>caller - expand_shift_1.

But then no out of bound values should remain.  Until we get 256bit ints where 
your workaround wouldn't work either?

Richard.

>> >2016-02-15  Jakub Jelinek  
>> >
>> >PR rtl-optimization/69764
>> >PR rtl-optimization/69771
>> >* optabs.c (expand_binop): Ensure for shift optabs invalid
>CONST_INT
>> >op1 is valid for mode.
>> >
>> >--- gcc/optabs.c.jj 2016-02-12 17:49:25.0 +0100
>> >+++ gcc/optabs.c2016-02-15 16:15:53.983673792 +0100
>> >@@ -1125,6 +1125,12 @@ expand_binop (machine_mode mode, optab b
>> >   op1 = negate_rtx (mode, op1);
>> >   binoptab = add_optab;
>> > }
>> >+  /* For shifts, constant invalid op1 might be expanded from
>different
>> >+ mode than MODE.  */
>> >+  else if (CONST_INT_P (op1)
>> >+  && shift_optab_p (binoptab)
>> >+  && UINTVAL (op1) >= GET_MODE_BITSIZE (GET_MODE_INNER (mode)))
>> >+op1 = gen_int_mode (INTVAL (op1), mode);
>> > 
>> >   /* Record where to delete back to if we backtrack.  */
>> >   last = get_last_insn ();
>
>   Jakub




C++ PATCH for c++/68890 (ICE with constexpr value-initialization)

2016-02-15 Thread Jason Merrill
Here, my assertion that a CONSTRUCTOR should be empty when we start to 
give it an initial value was forgetting about the case of classes with 
non-user-defined constructors, where value-initialization first 
zero-initializes, then calls the synthesized constructor.


Tested x86_64-pc-linux-gnu, applying to trunk and 5.
commit c87b2db1a8bff1394c5e607f8d470f5eed20193c
Author: Jason Merrill 
Date:   Wed Feb 10 21:17:52 2016 -0500

	PR c++/68990

	* constexpr.c (verify_ctor_sanity): Remove CONSTRUCTOR_NELTS check.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 85fc64e..11037fb 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -2202,7 +2202,8 @@ verify_ctor_sanity (const constexpr_ctx *ctx, tree type)
   gcc_assert (ctx->ctor);
   gcc_assert (same_type_ignoring_top_level_qualifiers_p
 	  (type, TREE_TYPE (ctx->ctor)));
-  gcc_assert (CONSTRUCTOR_NELTS (ctx->ctor) == 0);
+  /* We used to check that ctx->ctor was empty, but that isn't the case when
+ the object is zero-initialized before calling the constructor.  */
   if (ctx->object)
 gcc_assert (same_type_ignoring_top_level_qualifiers_p
 		(type, TREE_TYPE (ctx->object)));
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-value5.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-value5.C
new file mode 100644
index 000..8c67174
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-value5.C
@@ -0,0 +1,18 @@
+// PR c++/68990
+// { dg-do compile { target c++11 } }
+
+class ptr;
+template  struct A { typedef ptr _Type[_Nm]; };
+template  struct B { typename A<_Nm>::_Type _M_elems; };
+template  class FixedVector : B {
+public:
+  typedef B<1> base;
+  constexpr FixedVector() : base(), size_() {}
+  char size_;
+};
+class ptr {
+public:
+  constexpr ptr() : px_(){};
+  int px_;
+};
+FixedVector<1> a;


Re: [RFC, PR68580] Handle pthread_create error in tsan testsuite

2016-02-15 Thread Dmitry Vyukov
On Mon, Feb 15, 2016 at 8:22 PM, Mike Stump  wrote:
> On Feb 15, 2016, at 3:29 AM, Bernd Edlinger  wrote:
>> And independently of that I am looking at using llvm's test.h framework 
>> instead
>> of gcc's test_barrier.h for gcc-7 soon.
>
> Here’s to hoping that we don’t back slide on:
>
>   https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00436.html
>
> Did they ever adopt a reliable scheme to test?

Yes, they did.
Btw, the pthread_barrier solution did not work on macos as it does not
implement pthread_barrier and libpthread.so.0 does not exist. That was
replaced with spin loop with usleep(100). That caused spurious "as if
synchronized via sleep" messages that broke tests. That was replaced
with busy loop. That caused timeouts of weak test bots. That was
replaced with loop with sched_yield. That caused tsan trace overflows
and "failed to restore stack trace" error messages. And that was
replaced with special support in tsan runtime which seems to work in
all contexts so far...


Re: [RFC, PR68580] Handle pthread_create error in tsan testsuite

2016-02-15 Thread Mike Stump
On Feb 15, 2016, at 3:29 AM, Bernd Edlinger  wrote:
> And independently of that I am looking at using llvm's test.h framework 
> instead
> of gcc's test_barrier.h for gcc-7 soon.

Here’s to hoping that we don’t back slide on:

  https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00436.html

Did they ever adopt a reliable scheme to test?

[gomp-nvptx 5/5] libgomp plugin: manage soft-stack storage

2016-02-15 Thread Alexander Monakov
This patch implements the libgomp plugin part of the transition to
host-allocated soft stacks.  For now only a simple scheme with
allocation/deallocation per launch is implemented; a followup change is
planned to cache and reuse allocations when appropriate.

The call to cuLaunchKernel is changed to pass kernel entry function arguments
in a way that allows the driver to check for mismatch (but only when the
cumulative size of passed arguments is different).

* plugin/plugin-nvptx.c (nvptx_stacks_size): New.
(nvptx_stacks_alloc): New.
(nvptx_stacks_free): New.
(GOMP_OFFLOAD_run): Allocate soft-stacks storage from the host using
the above new functions.  Use kernel launch interface that allows
checking for mismatched total size of entry function arguments.
 
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index cb6a3ac..adf57b1 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1892,6 +1892,37 @@ nvptx_adjust_launch_bounds (struct targ_fn_descriptor 
*fn,
 *teams_p = max_blocks;
 }
 
+/* Return the size of per-warp stacks (see gcc -msoft-stack) to use for OpenMP
+   target regions.  */
+
+static size_t
+nvptx_stacks_size ()
+{
+  return 128 * 1024;
+}
+
+/* Return contiguous storage for NUM stacks, each SIZE bytes.  */
+
+static void *
+nvptx_stacks_alloc (size_t size, int num)
+{
+  CUdeviceptr stacks;
+  CUresult r = cuMemAlloc (, size * num);
+  if (r != CUDA_SUCCESS)
+GOMP_PLUGIN_fatal ("cuMemAlloc error: %s", cuda_error (r));
+  return (void *) stacks;
+}
+
+/* Release storage previously allocated by nvptx_stacks_alloc.  */
+
+static void
+nvptx_stacks_free (void *p, int num)
+{
+  CUresult r = cuMemFree ((CUdeviceptr) p);
+  if (r != CUDA_SUCCESS)
+GOMP_PLUGIN_fatal ("cuMemFree error: %s", cuda_error (r));
+}
+
 void
 GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)
 {
@@ -1899,7 +1930,6 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, 
void **args)
   CUresult r;
   struct ptx_device *ptx_dev = ptx_devices[ord];
   const char *maybe_abort_msg = "(perhaps abort was called)";
-  void *fn_args = _vars;
   int teams = 0, threads = 0;
 
   if (!args)
@@ -1922,10 +1952,19 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void 
*tgt_vars, void **args)
 }
   nvptx_adjust_launch_bounds (tgt_fn, ptx_dev, , );
 
+  size_t stack_size = nvptx_stacks_size ();
+  void *stacks = nvptx_stacks_alloc (stack_size, teams * threads);
+  void *fn_args[] = {tgt_vars, stacks, (void *) stack_size};
+  size_t fn_args_size = sizeof fn_args;
+  void *config[] = {
+CU_LAUNCH_PARAM_BUFFER_POINTER, fn_args,
+CU_LAUNCH_PARAM_BUFFER_SIZE, _args_size,
+CU_LAUNCH_PARAM_END
+  };
   r = cuLaunchKernel (function,
  teams, 1, 1,
  32, threads, 1,
- 0, ptx_dev->null_stream->stream, _args, 0);
+ 0, ptx_dev->null_stream->stream, NULL, config);
   if (r != CUDA_SUCCESS)
 GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
 
@@ -1935,6 +1974,7 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, 
void **args)
   maybe_abort_msg);
   else if (r != CUDA_SUCCESS)
 GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s", cuda_error (r));
+  nvptx_stacks_free (stacks, teams * threads);
 }
 
 void


Re: [PATCH] Fix PR69291, RTL if-conversion bug

2016-02-15 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 10 Feb 2016, Bernd Schmidt wrote:
>
>> On 02/10/2016 02:50 PM, Richard Biener wrote:
>> > On Wed, 10 Feb 2016, Bernd Schmidt wrote:
>> > 
>> > > On 02/10/2016 02:35 PM, Richard Biener wrote:
>> > > 
>> > > > Index: gcc/ifcvt.c
>> > > > ===
>> > > > --- gcc/ifcvt.c (revision 233262)
>> > > > +++ gcc/ifcvt.c (working copy)
>> > > > @@ -1274,7 +1274,8 @@ noce_try_store_flag_constants (struct no
>> > > >  && CONST_INT_P (XEXP (a, 1))
>> > > >  && CONST_INT_P (XEXP (b, 1))
>> > > >  && rtx_equal_p (XEXP (a, 0), XEXP (b, 0))
>> > > > -  && noce_operand_ok (XEXP (a, 0))
>> > > > +  && (REG_P (XEXP (a, 0))
>> > > > + || ! reg_mentioned_p (if_info->x, XEXP (a, 0)))
>> > > 
>> > > I guess that would also work. Could maybe use a brief comment.
>> > 
>> > Ok.  I'm testing that.  I wonder if we need to use reg_overlap_mentioned_p
>> > here (hard-reg pairs?) or if reg_mentioned_p is safe.
>> 
>> Let's go with reg_overlap_mentioned_p. I kind of forgot about that once I
>> thought of possible issues with emitting a move :-(
>
> Ok, the following is in testing now.
>
> Ok?
>
> Thanks,
> Richard.
>
> 2016-02-10  Richard Biener  
>
>   PR rtl-optimization/69291
>   * ifcvt.c (noce_try_store_flag_constants): Do not allow
>   subexpressions affected by changing the result.
>
> Index: gcc/ifcvt.c
> ===
> --- gcc/ifcvt.c   (revision 233262)
> +++ gcc/ifcvt.c   (working copy)
> @@ -1274,7 +1274,10 @@ noce_try_store_flag_constants (struct no
>&& CONST_INT_P (XEXP (a, 1))
>&& CONST_INT_P (XEXP (b, 1))
>&& rtx_equal_p (XEXP (a, 0), XEXP (b, 0))
> -  && noce_operand_ok (XEXP (a, 0))
> +  /* Allow expressions that are not using the result or plain
> + registers where we handle overlap below.  */
> +  && (REG_P (XEXP (a, 0))
> +   || ! reg_overlap_mentioned_p (if_info->x, XEXP (a, 0)))
>&& if_info->branch_cost >= 2)

Sorry if this has already been covered, but shouldn't we be adding
to the noce_operand_ok check rather than replacing it?  I think we
still want to check side_effects_p and may_trap_p.

Thanks,
Richard


[gomp-nvptx 4/5] libgomp: remove __nvptx_stacks setup code

2016-02-15 Thread Alexander Monakov
This patch implements the NVPTX libgomp part of the transition to
host-allocated soft stacks.  The wrapper around gomp_nvptx_main previously
responsible for that is no longer needed.

This mostly reverts commit b408f1293e29a009ba70a3fda7b800277e1f310a.

* config/nvptx/team.c: (gomp_nvptx_main_1): Rename back to...
(gomp_nvptx_main): ...this; delete the wrapper.

diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index bc8c4e6..b9f9f9f 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -34,9 +34,12 @@ struct gomp_thread *nvptx_thrs __attribute__((shared));
 
 static void gomp_thread_start (struct gomp_thread_pool *);
 
-static void __attribute__((noinline))
-gomp_nvptx_main_1 (void (*fn) (void *), void *fn_data, int ntids, int tid)
+void
+gomp_nvptx_main (void (*fn) (void *), void *fn_data)
 {
+  int tid, ntids;
+  asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
+  asm ("mov.u32 %0, %%ntid.y;" : "=r" (ntids));
   if (tid == 0)
 {
   gomp_global_icv.nthreads_var = ntids;
@@ -69,30 +72,6 @@ gomp_nvptx_main_1 (void (*fn) (void *), void *fn_data, int 
ntids, int tid)
 }
 }
 
-void
-gomp_nvptx_main (void (*fn) (void *), void *fn_data)
-{
-  int tid, ntids;
-  asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
-  asm ("mov.u32 %0, %%ntid.y;" : "=r" (ntids));
-  char *stacks = 0;
-  int *__nvptx_uni;
-  asm ("cvta.shared.u64 %0, __nvptx_uni;" : "=r" (__nvptx_uni));
-  __nvptx_uni[tid] = 0;
-  if (tid == 0)
-{
-  size_t stacksize = 131072;
-  stacks = gomp_malloc (stacksize * ntids);
-  char **__nvptx_stacks = 0;
-  asm ("cvta.shared.u64 %0, __nvptx_stacks;" : "=r" (__nvptx_stacks));
-  for (int i = 0; i < ntids; i++)
-   __nvptx_stacks[i] = stacks + stacksize * (i + 1);
-}
-  asm ("bar.sync 0;");
-  gomp_nvptx_main_1 (fn, fn_data, ntids, tid);
-  free (stacks);
-}
-
 /* This function is a pthread_create entry point.  This contains the idle
loop in which a thread waits to be called up to become part of a team.  */
 


[gomp-nvptx 3/5] nvptx backend: set up stacks in entry code

2016-02-15 Thread Alexander Monakov
This patch implements the NVPTX backend part of the transition to
host-allocated soft stacks.  The compiler-emitted kernel entry code now
accepts a pointer to stack storage and per-warp stack size, and initialized
__nvptx_stacks based on that (as well as trivially initializing __nvptx_uni).

The rewritten part of write_omp_entry now uses macro-expanded assembly
snippets to avoid highly repetitive dynamic code accounting for 32/64-bit
differences.

* config/nvptx/nvptx.c (write_omp_entry): Expand entry code to
initialize __nvptx_uni and __nvptx_stacks (based on pointer to storage
allocated by the libgomp plugin).

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index efd0f8e..81dd9a2 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -979,8 +979,10 @@ nvptx_init_unisimt_predicate (FILE *file)
 /* Emit kernel NAME for function ORIG outlined for an OpenMP 'target' region:
 
extern void gomp_nvptx_main (void (*fn)(void*), void *fnarg);
-   void __attribute__((kernel)) NAME(void *arg)
+   void __attribute__((kernel)) NAME (void *arg, char *stack, size_t stacksize)
{
+ __nvptx_stacks[tid.y] = stack + stacksize * (ctaid.x * ntid.y + tid.y + 
1);
+ __nvptx_uni[tid.y] = 0;
  gomp_nvptx_main (ORIG, arg);
}
ORIG itself should not be emitted as a PTX .entry function.  */
@@ -1000,21 +1002,44 @@ write_omp_entry (std::stringstream , const char 
*name, const char *orig)
   s << ".extern .func gomp_nvptx_main";
   s << "(.param" << sfx << " %in_ar1, .param" << sfx << " %in_ar2);\n";
 }
-  s << ".visible .entry " << name << "(.param" << sfx << " %in_ar1)\n";
-  s << "{\n";
-  s << "\t.reg" << sfx << " %ar1;\n";
-  s << "\t.reg" << sfx << " %r1;\n";
-  s << "\tld.param" << sfx << " %ar1, [%in_ar1];\n";
-  s << "\tmov" << sfx << " %r1, " << orig << ";\n";
-  s << "\t{\n";
-  s << "\t\t.param" << sfx << " %out_arg0;\n";
-  s << "\t\t.param" << sfx << " %out_arg1;\n";
-  s << "\t\tst.param" << sfx << " [%out_arg0], %r1;\n";
-  s << "\t\tst.param" << sfx << " [%out_arg1], %ar1;\n";
-  s << "\t\tcall.uni gomp_nvptx_main, (%out_arg0, %out_arg1);\n";
-  s << "\t}\n";
-  s << "\tret;\n";
-  s << "}\n";
+#define ENTRY_TEMPLATE(PS, PS_BYTES, MAD_PS_32) "\
+ (.param.u" PS " %arg, .param.u" PS " %stack, .param.u" PS " %sz)\n\
+{\n\
+   .reg.u32 %r<3>;\n\
+   .reg.u" PS " %R<4>;\n\
+   mov.u32 %r0, %tid.y;\n\
+   mov.u32 %r1, %ntid.y;\n\
+   mov.u32 %r2, %ctaid.x;\n\
+   cvt.u" PS ".u32 %R1, %r0;\n\
+   " MAD_PS_32 " %R1, %r1, %r2, %R1;\n\
+   mov.u" PS " %R0, __nvptx_stacks;\n\
+   " MAD_PS_32 " %R0, %r0, " PS_BYTES ", %R0;\n\
+   ld.param.u" PS " %R2, [%stack];\n\
+   ld.param.u" PS " %R3, [%sz];\n\
+   add.u" PS " %R2, %R2, %R3;\n\
+   mad.lo.u" PS " %R2, %R1, %R3, %R2;\n\
+   st.shared.u" PS " [%R0], %R2;\n\
+   mov.u" PS " %R0, __nvptx_uni;\n\
+   " MAD_PS_32 " %R0, %r0, 4, %R0;\n\
+   mov.u32 %r0, 0;\n\
+   st.shared.u32 [%R0], %r0;\n\
+   mov.u" PS " %R0, \0;\n\
+   ld.param.u" PS " %R1, [%arg];\n\
+   {\n\
+   .param.u" PS " %P<2>;\n\
+   st.param.u" PS " [%P0], %R0;\n\
+   st.param.u" PS " [%P1], %R1;\n\
+   call.uni gomp_nvptx_main, (%P0, %P1);\n\
+   }\n\
+   ret.uni;\n\
+}\n"
+  static const char template64[] = ENTRY_TEMPLATE ("64", "8", "mad.wide.u32");
+  static const char template32[] = ENTRY_TEMPLATE ("32", "4", "mad.lo.u32  ");
+#undef ENTRY_TEMPLATE
+  const char *template_1 = TARGET_ABI64 ? template64 : template32;
+  const char *template_2 = template_1 + strlen (template64) + 1;
+  s << ".visible .entry " << name << template_1 << orig << template_2;
+  need_softstack_decl = need_unisimt_decl = true;
 }
 
 /* Implement ASM_DECLARE_FUNCTION_NAME.  Writes the start of a ptx


[gomp-nvptx 1/5] libgomp plugin: correct types

2016-02-15 Thread Alexander Monakov
Handling of arguments array wrongly assumed that 'long' would match the size
of 'void *'.  As that would break on MinGW, use 'intptr_t'.  Use 'int' for
'teams' and 'threads', as that's what cuLaunchKernel accepts.

* plugin/plugin-nvptx.c (nvptx_adjust_launch_bounds): Adjust types.
(GOMP_OFFLOAD_run): Ditto.

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 39575d9..79fd253 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1874,7 +1875,7 @@ GOMP_OFFLOAD_openacc_set_cuda_stream (int async, void 
*stream)
 static void
 nvptx_adjust_launch_bounds (struct targ_fn_descriptor *fn,
struct ptx_device *ptx_dev,
-   long *teams_p, long *threads_p)
+   int *teams_p, int *threads_p)
 {
   int max_warps_block = fn->max_threads_per_block / 32;
   /* Maximum 32 warps per block is an implementation limit in NVPTX backend
@@ -1903,19 +1904,20 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void 
*tgt_vars, void **args)
   struct ptx_device *ptx_dev = ptx_devices[ord];
   const char *maybe_abort_msg = "(perhaps abort was called)";
   void *fn_args = _vars;
-  long teams = 0, threads = 0;
+  int teams = 0, threads = 0;
 
   if (!args)
 GOMP_PLUGIN_fatal ("No target arguments provided");
   while (*args)
 {
-  long id = (long) *args++, val;
+  intptr_t id = (intptr_t) *args++, val;
   if (id & GOMP_TARGET_ARG_SUBSEQUENT_PARAM)
-   val = (long) *args++;
+   val = (intptr_t) *args++;
   else
 val = id >> GOMP_TARGET_ARG_VALUE_SHIFT;
   if ((id & GOMP_TARGET_ARG_DEVICE_MASK) != GOMP_TARGET_ARG_DEVICE_ALL)
continue;
+  val = val > INT_MAX ? INT_MAX : val;
   id &= GOMP_TARGET_ARG_ID_MASK;
   if (id == GOMP_TARGET_ARG_NUM_TEAMS)
teams = val;


[gomp-nvptx 2/5] Revert "nvptx plugin: bump heap size to 1GB"

2016-02-15 Thread Alexander Monakov
This reverts commit 7d36b841341cde96f6cf89c5232916062da3fe4c.

The change was not well motivated: soft stacks would not fit in the default
8 MB heap only with multiple teams.  With the transition to host-allocated
soft stacks, libgomp uses the device heap only for relatively small
allocations.

Revert
2015-12-09  Alexander Monakov  

* plugin/plugin-nvptx.c (nvptx_open_device): Adjust heap size.

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 79fd253..cb6a3ac 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -694,10 +694,6 @@ nvptx_open_device (int n)
 
   init_streams_for_device (ptx_dev, async_engines);
 
-  r = cuCtxSetLimit (CU_LIMIT_MALLOC_HEAP_SIZE, 1<<30);
-  if (r != CUDA_SUCCESS)
-GOMP_PLUGIN_fatal ("cuCtxSetLimit error: %s", cuda_error (r));
-
   return ptx_dev;
 }
 


[gomp-nvptx 0/5] Reorganize soft-stack setup

2016-02-15 Thread Alexander Monakov
I've committed the following 5-patch series to amonakov/gomp-nvptx git branch.
The first two patches are unrelated fixes to previously landed code.  Patches
3-5 reorganize the way initial soft-stack setup is done.

Previously soft-stacks used to be allocated by libgomp/config/nvptx/team.c in
a function that wrapped gomp_nvptx_main.  However:

  - the default device heap is only 8 MB, which is not enough for multiple
teams with 128 KiB per-warp stacks; libgomp plugin would need to increase
heap size;
  - device heap persists between launches, so it's possible to leak
soft-stack allocations if a team exits without cleaning up;
  - device malloc is rather slow, so I'd like to eliminate or reuse device
allocations as much as possible; it's easier to arrange reuse of soft
stack storage from the host side;
  - there's a chicken-and-egg problem with setting up soft stacks from C code.

So the above motivates a transition to a scheme where libgomp core is
oblivious to soft stack setup, and instead the storage is allocated from the
libgomp plugin (via cuMemAlloc) and passed to the compiler-emitted entry
function as the 2nd (base pointer) and 3rd (per-warp size) arguments.  This
obviously addresses bullets 1-2 above, bullet 4 is addressed since the entry
code is emitted in assembly from the backend, and bullet 3 is left to a
followup change: cuMemAlloc is roughly as slow on the host as malloc is slow
on the device, but we should be able to reuse allocations on the host.

This changes the binary interface between libgomp plugin (GOMP_OFFLOAD_run)
and compiler-emitted kernel entry functions for OpenMP target regions.  For
now, I am free to do that on the branch without worries, but if a similar
change is required in the future after a release, libgomp plugin should be
able to detect which arguments the entry expects.  Assuming the argument list
is only appended to, libgomp plugin only needs to know the argument count.  So
a possible solution is to invent a tagging mechanism when the change needs to
be made, and provide the default 3 arguments to untagged entries.  Old libgomp
plugins unaware of the change should be able to detect failure to provide
sufficient arguments to entries emitted from new compiler from the failure of
cuLaunchKernel

Alexander Monakov (5):
  libgomp plugin: correct types
  Revert "nvptx plugin: bump heap size to 1GB"
  nvptx backend: set up stacks in entry code
  libgomp: remove __nvptx_stacks setup code
  libgomp plugin: manage soft-stack storage

 gcc/ChangeLog.gomp-nvptx  |  6 +
 gcc/config/nvptx/nvptx.c  | 57 ++
 libgomp/ChangeLog.gomp-nvptx  | 26 +++
 libgomp/config/nvptx/team.c   | 31 ---
 libgomp/plugin/plugin-nvptx.c | 58 +++
 5 files changed, 126 insertions(+), 52 deletions(-)



Re: [PATCH] Fix PR69771, bogus CONST_INT during shift expansion

2016-02-15 Thread Jakub Jelinek
On Mon, Feb 15, 2016 at 06:58:45PM +0100, Richard Biener wrote:
> We could also force_reg those at expansion or apply SHIFT_COUNT_TRUNCATED to 
> those invalid constants there.

Sure, but for force_reg we'd still need the gen_int_mode anyway.
As for SHIFT_COUNT_TRUNCATED, it should have been applied already from the
caller - expand_shift_1.

> >2016-02-15  Jakub Jelinek  
> >
> > PR rtl-optimization/69764
> > PR rtl-optimization/69771
> > * optabs.c (expand_binop): Ensure for shift optabs invalid CONST_INT
> > op1 is valid for mode.
> >
> >--- gcc/optabs.c.jj  2016-02-12 17:49:25.0 +0100
> >+++ gcc/optabs.c 2016-02-15 16:15:53.983673792 +0100
> >@@ -1125,6 +1125,12 @@ expand_binop (machine_mode mode, optab b
> >   op1 = negate_rtx (mode, op1);
> >   binoptab = add_optab;
> > }
> >+  /* For shifts, constant invalid op1 might be expanded from different
> >+ mode than MODE.  */
> >+  else if (CONST_INT_P (op1)
> >+   && shift_optab_p (binoptab)
> >+   && UINTVAL (op1) >= GET_MODE_BITSIZE (GET_MODE_INNER (mode)))
> >+op1 = gen_int_mode (INTVAL (op1), mode);
> > 
> >   /* Record where to delete back to if we backtrack.  */
> >   last = get_last_insn ();

Jakub


Re: [PATCH] Fix PR69771, bogus CONST_INT during shift expansion

2016-02-15 Thread Richard Biener
On February 15, 2016 4:34:38 PM GMT+01:00, Jakub Jelinek  
wrote:
>On Sat, Feb 13, 2016 at 07:50:25AM +, James Greenhalgh wrote:
>> On Fri, Feb 12, 2016 at 05:34:21PM +0100, Jakub Jelinek wrote:
>> > On Fri, Feb 12, 2016 at 03:20:07PM +0100, Bernd Schmidt wrote:
>> > > >>-  mode1 = GET_MODE (xop1) != VOIDmode ? GET_MODE (xop1) :
>mode;
>> > > >>+  mode1 = GET_MODE (xop1) != VOIDmode ? GET_MODE (xop1) :
>mode1;
>> > > >>if (xmode1 != VOIDmode && xmode1 != mode1)
>> > > >>  {
>> > > 
>> > > Here, things aren't so clear, and the fact that the mode1
>calculation now
>> > > differs from the mode0 one may be overlooked by someone in the
>future.
>> > > 
>> > > Rather than use codes like "mode variable is VOIDmode", I'd
>prefer to use
>> > > booleans with descriptive names, like "op1_may_need_conversion".
>> > 
>> > So do you prefer e.g. following?  Bootstrapped/regtested on
>x86_64-linux and
>> > i686-linux.
>> > 
>> > 2016-02-12  Jakub Jelinek  
>> > 
>> >PR rtl-optimization/69764
>> >PR rtl-optimization/69771
>> >* optabs.c (expand_binop_directly): For shift_optab_p, force
>> >convert_modes with VOIDmode if xop1 has VOIDmode.
>> > 
>> >* c-c++-common/pr69764.c: New test.
>> >* gcc.dg/torture/pr69771.c: New testcase.
>> > 
>> 
>> These two new tests are failing for me on AArch64 as so:
>
>As I said earlier, I wanted to fix it in expand_binop_directly because
>the
>higher levels still GEN_INT the various shift counters and then call
>expand_binop_directly.  But, as can be seen on aarch64/arm/m68k, there
>are
>cases that need op1 to be valid for mode already in expand_binop, so in
>addition to the already committed fix I think we need to handle it
>at the expand_binop level too.
>As we don't have a single spot with convert_modes like
>expand_binop_directly, I think the best is to do there a change
>for the uncommon and invalid cases only, like (seems to fix the ICE
>both on aarch64 and m68k):

We could also force_reg those at expansion or apply SHIFT_COUNT_TRUNCATED to 
those invalid constants there.

Richard.

>2016-02-15  Jakub Jelinek  
>
>   PR rtl-optimization/69764
>   PR rtl-optimization/69771
>   * optabs.c (expand_binop): Ensure for shift optabs invalid CONST_INT
>   op1 is valid for mode.
>
>--- gcc/optabs.c.jj2016-02-12 17:49:25.0 +0100
>+++ gcc/optabs.c   2016-02-15 16:15:53.983673792 +0100
>@@ -1125,6 +1125,12 @@ expand_binop (machine_mode mode, optab b
>   op1 = negate_rtx (mode, op1);
>   binoptab = add_optab;
> }
>+  /* For shifts, constant invalid op1 might be expanded from different
>+ mode than MODE.  */
>+  else if (CONST_INT_P (op1)
>+ && shift_optab_p (binoptab)
>+ && UINTVAL (op1) >= GET_MODE_BITSIZE (GET_MODE_INNER (mode)))
>+op1 = gen_int_mode (INTVAL (op1), mode);
> 
>   /* Record where to delete back to if we backtrack.  */
>   last = get_last_insn ();
>
>
>   Jakub




Re: add check for aarch64 in check_effective_target_section_anchors()

2016-02-15 Thread Prathamesh Kulkarni
On 15 February 2016 at 19:24, James Greenhalgh  wrote:
> On Thu, Feb 11, 2016 at 11:03:23PM +0530, Prathamesh Kulkarni wrote:
>> Hi,
>> aarch64 supports section anchors but it appears
>> check_effective_target_section_anchors() doesn't contain entry for it.
>> This patch adds for entry for aarch64.
>> OK for trunk ?
>
> OK. I presume you tested this, and the testcases this enables PASS without
> issue?
Yes, the unsupported test-cases for section anchors pass.
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/233425-target-supports/aarch64-none-linux-gnu/diff-gcc-rh60-aarch64-none-linux-gnu-default-default-default.txt
Tested with aarch64-none-linux-gnu, aarch64-none-elf, and aarch64_be-none-elf.
Committed as r233426.

Thanks,
Prathamesh
>
> Thanks,
> James
>
>> diff --git a/gcc/testsuite/lib/target-supports.exp 
>> b/gcc/testsuite/lib/target-supports.exp
>> index 645981a..66fb1ea 100644
>> --- a/gcc/testsuite/lib/target-supports.exp
>> +++ b/gcc/testsuite/lib/target-supports.exp
>> @@ -5467,7 +5467,8 @@ proc check_effective_target_section_anchors { } {
>>  } else {
>>  set et_section_anchors_saved 0
>>  if { [istarget powerpc*-*-*]
>> -   || [istarget arm*-*-*] } {
>> +   || [istarget arm*-*-*]
>> +   || [istarget aarch64*-*-*] } {
>> set et_section_anchors_saved 1
>>  }
>>  }
>
>


Re: [AArch64] Remove AARCH64_EXTRA_TUNE_RECIP_SQRT from Cortex-A57 tuning

2016-02-15 Thread Evandro Menezes

On 02/15/16 04:50, James Greenhalgh wrote:

On Mon, Feb 08, 2016 at 10:57:10AM +, James Greenhalgh wrote:

On Mon, Feb 01, 2016 at 02:00:01PM +, James Greenhalgh wrote:

On Mon, Jan 25, 2016 at 11:20:46AM +, James Greenhalgh wrote:

On Mon, Jan 11, 2016 at 12:04:43PM +, James Greenhalgh wrote:

Hi,

I've seen a couple of large performance issues caused by expanding
the high-precision reciprocal square root for Cortex-A57, so I'd like
to turn it off by default.

This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from
Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) for
some private microbenchmark kernels which stress the divide/sqrt/multiply
units. It therefore seems to me to be the correct choice to make across
a number of workloads.

Bootstrapped and tested on aarch64-none-linux-gnu with no issues.

OK?

*Ping*

*pingx2*

*ping^3*

*ping^4*

Thanks,
James


---
2015-12-11  James Greenhalgh  

* config/aarch64/aarch64.c (cortexa57_tunings): Remove
AARCH64_EXTRA_TUNE_RECIP_SQRT.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1d5d898..999c9fc 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings =
0,  /* max_case_values.  */
0,  /* cache_line_size.  */
tune_params::AUTOPREFETCHER_WEAK,   /* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS
-   | AARCH64_EXTRA_TUNE_RECIP_SQRT)/* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags.  */
  };
  
  static const struct tune_params cortexa72_tunings =




James,

There seem to be SPEC CPU2000fp validation issues on A57 when this flag 
is present too.  Though I evaluated the algorithm with a huge random set 
of values, always delivering accuracy around 1ulp, which should be 
enough for CPU2000fp (wit x86-64), I expected the benchmarks to pass.


My suspicion is that the Newton series on AArch64 is probably good only 
for SP.  Then, DP might require an extra round, probably exacerbating 
the performance penalty.


I'd like to try to split this tuning option into one for SP and another 
for DP.  Thoughts?


Thank you,

--
Evandro Menezes



Re: Use plain -fopenacc to enable OpenACC kernels processing

2016-02-15 Thread Tom de Vries

On 10/02/16 15:40, Thomas Schwinge wrote:

Hi!

Will this patch be acceptable for GCC trunk in the current development
stage?  In its current incarnation, this patch depends on my
'Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid
offloading"' patch,
,
which Bernd suggested "has to be considered after gcc-6".  So, I'll have
to re-work this patch here, hence I'm first checking if it generally
meets approval?

On Fri, 5 Feb 2016 13:06:17 +0100, I wrote:

On Mon, 9 Nov 2015 18:39:19 +0100, Tom de Vries  wrote:

On 09/11/15 16:35, Tom de Vries wrote:

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.



Atm, the parallelization behaviour for the kernels region is controlled
by flag_tree_parallelize_loops, which is also used to control generic
auto-parallelization by autopar using omp. That is not ideal, and we may
want a separate flag (or param) to control the behaviour for oacc
kernels, f.i. -foacc-kernels-gang-parallelize=. I'm open to suggestions.


I suggest to use plain -fopenacc to enable OpenACC kernels processing
(which just makes sense, I hope) ;-) and have later processing stages
determine the actual parametrization (currently: number of gangs) (that
is, Nathan's recent "Default compute dimensions" patches).



Hi Thomas,

That makes a lot of sense.  Thanks for working on this.


The code changes are simple enough; OK for trunk?  (This patch depends on
my 'Un-parallelized OpenACC kernels constructs with nvptx offloading:
"avoid offloading"' pending review,
.)

Originally, I want to use:

 OMP_CLAUSE_NUM_GANGS_EXPR (clause) = build_int_cst (integer_type_node, 
n_threads == 0 ? -1 : n_threads);

... to store -1 "have the compiler decidew" (instead of now 0 "have the
run-time decide", which might prevent some code optimizations, as I
understand it) for the n_threads == 0 case, but it seems that for an
offloaded OpenACC kernels region, gcc/omp-low.c:oacc_validate_dims is
called with the parameter "used" set to 0 instead of "gang", and then the
"Default anything left to 1 or a partitioned default" logic will default
dims["gang"] to oacc_min_dims["gang"] (that is, 1) instead of the
oacc_default_dims["gang"] (that is, 32).  Nathan, does that smell like a
bug (and could you look into that)?

diff --git gcc/tree-parloops.c gcc/tree-parloops.c
index 139e38c..e498e5b 100644
--- gcc/tree-parloops.c
+++ gcc/tree-parloops.c
@@ -2016,7 +2016,8 @@ transform_to_exit_first_loop (struct loop *loop,
  /* Create the parallel constructs for LOOP as described in gen_parallel_loop.
 LOOP_FN and DATA are the arguments of GIMPLE_OMP_PARALLEL.
 NEW_DATA is the variable that should be initialized from the argument
-   of LOOP_FN.  N_THREADS is the requested number of threads.  */
+   of LOOP_FN.  N_THREADS is the requested number of threads, which can be 0 if
+   that number is to be determined later.  */

  static void
  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
@@ -2049,6 +2050,7 @@ create_parallel_loop (struct loop *loop, tree loop_fn, 
tree data,
basic_block paral_bb = single_pred (bb);
gsi = gsi_last_bb (paral_bb);

+  gcc_checking_assert (n_threads != 0);
t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
OMP_CLAUSE_NUM_THREADS_EXPR (t)
= build_int_cst (integer_type_node, n_threads);
@@ -2221,7 +2223,8 @@ create_parallel_loop (struct loop *loop, tree loop_fn, 
tree data,
  }

  /* Generates code to execute the iterations of LOOP in N_THREADS
-   threads in parallel.
+   threads in parallel, which can be 0 if that number is to be determined
+   later.

 NITER describes number of iterations of LOOP.
 REDUCTION_LIST describes the reductions existent in the LOOP.  */
@@ -2318,6 +2321,7 @@ gen_parallel_loop (struct loop *loop,
else
m_p_thread=MIN_PER_THREAD;

+  gcc_checking_assert (n_threads != 0);
many_iterations_cond =
fold_build2 (GE_EXPR, boolean_type_node,
 nit, build_int_cst (type, m_p_thread * n_threads));
@@ -3177,7 +3181,7 @@ oacc_entry_exit_ok (struct loop *loop,
  static bool
  parallelize_loops (bool oacc_kernels_p)
  {
-  unsigned n_threads = flag_tree_parallelize_loops;
+  unsigned n_threads;
bool changed = false;
struct loop *loop;
struct loop *skip_loop = NULL;
@@ -3199,6 +3203,13 @@ parallelize_loops (bool oacc_kernels_p)
if (cfun->has_nonlocal_label)
  return false;

+  /* For OpenACC kernels, n_threads will be determined later; otherwise, it's
+ the argument to -ftree-parallelize-loops.  */
+  if (oacc_kernels_p)
+n_threads = 0;
+  else
+n_threads = flag_tree_parallelize_loops;
+

Re: [RFC] [PATCH] Add __array_size keyword

2016-02-15 Thread Stuart Brady
On Mon, Feb 15, 2016 at 03:05:36PM +0100, Marek Polacek wrote:
> On Sat, Feb 13, 2016 at 03:16:49AM +, Stuart Brady wrote:
> > I will look into submitting a PR for this properly soon, but will not
> > mind if someone wants to take this task upon themselves instead,
> > especially as we are into the release phase for GCC 6 and this may be
> > an issue worth fixing.  (Note that Debian's GCC 5.3.1 is also affected.)
> 
> Sure, I've just filed this:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69824

Thank you!

> > For a hypothetical change to the C standard itself, I think one might use
> > the name "_ArraySize", but for a non-standard extension this would not be
> > appropriate.  I think "__array_size" is fine, though, and is consistent
> > with "__alignof".
> 
> Please no CamelCase, that's ugly ;).  I also think that "__array_size" is 
> fine.

Okay.  I was wondering about __arraysize, __array_len, or some such
thing but I suppose one just needs to pick something.  ARRAY_SIZE seems
common as macro name that I find __array_size to be immediately obvious.
Certainly, the name is something that doesn't need to be decided just now
although it's better to consider alternatives sooner rather than later.

Hypothetically, I suppose _Array_Size, _Array_size or _Arraysize would
all be better than the CamelCase verison.


> (I haven't had time so far to read the patch.)

Jospeh's point that documentation and testcases are of much more value is
good one, and reveals a problem with my patch.  It does not work in the
following case:

   vla-array-size.c:19:55: error: invalid use of flexible array member
int foo6(int a, int b[*][*], int c[static __array_size(*b)]);
  ^

sizeof() *does* work in this case -- my test is based on gcc.dg/vla-5.c
which uses sizeof() instead.  __array_size(*b) works fine the function
body but not in the prototype, and also works fine with C99 variable
length arrays.

This message has been borrowed from c_incomplete_type_error(), along with
the test for TYPE_DOMAIN (type) && TYPE_MAX_VALUE (TYPE_DOMAIN (type)).

Unfortunately, I'm at a loss to say whether TYPE_MAX_VALUE should be
set in this instance, or whether I should be going about things
differently.  It seems interesting that TYPE_SIZE_UNIT (type) is set
but TYPE_MAX_VALUE (TYPE_DOMAIN (type)) is not.  Can anyone help?

Also, it's not obvious to me whether I need to consider the case of
TYPE_MIN_VALUE (TYPE_DOMAIN (type)) being nonzero.
-- 
Many thanks,
Stuart Brady


Re: [PATCH] Fix PR69771, bogus CONST_INT during shift expansion

2016-02-15 Thread Jakub Jelinek
On Sat, Feb 13, 2016 at 07:50:25AM +, James Greenhalgh wrote:
> On Fri, Feb 12, 2016 at 05:34:21PM +0100, Jakub Jelinek wrote:
> > On Fri, Feb 12, 2016 at 03:20:07PM +0100, Bernd Schmidt wrote:
> > > >>-  mode1 = GET_MODE (xop1) != VOIDmode ? GET_MODE (xop1) : mode;
> > > >>+  mode1 = GET_MODE (xop1) != VOIDmode ? GET_MODE (xop1) : mode1;
> > > >>if (xmode1 != VOIDmode && xmode1 != mode1)
> > > >>  {
> > > 
> > > Here, things aren't so clear, and the fact that the mode1 calculation now
> > > differs from the mode0 one may be overlooked by someone in the future.
> > > 
> > > Rather than use codes like "mode variable is VOIDmode", I'd prefer to use
> > > booleans with descriptive names, like "op1_may_need_conversion".
> > 
> > So do you prefer e.g. following?  Bootstrapped/regtested on x86_64-linux and
> > i686-linux.
> > 
> > 2016-02-12  Jakub Jelinek  
> > 
> > PR rtl-optimization/69764
> > PR rtl-optimization/69771
> > * optabs.c (expand_binop_directly): For shift_optab_p, force
> > convert_modes with VOIDmode if xop1 has VOIDmode.
> > 
> > * c-c++-common/pr69764.c: New test.
> > * gcc.dg/torture/pr69771.c: New testcase.
> > 
> 
> These two new tests are failing for me on AArch64 as so:

As I said earlier, I wanted to fix it in expand_binop_directly because the
higher levels still GEN_INT the various shift counters and then call
expand_binop_directly.  But, as can be seen on aarch64/arm/m68k, there are
cases that need op1 to be valid for mode already in expand_binop, so in
addition to the already committed fix I think we need to handle it
at the expand_binop level too.
As we don't have a single spot with convert_modes like
expand_binop_directly, I think the best is to do there a change
for the uncommon and invalid cases only, like (seems to fix the ICE
both on aarch64 and m68k):

2016-02-15  Jakub Jelinek  

PR rtl-optimization/69764
PR rtl-optimization/69771
* optabs.c (expand_binop): Ensure for shift optabs invalid CONST_INT
op1 is valid for mode.

--- gcc/optabs.c.jj 2016-02-12 17:49:25.0 +0100
+++ gcc/optabs.c2016-02-15 16:15:53.983673792 +0100
@@ -1125,6 +1125,12 @@ expand_binop (machine_mode mode, optab b
   op1 = negate_rtx (mode, op1);
   binoptab = add_optab;
 }
+  /* For shifts, constant invalid op1 might be expanded from different
+ mode than MODE.  */
+  else if (CONST_INT_P (op1)
+  && shift_optab_p (binoptab)
+  && UINTVAL (op1) >= GET_MODE_BITSIZE (GET_MODE_INNER (mode)))
+op1 = gen_int_mode (INTVAL (op1), mode);
 
   /* Record where to delete back to if we backtrack.  */
   last = get_last_insn ();


Jakub


Re: [PATCH] Fix PR c++/66786 (ICE with nested lambdas in variable template)

2016-02-15 Thread Patrick Palka
On Mon, Feb 8, 2016 at 12:19 AM, Patrick Palka  wrote:
> Here, we are calling template_class_depth on a FIELD_DECL corresponding
> to a lambda that is used inside variable template.  template_class_depth
> however does not see that this FIELD_DECL is used inside a variable
> template binding because its chain of DECL_CONTEXTs does not include the
> corresponding VAR_DECL.  So template_class_depth returns the wrong
> template nesting level which causes its callers to malfunction.  In
> particular we strip a template argument level in
> tsubst_copy [FIELD_DECL] when we shouldn't have.
>
> This patch makes template_class_depth look at a lambda type's
> LAMBDA_TYPE_EXTRA_SCOPE field instead of its TYPE_CONTEXT, so that it
> can iterate into an enclosing variable template, if applicable.
>
> Tested on x86_64-pc-linux gnu, no new regressions.  Also tested against
> Boost.  Is this OK to commit?
>
> gcc/cp/ChangeLog:
>
> PR c++/66786
> * pt.c (template_class_depth): Given a lambda type, iterate
> into its LAMBDA_TYPE_EXTRA_SCOPE field instead of its
> TYPE_CONTEXT.  Given a VAR_DECL, iterate into its
> CP_DECL_CONTEXT.
>
> gcc/testsuite/ChangeLog:
>
> PR c++/66786
> * g++.dg/cpp1y/var-templ48.C: New test.
> * g++.dg/cpp1y/var-templ49.C: New test.
> ---
>  gcc/cp/pt.c  | 12 
>  gcc/testsuite/g++.dg/cpp1y/var-templ48.C |  5 +
>  gcc/testsuite/g++.dg/cpp1y/var-templ49.C |  9 +
>  3 files changed, 22 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ48.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ49.C
>
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index 4d405cf..5c344c1 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -369,16 +369,20 @@ template_class_depth (tree type)
>  {
>int depth;
>
> -  for (depth = 0;
> -   type && TREE_CODE (type) != NAMESPACE_DECL;
> -   type = (TREE_CODE (type) == FUNCTION_DECL)
> -? CP_DECL_CONTEXT (type) : CP_TYPE_CONTEXT (type))
> +  for (depth = 0; type && TREE_CODE (type) != NAMESPACE_DECL; )
>  {
>tree tinfo = get_template_info (type);
>
>if (tinfo && PRIMARY_TEMPLATE_P (TI_TEMPLATE (tinfo))
>   && uses_template_parms (INNERMOST_TEMPLATE_ARGS (TI_ARGS (tinfo
> ++depth;
> +
> +  if (VAR_OR_FUNCTION_DECL_P (type))
> +   type = CP_DECL_CONTEXT (type);
> +  else if (LAMBDA_TYPE_P (type))
> +   type = LAMBDA_TYPE_EXTRA_SCOPE (type);
> +  else
> +   type = CP_TYPE_CONTEXT (type);
>  }
>
>return depth;
> diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ48.C 
> b/gcc/testsuite/g++.dg/cpp1y/var-templ48.C
> new file mode 100644
> index 000..f0c7693
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ48.C
> @@ -0,0 +1,5 @@
> +// PR c++/66786
> +// { dg-do compile { target c++14 } }
> +
> +template  auto list = [](T... xs) { [=](auto f) { f(xs...); 
> }; };
> +int main() { list(0); }
> diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ49.C 
> b/gcc/testsuite/g++.dg/cpp1y/var-templ49.C
> new file mode 100644
> index 000..cd3f230
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ49.C
> @@ -0,0 +1,9 @@
> +// PR c++/66786
> +// { dg-do compile { target c++14 } }
> +
> +int f (int, bool);
> +
> +template 
> +auto list = [](auto... xs) { return [=](auto f, auto... ys) { return 
> f(xs..., ys...); }; };
> +
> +const int  = list(0, true)(f);
> --
> 2.7.1.257.g925a48d
>

Ping.


Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-15 Thread David Edelsohn
On Mon, Feb 15, 2016 at 4:36 AM, Alan Modra  wrote:
> On Fri, Feb 12, 2016 at 02:57:22PM +0100, Ulrich Weigand wrote:
>> > On Fri, Feb 12, 2016 at 08:54:19AM +1030, Alan Modra wrote:
>> > > Another concern I had about this, besides using %L in asm output (what
>> > > forces TFmode to use just fprs?), is what happens when we're using
>> > > IEEE 128-bit floats?  In that case it looks like we'd get just one reg.
>> >
>> > Good point that it breaks if the default long double (TFmode) type is IEEE
>> > 128-bit floating point.  We would need to have two patterns, one that uses
>> > TFmode and one that uses IFmode.  I wrote the power8 direct move stuff 
>> > before
>> > going down the road of IEEE 128-bit floating point.
>>
>> Right.  It's a bit unfortunate that we can't just use IFmode unconditionally,
>> but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and
>> then we probably shouldn't be using it.
>
> Actually, we can use IFmode unconditionally.  scalar_mode_supported_p
> is relevant only up to and including expand.  Nothing prevents the
> backend from using IFmode.
>
>> Another option might be to use TDmode to allocate a scratch register pair.
>
> That won't work, at least if we want to extract the two component regs
> with simplify_gen_subreg, due to rs6000_cannot_change_mode_class.  In
> my original patch I just extracted the regs by using gen_rtx_REG but I
> changed that, based on your criticism of using gen_rtx_REG in
> reload_vsx_from_gprsf, and because rs6000.md avoids gen_rtx_REG using
> operand regnos in other places.  That particular change is of course
> entirely cosmetic.  I also changed reload_vsx_from_gprsf to avoid mode
> punning regs, instead duplicating insn patterns as done elsewhere in
> the vsx support.  I don't believe we will see subregs of vsx or fp
> regs after reload, but I'm quite willing to concede the point for a
> stage4 fix.
>
> Here's the revised patch.  To recap, the main bug fixes here are:
> - stop reload_vsx_from_gprsf splitter from emitting a move not
> handled by movdi_internal64
> - don't use TFmode, which cannot now be assumed to be IBM
> double-double.
> Secondary to that, not using or passing around TFmode means the %L
> restriction no longer matters, and constraints on the reload temp reg
> can be relaxed.
>
> Bootstrapped and regression tested powerpc64-linux biarch and
> powerpc64le-linux.  OK David?
>
> PR target/68973
> * config/rs6000/rs6000.md (reload_vsx_from_gprsf): Use p8_mtvsrd_sf
> rather than attempting to use movdi_internal64.  Remove op0_di.
> (p8_mtvsrd_df, p8_mtvsrd_sf): New.
> (p8_mtvsrd_1, p8_mtvsrd_2): Delete.
> (p8_mtvsrwz): New.
> (p8_mtvsrwz_1, p8_mtvsrwz_2): Delete.
> (p8_xxpermdi_): Take two DF inputs rather than one TF.
> (p8_fmrgow_): Likewise.
> (reload_vsx_from_gpr): Make clobber IF.  Adjust for above
> changes.
> (reload_fpr_from_gpr): Similarly. Use "d" for op0 constraint.
> * config/rs6000/vsx.md (vsx_xscvspdpn_directmove): Make op1 SFmode.
>

Okay.

Is there still an issue with the constraints used for movdi_internal64?

Thanks, David


Re: [RFC] [PATCH] Add __array_size keyword

2016-02-15 Thread Marek Polacek
On Sat, Feb 13, 2016 at 03:16:49AM +, Stuart Brady wrote:
> As a brief aside, I do get an ICE with the following source, without any
> modifications of my own:
> 
>int bar() { return foo(); }
>void baz(int c[foo()]) { return; }
> 
> I will look into submitting a PR for this properly soon, but will not
> mind if someone wants to take this task upon themselves instead,
> especially as we are into the release phase for GCC 6 and this may be
> an issue worth fixing.  (Note that Debian's GCC 5.3.1 is also affected.)

Sure, I've just filed this:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69824

> For a hypothetical change to the C standard itself, I think one might use
> the name "_ArraySize", but for a non-standard extension this would not be
> appropriate.  I think "__array_size" is fine, though, and is consistent
> with "__alignof".

Please no CamelCase, that's ugly ;).  I also think that "__array_size" is fine.
(I haven't had time so far to read the patch.)

Marek


Re: add check for aarch64 in check_effective_target_section_anchors()

2016-02-15 Thread James Greenhalgh
On Thu, Feb 11, 2016 at 11:03:23PM +0530, Prathamesh Kulkarni wrote:
> Hi,
> aarch64 supports section anchors but it appears
> check_effective_target_section_anchors() doesn't contain entry for it.
> This patch adds for entry for aarch64.
> OK for trunk ?

OK. I presume you tested this, and the testcases this enables PASS without
issue?

Thanks,
James

> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 645981a..66fb1ea 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -5467,7 +5467,8 @@ proc check_effective_target_section_anchors { } {
>  } else {
>  set et_section_anchors_saved 0
>  if { [istarget powerpc*-*-*]
> -   || [istarget arm*-*-*] } {
> +   || [istarget arm*-*-*] 
> +   || [istarget aarch64*-*-*] } {
> set et_section_anchors_saved 1
>  }
>  }




Re: [PATCH] Fix PR69595, bogus -Warray-bound warning

2016-02-15 Thread Richard Biener
On Mon, 15 Feb 2016, Richard Biener wrote:

> On Sun, 14 Feb 2016, Marc Glisse wrote:
> 
> > On Tue, 2 Feb 2016, Richard Biener wrote:
> > 
> > > *** gcc/match.pd  (revision 233067)
> > > --- gcc/match.pd  (working copy)
> > > *** DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > *** 2094,2099 
> > > --- 2094,2117 
> > >   (bit_and:c (ordered @0 @0) (ordered:c@2 @0 @1))
> > >   @2)
> > > 
> > > + /* Simple range test simplifications.  */
> > > + /* A < B || A >= B -> true.  */
> > > + (for test1 (lt le ne)
> > > +  test2 (ge gt eq)
> > > +  (simplify
> > > +   (bit_ior:c (test1 @0 @1) (test2 @0 @1))
> > > +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > +|| VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0)))
> > > +{ constant_boolean_node (true, type); })))
> > > + /* A < B && A >= B -> false.  */
> > > + (for test1 (lt lt lt le ne eq)
> > > +  test2 (ge gt eq gt eq gt)
> > 
> > The lack of symmetry between the || and && cases is surprising. Is there any
> > reason not to handle the pairs le/ge, le/ne and ge/ne for bit_ior?
> 
> Whoops, no.  I simply forgot those.  I'll bootstrap/test

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

> 2016-02-15  Richard Biener  
> 
>   PR tree-optimization/69595
>   * match.pd: Complete range test simplification to true.
> 
> Index: gcc/match.pd
> ===
> --- gcc/match.pd  (revision 233369)
> +++ gcc/match.pd  (working copy)
> @@ -2119,8 +2119,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  
>  /* Simple range test simplifications.  */
>  /* A < B || A >= B -> true.  */
> -(for test1 (lt le ne)
> - test2 (ge gt eq)
> +(for test1 (lt le le le ne ge)
> + test2 (ge gt ge ne eq ne)
>   (simplify
>(bit_ior:c (test1 @0 @1) (test2 @0 @1))
>(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-15 Thread Ulrich Weigand
Alan Modra wrote:
> On Fri, Feb 12, 2016 at 02:57:22PM +0100, Ulrich Weigand wrote:
> > Right.  It's a bit unfortunate that we can't just use IFmode 
> > unconditionally,
> > but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and
> > then we probably shouldn't be using it.
> 
> Actually, we can use IFmode unconditionally.  scalar_mode_supported_p
> is relevant only up to and including expand.  Nothing prevents the
> backend from using IFmode.

Hmm, OK.  That does make things simpler.

> > Another option might be to use TDmode to allocate a scratch register pair.
> 
> That won't work, at least if we want to extract the two component regs
> with simplify_gen_subreg, due to rs6000_cannot_change_mode_class.  In
> my original patch I just extracted the regs by using gen_rtx_REG but I
> changed that, based on your criticism of using gen_rtx_REG in
> reload_vsx_from_gprsf, and because rs6000.md avoids gen_rtx_REG using
> operand regnos in other places.  That particular change is of course
> entirely cosmetic.  I also changed reload_vsx_from_gprsf to avoid mode
> punning regs, instead duplicating insn patterns as done elsewhere in
> the vsx support.  I don't believe we will see subregs of vsx or fp
> regs after reload, but I'm quite willing to concede the point for a
> stage4 fix.

I was thinking here that in the special case of the *reload scratch
register*, which reload allocates for us, we will always get a full
register.  This is different from some other operand that may originate
from pre-existing RTX that may require a SUBREG even after reload.

But I certainly agree that your current patch looks like a good choice
for a stage4 bugfix change.  Further cleanup can always happen later.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH] Fix PR64748

2016-02-15 Thread Jakub Jelinek
On Tue, Feb 02, 2016 at 08:51:23AM -0600, James Norris wrote:
> --- a/gcc/cp/semantics.c
> +++ b/gcc/cp/semantics.c
> @@ -6683,6 +6683,14 @@ finish_omp_clauses (tree clauses, bool allow_fields, 
> bool declare_simd)
> error ("%qD appears both in data and map clauses", t);
> remove = true;
>   }
> +   else if (!processing_template_decl
> +&& OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> +&& OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR
> +&& !POINTER_TYPE_P (TREE_TYPE (t)))
> + {
> +   error ("%qD is not a pointer variable", t);
> +   remove = true;
> + }

Please move this a few lines up, before the first duplicate check, thus
above
  else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
   && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER)
Also, testing it only for !processing_template_decl is undesirable, then you
can't diagnose obvious issues in non-instantiated templates.  Better use:

else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR
 && !type_dependent_expression_p (t)
 && !POINTER_TYPE_P (TREE_TYPE (t)))

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/goacc/deviceptr-1.C
> @@ -0,0 +1,28 @@
> +// { dg-do compile }
> +
> +template 
> +
> +void
> +func1 (P p)
> +{
> +

Please avoid the unnecessary empty lines above (both of them).

> +#pragma acc data deviceptr (p)   // { dg-error "is not a pointer" }
> +{ }
> +

And here too.  Perhaps use "  ;" instead of "{ }"?  And, more importantly,
by using a single template and instantiating it with both arguments, you are
not testing that you are not diagnosing it for the pointer case.

> +}
> +
> +void
> +func2 (void)
> +{
> +  int *p;
> +
> +  func1 (p);
> +}
> +
> +void
> +func3 (void)
> +{
> +  int p;
> +
> +  func1 (p);
> +}

Also, I don't like the uses of uninitialized vars.
So better

template 
void
func1 (P p)
{
#pragma acc data deviceptr (p)  // { dg-bogus "is not a pointer" }
  ;
}

void
func2 (int *p)
{
  func1 (p);
}

template 
void
func3 (P p)
{
#pragma acc data deviceptr (p)  // { dg-error "is not a pointer" }
  ;
}

void
func4 (int p)
{
  func3 (p);
}

template 
void
func5 (int *p, int q)
{
#pragma acc data deviceptr (p)  // { dg-bogus "is not a pointer" }
  ;
#pragma acc data deviceptr (q)  // { dg-error "is not a pointer" }
  ;
}

func5 added so to test that you diagnose even uninstantiated templates
if the vars/parameters are not type dependent.

Ok for trunk with those changes.

Jakub


Re: [PATCH] Fix PR64748

2016-02-15 Thread James Norris

Hi,


Ping!

Thanks,
Jim


On 02/02/2016 08:51 AM, James Norris wrote:

Hi!

On 02/01/2016 02:03 PM, Jakub Jelinek wrote:

On Mon, Feb 01, 2016 at 01:41:50PM -0600, James Norris wrote:

The attached patch resolves c/PR64748. The patch
adds the use of parm's with the deviceptr clause.


 [snip snip]

--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -10760,7 +10760,7 @@ c_parser_oacc_data_clause_deviceptr (c_parser
*parser, tree list)
   c_parser_omp_var_list_parens() should construct a list of
   locations to go along with the var list.  */

-  if (!VAR_P (v))
+  if (!VAR_P (v) && !(TREE_CODE (v) == PARM_DECL))


Please don't write !(x == y) but x != y.


Fixed.




--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -30087,7 +30087,7 @@ cp_parser_oacc_data_clause_deviceptr (cp_parser
*parser, tree list)
   c_parser_omp_var_list_parens should construct a list of
   locations to go along with the var list.  */

-  if (!VAR_P (v))
+  if (!VAR_P (v) && !(TREE_CODE (v) == PARM_DECL))
  error_at (loc, "%qD is not a variable", v);
else if (TREE_TYPE (v) == error_mark_node)
  ;


For C++, all this diagnostics is premature, if processing_template_decl
you really often don't know what the type will be, not sure if you always
know at least if it is a VAR_DECL, PARM_DECL or something else.  I bet you
can easily ICE with the current POINTER_TYPE_P (TREE_TYPE (v)) check as
in templates the type can be NULL, or it could be some lang type and only
later on become POINTER_TYPE, etc.
For C++ the diagnostics need to be done during finish_omp_clauses or so, not
earlier.


The check has been moved to finish_omp_clause (). I put the check at
the tail end of the checking, as I wasn't able to determine if there
was a checking precedence done by the if-else-if sequence.

Thanks for the review!

Jim


= ChangeLog entries...

 gcc/testsuite/

 PR c/64748
 * c-c++-common/goacc/deviceptr-1.c: Add tests.
 * g++.dg/goacc/deviceptr-1.c: New file.


 gcc/cp/

 PR c/64748
 * parser.c (cp_parser_oacc_data_clause_deviceptr): Remove checking.
 * semantics.c (finish_omp_clauses): Add deviceptr checking.


 gcc/c/

 PR c/64748
 * c-parser.c (c_parser_oacc_data_clause_deviceptr): Allow parms.







Re: lra-remat issues (PR68730)

2016-02-15 Thread Bernd Schmidt

On 02/04/2016 09:27 PM, Vladimir Makarov wrote:

After a few false starts, I came up with the patch below, which keeps
track of not just the candidate insn, but also an activation insn, and
chooses candidates only if they are both available and active. Besides
passing a new arg to create_cand, the changes in create_cands are
mostly cosmetic to make the function less confusing. This was
bootstrapped and tested on x86_64-linux. Ok?


The patch looks ok for me.  Thanks for working on the PR, Bernd.


I should get in the habit of asking "ok everywhere?" Can I put this on 
gcc-5-branch as well?



Bernd



Re: [RFC, PR68580] Handle pthread_create error in tsan testsuite

2016-02-15 Thread Dmitry Vyukov
On Mon, Feb 15, 2016 at 1:44 PM, Bernd Edlinger
 wrote:
> On 15/02/16 13:05, Dmitry Vyukov wrote:
>> On Mon, Feb 15, 2016 at 12:29 PM, Bernd Edlinger
>>  wrote:
>>>
>>> No problem.  PR65400 was a GCC wrong code bug, so it makes no
>>> sense to have the same test in llvm's tree, thus we are free to fix it on
>>> our own, as we like.
>>>
>>> Here is a patch that puts each value on it's own 8-byte aligned memory
>>> location.  From my experience with tsan tests, sharing shadow memory
>>> slots between v and q or o is the most likely explanation for the occasional
>>> inability to spot the race condition on v, thus the test case fails, because
>>> the return code is 0, and the expected output is not found.
>>>
>>>
>>> Boot-strapped/regression tested on x86_64-linux-gnu.
>>>
>>> OK for trunk?
>>
>>
>> I don't know whether it will fire or not, but 4-byte variables that
>> are 8-byte aligned can still be collocated with something else. Making
>> vars 8-byte should be safer.
>
> Yes, but as PR65400 is a wrong code bug, I would not like to change more
> than absolutely necessary to the test case, in order not to loose the ability
> to check the original regression.
>
> The test case does not have more than the 3 global values that you see.
> nm a.out | sort shows that these are now separate:
>
> 00601400 b barrier
> 00601420 b barrier_wait
> 00601428 B v
> 00601430 B o
> 00601438 B q
> 00601440 B _end


Looks good to me then.


AW: [RFC, PR68580] Handle pthread_create error in tsan testsuite

2016-02-15 Thread Bernd Edlinger
On 15/02/16 13:05, Dmitry Vyukov wrote:
> On Mon, Feb 15, 2016 at 12:29 PM, Bernd Edlinger
>  wrote:
>>
>> No problem.  PR65400 was a GCC wrong code bug, so it makes no
>> sense to have the same test in llvm's tree, thus we are free to fix it on
>> our own, as we like.
>>
>> Here is a patch that puts each value on it's own 8-byte aligned memory
>> location.  From my experience with tsan tests, sharing shadow memory
>> slots between v and q or o is the most likely explanation for the occasional
>> inability to spot the race condition on v, thus the test case fails, because
>> the return code is 0, and the expected output is not found.
>>
>>
>> Boot-strapped/regression tested on x86_64-linux-gnu.
>>
>> OK for trunk?
>
>
> I don't know whether it will fire or not, but 4-byte variables that
> are 8-byte aligned can still be collocated with something else. Making
> vars 8-byte should be safer.

Yes, but as PR65400 is a wrong code bug, I would not like to change more
than absolutely necessary to the test case, in order not to loose the ability
to check the original regression.

The test case does not have more than the 3 global values that you see.
nm a.out | sort shows that these are now separate:

00601400 b barrier
00601420 b barrier_wait
00601428 B v
00601430 B o
00601438 B q
00601440 B _end



Regards,
Bernd.

Re: Fix PR69752, insn with REG_INC being removed as equiv_init insn

2016-02-15 Thread Bernd Schmidt

On 02/12/2016 08:43 AM, Jeff Law wrote:

On 02/11/2016 06:28 PM, Bernd Schmidt wrote:



PR rtl-optimization/69752
* ira.c (update_equiv_regs): When looking for more than a single SET,
also take other side effects into account.


OK for the trunk.


Branches too? The problem obviously exists everywhere.


Bernd



Re: [RS6000] reload_vsx_from_gprsf splitter

2016-02-15 Thread Alan Modra
On Fri, Feb 12, 2016 at 02:57:22PM +0100, Ulrich Weigand wrote:
> > On Fri, Feb 12, 2016 at 08:54:19AM +1030, Alan Modra wrote:
> > > Another concern I had about this, besides using %L in asm output (what
> > > forces TFmode to use just fprs?), is what happens when we're using
> > > IEEE 128-bit floats?  In that case it looks like we'd get just one reg.
> > 
> > Good point that it breaks if the default long double (TFmode) type is IEEE
> > 128-bit floating point.  We would need to have two patterns, one that uses
> > TFmode and one that uses IFmode.  I wrote the power8 direct move stuff 
> > before
> > going down the road of IEEE 128-bit floating point.
> 
> Right.  It's a bit unfortunate that we can't just use IFmode unconditionally,
> but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and
> then we probably shouldn't be using it.

Actually, we can use IFmode unconditionally.  scalar_mode_supported_p
is relevant only up to and including expand.  Nothing prevents the
backend from using IFmode.

> Another option might be to use TDmode to allocate a scratch register pair.

That won't work, at least if we want to extract the two component regs
with simplify_gen_subreg, due to rs6000_cannot_change_mode_class.  In
my original patch I just extracted the regs by using gen_rtx_REG but I
changed that, based on your criticism of using gen_rtx_REG in
reload_vsx_from_gprsf, and because rs6000.md avoids gen_rtx_REG using
operand regnos in other places.  That particular change is of course
entirely cosmetic.  I also changed reload_vsx_from_gprsf to avoid mode
punning regs, instead duplicating insn patterns as done elsewhere in
the vsx support.  I don't believe we will see subregs of vsx or fp
regs after reload, but I'm quite willing to concede the point for a
stage4 fix.

Here's the revised patch.  To recap, the main bug fixes here are:
- stop reload_vsx_from_gprsf splitter from emitting a move not
handled by movdi_internal64
- don't use TFmode, which cannot now be assumed to be IBM
double-double.
Secondary to that, not using or passing around TFmode means the %L
restriction no longer matters, and constraints on the reload temp reg
can be relaxed.

Bootstrapped and regression tested powerpc64-linux biarch and
powerpc64le-linux.  OK David?

PR target/68973
* config/rs6000/rs6000.md (reload_vsx_from_gprsf): Use p8_mtvsrd_sf
rather than attempting to use movdi_internal64.  Remove op0_di.
(p8_mtvsrd_df, p8_mtvsrd_sf): New.
(p8_mtvsrd_1, p8_mtvsrd_2): Delete.
(p8_mtvsrwz): New.
(p8_mtvsrwz_1, p8_mtvsrwz_2): Delete.
(p8_xxpermdi_): Take two DF inputs rather than one TF.
(p8_fmrgow_): Likewise.
(reload_vsx_from_gpr): Make clobber IF.  Adjust for above
changes.
(reload_fpr_from_gpr): Similarly. Use "d" for op0 constraint.
* config/rs6000/vsx.md (vsx_xscvspdpn_directmove): Make op1 SFmode.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index cdbf873..ec356cb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7488,41 +7488,31 @@
 ;; value, since it is allocated in reload and not all of the flow information
 ;; is setup for it.  We have two patterns to do the two moves between gprs and
 ;; fprs.  There isn't a dependancy between the two, but we could potentially
-;; schedule other instructions between the two instructions.  TFmode is
-;; currently limited to traditional FPR registers.  If/when this is changed, we
-;; will need to revist %L to make sure it works with VSX registers, or add an
-;; %x version of %L.
+;; schedule other instructions between the two instructions.
 
 (define_insn "p8_fmrgow_"
   [(set (match_operand:FMOVE64X 0 "register_operand" "=d")
-   (unspec:FMOVE64X [(match_operand:TF 1 "register_operand" "d")]
+   (unspec:FMOVE64X [
+   (match_operand:DF 1 "register_operand" "d")
+   (match_operand:DF 2 "register_operand" "d")]
 UNSPEC_P8V_FMRGOW))]
   "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "fmrgow %0,%1,%L1"
+  "fmrgow %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-(define_insn "p8_mtvsrwz_1"
-  [(set (match_operand:TF 0 "register_operand" "=d")
-   (unspec:TF [(match_operand:SI 1 "register_operand" "r")]
+(define_insn "p8_mtvsrwz"
+  [(set (match_operand:DF 0 "register_operand" "=d")
+   (unspec:DF [(match_operand:SI 1 "register_operand" "r")]
   UNSPEC_P8V_MTVSRWZ))]
   "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
   "mtvsrwz %x0,%1"
   [(set_attr "type" "mftgpr")])
 
-(define_insn "p8_mtvsrwz_2"
-  [(set (match_operand:TF 0 "register_operand" "+d")
-   (unspec:TF [(match_dup 0)
-   (match_operand:SI 1 "register_operand" "r")]
-  UNSPEC_P8V_MTVSRWZ))]
-  "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "mtvsrwz %L0,%1"
-  [(set_attr "type" "mftgpr")])
-
 (define_insn_and_split "reload_fpr_from_gpr"
-  [(set 

Re: [Patch, regex, libstdc++/69794] Unify special character parsing

2016-02-15 Thread Jonathan Wakely

On 13/02/16 11:13 -0800, Tim Shen wrote:

I did it wrong in r227289 - I ignored the "\n" special case in grep.
Turns out using code to handle special cases is error prone, so I
turned to use data (_M_grep_spec_char and _M_egrep_spec_char).


Those new members change the size of the type, so are an ABI change.

Couldn't they be static members?



Re: [RFC, PR68580] Handle pthread_create error in tsan testsuite

2016-02-15 Thread Dmitry Vyukov
On Mon, Feb 15, 2016 at 12:29 PM, Bernd Edlinger
 wrote:
> On 15/02/16 08:18, Dmitry Vyukov wrote:
>> llvm tsan tests contain test.h file (probably what's called
>> test_barrier.h in gcc), you can put the macro there. test.h should
>> already be included into all tests.
>
> Hmm.. as the person who introduced test_barrer.h (well before llvm had a 
> test.h ;)
> I must say, that although if gcc was first here, we will  probably change 
> that to
> match llvm's implementation for gcc-7.
>
> I would not like to add more differences here without a very good reason.
> I'd say, if Dmitry sees a reason to improve the error handling in test.h, that
> is a good thing, and should go into llvm's tree first.
>
> And independently of that I am looking at using llvm's test.h framework 
> instead
> of gcc's test_barrier.h for gcc-7 soon.
>
> On 15/02/16 11:56, Tom de Vries wrote:
>> On Mon, Feb 15, 2016 at 11:45 AM, Tom de Vries  
>> wrote:
>>>
>>> I've tried to be as clear as possible in the RFC submission that I'm not
>>> certain about the cause of the failure, and that the patch is proposing a
>>> fix that would make that guessed failure cause explicit.
>>>
 Sure pthread_create can fail, as malloc and mmap.
 But if that is the reason for the failure it would happen
 just randomly, everywhere.

 Why do you think that only this test case shows the problem?

>>>
>>> As I explained in the RFC submission, my reasoning there was that the test
>>> is one of the very few test cases that tests the result of pthread_create
>>> and then returns 0, which causes the failure in combination with
>>> dg-shouldfail.
>>>
>>> But thinking about it some more, even if pthread_create would fail, causing
>>> the testcase to fail in execution, allowing the execution test to pass due
>>> to dg-shouldfail, presumably the dg-output test would still fail in that
>>> case, so my reasoning was not sound.
>>>
>>> So I suppose you're right, indeed the pthread_create fail hypothesis is not
>>> the most logical one.
>>>
>>> Still, the patch is an improvement irrespective of the PR that inspired it,
>>> and perhaps a lot more library calls should be checked for errors that just
>>> pthread_create.
>>>
 I think Dmitry's comment may be right on the point.
>>>
>>>
>>> If someone proposes that as a patch for the testcase, great. I'm more that
>>> willing to test that in my setup to be able to claim 'bootstrapped and
>>> reg-tested on x86_64' in the submission.
>>>
>
> No problem.  PR65400 was a GCC wrong code bug, so it makes no
> sense to have the same test in llvm's tree, thus we are free to fix it on
> our own, as we like.
>
> Here is a patch that puts each value on it's own 8-byte aligned memory
> location.  From my experience with tsan tests, sharing shadow memory
> slots between v and q or o is the most likely explanation for the occasional
> inability to spot the race condition on v, thus the test case fails, because
> the return code is 0, and the expected output is not found.
>
>
> Boot-strapped/regression tested on x86_64-linux-gnu.
>
> OK for trunk?


I don't know whether it will fire or not, but 4-byte variables that
are 8-byte aligned can still be collocated with something else. Making
vars 8-byte should be safer.


Re: [PATCH] s390: New mcount call sequence for z900+ CPUs in 31-bit mode.

2016-02-15 Thread Andreas Krebbel
On 01/21/2016 02:03 PM, Marcin Kościelnicki wrote:
> gcc/ChangeLog:
> 
>   * config/s390/s390.c (s390_function_profiler): Add a new sequence
>   for z900+ CPUs in 31-bit mode.

Applied. Thanks!

-Andreas-




Re: [PATCH] s390: New mcount call sequence for z900+ CPUs in 31-bit mode.

2016-02-15 Thread Marcin Kościelnicki

On 21/01/16 14:03, Marcin Kościelnicki wrote:

On TARGET_CPU_ZARCH && !TARGET_64BIT, we can use a similiar lean mcount
call sequence to TARGET_64BIT.  The longer sequences are now used only
on deprecated g5/g6 CPUs.

Tested on s390-ibm-linux-gnu on RHEL 7.2.

gcc/ChangeLog:

* config/s390/s390.c (s390_function_profiler): Add a new sequence
for z900+ CPUs in 31-bit mode.
---
This change was mentioned in the s390 split-stack thread.

  gcc/ChangeLog  | 5 +
  gcc/config/s390/s390.c | 7 +++
  2 files changed, 12 insertions(+)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 0e77409..94b9bd0 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-01-21  Marcin Kościelnicki  
+
+   * config/s390/s390.c (s390_function_profiler): Add a new sequence
+   for z900+ CPUs in 31-bit mode.
+
  2016-01-21  Richard Biener  

* graphite-optimize-isl.c (get_schedule_map): Fix typo.
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 3be64de..eb26f18 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -11974,6 +11974,13 @@ s390_function_profiler (FILE *file, int labelno)
output_asm_insn ("brasl\t%0,%4", op);
output_asm_insn ("lg\t%0,%1", op);
  }
+  else if (TARGET_CPU_ZARCH)
+{
+  output_asm_insn ("st\t%0,%1", op);
+  output_asm_insn ("larl\t%2,%3", op);
+  output_asm_insn ("brasl\t%0,%4", op);
+  output_asm_insn ("l\t%0,%1", op);
+}
else if (!flag_pic)
  {
op[6] = gen_label_rtx ();




Ping?


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2016-02-15 Thread Kyrill Tkachov


On 04/02/16 08:58, Ramana Radhakrishnan wrote:

On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson  wrote:

This is my suggested fix for PR 65932, which is a linux kernel
miscompile with gcc-5.1.

The problem here is caused by a chain of events.  The first is that
the relatively new eipa_sra pass creates fake parameters that behave
slightly differently than normal parameters.  The second is that the
optimizer creates phi nodes that copy local variables to fake
parameters and/or vice versa.  The third is that the ouf-of-ssa pass
assumes that it can emit simple move instructions for these phi nodes.
And the fourth is that the ARM port has a PROMOTE_MODE macro that
forces QImode and HImode to unsigned, but a
TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
short parameters have different in register representations than local
variables, and require a conversion when copying between them, a
conversion that the out-of-ssa pass can't easily emit.

Ultimately, I think this is a problem in the arm backend.  It should
not have a PROMOTE_MODE macro that is changing the sign of char and
short local variables.  I also think that we should merge the
PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
prevent this from happening again.

I see four general problems with the current ARM PROMOTE_MODE definition.
1) Unsigned char is only faster for armv5 and earlier, before the sxtb
instruction was added.  It is a lose for armv6 and later.
2) Unsigned short was only faster for targets that don't support
unaligned accesses.  Support for these targets was removed a while
ago, and this PROMODE_MODE hunk should have been removed at the same
time.  It was accidentally left behind.
3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
converted to a function, the PROMOTE_MODE code was copied without the
UNSIGNEDP changes.  Thus it is only an accident that
TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
changes to resolve the difference are safe.
4) There is a general principle that you should only change signedness
in PROMOTE_MODE if the hardware forces it, as otherwise this results
in extra conversion instructions that make code slower.  The mips64
hardware for instance requires that 32-bit values be sign-extended
regardless of type, and instructions may trap if this is not true.
However, it has a set of 32-bit instructions that operate on these
values, and hence no conversions are required.  There is no similar
case on ARM. Thus the conversions are unnecessary and unwise.  This
can be seen in the testcases where gcc emits both a zero-extend and a
sign-extend inside a loop, as the sign-extend is required for a
compare, and the zero-extend is required by PROMOTE_MODE.

Given Kyrill's testing with the patch and the reasonably detailed
check of the effects of code generation changes - The arm.h hunk is ok
- I do think we should make this explicit in the documentation that
TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
better still maybe put in a checking assert for the same in the
mid-end but that could be the subject of a follow-up patch.

Ok to apply just the arm.h hunk as I think Kyrill has taken care of
the testsuite fallout separately.

Hi all,

I'd like to backport the arm.h from this ( r233130) to the GCC 5
branch. As the CSE patch from my series had some fallout on x86_64
due to a deficiency in the AVX patterns that is too invasive to fix
at this stage (and presumably backport), I'd like to just backport
this arm.h fix and adjust the tests to XFAIL the fallout that comes
with not applying the CSE patch. The attached patch does that.

The code quality fallout on code outside the testsuite is not
that gread. The SPEC benchmarks are not affected by not applying
the CSE change, and only a single sequence in a popular embedded benchmark
shows some degradation for -mtune=cortex-a9 in the same way as the
wmul-1.c and wmul-2.c tests.

I think that's a fair tradeoff for fixing the wrong code bug on that branch.

Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?

Thanks,
Kyrill

2016-02-15  Kyrylo Tkachov  

PR target/65932
* gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
xfail the scan-assembler test.
* gcc.target/arm/wmul-2.c: Likewise.
* gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb.





regards
Ramana





My change was tested with an arm bootstrap, make check, and SPEC
CPU2000 run.  The original poster verified that this gives a linux
kernel that boots correctly.

The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
are tests to verify that smulbb and/or smlabb are generated.
Eliminating the unnecessary sign conversions causes us to get better
code that doesn't include the smulbb and smlabb instructions.  I had
to modify the testcases 

Re: [RFC, PR68580] Handle pthread_create error in tsan testsuite

2016-02-15 Thread Bernd Edlinger
On 15/02/16 08:18, Dmitry Vyukov wrote: 
> llvm tsan tests contain test.h file (probably what's called
> test_barrier.h in gcc), you can put the macro there. test.h should
> already be included into all tests.

Hmm.. as the person who introduced test_barrer.h (well before llvm had a test.h 
;)
I must say, that although if gcc was first here, we will  probably change that 
to
match llvm's implementation for gcc-7.

I would not like to add more differences here without a very good reason.
I'd say, if Dmitry sees a reason to improve the error handling in test.h, that
is a good thing, and should go into llvm's tree first.

And independently of that I am looking at using llvm's test.h framework instead
of gcc's test_barrier.h for gcc-7 soon.

On 15/02/16 11:56, Tom de Vries wrote:
> On Mon, Feb 15, 2016 at 11:45 AM, Tom de Vries  wrote:
>>
>> I've tried to be as clear as possible in the RFC submission that I'm not
>> certain about the cause of the failure, and that the patch is proposing a
>> fix that would make that guessed failure cause explicit.
>>
>>> Sure pthread_create can fail, as malloc and mmap.
>>> But if that is the reason for the failure it would happen
>>> just randomly, everywhere.
>>>
>>> Why do you think that only this test case shows the problem?
>>>
>>
>> As I explained in the RFC submission, my reasoning there was that the test
>> is one of the very few test cases that tests the result of pthread_create
>> and then returns 0, which causes the failure in combination with
>> dg-shouldfail.
>>
>> But thinking about it some more, even if pthread_create would fail, causing
>> the testcase to fail in execution, allowing the execution test to pass due
>> to dg-shouldfail, presumably the dg-output test would still fail in that
>> case, so my reasoning was not sound.
>>
>> So I suppose you're right, indeed the pthread_create fail hypothesis is not
>> the most logical one.
>>
>> Still, the patch is an improvement irrespective of the PR that inspired it,
>> and perhaps a lot more library calls should be checked for errors that just
>> pthread_create.
>>
>>> I think Dmitry's comment may be right on the point.
>>
>>
>> If someone proposes that as a patch for the testcase, great. I'm more that
>> willing to test that in my setup to be able to claim 'bootstrapped and
>> reg-tested on x86_64' in the submission.
>>

No problem.  PR65400 was a GCC wrong code bug, so it makes no
sense to have the same test in llvm's tree, thus we are free to fix it on
our own, as we like.

Here is a patch that puts each value on it's own 8-byte aligned memory
location.  From my experience with tsan tests, sharing shadow memory
slots between v and q or o is the most likely explanation for the occasional
inability to spot the race condition on v, thus the test case fails, because
the return code is 0, and the expected output is not found.


Boot-strapped/regression tested on x86_64-linux-gnu.

OK for trunk?


Thanks
Bernd.2016-02-15  Bernd Edlinger  

	* c-c++-common/tsan/pr65400-1.c (v, q, o): Make 8-byte aligned.

--- gcc/testsuite/c-c++-common/tsan/pr65400-1.c.jj	2015-03-19 08:53:38.0 +0100
+++ gcc/testsuite/c-c++-common/tsan/pr65400-1.c	2016-02-15 11:09:18.852320827 +0100
@@ -7,9 +7,9 @@
 #include "tsan_barrier.h"
 
 static pthread_barrier_t barrier;
-int v;
-int q;
-int o;
+int v __attribute__((aligned(8)));
+int q __attribute__((aligned(8)));
+int o __attribute__((aligned(8)));
 extern void baz4 (int *);
 
 __attribute__((noinline, noclone)) int


Re: [RFC, PR68580] Handle pthread_create error in tsan testsuite

2016-02-15 Thread Dmitry Vyukov
On Mon, Feb 15, 2016 at 11:45 AM, Tom de Vries  wrote:
> On 15/02/16 10:07, Bernd Edlinger wrote:
>>
>> On 15/02/16 09:07, Tom de Vries wrote:

 >>On 15/02/16 08:24, Dmitry Vyukov wrote:
 >>
 >>If we are talking about pr 68580, then I would try:
 >>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68580#c2
 >>first.
>>>
>>> >
>>> >As I tried to explain in the follow-up comment
>>> > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68580#c3  ),
>>> >since unfortunately I have no reliable way of reproducing the failure,
>>> > there's no defined way to 'try' something.
>
>
>> But your proposed patch is also only guessing.
>
>
> I've tried to be as clear as possible in the RFC submission that I'm not
> certain about the cause of the failure, and that the patch is proposing a
> fix that would make that guessed failure cause explicit.
>
>> Sure pthread_create can fail, as malloc and mmap.
>> But if that is the reason for the failure it would happen
>> just randomly, everywhere.
>>
>> Why do you think that only this test case shows the problem?
>>
>
> As I explained in the RFC submission, my reasoning there was that the test
> is one of the very few test cases that tests the result of pthread_create
> and then returns 0, which causes the failure in combination with
> dg-shouldfail.
>
> But thinking about it some more, even if pthread_create would fail, causing
> the testcase to fail in execution, allowing the execution test to pass due
> to dg-shouldfail, presumably the dg-output test would still fail in that
> case, so my reasoning was not sound.
>
> So I suppose you're right, indeed the pthread_create fail hypothesis is not
> the most logical one.
>
> Still, the patch is an improvement irrespective of the PR that inspired it,
> and perhaps a lot more library calls should be checked for errors that just
> pthread_create.
>
>> I think Dmitry's comment may be right on the point.
>
>
> If someone proposes that as a patch for the testcase, great. I'm more that
> willing to test that in my setup to be able to claim 'bootstrapped and
> reg-tested on x86_64' in the submission.
>
> I'm just trying to point out that I cannot 'try' out that patch and come
> back with the conformation that 'the patch fixes the failure', given the
> nature of the failure.

Yes, we can't directly test a fix. But s/int/long long/ is still the
right thing to do. We do it in other tests for similar reasons. We can
submit it and see if flakes remain or go away.


Re: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2016-02-15 Thread James Greenhalgh
On Thu, Jan 21, 2016 at 04:55:40PM -0600, Evandro Menezes wrote:
> 
> Got it.
> 
> Let me try this again:
> 
>Add support for the FCCMP insn types
> 
>2016-01-21  Evandro Menezes  
> 
>gcc/
> * config/aarch64/aarch64.md (fccmp): Change insn type.
> (fccmpe): Likewise.
> * config/aarch64/thunderx.md (thunderx_fcmp): Add
>"fccmp{s,d}" types.
> * config/arm/cortex-a53.md (cortex_a53_fpalu): Likewise.
> * config/arm/cortex-a57.md (cortex_a57_fp_cmp): Likewise.
> * config/arm/xgene1.md (xgene1_fcmp): Likewise.
> * config/arm/exynos-m1.md (exynos_m1_fp_ccmp): New insn
>reservation.
> * config/arm/types.md (fccmps): Add new insn type.
> (fccmpd): Likewise.
> 
> 

This is OK. Sorry to have left it waiting so long.

Thanks,
James


> From 14874dec3257c7b59aed4b7c610305f76bbbcf33 Mon Sep 17 00:00:00 2001
> From: Evandro Menezes 
> Date: Mon, 4 Jan 2016 18:44:30 -0600
> Subject: [PATCH] Add support for the FCCMP insn types
> 
> 2016-01-21  Evandro Menezes  
> 
> gcc/
>   * config/aarch64/aarch64.md (fccmp): Change insn type.
>   (fccmpe): Likewise.
>   * config/aarch64/thunderx.md (thunderx_fcmp): Add "fccmp{s,d}" types.
>   * config/arm/cortex-a53.md (cortex_a53_fpalu): Likewise.
>   * config/arm/cortex-a57.md (cortex_a57_fp_cmp): Likewise.
>   * config/arm/xgene1.md (xgene1_fcmp): Likewise.
>   * config/arm/exynos-m1.md (exynos_m1_fp_ccmp): New insn reservation.
>   * config/arm/types.md (fccmps): Add new insn type.
>   (fccmpd): Likewise.
> ---
>  gcc/config/aarch64/aarch64.md  | 4 ++--
>  gcc/config/aarch64/thunderx.md | 2 +-
>  gcc/config/arm/cortex-a53.md   | 4 ++--
>  gcc/config/arm/cortex-a57.md   | 2 +-
>  gcc/config/arm/exynos-m1.md| 5 +
>  gcc/config/arm/types.md| 3 +++
>  gcc/config/arm/xgene1.md   | 2 +-
>  7 files changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 2f543aa..032b342 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -301,7 +301,7 @@
> (match_operand 5 "immediate_operand")))]
>"TARGET_FLOAT"
>"fccmp\\t%2, %3, %k5, %m4"
> -  [(set_attr "type" "fcmp")]
> +  [(set_attr "type" "fccmp")]
>  )
>  
>  (define_insn "fccmpe"
> @@ -316,7 +316,7 @@
> (match_operand 5 "immediate_operand")))]
>"TARGET_FLOAT"
>"fccmpe\\t%2, %3, %k5, %m4"
> -  [(set_attr "type" "fcmp")]
> +  [(set_attr "type" "fccmp")]
>  )
>  
>  ;; Expansion of signed mod by a power of 2 using CSNEG.
> diff --git a/gcc/config/aarch64/thunderx.md b/gcc/config/aarch64/thunderx.md
> index 922df39..058713a 100644
> --- a/gcc/config/aarch64/thunderx.md
> +++ b/gcc/config/aarch64/thunderx.md
> @@ -156,7 +156,7 @@
>  
>  (define_insn_reservation "thunderx_fcmp" 3
>(and (eq_attr "tune" "thunderx")
> -   (eq_attr "type" "fcmps,fcmpd"))
> +   (eq_attr "type" "fcmps,fcmpd,fccmps,fccmpd"))
>"thunderx_pipe1")
>  
>  (define_insn_reservation "thunderx_fmul" 6
> diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
> index c1eeedb..fc60bc2 100644
> --- a/gcc/config/arm/cortex-a53.md
> +++ b/gcc/config/arm/cortex-a53.md
> @@ -508,8 +508,8 @@
>  (define_insn_reservation "cortex_a53_fpalu" 5
>(and (eq_attr "tune" "cortexa53")
>   (eq_attr "type" "ffariths, fadds, ffarithd, faddd, fmov,
> - f_cvt, fcmps, fcmpd, fcsel, f_rints, f_rintd,
> - f_minmaxs, f_minmaxd"))
> + f_cvt, fcmps, fcmpd, fccmps, fccmpd, fcsel,
> + f_rints, f_rintd, f_minmaxs, f_minmaxd"))
>"cortex_a53_slot_any,cortex_a53_fp_alu")
>  
>  (define_insn_reservation "cortex_a53_fconst" 3
> diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md
> index 0d28951..f4c112c 100644
> --- a/gcc/config/arm/cortex-a57.md
> +++ b/gcc/config/arm/cortex-a57.md
> @@ -716,7 +716,7 @@
>  
>  (define_insn_reservation "cortex_a57_fp_cmp" 7
>(and (eq_attr "tune" "cortexa57")
> -   (eq_attr "type" "fcmps,fcmpd"))
> +   (eq_attr "type" "fcmps,fcmpd,fccmps,fccmpd"))
>"ca57_cx2")
>  
>  (define_insn_reservation "cortex_a57_fp_arith" 4
> diff --git a/gcc/config/arm/exynos-m1.md b/gcc/config/arm/exynos-m1.md
> index 0448073..973c8a9 100644
> --- a/gcc/config/arm/exynos-m1.md
> +++ b/gcc/config/arm/exynos-m1.md
> @@ -823,6 +823,11 @@
> (eq_attr "type" "fcmps, fcmpd"))
>"em1_nmisc")
>  
> +(define_insn_reservation "exynos_m1_fp_ccmp" 7
> +  (and (eq_attr "tune" "exynosm1")
> +   (eq_attr "type" "fccmps, fccmpd"))
> +  "em1_st, em1_nmisc")
> +
>  (define_insn_reservation "exynos_m1_fp_sel" 4
>(and (eq_attr "tune" "exynosm1")
> (eq_attr "type" "fcsel"))
> diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
> index 321ff89..25f79b4 

Re: [AArch64] Remove AARCH64_EXTRA_TUNE_RECIP_SQRT from Cortex-A57 tuning

2016-02-15 Thread James Greenhalgh
On Mon, Feb 08, 2016 at 10:57:10AM +, James Greenhalgh wrote:
> On Mon, Feb 01, 2016 at 02:00:01PM +, James Greenhalgh wrote:
> > On Mon, Jan 25, 2016 at 11:20:46AM +, James Greenhalgh wrote:
> > > On Mon, Jan 11, 2016 at 12:04:43PM +, James Greenhalgh wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > I've seen a couple of large performance issues caused by expanding
> > > > the high-precision reciprocal square root for Cortex-A57, so I'd like
> > > > to turn it off by default.
> > > > 
> > > > This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from
> > > > Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) 
> > > > for
> > > > some private microbenchmark kernels which stress the 
> > > > divide/sqrt/multiply
> > > > units. It therefore seems to me to be the correct choice to make across
> > > > a number of workloads.
> > > > 
> > > > Bootstrapped and tested on aarch64-none-linux-gnu with no issues.
> > > > 
> > > > OK?
> > > 
> > > *Ping*
> > 
> > *pingx2*
> 
> *ping^3*

*ping^4*

Thanks,
James

> > > > ---
> > > > 2015-12-11  James Greenhalgh  
> > > > 
> > > > * config/aarch64/aarch64.c (cortexa57_tunings): Remove
> > > > AARCH64_EXTRA_TUNE_RECIP_SQRT.
> > > > 
> > > 
> > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > > > index 1d5d898..999c9fc 100644
> > > > --- a/gcc/config/aarch64/aarch64.c
> > > > +++ b/gcc/config/aarch64/aarch64.c
> > > > @@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings =
> > > >0,   /* max_case_values.  */
> > > >0,   /* cache_line_size.  */
> > > >tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
> > > > -  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS
> > > > -   | AARCH64_EXTRA_TUNE_RECIP_SQRT)/* tune_flags.  */
> > > > +  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags.  */
> > > >  };
> > > >  
> > > >  static const struct tune_params cortexa72_tunings =
> > > 
> > 
> 


Re: [Patch AArch64] Restrict 16-bit sqrdml{sa}h instructions to FP_LO_REGS

2016-02-15 Thread James Greenhalgh
On Mon, Feb 08, 2016 at 12:52:00PM +, James Greenhalgh wrote:
> On Tue, Jan 26, 2016 at 04:04:47PM +, James Greenhalgh wrote:
> > 
> > Hi,
> > 
> > In their forms using 16-bit lanes, the sqrdmlah and sqrdmlsh instruction
> > available when compiling with -march=armv8.1-a are only usable with
> > a register number in the range 0 to 15 for operand 3, as gas will point
> > out:
> > 
> >   Error: register number out of range 0 to 15 at
> > operand 3 -- `sqrdmlsh v2.4h,v4.4h,v23.h[5]'
> > 
> > This patch teaches GCC to avoid registers outside of this range when
> > appropriate, in the same fashion as we do for other instructions with
> > this limitation.
> > 
> > Tested on an internal testsuite targeting Neon intrinsics.
> > 
> > OK?
> 
> *ping*

*ping^2*

Thanks,
James

> > ---
> > 2016-01-25  James Greenhalgh  
> > 
> > * config/aarch64/aarch64.md
> > (arch64_sqrdmlh_lane): Fix register
> > constraints for operand 3.
> > (aarch64_sqrdmlh_laneq): Likewise.
> > 
> 
> > diff --git a/gcc/config/aarch64/aarch64-simd.md 
> > b/gcc/config/aarch64/aarch64-simd.md
> > index e1f5682..0b46e78 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -3240,7 +3240,7 @@
> >   [(match_operand:VDQHS 1 "register_operand" "0")
> >(match_operand:VDQHS 2 "register_operand" "w")
> >(vec_select:
> > -(match_operand: 3 "register_operand" "w")
> > +(match_operand: 3 "register_operand" "")
> >  (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
> >   SQRDMLH_AS))]
> > "TARGET_SIMD_RDMA"
> > @@ -3258,7 +3258,7 @@
> >   [(match_operand:SD_HSI 1 "register_operand" "0")
> >(match_operand:SD_HSI 2 "register_operand" "w")
> >(vec_select:
> > -(match_operand: 3 "register_operand" "w")
> > +(match_operand: 3 "register_operand" "")
> >  (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
> >   SQRDMLH_AS))]
> > "TARGET_SIMD_RDMA"
> > @@ -3278,7 +3278,7 @@
> >   [(match_operand:VDQHS 1 "register_operand" "0")
> >(match_operand:VDQHS 2 "register_operand" "w")
> >(vec_select:
> > -(match_operand: 3 "register_operand" "w")
> > +(match_operand: 3 "register_operand" "")
> >  (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
> >   SQRDMLH_AS))]
> > "TARGET_SIMD_RDMA"
> > @@ -3296,7 +3296,7 @@
> >   [(match_operand:SD_HSI 1 "register_operand" "0")
> >(match_operand:SD_HSI 2 "register_operand" "w")
> >(vec_select:
> > -(match_operand: 3 "register_operand" "w")
> > +(match_operand: 3 "register_operand" "")
> >  (parallel [(match_operand:SI 4 "immediate_operand" "i")]))]
> >   SQRDMLH_AS))]
> > "TARGET_SIMD_RDMA"
> 


Re: [Patch AArch64] GCC 6 regression in vector performance. - Fix vector initialization to happen with lane load instructions.

2016-02-15 Thread James Greenhalgh
On Mon, Feb 08, 2016 at 10:56:29AM +, James Greenhalgh wrote:
> On Tue, Feb 02, 2016 at 10:29:29AM +, James Greenhalgh wrote:
> > On Wed, Jan 20, 2016 at 03:22:11PM +, James Greenhalgh wrote:
> > > 
> > > Hi,
> > > 
> > > In a number of cases where we try to create vectors we end up spilling to 
> > > the
> > > stack and then filling. This is one example distilled from a couple of
> > > micro-benchmrks where the issue shows up. The reason for the extra cost
> > > in this case is the unnecessary use of the stack. The patch attempts to
> > > finesse this by using lane loads or vector inserts to produce the right
> > > results.
> > > 
> > > This patch is mostly Ramana's work, I've just cleaned it up a little.
> > > 
> > > This has been in a number of our trees lately, and we haven't seen any
> > > regressions. I've also bootstrapped and tested it, and run a set of
> > > benchmarks to show no regressions on Cortex-A57 or Cortex-A53.
> > > 
> > > The patch fixes some regressions caused by the more agressive 
> > > vectorization
> > > in GCC6, so I'd like to propose it to go in even though we are in Stage 4.
> > > 
> > > OK?
> > 
> > *Ping*
> 
> *ping^2*

*ping ^3*

Thanks,
James

> > > 2016-01-20  James Greenhalgh  
> > >   Ramana Radhakrishnan  
> > > 
> > >   * config/aarch64/aarch64.c (aarch64_expand_vector_init): Refactor,
> > >   always use lane loads to construct non-constant vectors.
> > > 
> > > gcc/testsuite/
> > > 
> > > 2016-01-20  James Greenhalgh  
> > >   Ramana Radhakrishnan  
> > > 
> > >   * gcc.target/aarch64/vector_initialization_nostack.c: New.
> > > 
> > 
> > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > > index 03bc1b9..3787b38 100644
> > > --- a/gcc/config/aarch64/aarch64.c
> > > +++ b/gcc/config/aarch64/aarch64.c
> > > @@ -10985,28 +10985,37 @@ aarch64_simd_make_constant (rtx vals)
> > >  return NULL_RTX;
> > >  }
> > >  
> > > +/* Expand a vector initialisation sequence, such that TARGET is
> > > +   initialised to contain VALS.  */
> > > +
> > >  void
> > >  aarch64_expand_vector_init (rtx target, rtx vals)
> > >  {
> > >machine_mode mode = GET_MODE (target);
> > >machine_mode inner_mode = GET_MODE_INNER (mode);
> > > +  /* The number of vector elements.  */
> > >int n_elts = GET_MODE_NUNITS (mode);
> > > +  /* The number of vector elements which are not constant.  */
> > >int n_var = 0;
> > >rtx any_const = NULL_RTX;
> > > +  /* The first element of vals.  */
> > > +  rtx v0 = XVECEXP (vals, 0, 0);
> > >bool all_same = true;
> > >  
> > > +  /* Count the number of variable elements to initialise.  */
> > >for (int i = 0; i < n_elts; ++i)
> > >  {
> > >rtx x = XVECEXP (vals, 0, i);
> > > -  if (!CONST_INT_P (x) && !CONST_DOUBLE_P (x))
> > > +  if (!(CONST_INT_P (x) || CONST_DOUBLE_P (x)))
> > >   ++n_var;
> > >else
> > >   any_const = x;
> > >  
> > > -  if (i > 0 && !rtx_equal_p (x, XVECEXP (vals, 0, 0)))
> > > - all_same = false;
> > > +  all_same &= rtx_equal_p (x, v0);
> > >  }
> > >  
> > > +  /* No variable elements, hand off to aarch64_simd_make_constant which 
> > > knows
> > > + how best to handle this.  */
> > >if (n_var == 0)
> > >  {
> > >rtx constant = aarch64_simd_make_constant (vals);
> > > @@ -11020,14 +11029,15 @@ aarch64_expand_vector_init (rtx target, rtx 
> > > vals)
> > >/* Splat a single non-constant element if we can.  */
> > >if (all_same)
> > >  {
> > > -  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0));
> > > +  rtx x = copy_to_mode_reg (inner_mode, v0);
> > >aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
> > >return;
> > >  }
> > >  
> > > -  /* Half the fields (or less) are non-constant.  Load constant then 
> > > overwrite
> > > - varying fields.  Hope that this is more efficient than using the 
> > > stack.  */
> > > -  if (n_var <= n_elts/2)
> > > +  /* Initialise a vector which is part-variable.  We want to first try
> > > + to build those lanes which are constant in the most efficient way we
> > > + can.  */
> > > +  if (n_var != n_elts)
> > >  {
> > >rtx copy = copy_rtx (vals);
> > >  
> > > @@ -11054,31 +11064,21 @@ aarch64_expand_vector_init (rtx target, rtx 
> > > vals)
> > > XVECEXP (copy, 0, i) = subst;
> > >   }
> > >aarch64_expand_vector_init (target, copy);
> > > +}
> > >  
> > > -  /* Insert variables.  */
> > > -  enum insn_code icode = optab_handler (vec_set_optab, mode);
> > > -  gcc_assert (icode != CODE_FOR_nothing);
> > > +  /* Insert the variable lanes directly.  */
> > >  
> > > -  for (int i = 0; i < n_elts; i++)
> > > - {
> > > -   rtx x = XVECEXP (vals, 0, i);
> > > -   if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
> > > - continue;
> > > -   x = 

Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt

2016-02-15 Thread James Greenhalgh
On Mon, Feb 08, 2016 at 10:57:44AM +, James Greenhalgh wrote:
> On Mon, Feb 01, 2016 at 01:59:34PM +, James Greenhalgh wrote:
> > On Mon, Jan 25, 2016 at 11:21:25AM +, James Greenhalgh wrote:
> > > On Mon, Jan 11, 2016 at 11:53:39AM +, James Greenhalgh wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > I'd like to switch the logic around in aarch64.c such that
> > > > -mlow-precision-recip-sqrt causes us to always emit the low-precision
> > > > software expansion for reciprocal square root. I have two reasons to do
> > > > this; first is consistency across -mcpu targets, second is enabling more
> > > > -mcpu targets to use the flag for peak tuning.
> > > > 
> > > > I don't much like that the precision we use for 
> > > > -mlow-precision-recip-sqrt
> > > > differs between cores (and possibly compiler revisions). Yes, we're
> > > > under -ffast-math but I take this flag to mean the user explicitly 
> > > > wants the
> > > > low-precision expansion, and we should not diverge from that based on an
> > > > internal decision as to what is optimal for performance in the
> > > > high-precision case. I'd prefer to keep things as predictable as 
> > > > possible,
> > > > and here that means always emitting the low-precision expansion when 
> > > > asked.
> > > > 
> > > > Judging by the comments in the thread proposing the reciprocal square
> > > > root optimisation, this will benefit all cores currently supported by 
> > > > GCC.
> > > > To be clear, we would still not expand in the high-precision case for 
> > > > any
> > > > cores which do not explicitly ask for it. Currently that is Cortex-A57
> > > > and xgene, though I will be proposing a patch to remove Cortex-A57 from
> > > > that list shortly.
> > > > 
> > > > Which gives my second motivation for this patch. 
> > > > -mlow-precision-recip-sqrt
> > > > is intended as a tuning flag for situations where performance is more
> > > > important than precision, but the current logic requires setting an
> > > > internal flag which also changes the performance characteristics where
> > > > high-precision is needed. This conflates two decisions the target might
> > > > want to make, and reduces the applicability of an option targets might
> > > > want to enable for performance. In particular, I'd still like to see
> > > > -mlow-precision-recip-sqrt continue to emit the cheaper, low-precision
> > > > sequence for floats under Cortex-A57.
> > > > 
> > > > Based on that reasoning, this patch makes the appropriate change to the
> > > > logic. I've checked with the current -mcpu values to ensure that 
> > > > behaviour
> > > > without -mlow-precision-recip-sqrt does not change, and that behaviour
> > > > with -mlow-precision-recip-sqrt is to emit the low precision sequences.
> > > > 
> > > > I've also put this through bootstrap and test on aarch64-none-linux-gnu
> > > > with no issues.
> > > > 
> > > > OK?
> > > 
> > > *Ping*
> > 
> > *Pingx2*
> 
> *Ping^3*

*ping^4*

Thanks,
James

> > > > 2015-12-10  James Greenhalgh  
> > > > 
> > > > * config/aarch64/aarch64.c (use_rsqrt_p): Always use software
> > > > reciprocal sqrt for -mlow-precision-recip-sqrt.
> > > > 
> > > 
> > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > > > index 9142ac0..1d5d898 100644
> > > > --- a/gcc/config/aarch64/aarch64.c
> > > > +++ b/gcc/config/aarch64/aarch64.c
> > > > @@ -7485,8 +7485,9 @@ use_rsqrt_p (void)
> > > >  {
> > > >return (!flag_trapping_math
> > > >   && flag_unsafe_math_optimizations
> > > > - && (aarch64_tune_params.extra_tuning_flags
> > > > - & AARCH64_EXTRA_TUNE_RECIP_SQRT));
> > > > + && ((aarch64_tune_params.extra_tuning_flags
> > > > +  & AARCH64_EXTRA_TUNE_RECIP_SQRT)
> > > > + || flag_mrecip_low_precision_sqrt));
> > > >  }
> > > >  
> > > >  /* Function to decide when to use
> > > 
> > 
> 


Re: [RFC, PR68580] Handle pthread_create error in tsan testsuite

2016-02-15 Thread Tom de Vries

On 15/02/16 10:07, Bernd Edlinger wrote:

On 15/02/16 09:07, Tom de Vries wrote:

>>On 15/02/16 08:24, Dmitry Vyukov wrote:
>>
>>If we are talking about pr 68580, then I would try:
>>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68580#c2
>>first.

>
>As I tried to explain in the follow-up comment 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68580#c3  ),
>since unfortunately I have no reliable way of reproducing the failure, there's 
no defined way to 'try' something.



But your proposed patch is also only guessing.


I've tried to be as clear as possible in the RFC submission that I'm not 
certain about the cause of the failure, and that the patch is proposing 
a fix that would make that guessed failure cause explicit.



Sure pthread_create can fail, as malloc and mmap.
But if that is the reason for the failure it would happen
just randomly, everywhere.

Why do you think that only this test case shows the problem?



As I explained in the RFC submission, my reasoning there was that the 
test is one of the very few test cases that tests the result of 
pthread_create and then returns 0, which causes the failure in 
combination with dg-shouldfail.


But thinking about it some more, even if pthread_create would fail, 
causing the testcase to fail in execution, allowing the execution test 
to pass due to dg-shouldfail, presumably the dg-output test would still 
fail in that case, so my reasoning was not sound.


So I suppose you're right, indeed the pthread_create fail hypothesis is 
not the most logical one.


Still, the patch is an improvement irrespective of the PR that inspired 
it, and perhaps a lot more library calls should be checked for errors 
that just pthread_create.



I think Dmitry's comment may be right on the point.


If someone proposes that as a patch for the testcase, great. I'm more 
that willing to test that in my setup to be able to claim 'bootstrapped 
and reg-tested on x86_64' in the submission.


I'm just trying to point out that I cannot 'try' out that patch and come 
back with the conformation that 'the patch fixes the failure', given the 
nature of the failure.


Thanks,
- Tom


Re: [PATCH] s390: Add -fsplit-stack support

2016-02-15 Thread Marcin Kościelnicki

On 15/02/16 11:21, Andreas Krebbel wrote:

On 02/14/2016 05:01 PM, Marcin Kościelnicki wrote:

libgcc/ChangeLog:

* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
* config/s390/morestack.S: New file.
* config/s390/t-stack-s390: New file.
* generic-morestack.c (__splitstack_find): Add s390-specific code.

gcc/ChangeLog:

* common/config/s390/s390-common.c (s390_supports_split_stack):
New function.
(TARGET_SUPPORTS_SPLIT_STACK): New macro.
* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
* config/s390/s390.c (struct machine_function): New field
split_stack_varargs_pointer.
(s390_register_info): Mark r12 as clobbered if it'll be used as temp
in s390_emit_prologue.
(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
vararg pointer.
(morestack_ref): New global.
(SPLIT_STACK_AVAILABLE): New macro.
(s390_expand_split_stack_prologue): New function.
(s390_live_on_entry): New function.
(s390_va_start): Use split-stack vararg pointer if appropriate.
(s390_asm_file_end): Emit the split-stack note sections.
(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
(UNSPECV_SPLIT_STACK_CALL): New unspec.
(UNSPECV_SPLIT_STACK_DATA): New unspec.
(split_stack_prologue): New expand.
(split_stack_space_check): New expand.
(split_stack_data): New insn.
(split_stack_call): New expand.
(split_stack_call_*): New insn.
(split_stack_cond_call): New expand.
(split_stack_cond_call_*): New insn.


Applied. Thanks!

-Andreas-



Thanks.  And how about that testcase I submitted, does that look OK?

Marcin Kościelnicki


[Ping][PATCH][GCC-5] Fix "#pragma GCC pop_options" warning.

2016-02-15 Thread Andre Vieira (lists)

On 18/01/16 11:04, Andre Vieira (lists) wrote:

Hi there,

Can we have the "#pragma GCC pop_options" fix backported to GCC-5?

Patch found in https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01261.html
and was committed in r228794.

The same patch applies cleanly to gcc-5, which would otherwise not be
able to use this pragma even though the support is there.

Cheers,
Andre



Ping.


Re: [PATCH] s390: Add -fsplit-stack support

2016-02-15 Thread Andreas Krebbel
On 02/14/2016 05:01 PM, Marcin Kościelnicki wrote:
> libgcc/ChangeLog:
> 
>   * config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
>   * config/s390/morestack.S: New file.
>   * config/s390/t-stack-s390: New file.
>   * generic-morestack.c (__splitstack_find): Add s390-specific code.
> 
> gcc/ChangeLog:
> 
>   * common/config/s390/s390-common.c (s390_supports_split_stack):
>   New function.
>   (TARGET_SUPPORTS_SPLIT_STACK): New macro.
>   * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
>   * config/s390/s390.c (struct machine_function): New field
>   split_stack_varargs_pointer.
>   (s390_register_info): Mark r12 as clobbered if it'll be used as temp
>   in s390_emit_prologue.
>   (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
>   vararg pointer.
>   (morestack_ref): New global.
>   (SPLIT_STACK_AVAILABLE): New macro.
>   (s390_expand_split_stack_prologue): New function.
>   (s390_live_on_entry): New function.
>   (s390_va_start): Use split-stack vararg pointer if appropriate.
>   (s390_asm_file_end): Emit the split-stack note sections.
>   (TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
>   * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
>   (UNSPECV_SPLIT_STACK_CALL): New unspec.
>   (UNSPECV_SPLIT_STACK_DATA): New unspec.
>   (split_stack_prologue): New expand.
>   (split_stack_space_check): New expand.
>   (split_stack_data): New insn.
>   (split_stack_call): New expand.
>   (split_stack_call_*): New insn.
>   (split_stack_cond_call): New expand.
>   (split_stack_cond_call_*): New insn.

Applied. Thanks!

-Andreas-



[PATCH] Improve PTA restrict handling for non-restrict pointers

2016-02-15 Thread Richard Biener

Currently we only disambiguate restrict based accesses against pointer
based accesses that end up using a default def.  The following removes
this restriction allowing disambiguation against any pointer based
access where PTA computed that the pointer cannot point to one of the
restrict tags we used to assign bases != 0 to.

This results in extra disambigations (all non-restrict pointer IV
uses were not handled previously), see the added vectorizer testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for GCC 7.

Richard.

2016-02-12  Richard Biener  

PR tree-optimization/43434
* tree-ssa-structalias.c (struct vls_data): New.
(visit_loadstore): Handle all pointer-based accesses.
(compute_dependence_clique): Compute a bitmap of restrict tags
assigned bases and pass it to visit_loadstore.

* gcc.dg/vect/pr43434.c: New testcase.
* c-c++-common/goacc/kernels-alias-3.c: Adjust.
* c-c++-common/goacc/kernels-alias-4.c: Likewise.
* c-c++-common/goacc/kernels-alias-5.c: Likewise.
* c-c++-common/goacc/kernels-alias-6.c: Likewise.
* c-c++-common/goacc/kernels-alias-7.c: Likewise.
* c-c++-common/goacc/kernels-alias-8.c: Likewise.
* gcc.dg/gomp/pr68640.c: Likewise.

Index: gcc/tree-ssa-structalias.c
===
*** gcc/tree-ssa-structalias.c  (revision 233369)
--- gcc/tree-ssa-structalias.c  (working copy)
*** delete_points_to_sets (void)
*** 7162,7174 
obstack_free (_solutions_obstack, NULL);
  }
  
  /* Mark "other" loads and stores as belonging to CLIQUE and with
 base zero.  */
  
  static bool
! visit_loadstore (gimple *, tree base, tree ref, void *clique_)
  {
!   unsigned short clique = (uintptr_t)clique_;
if (TREE_CODE (base) == MEM_REF
|| TREE_CODE (base) == TARGET_MEM_REF)
  {
--- 7162,7181 
obstack_free (_solutions_obstack, NULL);
  }
  
+ struct vls_data
+ {
+   unsigned short clique;
+   bitmap rvars;
+ };
+ 
  /* Mark "other" loads and stores as belonging to CLIQUE and with
 base zero.  */
  
  static bool
! visit_loadstore (gimple *, tree base, tree ref, void *data)
  {
!   unsigned short clique = ((vls_data *) data)->clique;
!   bitmap rvars = ((vls_data *) data)->rvars;
if (TREE_CODE (base) == MEM_REF
|| TREE_CODE (base) == TARGET_MEM_REF)
  {
*** visit_loadstore (gimple *, tree base, tr
*** 7176,7188 
if (TREE_CODE (ptr) == SSA_NAME
  && ! SSA_NAME_IS_DEFAULT_DEF (ptr))
{
! /* ???  We need to make sure 'ptr' doesn't include any of
 the restrict tags we added bases for in its points-to set.  */
! return false;
}
  
-   /* For now let decls through.  */
- 
/* Do not overwrite existing cliques (that includes clique, base
   pairs we just set).  */
if (MR_DEPENDENCE_CLIQUE (base) == 0)
--- 7183,7199 
if (TREE_CODE (ptr) == SSA_NAME
  && ! SSA_NAME_IS_DEFAULT_DEF (ptr))
{
! /* We need to make sure 'ptr' doesn't include any of
 the restrict tags we added bases for in its points-to set.  */
! varinfo_t vi = lookup_vi_for_tree (ptr);
! if (! vi)
!   return false;
! 
! vi = get_varinfo (find (vi->id));
! if (bitmap_intersect_p (rvars, vi->solution))
!   return false;
}
  
/* Do not overwrite existing cliques (that includes clique, base
   pairs we just set).  */
if (MR_DEPENDENCE_CLIQUE (base) == 0)
*** compute_dependence_clique (void)
*** 7255,7260 
--- 7266,7272 
  {
unsigned short clique = 0;
unsigned short last_ruid = 0;
+   bitmap rvars = BITMAP_ALLOC (NULL);
for (unsigned i = 0; i < num_ssa_names; ++i)
  {
tree ptr = ssa_name (i);
*** compute_dependence_clique (void)
*** 7310,7347 
  /* Now look at possible dereferences of ptr.  */
  imm_use_iterator ui;
  gimple *use_stmt;
  FOR_EACH_IMM_USE_STMT (use_stmt, ui, ptr)
{
  /* ???  Calls and asms.  */
  if (!gimple_assign_single_p (use_stmt))
continue;
! maybe_set_dependence_info (gimple_assign_lhs (use_stmt), ptr,
!clique, restrict_var, last_ruid);
! maybe_set_dependence_info (gimple_assign_rhs1 (use_stmt), ptr,
!clique, restrict_var, last_ruid);
}
}
  }
  
!   if (clique == 0)
! return;
  
!   /* Assign the BASE id zero to all accesses not based on a restrict
!  pointer.  That way they get disabiguated against restrict
!  accesses but not against each other.  */
!   /* ???  For restricts derived from globals (thus not incoming
!  parameters) we can't restrict scoping properly thus 

[PING][PATCH, PR67709 ] Don't call call_cgraph_insertion_hooks in simd_clone_create

2016-02-15 Thread Tom de Vries

On 08/02/16 13:54, Jakub Jelinek wrote:

On Mon, Feb 08, 2016 at 01:46:44PM +0100, Tom de Vries wrote:

[ The pass before pass_omp_simd_clone is pass_dispatcher_calls. It has a
function create_target_clone, similar to simd_clone_create, with a
node.defition and !node.defition part. The !node.defition part does not call
'symtab->call_cgraph_insertion_hooks (new_node)'. ]


I'll defer to Honza or Richi if it is ok not to call cgraph insertion hooks
at this point (and since when they can be avoided), or what else should be
done.

The patch could be ok even for 6.0, not just stage1, if they are ok with it
(or propose some other change).



Ping (Given that Jakub suggested this or an alternative patch might be 
included in 6.0 stage4).


Original submission at 
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00543.html .


Thanks,
- Tom


Don't call call_cgraph_insertion_hooks in simd_clone_create

2016-02-08  Tom de Vries  

PR lto/67709
* omp-low.c (simd_clone_create): Remove call to
symtab->call_cgraph_insertion_hooks.

* testsuite/libgomp.fortran/declare-simd-4.f90: New test.


Jakub





[PATCH] Fix PR69783

2016-02-15 Thread Richard Biener

The following fixes PR69783, a missed optimization after my fix to
vectorizer runtime alias test merging.  While I still don't understand
the existing test guarding the merging (I've just fixed up things to
its assumptions based on the original posting), the following patch
adds two trivially correct cases applying before it which restores
the alias check reduction for the testcase (to 2 instead of 3 even).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

If any other issue with the old test comes up (I still believe the
condition is bogus) I'll simply remove it.

Richard.

2016-02-15  Richard Biener  

PR tree-optimization/69783
* tree-vect-data-refs.c (vect_prune_runtime_alias_test_list):
Add trivially correct cases.

* gcc.dg/vect/pr69783.c: New testcase.

Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 233369)
--- gcc/tree-vect-data-refs.c   (working copy)
*** vect_prune_runtime_alias_test_list (loop
*** 3089,3094 
--- 3089,3118 
= tree_to_shwi (dr_a2->offset) - tree_to_shwi (dr_a1->offset);
  
  
+ bool do_remove = false;
+ 
+ /* If the left segment does not extend beyond the start of the
+right segment the new segment length is that of the right
+plus the segment distance.  */
+ if (tree_fits_uhwi_p (dr_a1->seg_len)
+ && compare_tree_int (dr_a1->seg_len, diff) <= 0)
+   {
+ dr_a1->seg_len = size_binop (PLUS_EXPR, dr_a2->seg_len,
+  size_int (diff));
+ do_remove = true;
+   }
+ /* Generally the new segment length is the maximum of the
+left segment size and the right segment size plus the distance.
+???  We can also build tree MAX_EXPR here but it's not clear this
+is profitable.  */
+ else if (tree_fits_uhwi_p (dr_a1->seg_len)
+  && tree_fits_uhwi_p (dr_a2->seg_len))
+   {
+ unsigned HOST_WIDE_INT seg_len_a1 = tree_to_uhwi (dr_a1->seg_len);
+ unsigned HOST_WIDE_INT seg_len_a2 = tree_to_uhwi (dr_a2->seg_len);
+ dr_a1->seg_len = size_int (MAX (seg_len_a1, diff + seg_len_a2));
+ do_remove = true;
+   }
  /* Now we check if the following condition is satisfied:
  
 DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B
*** vect_prune_runtime_alias_test_list (loop
*** 3101,3139 
 one above:
  
 1: DIFF <= MIN_SEG_LEN_B
!2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B
! 
!*/
  
! unsigned HOST_WIDE_INT min_seg_len_b
!   = (tree_fits_uhwi_p (dr_b1->seg_len)
!  ? tree_to_uhwi (dr_b1->seg_len)
!  : vect_factor);
! 
! if (diff <= min_seg_len_b
! || (tree_fits_uhwi_p (dr_a1->seg_len)
! && diff - tree_to_uhwi (dr_a1->seg_len) < min_seg_len_b))
{
  if (dump_enabled_p ())
{
  dump_printf_loc (MSG_NOTE, vect_location,
   "merging ranges for ");
! dump_generic_expr (MSG_NOTE, TDF_SLIM,
!DR_REF (dr_a1->dr));
  dump_printf (MSG_NOTE,  ", ");
! dump_generic_expr (MSG_NOTE, TDF_SLIM,
!DR_REF (dr_b1->dr));
  dump_printf (MSG_NOTE,  " and ");
! dump_generic_expr (MSG_NOTE, TDF_SLIM,
!DR_REF (dr_a2->dr));
  dump_printf (MSG_NOTE,  ", ");
! dump_generic_expr (MSG_NOTE, TDF_SLIM,
!DR_REF (dr_b2->dr));
  dump_printf (MSG_NOTE, "\n");
}
- 
- dr_a1->seg_len = size_binop (PLUS_EXPR,
-  dr_a2->seg_len, size_int (diff));
  comp_alias_ddrs.ordered_remove (i--);
}
}
--- 3125,3163 
 one above:
  
 1: DIFF <= MIN_SEG_LEN_B
!2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B  */
! else
!   {
! unsigned HOST_WIDE_INT min_seg_len_b
!   = (tree_fits_uhwi_p (dr_b1->seg_len)
!  ? tree_to_uhwi (dr_b1->seg_len)
!  : vect_factor);
! 
! if (diff <= min_seg_len_b
! || (tree_fits_uhwi_p (dr_a1->seg_len)
! && diff - tree_to_uhwi (dr_a1->seg_len) < min_seg_len_b))
!   {
! dr_a1->seg_len = size_binop (PLUS_EXPR,
!  dr_a2->seg_len, size_int (diff));
! do_remove = true;
!   }
!   }
  
! if (do_remove)
{
  

Re: [PATCH PR69821] gcc: add option gno-record-debug-prefix-map

2016-02-15 Thread Hongxu Jia

On 02/15/2016 04:09 PM, Hongxu Jia wrote:

PR other/69821
   * common.opt (grecord-debug-prefix-map, gno-record-debug-prefix-map):
 New options.
   * dwarf2out.c:(gen_producer_string) Use option to filter
 -fdebug-prefix-map
   * doc/invoke.texi: Document -grecord-debug-prefix-map and
 -gno-record-debug-prefix-map.

Signed-off-by: Hongxu Jia 
---
  gcc/common.opt  |  8 
  gcc/doc/invoke.texi | 14 ++
  gcc/dwarf2out.c |  8 
  3 files changed, 30 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index 2259f29..3aef05a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2464,6 +2464,14 @@ grecord-gcc-switches
  Common RejectNegative Var(dwarf_record_gcc_switches,1)
  Record gcc command line switches in DWARF DW_AT_producer.
  
+gno-record-debug-prefix-map

+Common RejectNegative Var(dwarf_record_debug_prefix_map,0) Init(1)
+Don't record -fdebug-prefix-map in gcc command line switches in DWARF 
DW_AT_producer.
+
+grecord-debug-prefix-map
+Common RejectNegative Var(dwarf_record_debug_prefix_map,1)
+Record -fdebug-prefix-map in gcc command line switches in DWARF DW_AT_producer.
+
  gno-split-dwarf
  Common Driver RejectNegative Var(dwarf_split_debug_info,0) Init(0)
  Don't generate debug information in separate .dwo files
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9d8ffc0..d18d24a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -337,6 +337,7 @@ Objective-C and Objective-C++ Dialects}.
  -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining-verbose @gol
  -fstack-usage  -ftest-coverage  -ftime-report -fvar-tracking @gol
  -fvar-tracking-assignments  -fvar-tracking-assignments-toggle @gol
+-grecord-debug-prefix-map-gstabs  -gno-record-debug-prefix-map @gol


s/-grecord-debug-prefix-map-gstabs/-grecord-debug-prefix-map/

Sorry for the typo


  -g  -g@var{level}  -gtoggle  -gcoff  -gdwarf-@var{version} @gol
  -ggdb  -grecord-gcc-switches  -gno-record-gcc-switches @gol
  -gstabs  -gstabs+  -gstrict-dwarf  -gno-strict-dwarf @gol
@@ -5220,6 +5221,19 @@ way of storing compiler options into the object file.  
This is the default.
  Disallow appending command-line options to the DW_AT_producer attribute
  in DWARF debugging information.
  
+@item -grecord-debug-prefix-map-gstabs

+@opindex grecord-debug-prefix-map-gstabs
+While -grecord-gcc-switches and -fdebug-prefix-map used, keep
+-fdebug-prefix-map in command line options which is appended
+to the DW_AT_producer attribute in DWARF debugging information.
+This is the default.
+
+@item -gno-record-debug-prefix-map
+@opindex gno-record-debug-prefix-map
+While -grecord-gcc-switches and -fdebug-prefix-map used, remove
+-fdebug-prefix-map in command line options which is appended
+to the DW_AT_producer attribute in DWARF debugging information.
+
  @item -gstrict-dwarf
  @opindex gstrict-dwarf
  Disallow using extensions of later DWARF standard version than selected
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 13b2de7..19a149a 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -19182,6 +19182,8 @@ gen_producer_string (void)
case OPT_SPECIAL_input_file:
case OPT_grecord_gcc_switches:
case OPT_gno_record_gcc_switches:
+  case OPT_grecord_debug_prefix_map:
+  case OPT_gno_record_debug_prefix_map:
case OPT__output_pch_:
case OPT_fdiagnostics_show_location_:
case OPT_fdiagnostics_show_option:
@@ -19214,6 +19216,12 @@ gen_producer_string (void)
  default:
break;
  }
+
+   /* Don't record -fdebug-prefix-map in gcc command line
+  switches in DWARF DW_AT_producer */
+   if (save_decoded_options[j].opt_index==OPT_fdebug_prefix_map_ &&
+   !dwarf_record_debug_prefix_map)
+ continue;
switches.safe_push (save_decoded_options[j].orig_option_with_args_text);
len += strlen (save_decoded_options[j].orig_option_with_args_text) + 1;
break;




Re: [RFC, PR68580] Handle pthread_create error in tsan testsuite

2016-02-15 Thread Dmitry Vyukov
On Mon, Feb 15, 2016 at 10:07 AM, Bernd Edlinger
 wrote:
> On 15/02/16 09:07, Tom de Vries wrote:
>>>On 15/02/16 08:24, Dmitry Vyukov wrote:
>>>
>>>If we are talking about pr 68580, then I would try:
>>>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68580#c2
>>>first.
>>
>> As I tried to explain in the follow-up comment ( 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68580#c3 ),
>> since unfortunately I have no reliable way of reproducing the failure, 
>> there's no defined way to 'try' something.
>
> But your proposed patch is also only guessing.

Yeah, that's what I thought.
Agree that general pthread_create failures should affect more tests.

We can do both I guess.

> Sure pthread_create can fail, as malloc and mmap.
> But if that is the reason for the failure it would happen
> just randomly, everywhere.
>
> Why do you think that only this test case shows the problem?
>
> I think Dmitry's comment may be right on the point.
>
> In pr65400-1.c we have
> int v; int q; int o;
>
> be 4 byte aligned integers.
> and at least
> v and q share the same 8-byte tsan shadow memory slot.
>
> v and q are modified simultaniously, and each update
> can be lost.
>
> The barrier wont help here, as it only synchronizes
> accesses on v.
>
> So I think either we change int => long long
> or add __attribute__((aligned(8))) to the variable declarations,
> to make sure that each of them goes into a different memory
> slot.
>
>
> Regards,
> Bernd.
>
>


Re: [PING][PATCH] Don't mark offload symbols with force_output in ltrans

2016-02-15 Thread Richard Biener
On Mon, 15 Feb 2016, Tom de Vries wrote:

> [ was: [PING][PATCH] Mark symbols in offload tables with force_output in
> read_offload_tables ]
> 
> On 08/02/16 14:20, Tom de Vries wrote:
> > On 26/01/16 14:01, Ilya Verbin wrote:
> > > On Tue, Jan 26, 2016 at 13:21:57 +0100, Tom de Vries wrote:
> > > > On 25/01/16 14:27, Ilya Verbin wrote:
> > > > > On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote:
> > > > > > > diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> > > > > > > index 62e5454..cdaee41 100644
> > > > > > > --- a/gcc/lto-cgraph.c
> > > > > > > +++ b/gcc/lto-cgraph.c
> > > > > > > @@ -1911,6 +1911,11 @@ input_offload_tables (void)
> > > > > > > tree fn_decl
> > > > > > >   = lto_file_decl_data_get_fn_decl (file_data,
> > > > > > > decl_index);
> > > > > > > vec_safe_push (offload_funcs, fn_decl);
> > > > > > > +
> > > > > > > +  /* Prevent IPA from removing fn_decl as unreachable,
> > > > > > > since there
> > > > > > > + may be no refs from the parent function to child_fn in
> > > > > > > offload
> > > > > > > + LTO mode.  */
> > > > > > > +  cgraph_node::get (fn_decl)->mark_force_output ();
> > > > > > >   }
> > > > > > > else if (tag == LTO_symtab_variable)
> > > > > > >   {
> > > > > > > @@ -1918,6 +1923,10 @@ input_offload_tables (void)
> > > > > > > tree var_decl
> > > > > > >   = lto_file_decl_data_get_var_decl (file_data,
> > > > > > > decl_index);
> > > > > > > vec_safe_push (offload_vars, var_decl);
> > > > > > > +
> > > > > > > +  /* Prevent IPA from removing var_decl as unused, since
> > > > > > > there
> > > > > > > + may be no refs to var_decl in offload LTO mode.  */
> > > > > > > +  varpool_node::get (var_decl)->force_output = 1;
> > > > > > >   }
> > > > > 
> > > > > This doesn't work when there is more than one LTO partition, because
> > > > > only first
> > > > > partition contains full offload table to maintain correct order, but
> > > > > cgraph and
> > > > > varpool nodes aren't necessarily created for the first partition.
> > > > > To reproduce:
> > > > > 
> > > > > $ make check-target-libgomp RUNTESTFLAGS="c.exp=for-*
> > > > > --target_board=unix/-flto"
> > > > > FAIL: libgomp.c/for-3.c (internal compiler error)
> > > > > FAIL: libgomp.c/for-5.c (internal compiler error)
> > > > > FAIL: libgomp.c/for-6.c (internal compiler error)
> > > > > $ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-*
> > > > > --target_board=unix/-flto"
> > > > > FAIL: libgomp.c++/for-11.C (internal compiler error)
> > > > > FAIL: libgomp.c++/for-13.C (internal compiler error)
> > > > > FAIL: libgomp.c++/for-14.C (internal compiler error)
> > > > 
> > > > This works for me.
> > > > 
> > > > OK for trunk?
> > > > 
> > > > Thanks,
> > > > - Tom
> > > > 
> > > 
> > > > Check that cgraph/varpool_node exists before use in input_offload_tables
> > > > 
> > > > 2016-01-26  Tom de Vries  
> > > > 
> > > > * lto-cgraph.c (input_offload_tables): Check that
> > > > cgraph/varpool_node
> > > > exists before use.
> > > 
> > > In this case they will be not marked as force_output in other
> > > partitions (except
> > > the first one).
> > 
> > AFAIU, that's not the case.
> > 
> > If we're splitting up lto compilation over partitions, it means we're
> > first calling lto1 in WPA mode. We'll read in all offload tables, and
> > mark all symbols with force_output, and when writing out the partitions,
> > we'll write the offload symbols out with force_output set.
> > 
> > This updated patch only does the force_output marking for offload
> > symbols in WPA or LTO. It's not necessary in LTRANS mode.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > Build for nvidia accelerator and reg-tested libgomp with various lto
> > settings.
> > 
> > OK for trunk, stage4?
> > 
> 
> Ping. Original submission here:
> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00549.html .

Ok.

Richhard.

> Thanks,
> - Tom
> 
> > 0006-Don-t-mark-offload-symbols-with-force_output-in-ltrans.patch
> > 
> > 
> > Don't mark offload symbols with force_output in ltrans
> > 
> > 2016-02-08  Tom de Vries  
> > 
> > PR lto/69655
> > * lto-cgraph.c (input_offload_tables): Add and handle bool parameter
> > do_force_output.
> > * lto-streamer.h (input_offload_tables): Add and handle bool
> > parameter.
> > 
> > * lto.c (read_cgraph_and_symbols): Call input_offload_tables with
> > argument.
> > 
> > ---
> >   gcc/lto-cgraph.c   | 8 +---
> >   gcc/lto-streamer.h | 2 +-
> >   gcc/lto/lto.c  | 2 +-
> >   3 files changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> > index 0634779..95c446d 100644
> > --- a/gcc/lto-cgraph.c
> > +++ b/gcc/lto-cgraph.c
> > @@ -1885,7 +1885,7 @@ input_symtab (void)
> >  target code, and store them into OFFLOAD_FUNCS and OFFLOAD_VARS. 

Re: [RFC, PR68580] Handle pthread_create error in tsan testsuite

2016-02-15 Thread Bernd Edlinger
On 15/02/16 09:07, Tom de Vries wrote:
>>On 15/02/16 08:24, Dmitry Vyukov wrote:
>>
>>If we are talking about pr 68580, then I would try:
>>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68580#c2
>>first.
>
> As I tried to explain in the follow-up comment ( 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68580#c3 ),
> since unfortunately I have no reliable way of reproducing the failure, 
> there's no defined way to 'try' something.

But your proposed patch is also only guessing.
Sure pthread_create can fail, as malloc and mmap.
But if that is the reason for the failure it would happen
just randomly, everywhere.

Why do you think that only this test case shows the problem?

I think Dmitry's comment may be right on the point.

In pr65400-1.c we have
int v; int q; int o; 

be 4 byte aligned integers.
and at least
v and q share the same 8-byte tsan shadow memory slot.

v and q are modified simultaniously, and each update
can be lost.

The barrier wont help here, as it only synchronizes
accesses on v.

So I think either we change int => long long
or add __attribute__((aligned(8))) to the variable declarations,
to make sure that each of them goes into a different memory
slot.


Regards,
Bernd.




Re: [PATCH] Avoid bugs like PR68273 to trigger

2016-02-15 Thread Richard Biener
On Sun, 14 Feb 2016, Eric Botcazou wrote:

> > No, but if there is none left why would you want to "fix" SRA?
> 
> As expected, it seems that the make_ssa_name_fn kludge is not sufficient, so 
> I'm proposing to disable the PR65310 one-liner for selected targets, using 
> the 
> function_arg_boundary hook, until after we have a clear way out of this mess.
> 
> Here's a summary of the situation:
> 
> targhooks.c:default_function_arg_boundary N
> aarch64/aarch64.c:aarch64_function_arg_boundary   Y
> arm/arm.c:arm_function_arg_boundary   N
> c6x/c6x.c:c6x_function_arg_boundary   Y
> epiphany/epiphany.c:epiphany_function_arg_boundaryY
> frv/frv.c:frv_function_arg_boundary   N
> i386/i386.c:ix86_function_arg_boundaryN
> ia64/ia64.c:ia64_function_arg_boundaryY
> iq2000/iq2000.c:iq2000_function_arg_boundary  Y
> m32c/m32c.c:m32c_function_arg_boundaryN
> mcore/mcore.c:mcore_function_arg_boundary N
> mips/mips.c:mips_function_arg_boundaryY
> msp430/msp430.c:msp430_function_arg_boundary  N
> nds32/nds32.c:nds32_function_arg_boundary Y
> pa/pa.c:pa_function_arg_boundary  N
> rl78/rl78.c:rl78_function_arg_boundaryN
> rs6000/rs6000.c:rs6000_function_arg_boundary  Y (aggr, AIX/ELFv2)
> rx/rx.c:rx_function_arg_boundary  Y
> sparc/sparc.c:sparc_function_arg_boundary Y (64-bit)
> tilegx/tilegx.c:tilegx_function_arg_boundary  Y
> tilepro/tilepro.c:tilepro_function_arg_boundary   Y
> xtensa/xtensa.c:xtensa_function_arg_boundary  Y
> 
> A 'Y' means that the return value of function_arg_boundary depends on the 
> alignment of the type it is directly passed (e.g. not on its main variant).
> 'aggr' means for aggregate types only, the other modifiers are ABIs.
> 
> So if we add a test based on function_arg_boundary, we'll effectively disable 
> the PR65310 one-liner in some cases for the following targets:
> aarch64, c6x, epiphany, ia64, iq2000, mips, nds32, rs6000 (aggr), rx, sparc,
> tilegx, tilepro, xtensa
> 
> If we additionally test STRICT_ALIGNMENT, the set of targets shrinks to:
> c6x, epiphany, ia64, iq2000, mips, nds32, sparc, tilegx, tilepro, xtensa
> 
> MIPS being in both sets, this will fix PR68273 in both cases.

I agree with Jakub on this (obviously), still a comment on the patch:

> Index: tree-sra.c
> ===
> --- tree-sra.c  (revision 28)
> +++ tree-sra.c  (working copy)
> @@ -1681,9 +1681,22 @@ build_ref_for_offset (location_t loc, tr
>misalign = (misalign + offset) & (align - 1);
>if (misalign != 0)
>  align = (misalign & -misalign);
> -  if (align != TYPE_ALIGN (exp_type))
> +
> +  /* Misaligning a type is generally OK (if it's naturally aligned).  */
> +  if (align < TYPE_ALIGN (exp_type))
>  exp_type = build_aligned_type (exp_type, align);

So you simply assume that exp_type is naturally aligned here.  I think
you should test align < TYPE_ALIGN (TYPE_MAIN_VARIANT (exp_type)) here, 
no?

> +  /* Overaligning it can be problematic because of calling conventions.  */
> +  else if (align > TYPE_ALIGN (exp_type))
> +{
> +  tree aligned_type = build_aligned_type (exp_type, align);
> +  if (targetm.calls.function_arg_boundary (TYPE_MODE (aligned_type),
> +  aligned_type)
> + == targetm.calls.function_arg_boundary (TYPE_MODE (exp_type),
> + exp_type))

And if you get enough supporters to apply this kind of workaround
I'd prefer it to be in build_aligned_type itself, basically
refusing to build over-aligned types.  And I'd rather make this
controlled by an internal flag that backends should consciously
set (aka a target hook).  The above is simply a bit too ugly IMHO
and looks incomplete(?), cannot even the cummulative args machinery
end up with type-align specifics or are you sure those can only
be triggered from function_arg_boundary?

Note the real issue is overaligned register types.  I've tried

Index: tree-ssa.c
===
--- tree-ssa.c  (revision 233369)
+++ tree-ssa.c  (working copy)
@@ -936,6 +936,22 @@ verify_ssa (bool check_modified_stmt, bo
  name, stmt, virtual_operand_p (name)))
goto err;
}
+
+ if (! SSA_NAME_VAR (name)
+ && TYPE_MAIN_VARIANT (TREE_TYPE (name)) != TREE_TYPE (name))
+   {
+ error ("type is not main variant");
+ goto err;
+   }
+

but that's not too happy (to verify the change below).  But verifying
we don't have over-aligned (or under-aligned) registers would be a good 
thing and I think all registers should 

Re: [PATCH] Fix PR69595, bogus -Warray-bound warning

2016-02-15 Thread Richard Biener
On Sun, 14 Feb 2016, Marc Glisse wrote:

> On Tue, 2 Feb 2016, Richard Biener wrote:
> 
> > *** gcc/match.pd(revision 233067)
> > --- gcc/match.pd(working copy)
> > *** DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > *** 2094,2099 
> > --- 2094,2117 
> >   (bit_and:c (ordered @0 @0) (ordered:c@2 @0 @1))
> >   @2)
> > 
> > + /* Simple range test simplifications.  */
> > + /* A < B || A >= B -> true.  */
> > + (for test1 (lt le ne)
> > +  test2 (ge gt eq)
> > +  (simplify
> > +   (bit_ior:c (test1 @0 @1) (test2 @0 @1))
> > +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > +|| VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0)))
> > +{ constant_boolean_node (true, type); })))
> > + /* A < B && A >= B -> false.  */
> > + (for test1 (lt lt lt le ne eq)
> > +  test2 (ge gt eq gt eq gt)
> 
> The lack of symmetry between the || and && cases is surprising. Is there any
> reason not to handle the pairs le/ge, le/ne and ge/ne for bit_ior?

Whoops, no.  I simply forgot those.  I'll bootstrap/test

2016-02-15  Richard Biener  

PR tree-optimization/69595
* match.pd: Complete range test simplification to true.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 233369)
+++ gcc/match.pd(working copy)
@@ -2119,8 +2119,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* Simple range test simplifications.  */
 /* A < B || A >= B -> true.  */
-(for test1 (lt le ne)
- test2 (ge gt eq)
+(for test1 (lt le le le ne ge)
+ test2 (ge gt ge ne eq ne)
  (simplify
   (bit_ior:c (test1 @0 @1) (test2 @0 @1))
   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))


Re: [PATCH] 69780 - [4.9/5/6 Regression] ICE on __builtin_alloca_with_align, with small alignment

2016-02-15 Thread Jakub Jelinek
On Sun, Feb 14, 2016 at 07:16:13PM -0700, Martin Sebor wrote:
> +case BUILT_IN_ALLOCA_WITH_ALIGN:
> +  {
> + /* Get the requested alignment (in bits) if it's a constant
> +integer expression.  */
> + HOST_WIDE_INT align =
> +   TREE_CODE (args [1]) == INTEGER_CST ? tree_to_uhwi (args [1]) : 0;

Formatting.  = needs to be on the next line.

> + /* Determine the exact power of 2 of the requested alignment.  */
> + int alignpow = align ? tree_log2 (args [1]) : 0;
> +
> + /* Reject invalid alignments.  */
> + if (alignpow < 3 || MAX_STACK_ALIGNMENT < align)

This looks wrong.  Both the hardcoding of 3 (IMHO you should instead
do alignpow == -1 || align < BITS_PER_UNIT for the low boundary.
And MAX_STACK_ALIGNMENT certainly is not the upper bound of the alignment,
that is from what I understand the code something so that the alignment
in bits fits into unsigned int.  And perhaps if you aren't going to
use alignpow for anything, you can just check the low and high boundaries
and otherwise check (align & (align - 1)) == 0, or integer_pow2p (args [1]).

> +   {
> + error_at (EXPR_LOC_OR_LOC (args [1], input_location),

No space before [.

> +   "second argument to function %qE must be a constant "
> +   "integer power of 2 between %qi and %qwu bits",
> +   fndecl, CHAR_TYPE_SIZE,

And here, you are printing unrelated value to the one that you've checked.

> +   (unsigned HOST_WIDE_INT)MAX_STACK_ALIGNMENT);
> + return false;
> +   }
> +  return true;
> +  }
> +

Jakub


[PATCH PR69821] gcc: add option gno-record-debug-prefix-map

2016-02-15 Thread Hongxu Jia
PR other/69821
  * common.opt (grecord-debug-prefix-map, gno-record-debug-prefix-map):
New options.
  * dwarf2out.c:(gen_producer_string) Use option to filter
-fdebug-prefix-map
  * doc/invoke.texi: Document -grecord-debug-prefix-map and
-gno-record-debug-prefix-map.

Signed-off-by: Hongxu Jia 
---
 gcc/common.opt  |  8 
 gcc/doc/invoke.texi | 14 ++
 gcc/dwarf2out.c |  8 
 3 files changed, 30 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index 2259f29..3aef05a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2464,6 +2464,14 @@ grecord-gcc-switches
 Common RejectNegative Var(dwarf_record_gcc_switches,1)
 Record gcc command line switches in DWARF DW_AT_producer.
 
+gno-record-debug-prefix-map
+Common RejectNegative Var(dwarf_record_debug_prefix_map,0) Init(1)
+Don't record -fdebug-prefix-map in gcc command line switches in DWARF 
DW_AT_producer.
+
+grecord-debug-prefix-map
+Common RejectNegative Var(dwarf_record_debug_prefix_map,1)
+Record -fdebug-prefix-map in gcc command line switches in DWARF DW_AT_producer.
+
 gno-split-dwarf
 Common Driver RejectNegative Var(dwarf_split_debug_info,0) Init(0)
 Don't generate debug information in separate .dwo files
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9d8ffc0..d18d24a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -337,6 +337,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining-verbose @gol
 -fstack-usage  -ftest-coverage  -ftime-report -fvar-tracking @gol
 -fvar-tracking-assignments  -fvar-tracking-assignments-toggle @gol
+-grecord-debug-prefix-map-gstabs  -gno-record-debug-prefix-map @gol
 -g  -g@var{level}  -gtoggle  -gcoff  -gdwarf-@var{version} @gol
 -ggdb  -grecord-gcc-switches  -gno-record-gcc-switches @gol
 -gstabs  -gstabs+  -gstrict-dwarf  -gno-strict-dwarf @gol
@@ -5220,6 +5221,19 @@ way of storing compiler options into the object file.  
This is the default.
 Disallow appending command-line options to the DW_AT_producer attribute
 in DWARF debugging information.
 
+@item -grecord-debug-prefix-map-gstabs
+@opindex grecord-debug-prefix-map-gstabs
+While -grecord-gcc-switches and -fdebug-prefix-map used, keep
+-fdebug-prefix-map in command line options which is appended
+to the DW_AT_producer attribute in DWARF debugging information.
+This is the default.
+
+@item -gno-record-debug-prefix-map
+@opindex gno-record-debug-prefix-map
+While -grecord-gcc-switches and -fdebug-prefix-map used, remove
+-fdebug-prefix-map in command line options which is appended
+to the DW_AT_producer attribute in DWARF debugging information.
+
 @item -gstrict-dwarf
 @opindex gstrict-dwarf
 Disallow using extensions of later DWARF standard version than selected
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 13b2de7..19a149a 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -19182,6 +19182,8 @@ gen_producer_string (void)
   case OPT_SPECIAL_input_file:
   case OPT_grecord_gcc_switches:
   case OPT_gno_record_gcc_switches:
+  case OPT_grecord_debug_prefix_map:
+  case OPT_gno_record_debug_prefix_map:
   case OPT__output_pch_:
   case OPT_fdiagnostics_show_location_:
   case OPT_fdiagnostics_show_option:
@@ -19214,6 +19216,12 @@ gen_producer_string (void)
  default:
break;
  }
+
+   /* Don't record -fdebug-prefix-map in gcc command line
+  switches in DWARF DW_AT_producer */
+   if (save_decoded_options[j].opt_index==OPT_fdebug_prefix_map_ &&
+   !dwarf_record_debug_prefix_map)
+ continue;
switches.safe_push (save_decoded_options[j].orig_option_with_args_text);
len += strlen (save_decoded_options[j].orig_option_with_args_text) + 1;
break;
-- 
1.9.1



Re: [RFC, PR68580] Handle pthread_create error in tsan testsuite

2016-02-15 Thread Tom de Vries

On 15/02/16 08:24, Dmitry Vyukov wrote:

If we are talking about pr 68580, then I would try:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68580#c2
first.


As I tried to explain in the follow-up comment ( 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68580#c3 ), since 
unfortunately I have no reliable way of reproducing the failure, there's 
no defined way to 'try' something.


Thanks,
- Tom