[patch, committed] Bug 79841 - Inconsistent diagnostics

2017-03-03 Thread Jerry DeLisle

Committed as obvious after regression testing.

Regards,

Jerry

Author: jvdelisle
Date: Sat Mar  4 03:13:34 2017
New Revision: 245891

URL: https://gcc.gnu.org/viewcvs?rev=245891=gcc=rev
Log:
2017-03-03  Jerry DeLisle  

PR fortran/79841
* openmp.c (check_symbol_not_pointer): Adjust diagnostic.

Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/openmp.c



New German PO file for 'gcc' (version 7.1-b20170226)

2017-03-03 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

http://translationproject.org/latest/gcc/de.po

(This file, 'gcc-7.1-b20170226.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [RFC] VEC_SELECT sanity checking in genrecog

2017-03-03 Thread Michael Meissner
On Fri, Mar 03, 2017 at 02:58:06PM -0600, Segher Boessenkool wrote:
> On Fri, Mar 03, 2017 at 03:43:03PM -0500, Michael Meissner wrote:
> > On Fri, Mar 03, 2017 at 02:19:57PM -0600, Segher Boessenkool wrote:
> > > On Fri, Mar 03, 2017 at 05:28:27PM +0100, Jakub Jelinek wrote:
> > > Yeah, it looks like these patterns should use VSX_D instead of VSX_LE.
> > > Mike, you know this code best, what do you think?
> > 
> > Bill Schmidt added these, but he is gone for the day.
> > 
> > No, these patterns need to use VSX_LE, but V1TI should be moved from VSX_LE 
> > to
> > VSX_LE_128.  VSX_LE_128 is used for things like IEEE 128-bit floating point 
> > and
> > 128-bit integers that can't use vec_select.  In fact there is this comment:
> > 
> > ;; Little endian word swapping for 128-bit types that are either scalars or 
> > the
> > ;; special V1TI container class, which it is not appropriate to use 
> > vec_select
> > ;; for the type.
> 
> Okay, but there are also these comments:
> 
> ;; Iterator for the 2 64-bit vector types
> (define_mode_iterator VSX_D [V2DF V2DI])
> 
> ;; Iterator for the 2 64-bit vector types + 128-bit types that are loaded with
> ;; lxvd2x to properly handle swapping words on little endian
> (define_mode_iterator VSX_LE [V2DF V2DI V1TI])
> 
> so that removing V1TI from VSX_LE makes it exactly equal to VSX_D.

Yep, VSX_LE would now be redunant.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [RFC] VEC_SELECT sanity checking in genrecog

2017-03-03 Thread Segher Boessenkool
On Fri, Mar 03, 2017 at 03:43:03PM -0500, Michael Meissner wrote:
> On Fri, Mar 03, 2017 at 02:19:57PM -0600, Segher Boessenkool wrote:
> > On Fri, Mar 03, 2017 at 05:28:27PM +0100, Jakub Jelinek wrote:
> > Yeah, it looks like these patterns should use VSX_D instead of VSX_LE.
> > Mike, you know this code best, what do you think?
> 
> Bill Schmidt added these, but he is gone for the day.
> 
> No, these patterns need to use VSX_LE, but V1TI should be moved from VSX_LE to
> VSX_LE_128.  VSX_LE_128 is used for things like IEEE 128-bit floating point 
> and
> 128-bit integers that can't use vec_select.  In fact there is this comment:
> 
> ;; Little endian word swapping for 128-bit types that are either scalars or 
> the
> ;; special V1TI container class, which it is not appropriate to use vec_select
> ;; for the type.

Okay, but there are also these comments:

;; Iterator for the 2 64-bit vector types
(define_mode_iterator VSX_D [V2DF V2DI])

;; Iterator for the 2 64-bit vector types + 128-bit types that are loaded with
;; lxvd2x to properly handle swapping words on little endian
(define_mode_iterator VSX_LE [V2DF V2DI V1TI])

so that removing V1TI from VSX_LE makes it exactly equal to VSX_D.


Segher


Re: [RFC] VEC_SELECT sanity checking in genrecog

2017-03-03 Thread Michael Meissner
On Fri, Mar 03, 2017 at 02:19:57PM -0600, Segher Boessenkool wrote:
> Hi,
> 
> On Fri, Mar 03, 2017 at 05:28:27PM +0100, Jakub Jelinek wrote:
> > E.g. in vsx.md the thing is that
> > the pattern uses an iterator with 2 V2?? modes in it and then V1TI mode,
> > and uses exactly two elements in paralle, which doesn't make any sense
> > for V1TI.
> > 
> > ../../gcc/config/rs6000/vsx.md:2063:1: vec_select parallel with 2 elements, 
> > expected 1
> > ../../gcc/config/rs6000/vsx.md:2112:1: vec_select parallel with 2 elements, 
> > expected 1
> > ../../gcc/config/rs6000/vsx.md:2161:1: vec_select parallel with 2 elements, 
> > expected 1
> 
> Yeah, it looks like these patterns should use VSX_D instead of VSX_LE.
> Mike, you know this code best, what do you think?

Bill Schmidt added these, but he is gone for the day.

No, these patterns need to use VSX_LE, but V1TI should be moved from VSX_LE to
VSX_LE_128.  VSX_LE_128 is used for things like IEEE 128-bit floating point and
128-bit integers that can't use vec_select.  In fact there is this comment:

;; Little endian word swapping for 128-bit types that are either scalars or the
;; special V1TI container class, which it is not appropriate to use vec_select
;; for the type.
(define_insn "*vsx_le_permute_"
  [(set (match_operand:VSX_LE_128 0 "nonimmediate_operand" "=,,Z")
(rotate:VSX_LE_128
 (match_operand:VSX_LE_128 1 "input_operand" ",Z,")
 (const_int 64)))]
  "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
  "@
   xxpermdi %x0,%x1,%x1,2
   lxvd2x %x0,%y1
   stxvd2x %x1,%y0"
  [(set_attr "length" "4")
   (set_attr "type" "vecperm,vecload,vecstore")])

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [RFC] VEC_SELECT sanity checking in genrecog

2017-03-03 Thread Segher Boessenkool
Hi,

On Fri, Mar 03, 2017 at 05:28:27PM +0100, Jakub Jelinek wrote:
> E.g. in vsx.md the thing is that
> the pattern uses an iterator with 2 V2?? modes in it and then V1TI mode,
> and uses exactly two elements in paralle, which doesn't make any sense
> for V1TI.
> 
> ../../gcc/config/rs6000/vsx.md:2063:1: vec_select parallel with 2 elements, 
> expected 1
> ../../gcc/config/rs6000/vsx.md:2112:1: vec_select parallel with 2 elements, 
> expected 1
> ../../gcc/config/rs6000/vsx.md:2161:1: vec_select parallel with 2 elements, 
> expected 1

Yeah, it looks like these patterns should use VSX_D instead of VSX_LE.
Mike, you know this code best, what do you think?


Segher


[committed] Small C OpenMP diagnostics fix (PR c/79837)

2017-03-03 Thread Jakub Jelinek
Hi!

In both cases starting with the UDR additions we are looking for any
identifiers, both min and max are identifiers, so no need to mention them
explicitly.  They are just kind like predefined UDRs.

Committed to trunk.

2017-03-03  Jakub Jelinek  

PR c/79837
* c-parser.c (c_parser_omp_clause_reduction): Don't mention
% or % in the diagnostics, instead mention identifier.
(c_parser_omp_declare_reduction): Don't mention % in the
diagnostics.

--- gcc/c/c-parser.c.jj 2017-03-03 20:53:36.0 +0100
+++ gcc/c/c-parser.c2017-03-03 21:13:35.058836286 +0100
@@ -12112,7 +12112,7 @@ c_parser_omp_clause_reduction (c_parser
default:
  c_parser_error (parser,
  "expected %<+%>, %<*%>, %<-%>, %<&%>, "
- "%<^%>, %<|%>, %<&&%>, %<||%>, % or %");
+ "%<^%>, %<|%>, %<&&%>, %<||%> or identifier");
  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
  return list;
}
@@ -16960,7 +16960,7 @@ c_parser_omp_declare_reduction (c_parser
 default:
   c_parser_error (parser,
  "expected %<+%>, %<*%>, %<-%>, %<&%>, "
- "%<^%>, %<|%>, %<&&%>, %<||%>, % or identifier");
+ "%<^%>, %<|%>, %<&&%>, %<||%> or identifier");
   goto fail;
 }
 

Jakub


[committed] Fix 3 C FE diagnostic messages (PR c/79836)

2017-03-03 Thread Jakub Jelinek
Hi!

Missed % before > in 3 spots (strangely, all of them in the C FE).

The first hunk committed as obvious, the second two falls under OpenMP
maintainance, committed to trunk.

2017-03-03  Jakub Jelinek  

PR c/79836
* c-parser.c (c_parser_generic_selection): Use %<_Generic%>
instead of %<_Generic>.
(c_parser_omp_ordered): Use % instead of %.
(c_parser_omp_target_exit_data): Use % instead of
%.

--- gcc/c/c-parser.c.jj 2017-02-28 16:24:05.0 +0100
+++ gcc/c/c-parser.c2017-03-03 20:53:36.394531548 +0100
@@ -7452,7 +7452,7 @@ c_parser_generic_selection (c_parser *pa
  else
{
  error_at (assoc.type_location,
-   "%<_Generic> selector matches multiple associations");
+   "%<_Generic%> selector matches multiple associations");
  inform (matched_assoc.type_location,
  "other match is here");
}
@@ -15454,7 +15454,7 @@ c_parser_omp_ordered (c_parser *parser,
  if (context == pragma_stmt)
{
  error_at (loc,
-   "%<#pragma omp ordered%> with % clause may "
+   "%<#pragma omp ordered%> with % clause may "
"only be used in compound statements");
  c_parser_skip_to_pragma_eol (parser, false);
  return false;
@@ -16297,7 +16297,7 @@ c_parser_omp_target_exit_data (location_
map_seen |= 1;
error_at (OMP_CLAUSE_LOCATION (*pc),
  "%<#pragma omp target exit data%> with map-type other "
- "than %, % or % on %"
+ "than %, % or % on %"
  " clause");
*pc = OMP_CLAUSE_CHAIN (*pc);
continue;

Jakub


[PATCH] Add " && !(MEM_P (operands[0]) && MEM_P (operands[1]))" to condition of some sse.md instructions

2017-03-03 Thread Jakub Jelinek
On Fri, Mar 03, 2017 at 12:18:09PM +0100, Uros Bizjak wrote:
> Yes. Although expander takes care not to generate two memory
> references, combine can propagate memory to the other operand,
> creating semi-invalid RTX that is later resolved by RA.

Here is a patch which does that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-03-03  Jakub Jelinek  

* config/i386/sse.md (sse_storehps, sse_storelps,
avx__,
avx512f__,
avx512f__256): Require
in condition that at least one operand is not a MEM.

--- gcc/config/i386/sse.md.jj   2017-03-03 17:08:53.0 +0100
+++ gcc/config/i386/sse.md  2017-03-03 17:39:05.907871328 +0100
@@ -6637,7 +6637,7 @@ (define_insn "sse_storehps"
(vec_select:V2SF
  (match_operand:V4SF 1 "nonimmediate_operand" "v,v,o")
  (parallel [(const_int 2) (const_int 3)])))]
-  "TARGET_SSE"
+  "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "@
%vmovhps\t{%1, %0|%q0, %1}
%vmovhlps\t{%1, %d0|%d0, %1}
@@ -6690,7 +6690,7 @@ (define_insn "sse_storelps"
(vec_select:V2SF
  (match_operand:V4SF 1 "nonimmediate_operand" " v,v,m")
  (parallel [(const_int 0) (const_int 1)])))]
-  "TARGET_SSE"
+  "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "@
%vmovlps\t{%1, %0|%q0, %1}
%vmovaps\t{%1, %0|%0, %1}
@@ -18661,7 +18661,7 @@ (define_insn_and_split "avx_ 1 "nonimmediate_operand" "xm,x")]
  UNSPEC_CAST))]
-  "TARGET_AVX"
+  "TARGET_AVX && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
@@ -19575,7 +19575,7 @@ (define_insn_and_split "avx512f_ 1 "nonimmediate_operand" "xm,x")]
  UNSPEC_CAST))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
@@ -19592,7 +19592,7 @@ (define_insn_and_split "avx512f_ 1 "nonimmediate_operand" "xm,x")]
  UNSPEC_CAST))]
-  "TARGET_AVX512F"
+  "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]


Jakub


[PATCH] Fix avx512 vpermq (PR target/79812)

2017-03-03 Thread Jakub Jelinek
Hi!

vpermq/vpermpd instructions for 512-bit vectors use bogus RTL and if
we happen to simplify-rtx.c it, we ICE.
The problem is that for V8D[IF]mode VEC_SELECT we need to use a PARALLEL
with 8 elements, not 4.
The _vec_dup_1 change is unrelated to this, spotted
first by manual inspection and verified by the genrecog.c verify_pattern
patch; the broadcast wants to broadcast the first element, so it should be
a scalar vec_select in vec_duplicate, rather than same size vector as
vec_select's operand.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-03-03  Jakub Jelinek  

PR target/79812
* config/i386/sse.md (VI8F_256_512): Remove mode iterator.
(_perm): Rename to ...
(avx2_perm): ... this.  Use VI8F_256 iterator instead
of VI8F_256_512.
(_perm_mask): Rename to ...
(avx512vl_perm_mask): ... this.  Use VI8F_256 iterator instead
of VI8F_256_512.
(_perm_1): Rename to ...
(avx2_perm_1> 0) & 3),
- GEN_INT ((mask >> 2) & 3),
- GEN_INT ((mask >> 4) & 3),
- GEN_INT ((mask >> 6) & 3)));
+  emit_insn (gen_avx2_perm_1 (operands[0], operands[1],
+   GEN_INT ((mask >> 0) & 3),
+   GEN_INT ((mask >> 2) & 3),
+   GEN_INT ((mask >> 4) & 3),
+   GEN_INT ((mask >> 6) & 3)));
   DONE;
 })
 
-(define_expand "_perm_mask"
-  [(match_operand:VI8F_256_512 0 "register_operand")
-   (match_operand:VI8F_256_512 1 "nonimmediate_operand")
+(define_expand "avx512vl_perm_mask"
+  [(match_operand:VI8F_256 0 "register_operand")
+   (match_operand:VI8F_256 1 "nonimmediate_operand")
(match_operand:SI 2 "const_0_to_255_operand")
-   (match_operand:VI8F_256_512 3 "vector_move_operand")
+   (match_operand:VI8F_256 3 "vector_move_operand")
(match_operand: 4 "register_operand")]
-  "TARGET_AVX512F"
+  "TARGET_AVX512VL"
 {
   int mask = INTVAL (operands[2]);
   emit_insn (gen__perm_1_mask (operands[0], operands[1],
-  GEN_INT ((mask >> 0) & 3),
-  GEN_INT ((mask >> 2) & 3),
-  GEN_INT ((mask >> 4) & 3),
-  GEN_INT ((mask >> 6) & 3),
-  operands[3], operands[4]));
+ GEN_INT ((mask >> 0) & 3),
+ GEN_INT ((mask >> 2) & 3),
+ GEN_INT ((mask >> 4) & 3),
+ GEN_INT ((mask >> 6) & 3),
+ operands[3], operands[4]));
   DONE;
 })
 
-(define_insn "_perm_1"
-  [(set (match_operand:VI8F_256_512 0 "register_operand" "=v")
-   (vec_select:VI8F_256_512
- (match_operand:VI8F_256_512 1 "nonimmediate_operand" "vm")
+(define_insn "avx2_perm_1"
+  [(set (match_operand:VI8F_256 0 "register_operand" "=v")
+   (vec_select:VI8F_256
+ (match_operand:VI8F_256 1 "nonimmediate_operand" "vm")
  (parallel [(match_operand 2 "const_0_to_3_operand")
 (match_operand 3 "const_0_to_3_operand")
 (match_operand 4 "const_0_to_3_operand")
@@ -17361,6 +17359,77 @@ 

[C++ PATCH] Fix ICE on empty statement expression (PR c++/79822)

2017-03-03 Thread Jakub Jelinek
Hi!

Seems callers of cxx_eval_constant_expression don't really like if it
returns NULL, which is what cxx_eval_statement_list returns for empty
STATEMENT_LIST.  In statement expressions we treat those the same
as (void) 0 or any other statement with void value.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-03-03  Jakub Jelinek  

PR c++/79822
* constexpr.c (cxx_eval_statement_list): Treat empty ({ }) like
({ (void) 0; }).

* g++.dg/cpp0x/constexpr-79822.C: New test.

--- gcc/cp/constexpr.c.jj   2017-02-22 23:03:12.0 +0100
+++ gcc/cp/constexpr.c  2017-03-03 12:42:02.391773591 +0100
@@ -3725,8 +3725,9 @@ cxx_eval_statement_list (const constexpr
 {
   tree_stmt_iterator i;
   tree local_target;
-  /* In a statement-expression we want to return the last value.  */
-  tree r = NULL_TREE;
+  /* In a statement-expression we want to return the last value.
+ For empty statement expression return void_node.  */
+  tree r = void_node;
   if (!jump_target)
 {
   local_target = NULL_TREE;
--- gcc/testsuite/g++.dg/cpp0x/constexpr-79822.C.jj 2017-03-03 
12:49:41.595614494 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-79822.C2017-03-03 
12:50:51.753673495 +0100
@@ -0,0 +1,12 @@
+// PR c++/79822
+// { dg-do compile }
+// { dg-options "" }
+
+bool
+foo ()
+{
+  bool a = ({ }) && false; // { dg-error "could not convert" }
+  bool b = ({ ; }) && false;   // { dg-error "could not convert" }
+  bool c = ({ (void) 1; }) && false;   // { dg-error "could not convert" }
+  return a && b && c;
+}

Jakub


Re: C PATCH to fix c/79758 (ICE-on-invalid with function redefinition and old style decls)

2017-03-03 Thread Bernd Schmidt

On 03/03/2017 02:33 PM, Marek Polacek wrote:


2017-03-03  Marek Polacek  

PR c/79758
* c-decl.c (store_parm_decls_oldstyle): Check if the element of
current_function_prototype_arg_types is error_mark_node.  Fix
formatting.  Use TREE_VALUE instead of TREE_TYPE.

* gcc.dg/noncompile/pr79758.c: New test.


Ok.


Bernd



[maintainer-scripts] gcc_release: replace MD5 hashing

2017-03-03 Thread Gerald Pfeifer
Looking how our snapshots are secured in terms of checksums, I
noticed we're still using MD5 and SHA-1 hashes.  

Which is unfortunate, given that MD5 has been considered weak for
what, a decade?, and SHA-1 has been considered weak for years as
well and now been demonstrated broken for real.

So I went ahead and added SHA-256 as new primary hash, demoting
SHA-1 to secondary, and removing MD5.

I also updated the copy of this script on gcc.gnu.org and will
have a closer look at coming snapshots.

Gerald

2017-03-03  Gerald Pfeifer  

* gcc_release (snapshot_print): Replace md5 checksum by sha256.

Index: gcc_release
===
--- gcc_release (revision 245874)
+++ gcc_release (revision 245875)
@@ -356,7 +356,7 @@
 # Print description if snapshot exists.
 snapshot_print() {
   if [ -e ${RELEASE}/$1 ]; then
-hash=`openssl  md5  ${RELEASE}/$1 | sed -e 's#(.*)##' -e 's# *= *#=#'`
+hash=`openssl  sha256  ${RELEASE}/$1 | sed -e 's#(.*)##' -e 's# *= *#=#'`
 hash2=`openssl sha1 ${RELEASE}/$1 | sed -e 's#(.*)##' -e 's# *= *#=#'`
 
 printf " %-37s%s\n\n  %s\n  %s\n\n" "$1" "$2" "$hash" "$hash2" \


[PATCH], PR target/79038, Improve char/short -> _Float128 conversions

2017-03-03 Thread Michael Meissner
On January 31st I commited a patch that improves conversion of signed/unsigned
char/short values to 32-bit and 64-bit floating point on Power9, particularly
when the values are coming from memory.  This adds similar support to
_Float128/__float128 (i.e. IEEE 128-bit floating point).  I have tested this
patch via bootstrap and make check and there were no regressions.  Can I check
this patch into the trunk?

[gcc]
2017-03-03  Michael Meissner  

PR target/79038
* config/rs6000/rs6000.md (float2): Define
insns to convert from signed/unsigned char/short to IEEE 128-bit
floating point.
(floatuns2): Likewise.

[gcc/testsuite]
2017-02-24  Michael Meissner  

PR target/79038
* gcc.target/powerpc/pr79038-1.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 245815)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -14467,6 +14467,43 @@ (define_insn_and_split "float_si2_
 operands[2] = gen_reg_rtx (DImode);
 })
 
+(define_insn_and_split "float2"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v,v,v")
+   (float:IEEE128 (match_operand:QHI 1 "nonimmediate_operand" "v,r,Z")))
+   (clobber (match_scratch:DI 2 "=X,r,X"))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx dest_di = gen_rtx_REG (DImode, REGNO (dest));
+
+  if (altivec_register_operand (src, mode))
+emit_insn (gen_extenddi2 (dest_di, src));
+  else if (int_reg_operand (src, mode))
+{
+  rtx ext_di = operands[2];
+  emit_insn (gen_extenddi2 (ext_di, src));
+  emit_move_insn (dest_di, ext_di);
+}
+  else if (MEM_P (src))
+{
+  rtx dest_qhi = gen_rtx_REG (mode, REGNO (dest));
+  emit_move_insn (dest_qhi, src);
+  emit_insn (gen_extenddi2 (dest_di, dest_qhi));
+}
+  else
+gcc_unreachable ();
+
+  emit_insn (gen_float_di2_hw (dest, dest_di));
+  DONE;
+}
+  [(set_attr "length" "8,12,12")
+   (set_attr "type" "vecfloat")
+   (set_attr "size" "128")])
+
 (define_insn "floatuns_di2_hw"
   [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
(unsigned_float:IEEE128
@@ -14493,6 +14530,38 @@ (define_insn_and_split "floatuns_s
 operands[2] = gen_reg_rtx (DImode);
 })
 
+(define_insn_and_split "floatuns2"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v,v,v")
+   (unsigned_float:IEEE128
+(match_operand:QHI 1 "nonimmediate_operand" "v,r,Z")))
+   (clobber (match_scratch:DI 2 "=X,r,X"))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx dest_di = gen_rtx_REG (DImode, REGNO (dest));
+
+  if (altivec_register_operand (src, mode) || MEM_P (src))
+emit_insn (gen_zero_extenddi2 (dest_di, src));
+  else if (int_reg_operand (src, mode))
+{
+  rtx ext_di = operands[2];
+  emit_insn (gen_zero_extenddi2 (ext_di, src));
+  emit_move_insn (dest_di, ext_di);
+}
+  else
+gcc_unreachable ();
+
+  emit_insn (gen_floatuns_di2_hw (dest, dest_di));
+  DONE;
+}
+  [(set_attr "length" "8,12,8")
+   (set_attr "type" "vecfloat")
+   (set_attr "size" "128")])
+
 ;; IEEE 128-bit instructions with round to odd semantics
 (define_insn "*truncdf2_odd"
   [(set (match_operand:DF 0 "vsx_register_operand" "=v")
Index: gcc/testsuite/gcc.target/powerpc/pr79038-1.c
===
--- gcc/testsuite/gcc.target/powerpc/pr79038-1.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr79038-1.c(revision 0)
@@ -0,0 +1,39 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2 -mfloat128" } */
+
+#ifndef TYPE
+#define TYPE _Float128
+#endif
+
+TYPE
+return_convert_uchar_to_float128_mem (unsigned char *p_uc)
+{
+  return (TYPE) p_uc[10];
+}
+
+TYPE
+return_convert_schar_to_float128_mem (signed char *p_sc)
+{
+  return (TYPE) p_sc[10];
+}
+
+TYPE
+return_convert_ushort_to_float128_mem (unsigned short *p_us)
+{
+  return (TYPE) p_us[10];
+}
+
+TYPE
+return_convert_sshort_to_float128_mem (short *p_ss)
+{
+  return (TYPE) p_ss[10];
+}
+
+/* { dg-final { scan-assembler {\mlxsi[bh]zx\M}  } } */
+/* { dg-final { scan-assembler {\mvexts[bh]2d\M} } } */
+/* { dg-final { scan-assembler-not {\mextsb\M}   } } */
+/* { dg-final { scan-assembler-not {\ml[bh][az]\M}   } } */
+/* { dg-final { scan-assembler-not 

Re: [PATCH] [MSP430] PR78838: Do not add section name prefixes when section name is .lowtext

2017-03-03 Thread Jozef Lawrynowicz
ping


The MSP430 target supports the automatic placement of functions and
data in different memory regions when passing the
"-mdata-region=either" and "-mcode-region=either" options.
MSP430x devices support the large memory model, "-mlarge", which
enables 20 bit pointers, however the vector table across all MSP430
targets only accepts 16-bit pointers. To prevent the use of 20-bit
pointers in the vector table when the large memory model is used, the
MSP430 backend currently places functions specified with the interrupt
attribute in a section called ".lowtext". This section is placed at
the beginning of .text in the MSP430 linker scripts.

PR target/78838 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78838)
reports that a function with the interrupt attribute is placed in a
non-existent section ".either.lowtext" when "-mlarge",
"-mcode-region=either" and "-ffunction-sections" are passed. The
backend has correctly decided to place the function in .lowtext, but
has applied the .either prefix which is undesirable.

No additional .lower/.upper/.either prefixes should be applied to the
section name once it has been placed in .lowtext, the patch below
implements this.

I've built and tested successfully with no regressions reported:
  Target is msp430-unknown-elf
  Host   is x86_64-unknown-linux-gnu

Testsuite variations:
msp430-sim/-mcpu=msp430x/-mlarge/-mdata-region=either/-mcode-region=either

The section .lowtext will only be utilised if the large memory model
is used, which is why I have only tested with this testsuite
variation.
I haven't run the g++ testsuite because msp430-elf doesn't build with
C++ support at the moment:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79242

I don't have write access to the GCC SVN repository, so if this patch
is satisfactory, I would appreciate if someone could commit it for me.
Thanks.

---

2017-02-XX  Jozef Lawrynowicz  

gcc/
PR target/78838
* config/msp430/msp430.c (gen_prefix): Return NULL when section name is .lowtext
(msp430_section_is_lowtext): New function.

gcc/testsuite
PR target/78838
* gcc.target/msp430/interrupt_fn_placement.c: New test

From: Jozef Lawrynowicz 
Date: Wed, 15 Feb 2017 13:03:40 +
Subject: [PATCH] [MSP430] PR78838: Do not add section name prefixes
 when the section name is .lowtext

---
 gcc/config/msp430/msp430.c   | 14 ++
 gcc/testsuite/gcc.target/msp430/interrupt_fn_placement.c |  9 +
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/msp430/interrupt_fn_placement.c

diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index 6f63116..c9dffb1 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -1808,6 +1808,15 @@ is_critical_func (tree decl = current_function_decl)
   return has_attr (ATTR_CRIT, decl);
 }

+static bool
+msp430_section_is_lowtext (tree decl = current_function_decl)
+{
+  if (decl == NULL_TREE)
+return false;
+  const char * dec_name = DECL_SECTION_NAME (decl);
+  return dec_name && strcmp (".lowtext", dec_name) == 0;
+}
+
 #undef  TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
 #define TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
msp430_allocate_stack_slots_for_args

@@ -2146,6 +2155,11 @@ gen_prefix (tree decl)
   if (has_attr ("section", decl))
 return NULL;

+  /* If the function has been put in the .lowtext section because it
is an interrupt
+   * handler, and the large memory model is used, then do not add any
prefixes. */
+  if (msp430_section_is_lowtext (decl))
+return NULL;
+
   /* If the object has __attribute__((lower)) then use the ".lower."
prefix.  */
   if (has_attr (ATTR_LOWER, decl))
 return lower_prefix;
diff --git a/gcc/testsuite/gcc.target/msp430/interrupt_fn_placement.c
b/gcc/testsuite/gcc.target/msp430/interrupt_fn_placement.c
new file mode 100644
index 000..d5d2113
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/interrupt_fn_placement.c
@@ -0,0 +1,9 @@
+/* { dg-do link } */
+/* { dg-options "-mlarge -mcode-region=either -ffunction-sections" } */
+
+void __attribute__((interrupt(2))) ir_1(void) {
+}
+
+int main(void) {
+  while(1);
+}
--
1.8.3.1


Re: [PATCH] [ARC] Handle complex PIC move patterns.

2017-03-03 Thread Andrew Burgess
* Claudiu Zissulescu  [2017-02-28 16:59:58 
+0100]:

> Hi,
> 
> fwprop step is placing in the REG_EQUIV notes constant pic unspecs
> expressions.  Then, loop may use these notes for optimizations
> rezulting in complex patterns that are not supported by the current
> implementation.
> 
> The patch adds handling of complex PIC addresses having MINUS or UNARY
> operations.
> 
> Ok to apply?
> Claudiu

There's a couple of spelling mistakes I've marked inline, but
otherwise, this seems sensible.

Thanks,
Andrew



> 
> gcc/
> 2017-02-28  Claudiu Zissulescu  
> 
>   * config/arc/arc.c (arc_legitimize_pic_address): Handle PIC
>   expressions with MINUS and UNARY ops.
> 
> gcc/testsuite
> 2017-02-28  Claudiu Zissulescu  
> 
>   * gcc.target/arc/pr9001090948.c: New file.
> ---
>  gcc/config/arc/arc.c| 49 
> -
>  gcc/testsuite/gcc.target/arc/pr9001090948.c | 25 +++
>  2 files changed, 73 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/pr9001090948.c
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 508a9a6..3b94d7c 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -5024,8 +5024,55 @@ arc_legitimize_pic_address (rtx orig, rtx oldx)
> /* Check that the unspec is one of the ones we generate?  */
> return orig;
>   }
> +   /* fwprop is placing in the REG_EQUIV notes constant pic
> +  unspecs expressions.  Then, loop may use these notes for
> +  optimizations rezulting in complex patterns that are not

s/rezulting/resulting/


> +  supported by the current implementation. The following
> +  two if-cases are simplifying the complex patters in

s/patters in/patterns to/

> +  simpler ones.  */
> +   else if (GET_CODE (addr) == MINUS)
> + {
> +   rtx op0 = XEXP (addr, 0);
> +   rtx op1 = XEXP (addr, 1);
> +   gcc_assert (oldx);
> +   gcc_assert (GET_CODE (op1) == UNSPEC);
> +
> +   emit_move_insn (oldx,
> +   gen_rtx_CONST (SImode,
> +  arc_legitimize_pic_address (op1,
> +  
> NULL_RTX)));
> +   emit_insn (gen_rtx_SET (oldx, gen_rtx_MINUS (SImode, op0, oldx)));
> +   return oldx;
> +
> + }
> +   else if (GET_CODE (addr) != PLUS)
> + {
> +   rtx tmp = XEXP (addr, 0);
> +   enum rtx_code code = GET_CODE (addr);
> +
> +   /* It only works for UNARY operations.  */
> +   gcc_assert (UNARY_P (addr));
> +   gcc_assert (GET_CODE (tmp) == UNSPEC);
> +   gcc_assert (oldx);
> +
> +   emit_move_insn
> + (oldx,
> +  gen_rtx_CONST (SImode,
> + arc_legitimize_pic_address (tmp,
> + NULL_RTX)));
> +
> +   emit_insn (gen_rtx_SET (oldx,
> +   gen_rtx_fmt_ee (code, SImode,
> +   oldx, const0_rtx)));
> +
> +   return oldx;
> + }
> else
> - gcc_assert (GET_CODE (addr) == PLUS);
> + {
> +   gcc_assert (GET_CODE (addr) == PLUS);
> +   if (GET_CODE (XEXP (addr, 0)) == UNSPEC)
> + return orig;
> + }
>   }
>  
>if (GET_CODE (addr) == PLUS)
> diff --git a/gcc/testsuite/gcc.target/arc/pr9001090948.c 
> b/gcc/testsuite/gcc.target/arc/pr9001090948.c
> new file mode 100644
> index 000..103f4ae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/pr9001090948.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "ARC600 doesn't support pic" { arc6xx } } */
> +/* { dg-options "-Os -fPIC" } */
> +#include 
> +#include 
> +
> +char *
> +strip_trail (const char str[], size_t n)
> +{
> +  static char buf[1025];
> +  int j;
> +
> +  strncpy (buf, str, n);
> +  buf[n] = '\0';
> +
> +  for (j = strlen (buf) - 1; j >= 0; j--)
> +{
> +  if (buf[j] != ' ')
> +break;
> +
> +  buf[j] = '\0';
> +}
> +
> +  return buf;
> +}
> -- 
> 1.9.1
> 


Re: Fix (work around) LRA infinite loop, PR78911

2017-03-03 Thread Vladimir Makarov



On 03/03/2017 08:36 AM, Bernd Schmidt wrote:


Reload is designed in a way to avoid cycles and to process all reloads 
for an insn in order of ascending class so as to avoid this kind of 
issue. With LRA I'm really not sure how to fix this properly, but the 
following patch seems to cure the PR at least, by recognizing when 
we're about to spill a reload reg whose assigned class contains only 
contains a single register.


I had conversations with several people who worked on different RAs and 
got a conclusion that looping is a pretty general problem in iterative 
classical RAs.  Unfortunately, it is not discussed in a literature and 
ad-hoc approaches are used (that is why it is probably not discussed in 
scientific literature).


On first stages of LRA development, looping was a very frequent bug and 
also add-hoc approaches were used to fix it.  The major way is to 
predict that given decision can result in repeating patterns and to 
avoid such decision (e.g. lra_constraints.c::process_alt_operands 
contains code).


Reasons for looping can have different nature and now I have no 
systematic solution on my mind to avoid all of them.

Bootstrapped and tested on x86_64-linux, ok?


Yes.  Thank you for working on this, Bernd.  The issue mentioned in 
FIXME part of the comment can be addressed later if we have cases for 
which your fix will be not enough.




C++ PATCH to fix ICE with NSDMI and this pointer (PR c++/79796)

2017-03-03 Thread Marek Polacek
We weren't replacing  PLACEHOLDER_EXPR in the following testcase, leading to a
crash in the gimplifier.  The fix seems to be to use replace_placeholders, the
question is where.  These three functions have it:
cp_gimplify_init_expr
store_init_value
build_new_1
But we call neither so I tried adding the call to build_over_call, right
after we create the MODIFY_EXPR with the NSDMI, and that seemed to work
out.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-03-03  Marek Polacek  

PR c++/79796 - ICE with NSDMI and this pointer
* call.c (build_over_call): Handle NSDMI with a 'this' by calling
replace_placeholders.

* g++.dg/cpp0x/nsdmi13.C: New test.

diff --git gcc/cp/call.c gcc/cp/call.c
index dc629b96..b821224 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -8047,6 +8047,9 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
{
  arg = cp_build_indirect_ref (arg, RO_NULL, complain);
  val = build2 (MODIFY_EXPR, TREE_TYPE (to), to, arg);
+ if (cxx_dialect >= cxx14)
+   /* Handle NSDMI that refer to the object being initialized.  */
+   replace_placeholders (arg, to);
}
   else
{
diff --git gcc/testsuite/g++.dg/cpp0x/nsdmi13.C 
gcc/testsuite/g++.dg/cpp0x/nsdmi13.C
index e69de29..2751da3 100644
--- gcc/testsuite/g++.dg/cpp0x/nsdmi13.C
+++ gcc/testsuite/g++.dg/cpp0x/nsdmi13.C
@@ -0,0 +1,13 @@
+// PR c++/79796
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  A* p = this;
+};
+
+void foo()
+{
+  A a;
+  a = A({});
+}

Marek


[RFC] VEC_SELECT sanity checking in genrecog

2017-03-03 Thread Jakub Jelinek
Hi!

When working on PR79812 which was caused by bugs in x86 define_insn
(used vec_select with parallel as last operand with incorrect number of
operands), I wrote following sanity checking.

The thing is that e.g. simplify-rtx.c already has such assertions:
  if (!VECTOR_MODE_P (mode))
{
  gcc_assert (VECTOR_MODE_P (GET_MODE (trueop0)));
  gcc_assert (mode == GET_MODE_INNER (GET_MODE (trueop0)));
  gcc_assert (GET_CODE (trueop1) == PARALLEL);
  gcc_assert (XVECLEN (trueop1, 0) == 1);
  gcc_assert (CONST_INT_P (XVECEXP (trueop1, 0, 0)));
...
  else
{
  gcc_assert (VECTOR_MODE_P (GET_MODE (trueop0)));
  gcc_assert (GET_MODE_INNER (mode)
  == GET_MODE_INNER (GET_MODE (trueop0)));
  gcc_assert (GET_CODE (trueop1) == PARALLEL);

  if (GET_CODE (trueop0) == CONST_VECTOR)
{
  int elt_size = GET_MODE_UNIT_SIZE (mode);
  unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
  rtvec v = rtvec_alloc (n_elts);
  unsigned int i;

  gcc_assert (XVECLEN (trueop1, 0) == (int) n_elts);
...
and documentation says:
@findex vec_select
@item (vec_select:@var{m} @var{vec1} @var{selection})
This describes an operation that selects parts of a vector.  @var{vec1} is
the source vector, and @var{selection} is a @code{parallel} that contains a
@code{const_int} for each of the subparts of the result vector, giving the
number of the source subpart that should be stored into it.
The result mode @var{m} is either the submode for a single element of
@var{vec1} (if only one subpart is selected), or another vector mode
with that element submode (if multiple subparts are selected).

Unfortunately, even after fixing 2 x86 bugs with that, I'm seeing tons
of issues on other targets (many others are ok though) below.  So, is the
verification the right thing and are all these md bugs that should be fixed
(and then the verification added)?  If not, I'd be afraid if people are
unlucky and combiner constant propagates something or some other pass,
we can ICE on those in simplify-rtx.c.  E.g. in vsx.md the thing is that
the pattern uses an iterator with 2 V2?? modes in it and then V1TI mode,
and uses exactly two elements in paralle, which doesn't make any sense
for V1TI.

../../gcc/config/rs6000/vsx.md:2063:1: vec_select parallel with 2 elements, 
expected 1
../../gcc/config/rs6000/vsx.md:2112:1: vec_select parallel with 2 elements, 
expected 1
../../gcc/config/rs6000/vsx.md:2161:1: vec_select parallel with 2 elements, 
expected 1

../../gcc/config/aarch64/aarch64-simd.md:79:1: DImode of first vec_select 
operand is not a vector mode
../../gcc/config/aarch64/aarch64-simd.md:79:1: DFmode of first vec_select 
operand is not a vector mode
../../gcc/config/aarch64/aarch64-simd.md:588:1: DImode of first vec_select 
operand is not a vector mode
../../gcc/config/aarch64/aarch64-simd.md:588:1: DFmode of first vec_select 
operand is not a vector mode
../../gcc/config/aarch64/aarch64-simd.md:3192:1: DFmode of first vec_select 
operand is not a vector mode

../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select 
HImode and its operand QImode
../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select 
SImode and its operand QImode
../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select 
QImode and its operand HImode
../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select 
SImode and its operand HImode
../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select 
QImode and its operand SImode
../../gcc/config/arm/neon.md:1338:1: element mode mismatch between vec_select 
HImode and its operand SImode
../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select 
HImode and its operand QImode
../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select 
SImode and its operand QImode
../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select 
QImode and its operand HImode
../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select 
SImode and its operand HImode
../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select 
QImode and its operand SImode
../../gcc/config/arm/neon.md:1353:1: element mode mismatch between vec_select 
HImode and its operand SImode
../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select 
HImode and its operand QImode
../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select 
SImode and its operand QImode
../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select 
QImode and its operand HImode
../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select 
SImode and its operand HImode
../../gcc/config/arm/neon.md:1407:1: element mode mismatch between vec_select 
QImode and 

[PATCH][AArch64] Fix bootstrap due to wide_int .elt (0) uninit warning

2017-03-03 Thread Kyrill Tkachov

Hi all,

This patch fixes the aarch64 bootstrap failure I've encountered.
Richi suggested on IRC that we should be using .ulow () on the wide_int insetad 
of
accessing elt (0) as that doesn't play well with the uninit analysis.

Bootstrapped and tested on aarch64-none-linux-gnu.

Committing to trunk in the interest of fixing the build

2017-03-03  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_float_const_representable_p):
Use wide_int::ulow () instead of .elt (0).
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ab1bdc0..714bb79 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12496,7 +12496,7 @@ aarch64_float_const_representable_p (rtx x)
 
   /* If the low part of the mantissa has bits set we cannot represent
  the value.  */
-  if (w.elt (0) != 0)
+  if (w.ulow () != 0)
 return false;
   /* We have rejected the lower HOST_WIDE_INT, so update our
  understanding of how many bits lie in the mantissa and


[PATCH, i386]: Fix a couple of split insn constraints

2017-03-03 Thread Uros Bizjak
Hello!

While looking in this area, I've noticed that there are constraints
that allow wide immediate operands that result in unsupported
immediate->memory moves on 64bit targets. Since these immediates have
to be limited to 32bits, we have to use C constraint that allows only
all-bits-zero 0.0 FP value.

2017-03-03  Uros Bizjak  

* config/i386/i386.md (*pushtf): Change *roF constraint to *roC.
(*pushxf): Limit oF constraint to 32bit targets and add oC
constraint for 64bit targets.
(pushxf splitter): Use PUSH_ROUNDING to calculate stack adjustment.
(*pushdf): Change rmF constraint to rmC.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32} and committed.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 245874)
+++ config/i386/i386.md (working copy)
@@ -2992,7 +2992,7 @@
 
 (define_insn "*pushtf"
   [(set (match_operand:TF 0 "push_operand" "=<,<")
-   (match_operand:TF 1 "general_no_elim_operand" "v,*roF"))]
+   (match_operand:TF 1 "general_no_elim_operand" "v,*roC"))]
   "TARGET_64BIT || TARGET_SSE"
 {
   /* This insn should be already split before reg-stack.  */
@@ -3046,17 +3046,18 @@
(symbol_ref "true")))])
 
 (define_insn "*pushxf"
-  [(set (match_operand:XF 0 "push_operand" "=<,<,<,<")
-   (match_operand:XF 1 "general_no_elim_operand" "f,r,*r,oF"))]
+  [(set (match_operand:XF 0 "push_operand" "=<,<,<,<,<")
+   (match_operand:XF 1 "general_no_elim_operand" "f,r,*r,oF,oC"))]
   ""
 {
   /* This insn should be already split before reg-stack.  */
   gcc_unreachable ();
 }
-  [(set_attr "type" "multi")
-   (set_attr "unit" "i387,*,*,*")
+  [(set_attr "isa" "*,*,*,nox64,x64")
+   (set_attr "type" "multi")
+   (set_attr "unit" "i387,*,*,*,*")
(set (attr "mode")
-   (cond [(eq_attr "alternative" "1,2,3")
+   (cond [(eq_attr "alternative" "1,2,3,4")
 (if_then_else (match_test "TARGET_64BIT")
   (const_string "DI")
   (const_string "SI"))
@@ -3075,7 +3076,7 @@
   [(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (match_dup 2)))
(set (match_dup 0) (match_dup 1))]
 {
-  operands[2] = GEN_INT (-GET_MODE_SIZE (XFmode));
+  operands[2] = GEN_INT (-PUSH_ROUNDING (GET_MODE_SIZE (XFmode)));
   /* Preserve memory attributes. */
   operands[0] = replace_equiv_address (operands[0], stack_pointer_rtx);
 })
@@ -3082,7 +3083,7 @@
 
 (define_insn "*pushdf"
   [(set (match_operand:DF 0 "push_operand" "=<,<,<,<,<,<")
-   (match_operand:DF 1 "general_no_elim_operand" "f,r,*r,oF,rmF,x"))]
+   (match_operand:DF 1 "general_no_elim_operand" "f,r,*r,oF,rmC,x"))]
   ""
 {
   /* This insn should be already split before reg-stack.  */


Re: [PATCH] PR libstdc++/79789 fix non-reserved names in headers

2017-03-03 Thread David Edelsohn
This patch caused a new regression on AIX.

- David

FAIL: 17_intro/names.cc (test for excess errors)
Excess errors:
/tmp/GCC/gcc/include-fixed/sys/types.h:600: error: expected
unqualified-id before '[' token
/tmp/GCC/gcc/include-fixed/sys/types.h:600: error: expected ')' before '[' token
/tmp/GCC/gcc/include-fixed/sys/types.h:668: error: expected
unqualified-id before '[' token
/tmp/GCC/gcc/include-fixed/sys/types.h:668: error: expected ')' before '[' token
/tmp/GCC/gcc/include-fixed/stdlib.h:332: error: 'parameter' declared
as function returning a function
/tmp/GCC/gcc/include-fixed/stdlib.h:332: error: expected
primary-expression before 'void'
/tmp/GCC/gcc/include-fixed/math.h:1458: error: expected unqualified-id
before ',' token
/tmp/GCC/gcc/include-fixed/math.h:1458: error: expected ')' before ',' token

AIX types.h:600 is

typedef struct { int r[1]; } *  physadr_t;

stdlib.h:332 is

extern int  at_quick_exit(void (*f)(void));

math.h:1458 is

 struct dbl_hypot {
 double x, y;
 };


Re: [PR79793] Fixes interrupt stack alignment

2017-03-03 Thread Uros Bizjak
On Fri, Mar 3, 2017 at 4:20 PM, Koval, Julia  wrote:
> Hi,
> This patch fixes PR79731. Ok for trunk?
>
> gcc/
> * config/i386/i386.c (ix86_minimum_incoming_stack_boundary): Fix 
> boundary for interrupt

...: Set incoming stack boundary to 128 for 64-bit targets.

> gcc/testsuite/
> * gcc.target/i386/interrupt-12.c: Fix test.
> * gcc.target/i386/interrupt-13.c: Ditto.
> * gcc.target/i386/interrupt-14.c: Ditto.
> * gcc.target/i386/interrupt-15.c: Ditto.

...: Update scan-assembler-times directives.

Please also add "PR target/79731" to the top of ChangeLog entry.

OK with the above ChangeLog changes.

Thanks,
Uros.


[PR79793] Fixes interrupt stack alignment

2017-03-03 Thread Koval, Julia
Hi,
This patch fixes PR79731. Ok for trunk?

gcc/
* config/i386/i386.c (ix86_minimum_incoming_stack_boundary): Fix 
boundary for interrupt
gcc/testsuite/
* gcc.target/i386/interrupt-12.c: Fix test.
* gcc.target/i386/interrupt-13.c: Ditto. 
* gcc.target/i386/interrupt-14.c: Ditto. 
* gcc.target/i386/interrupt-15.c: Ditto.

Julia


patch_interrupt_3.3
Description: patch_interrupt_3.3


[PATCH] libbacktrace: Fix compile warning building testcase

2017-03-03 Thread Sam Thursfield

Hello,
I hit this compile warning running `make check` on AIX. Since -Werror is 
enabled by default here it causes a build failure.


Tested on powerpc-ibm-aix7.2.0.0

This my first patch submission for GCC, so please let me know if I need 
to do anything differently.

Thanks
Sam
>From a0d73b85afd593aafd50e454d9a86acd9f0bc489 Mon Sep 17 00:00:00 2001
From: Sam Thursfield 
Date: Fri, 3 Mar 2017 08:38:42 -0600
Subject: [PATCH] libbacktrace: Fix compile warning building testcase

This warning occurred when the BACKTRACE_SUPPORTED flag is undefined.

../../../gcc/libbacktrace/btest.c:624:1: error: 'test5' defined but not used [-Werror=unused-function]
 test5 (void)
 ^
cc1: all warnings being treated as errors

libbacktrace/:

2017-03-03  Sam Thursfield  

   * btest.c (test5): Fix compile warning when BACKTRACE_SUPPORTED isn't
   defined.
---
 libbacktrace/btest.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libbacktrace/btest.c b/libbacktrace/btest.c
index e28a3d8..bed547a 100644
--- a/libbacktrace/btest.c
+++ b/libbacktrace/btest.c
@@ -616,6 +616,7 @@ f33 (int f1line, int f2line)
   return failures;
 }
 
+#if BACKTRACE_SUPPORTED
 #if BACKTRACE_SUPPORTS_DATA
 
 int global = 1;
@@ -687,6 +688,7 @@ test5 (void)
 }
 
 #endif /* BACKTRACE_SUPPORTS_DATA  */
+#endif /* BACKTRACE_SUPPORTED */
 
 static void
 error_callback_create (void *data ATTRIBUTE_UNUSED, const char *msg,
-- 
2.2.2



Fix IRA issue, PR79728

2017-03-03 Thread Bernd Schmidt
This is an ICE where setup_pressure_classes fails if xmm0 is a global 
reg. Instead of GENERAL/FLOAT/SSE/MMX_REGS, it computes only 
SSE_FIRST_REG as the third register class. The problem is that the costs 
for moving between SSE_FIRST_REG and SSE_REGS are inflated because we 
think we have no available registers in SSE_FIRST_REG (since the only 
register in that class is global), and that somewhat confuses the algorithm.


The following fixes it by tweaking contains_regs_of_mode. Out of 
caution, I've retained the old meaning for code in reload which uses this.


Bootstrapped and tested on x86_64-linux. Ok?


Bernd
	PR rtl-optimization/79728
	* regs.h (struct target_regs): New field
	x_contains_allocatable_regs_of_mode.
	(contains_allocatable_regs_of_mode): New macro.
	* reginfo.c (init_reg_sets_1): Initialize it, and change
	contains_reg_of_mode so it includes global regs as well.
	* reload.c (push_reload): Use contains_allocatable_regs_of_mode
	rather thanc ontains_regs_of_mode.

	PR rtl-optimization/79728
	* gcc.target/i386/sse-globalreg.c: New test.

Index: gcc/reginfo.c
===
--- gcc/reginfo.c	(revision 245685)
+++ gcc/reginfo.c	(working copy)
@@ -468,19 +468,29 @@ init_reg_sets_1 (void)
   memset (contains_reg_of_mode, 0, sizeof (contains_reg_of_mode));
   for (m = 0; m < (unsigned int) MAX_MACHINE_MODE; m++)
 {
-  HARD_REG_SET ok_regs;
+  HARD_REG_SET ok_regs, ok_regs2;
   CLEAR_HARD_REG_SET (ok_regs);
+  CLEAR_HARD_REG_SET (ok_regs2);
   for (j = 0; j < FIRST_PSEUDO_REGISTER; j++)
-	if (!fixed_regs [j] && HARD_REGNO_MODE_OK (j, (machine_mode) m))
-	  SET_HARD_REG_BIT (ok_regs, j);
+	if (!TEST_HARD_REG_BIT (fixed_nonglobal_reg_set, j)
+	&& HARD_REGNO_MODE_OK (j, (machine_mode) m))
+	  {
+	SET_HARD_REG_BIT (ok_regs, j);
+	if (!fixed_regs[j])
+	  SET_HARD_REG_BIT (ok_regs2, j);
+	  }
 
   for (i = 0; i < N_REG_CLASSES; i++)
 	if ((targetm.class_max_nregs ((reg_class_t) i, (machine_mode) m)
 	 <= reg_class_size[i])
 	&& hard_reg_set_intersect_p (ok_regs, reg_class_contents[i]))
 	  {
-	 contains_reg_of_mode [i][m] = 1;
-	 have_regs_of_mode [m] = 1;
+	 contains_reg_of_mode[i][m] = 1;
+	 if (hard_reg_set_intersect_p (ok_regs2, reg_class_contents[i]))
+	   {
+		 have_regs_of_mode[m] = 1;
+		 contains_allocatable_reg_of_mode[i][m] = 1;
+	   }
 	  }
  }
 }
Index: gcc/regs.h
===
--- gcc/regs.h	(revision 245685)
+++ gcc/regs.h	(working copy)
@@ -218,6 +218,10 @@ struct target_regs {
   /* 1 if the corresponding class contains a register of the given mode.  */
   char x_contains_reg_of_mode[N_REG_CLASSES][MAX_MACHINE_MODE];
 
+  /* 1 if the corresponding class contains a register of the given mode
+ which is not global and can therefore be allocated.  */
+  char x_contains_allocatable_reg_of_mode[N_REG_CLASSES][MAX_MACHINE_MODE];
+
   /* Record for each mode whether we can move a register directly to or
  from an object of that mode in memory.  If we can't, we won't try
  to use that mode directly when accessing a field of that mode.  */
@@ -243,6 +247,8 @@ extern struct target_regs *this_target_r
   (this_target_regs->x_have_regs_of_mode)
 #define contains_reg_of_mode \
   (this_target_regs->x_contains_reg_of_mode)
+#define contains_allocatable_reg_of_mode \
+  (this_target_regs->x_contains_allocatable_reg_of_mode)
 #define direct_load \
   (this_target_regs->x_direct_load)
 #define direct_store \
Index: gcc/reload.c
===
--- gcc/reload.c	(revision 245685)
+++ gcc/reload.c	(working copy)
@@ -1055,7 +1055,7 @@ push_reload (rtx in, rtx out, rtx *inloc
 #ifdef CANNOT_CHANGE_MODE_CLASS
   && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (in)), inmode, rclass)
 #endif
-  && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (in))]
+  && contains_allocatable_reg_of_mode[rclass][GET_MODE (SUBREG_REG (in))]
   && (CONSTANT_P (SUBREG_REG (in))
 	  || GET_CODE (SUBREG_REG (in)) == PLUS
 	  || strict_low
@@ -1164,7 +1164,7 @@ push_reload (rtx in, rtx out, rtx *inloc
 #ifdef CANNOT_CHANGE_MODE_CLASS
   && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (out)), outmode, rclass)
 #endif
-  && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (out))]
+  && contains_allocatable_reg_of_mode[rclass][GET_MODE (SUBREG_REG (out))]
   && (CONSTANT_P (SUBREG_REG (out))
 	  || strict_low
 	  || (((REG_P (SUBREG_REG (out))
Index: gcc/testsuite/gcc.target/i386/sse-globalreg.c
===
--- gcc/testsuite/gcc.target/i386/sse-globalreg.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/sse-globalreg.c	(working copy)
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -w" } */
+/* { dg-require-effective-target sse2 } */
+
+register 

Fix (work around) LRA infinite loop, PR78911

2017-03-03 Thread Bernd Schmidt
In this PR, we have an endless cycle in LRA, generating ever more 
instructions. The relevant portions of the dump seem to be these:


** Local #9: **
  Creating newreg=130 from oldreg=128, assigning class CREG to r130
   18: 
{r117:DI=unspec/v[[r129:SI*0x8+r123:SI],r117:DI,r122:SI,r130:SI,0x8005] 
62;[r129:SI*0x8+r123:SI]=unspec/v[0] 62;flags:CCZ=unspec/v[0] 62;}


** Assignment #7: **
 Assigning to 130 (cl=CREG, orig=114, freq=1148, tfirst=130, 
tfreq=1148)...
  2nd iter for reload pseudo assignments:
 Reload r130 assignment failure
  Spill  r87(hr=5, freq=4296)
  Spill  r123(hr=4, freq=1148)
  Spill reload  r129(hr=2, freq=1148)

** Local #11: **

   Spilling non-eliminable hard regs: 6
  Creating newreg=131, assigning class GENERAL_REGS to address r131
   Change to class INDEX_REGS for r131

** Assignment #8: **

	 Assigning to 131 (cl=INDEX_REGS, orig=131, freq=1148, tfirst=131, 
tfreq=1148)...
	 Trying 0: spill 117(freq=1722)	 Now best 0(cost=574, bad_spills=0, 
insn_pseudos=1)


 Trying 1: spill 117(freq=1722)
	 Trying 2: spill 130(freq=1148)	 Now best 2(cost=0, bad_spills=0, 
insn_pseudos=1)


 Trying 3: spill 122(freq=1148)
 Trying 4: spill 129(freq=1148)

** Local #12: **
  Creating newreg=133 from oldreg=130, assigning class CREG to r133

What seems to happen is that in the "Assignemnt #8" phase, we're 
considering one new reload reg only, and we end up spilling r130, which 
only allows a register from a single class. Then we reload r130 again 
and make r133, and the cycle starts.


Reload is designed in a way to avoid cycles and to process all reloads 
for an insn in order of ascending class so as to avoid this kind of 
issue. With LRA I'm really not sure how to fix this properly, but the 
following patch seems to cure the PR at least, by recognizing when we're 
about to spill a reload reg whose assigned class contains only contains 
a single register.


Bootstrapped and tested on x86_64-linux, ok?


Bernd
	PR rtl-optimization/78911
	* lra-assigns.c (must_not_spill_p): New function.
	(spill_for): Use it.

	PR rtl-optimization/78911
	* gcc.target/i386/pr78911-1.c: New test.
	* gcc.target/i386/pr78911-2.c: New test.

Index: gcc/lra-assigns.c
===
--- gcc/lra-assigns.c	(revision 245685)
+++ gcc/lra-assigns.c	(working copy)
@@ -889,6 +889,30 @@ assign_temporarily (int regno, int hard_
   live_pseudos_reg_renumber[regno] = hard_regno;
 }
 
+/* Return true iff there is a reason why pseudo SPILL_REGNO should not
+   be spilled.  */
+static bool
+must_not_spill_p (unsigned spill_regno)
+{
+  if ((pic_offset_table_rtx != NULL
+   && spill_regno == REGNO (pic_offset_table_rtx))
+  || ((int) spill_regno >= lra_constraint_new_regno_start
+	  && ! bitmap_bit_p (_inheritance_pseudos, spill_regno)
+	  && ! bitmap_bit_p (_split_regs, spill_regno)
+	  && ! bitmap_bit_p (_subreg_reload_pseudos, spill_regno)
+	  && ! bitmap_bit_p (_optional_reload_pseudos, spill_regno)))
+return true;
+  /* A reload pseudo that requires a singleton register class should
+ not be spilled.
+ FIXME: this mitigates the issue on certain i386 patterns, but
+ does not solve the general case where existing reloads fully
+ cover a limited register class.  */
+  if (!bitmap_bit_p (_reload_pseudos, spill_regno)
+  && reg_class_size [reg_preferred_class (spill_regno)] == 1)
+return true;
+  return false;
+}
+
 /* Array used for sorting reload pseudos for subsequent allocation
after spilling some pseudo.	*/
 static int *sorted_reload_pseudos;
@@ -960,13 +984,7 @@ spill_for (int regno, bitmap spilled_pse
   /* Spill pseudos.	 */
   static_p = false;
   EXECUTE_IF_SET_IN_BITMAP (_pseudos_bitmap, 0, spill_regno, bi)
-	if ((pic_offset_table_rtx != NULL
-	 && spill_regno == REGNO (pic_offset_table_rtx))
-	|| ((int) spill_regno >= lra_constraint_new_regno_start
-		&& ! bitmap_bit_p (_inheritance_pseudos, spill_regno)
-		&& ! bitmap_bit_p (_split_regs, spill_regno)
-		&& ! bitmap_bit_p (_subreg_reload_pseudos, spill_regno)
-		&& ! bitmap_bit_p (_optional_reload_pseudos, spill_regno)))
+	if (must_not_spill_p (spill_regno))
 	  goto fail;
 	else if (non_spilled_static_chain_regno_p (spill_regno))
 	  static_p = true;
Index: gcc/testsuite/gcc.target/i386/pr78911-1.c
===
--- gcc/testsuite/gcc.target/i386/pr78911-1.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr78911-1.c	(working copy)
@@ -0,0 +1,22 @@
+/* PR rtl-optimization/78911 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-strict-aliasing -fno-omit-frame-pointer" } */
+/* { dg-additional-options "-fPIC" { target fpic } } */
+/* { dg-additional-options "-march=pentium-m" { target ia32 } } */
+
+int a, b, d, e;
+long long *c;
+
+static 

Re: C PATCH to fix c/79758 (ICE-on-invalid with function redefinition and old style decls)

2017-03-03 Thread Marek Polacek
On Thu, Mar 02, 2017 at 08:47:57PM +0100, Bernd Schmidt wrote:
> On 03/02/2017 06:35 PM, Marek Polacek wrote:
> > While at it, I fixed wrong formatting in the nearby code.  Also use 
> > NULL_TREE
> > instead of 0 where appropriate.  I really dislike those zeros-as-trees; one
> > day I'll just go and turn them into NULL_TREEs.
> 
> I sympathize, but it makes it harder to see which parts of the patch are
> actual changes. Generally it's best to separate functional and formatting
> changes.
 
True, but I though I'd do it anyway while I'm changing the very same if;
formatting fixes along with actual fixes are not that rare.

> > @@ -8996,7 +8999,7 @@ store_parm_decls_oldstyle (tree fndecl, const struct 
> > c_arg_info *arg_info)
> >  declared for the arg.  ISO C says we take the unqualified
> >  type for parameters declared with qualified type.  */
> >   if (TREE_TYPE (parm) != error_mark_node
> > - && TREE_TYPE (type) != error_mark_node
> > + && TREE_TYPE (TREE_VALUE (type)) != error_mark_node
> >   && ((TYPE_ATOMIC (DECL_ARG_TYPE (parm))
> >!= TYPE_ATOMIC (TREE_VALUE (type)))
> >   || !comptypes (TYPE_MAIN_VARIANT (DECL_ARG_TYPE (parm)),
> 
> Isn't this most likely intended to be TREE_VALUE (type) != error_mark_node?

Yea, that's what I had in mind but goofed.

> > @@ -9017,7 +9020,7 @@ store_parm_decls_oldstyle (tree fndecl, const struct 
> > c_arg_info *arg_info)
> >   if (targetm.calls.promote_prototypes (TREE_TYPE 
> > (current_function_decl))
> >   && INTEGRAL_TYPE_P (TREE_TYPE (parm))
> >   && TYPE_PRECISION (TREE_TYPE (parm))
> > - < TYPE_PRECISION (integer_type_node))
> > +< TYPE_PRECISION (integer_type_node))
> 
> Should add the necessary parens while fixing formatting.

Dunno if they're necessary here, but I added them.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-03-03  Marek Polacek  

PR c/79758
* c-decl.c (store_parm_decls_oldstyle): Check if the element of
current_function_prototype_arg_types is error_mark_node.  Fix
formatting.  Use TREE_VALUE instead of TREE_TYPE.

* gcc.dg/noncompile/pr79758.c: New test.

diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index 32edacc..f46ca11 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -8965,12 +8965,15 @@ store_parm_decls_oldstyle (tree fndecl, const struct 
c_arg_info *arg_info)
   tree type;
   for (parm = DECL_ARGUMENTS (fndecl),
 type = current_function_prototype_arg_types;
-  parm || (type && TREE_VALUE (type) != error_mark_node
-   && (TYPE_MAIN_VARIANT (TREE_VALUE (type)) != 
void_type_node));
+  parm || (type != NULL_TREE
+   && TREE_VALUE (type) != error_mark_node
+   && TYPE_MAIN_VARIANT (TREE_VALUE (type)) != void_type_node);
   parm = DECL_CHAIN (parm), type = TREE_CHAIN (type))
{
- if (parm == 0 || type == 0
- || TYPE_MAIN_VARIANT (TREE_VALUE (type)) == void_type_node)
+ if (parm == NULL_TREE
+ || type == NULL_TREE
+ || (TREE_VALUE (type) != error_mark_node
+ && TYPE_MAIN_VARIANT (TREE_VALUE (type)) == void_type_node))
{
  if (current_function_prototype_built_in)
warning_at (DECL_SOURCE_LOCATION (fndecl),
@@ -8996,7 +8999,7 @@ store_parm_decls_oldstyle (tree fndecl, const struct 
c_arg_info *arg_info)
 declared for the arg.  ISO C says we take the unqualified
 type for parameters declared with qualified type.  */
  if (TREE_TYPE (parm) != error_mark_node
- && TREE_TYPE (type) != error_mark_node
+ && TREE_VALUE (type) != error_mark_node
  && ((TYPE_ATOMIC (DECL_ARG_TYPE (parm))
   != TYPE_ATOMIC (TREE_VALUE (type)))
  || !comptypes (TYPE_MAIN_VARIANT (DECL_ARG_TYPE (parm)),
@@ -9016,8 +9019,8 @@ store_parm_decls_oldstyle (tree fndecl, const struct 
c_arg_info *arg_info)
 
  if (targetm.calls.promote_prototypes (TREE_TYPE 
(current_function_decl))
  && INTEGRAL_TYPE_P (TREE_TYPE (parm))
- && TYPE_PRECISION (TREE_TYPE (parm))
- < TYPE_PRECISION (integer_type_node))
+ && (TYPE_PRECISION (TREE_TYPE (parm))
+ < TYPE_PRECISION (integer_type_node)))
DECL_ARG_TYPE (parm)
  = c_type_promotes_to (TREE_TYPE (parm));
 
diff --git gcc/testsuite/gcc.dg/noncompile/pr79758.c 
gcc/testsuite/gcc.dg/noncompile/pr79758.c
index e69de29..aeaf7c7 100644
--- gcc/testsuite/gcc.dg/noncompile/pr79758.c
+++ gcc/testsuite/gcc.dg/noncompile/pr79758.c
@@ -0,0 +1,6 @@
+/* PR c/79758 */
+/* { dg-do compile } */
+
+void fn1 (int[a]) { }; /* { dg-error "undeclared here|parameter name omitted" 
} */

RE: [PATCH,testsuite] MIPS: Fix register mode checking for n64 in pr68273.c.

2017-03-03 Thread Toma Tabacu
> From: Matthew Fortune
> >
> > gcc/testsuite/
> >
> > * gcc.target/mips/pr68273.c (dg-final): Match SImode registers only for
> > ilp32 targets and match DImode registers for lp64 targets.
> 
> OK, thanks.
> 
> Matthew

Committed as r245874.

Thanks,
Toma


Re: [OBVIOUS] [PATCH] Remove unused variable.

2017-03-03 Thread Martin Liška
The missing patch.
>From d539ba1a94ab85445f54c1742ff8e924e915f06a Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 3 Mar 2017 13:50:42 +0100
Subject: [PATCH] Remove unused variable.

gcc/ChangeLog:

2017-03-03  Martin Liska  

	* tree-ssa-loop-prefetch.c (pass_loop_prefetch::execute):
	Remove unused variable.
---
 gcc/tree-ssa-loop-prefetch.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index a1372074e9f..339eeafcfee 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -2035,7 +2035,6 @@ pass_loop_prefetch::execute (function *fun)
   if (number_of_loops (fun) <= 1)
 return 0;
 
-  bool warned_p = false;
   if ((PREFETCH_BLOCK & (PREFETCH_BLOCK - 1)) != 0)
 {
   static bool warned = false;
-- 
2.11.1



Re: [PATCH] Fix IFN_ATOMIC* handling with -fnon-call-exceptions (PR middle-end/79805)

2017-03-03 Thread Richard Biener
On Fri, 3 Mar 2017, Jakub Jelinek wrote:

> On Fri, Mar 03, 2017 at 09:21:55AM +0100, Richard Biener wrote:
> > > --- gcc/gimple-fold.c.jj  2017-02-07 16:40:45.0 +0100
> > > +++ gcc/gimple-fold.c 2017-03-02 16:04:51.304850077 +0100
> > > @@ -3533,6 +3533,8 @@ fold_builtin_atomic_compare_exchange (gi
> > >tree itype = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (parmt)));
> > >tree ctype = build_complex_type (itype);
> > >tree expected = TREE_OPERAND (gimple_call_arg (stmt, 1), 0);
> > > +  bool may_throw = false;
> > > +  edge e = NULL;
> > >gimple *g = gimple_build_assign (make_ssa_name (TREE_TYPE (expected)),
> > >  expected);
> > >gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > > @@ -3558,19 +3560,43 @@ fold_builtin_atomic_compare_exchange (gi
> > >gimple_set_vdef (g, gimple_vdef (stmt));
> > >gimple_set_vuse (g, gimple_vuse (stmt));
> > >SSA_NAME_DEF_STMT (gimple_vdef (g)) = g;
> > > -  if (gimple_call_lhs (stmt))
> > > +  tree oldlhs = gimple_call_lhs (stmt);
> > > +  if (flag_non_call_exceptions && stmt_ends_bb_p (stmt))
> > 
> > I think a more appropriate check is stmt_can_throw_internal (stmt)
> 
> Ok.  I suppose without "flag_non_call_exceptions && ".

Yeah.

> > >  {
> > > -  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > > +  may_throw = true;
> > 
> > and 'throws' rather than 'may_throw'.
> 
> Ack.
> 
> > > +  gimple_call_set_lhs (stmt, NULL_TREE);
> > > +  gsi_replace (gsi, g, true);
> > > +  e = find_fallthru_edge (gsi_bb (*gsi)->succs);
> > > +}
> > > +  gimple_call_set_nothrow (as_a  (g), flag_non_call_exceptions 
> > > == 0);
> > 
> > it should copy nothrow state from the source(s) of the folding if you
> > consider flag_non_call_exceptions == true in this fn but working on
> > inlined code from a flag_non_call_exceptions == false function.  At least
> > for calls in the source(s) that works, for other stmts I think we do
> > not have an explicit nothrow on the stmt.
> 
> So just
>   gimple_call_set_nothrow (as_a  (g), !throws);
> then?

Hmm, no.  That will result in stmt_can_throw_external to return false.
Simply

gimple_call_set_nothrow (g, gimple_call_nothrow (orig-call))


> > > +  if (oldlhs)
> > > +{
> > > +  if (!may_throw)
> > > + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > >g = gimple_build_assign (make_ssa_name (itype), IMAGPART_EXPR,
> > >  build1 (IMAGPART_EXPR, itype, lhs));
> > > -  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > > -  g = gimple_build_assign (gimple_call_lhs (stmt), NOP_EXPR,
> > > -gimple_assign_lhs (g));
> > > +  if (may_throw)
> > > + {
> > > +   gsi_insert_on_edge_immediate (e, g);
> > > +   *gsi = gsi_for_stmt (g);
> > > + }
> > > +  else
> > > + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > > +  g = gimple_build_assign (oldlhs, NOP_EXPR, gimple_assign_lhs (g));
> > > +  if (may_throw)
> > > + gsi_insert_after (gsi, g, GSI_NEW_STMT);
> > >  }
> > > -  gsi_replace (gsi, g, true);
> > > +  if (!may_throw)
> > > +gsi_replace (gsi, g, true);
> > >g = gimple_build_assign (make_ssa_name (itype), REALPART_EXPR,
> > >  build1 (REALPART_EXPR, itype, lhs));
> > > -  gsi_insert_after (gsi, g, GSI_NEW_STMT);
> > > +  if (may_throw && oldlhs == NULL_TREE)
> > > +{
> > > +  gsi_insert_on_edge_immediate (e, g);
> > > +  *gsi = gsi_for_stmt (g);
> > > +}
> > > +  else
> > > +gsi_insert_after (gsi, g, GSI_NEW_STMT);
> > 
> > rather than these dances with if (may_throw) isn't it easier to
> > compute an insert gsi after finding the fallthru edge, splitting
> > the edge if it is a critical one?  (basically doing a less
> > fancy gimple_find_edge_insert_loc)
> 
> I can't see how that could simplify the code, gimple_find_edge_insert_loc
> is lots of code, not exported and can return ins_after/!ins_after which
> afterwards needs to be honored.
> 
> I've simplified the code a little bit by doing gsi_replace early, then
> the difference is pretty much just:
> -  gsi_insert_after (gsi, g, GSI_NEW_STMT);
> +  if (throws)
> +   {
> + gsi_insert_on_edge_immediate (e, g);
> + *gsi = gsi_for_stmt (g);
> +   }
> +  else
> +   gsi_insert_after (gsi, g, GSI_NEW_STMT);
> that has to be done on the first insertion after the call (which
> unfortunately means two locations).  Another option I see is to
> gimple_seq_add_stmt_without_update all the stmts following the call
> into a seq and then conditionally insert that seq on the edge (immediate)
> or insert the seq after the call.  It would make sense only for
> gimple-fold.c, in tree-ssa-ccp.c there is just one stmt added this way,
> the debug stmt needs extra care.

As you prefer.

Looks good apart from the set_nothrow issue above.

Thanks,
Richard.

> 2017-03-03  Jakub Jelinek  
> 
>   PR middle-end/79805
>   * internal-fn.def 

Re: Fix ICE in use-after-scope w/ -fno-tree-dce (PR, sanitize/79783).

2017-03-03 Thread Jakub Jelinek
On Thu, Mar 02, 2017 at 06:49:32PM +0100, Martin Liška wrote:
> It can happen with inlining and -fno-tree-dce that VAR_DECL for a SSA
> NAME was removed and thus the poisoning should not have any usage.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin

> >From d8aa72dc1d696f5500c00b6c2f532f2a87cf58d2 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Thu, 2 Mar 2017 11:55:00 +0100
> Subject: [PATCH] Fix ICE in use-after-scope w/ -fno-tree-dce (PR
>  sanitize/79783).
> 
> gcc/ChangeLog:
> 
> 2017-03-02  Martin Liska  
> 
>   PR sanitize/79783
>   * asan.c (asan_expand_poison_ifn): Do not expand ASAN_POISON
>   when having a SSA NAME w/o VAR_DECL assigned to it.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-03-02  Martin Liska  
> 
>   PR sanitize/79783
>   * g++.dg/asan/pr79783.C: New test.
> ---
>  gcc/asan.c  |  5 -
>  gcc/testsuite/g++.dg/asan/pr79783.C | 19 +++
>  2 files changed, 23 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/asan/pr79783.C
> 
> diff --git a/gcc/asan.c b/gcc/asan.c
> index 6cdd59b7ea7..307423ced03 100644
> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -3107,7 +3107,10 @@ asan_expand_poison_ifn (gimple_stmt_iterator *iter,
>  {
>gimple *g = gsi_stmt (*iter);
>tree poisoned_var = gimple_call_lhs (g);
> -  if (!poisoned_var)
> +
> +  /* It can happen with inlining and -fno-tree-dce that VAR_DECL for a SSA
> + NAME was removed and thus the poisoning should not have any usage.  */
> +  if (!poisoned_var || SSA_NAME_VAR (poisoned_var) == NULL_TREE)

I wonder if it wouldn't be better to do:
  if (!poisoned_var || has_zero_uses (poisoned_var))

perhaps with -fno-tree-dce we could end up with SSA_NAME_VAR being
non-NULL, yet no uses; in that case there is nothing to warn about.
On the other side, in theory we could also end up with anonymous SSA_NAME
and some uses - in that case perhaps it would be better to warn.
So do:
  if (SSA_NAME_VAR (poisoned_var) == NULL_TREE)
SSA_NAME_VAR (poisoned_var) = create_tmp_var (TREE_TYPE (poisoned_var));
  tree shadow_var = create_asan_shadow_var (SSA_NAME_VAR (poisoned_var),
shadow_vars_mapping);
or so?  We'll need SSA_NAME_VAR non-NULL so that we can use a default def
later.

Jakub


[OBVIOUS] [PATCH] Remove unused variable.

2017-03-03 Thread Martin Liška
Hello.

This fixes copy error, it's trivial, thus I'm going to install it.

Thanks,
Martin


[PATCH, testsuite]: Disable some tests on x32

2017-03-03 Thread Uros Bizjak
x32 does not support -mmpx.

2017-03-03  Uros Bizjak  

* g++.dg/pr71624.C: Disable for x32.
* g++.dg/pr71633.C: Ditto.

Tested on x86_64-linux-gnu {,32}, committed.

Uros.
Index: g++.dg/pr71624.C
===
--- g++.dg/pr71624.C(revision 245860)
+++ g++.dg/pr71624.C(working copy)
@@ -1,5 +1,5 @@
 /* PR71624 */
-// { dg-do compile { target i?86-*-* x86_64-*-* } }
+/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && { ! x32 } } } } */
 /* { dg-options "-fcheck-pointer-bounds -mmpx -O2" } */
 
 class c1
Index: g++.dg/pr71633.C
===
--- g++.dg/pr71633.C(revision 245860)
+++ g++.dg/pr71633.C(working copy)
@@ -1,5 +1,5 @@
 /* PR71633 */
-// { dg-do compile { target i?86-*-* x86_64-*-* } }
+/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && { ! x32 } } } } */
 /* { dg-options "-fcheck-pointer-bounds -mmpx -O2" } */
 
 class c1


Re: [PATCH] Fix -O0 -ffloat-store -mavx ICE (PR target/79807)

2017-03-03 Thread Uros Bizjak
On Fri, Mar 3, 2017 at 12:30 PM, Jakub Jelinek  wrote:

>> Regarding the implementation, I'd rather see another flag for
>> i386-builtin.def entries, where we can mark if matching memory operand
>> is allowed. The expander can then look at the flag, and do some
>> special magic.
>
> I think maintaining such a flag would be a nightmare, I think more insns
> will need it in the future and it will be hard to remember.  Maybe it is 
> better
> not to do anything about those (i.e. apply my patch as is), after all -O0
> is not optimized code.

Yes, I agree.

Uros.


[PATCH] Properly handle __cxa_pure_virtual visibility (PR lto/79760).

2017-03-03 Thread Martin Liška
Hello.

I'm sending Honza's patch (which is pre-approved) that I've just tested on 
ppc64le-redhat-linux.
It also fixes the issue spotted in the PR.

Martin
>From 10050e47af7dbf217469da083666b16dbbbf606d Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 2 Mar 2017 18:46:25 +0100
Subject: [PATCH] Properly handle __cxa_pure_virtual visibility (PR lto/79760).

gcc/ChangeLog:

2017-03-02  Jan Hubicka  

	PR lto/79760
	* ipa-devirt.c (maybe_record_node): Properly handle
	__cxa_pure_virtual visibility.
---
 gcc/ipa-devirt.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index 4428d679722..0e5eb85b61d 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -2462,10 +2462,19 @@ maybe_record_node (vec  ,
 	  nodes.safe_push (target_node);
 	}
 }
-  else if (completep
-	   && (!type_in_anonymous_namespace_p
-		 (DECL_CONTEXT (target))
-	   || flag_ltrans))
+  else if (!completep)
+;
+  /* We have definition of __cxa_pure_virtual that is not accessible (it is
+ optimized out or partitioned to other unit) so we can not add it.  When
+ not sanitizing, there is nothing to do.
+ Otherwise declare the list incomplete.  */
+  else if (pure_virtual)
+{
+  if (flag_sanitize & SANITIZE_UNREACHABLE)
+	*completep = false;
+}
+  else if (flag_ltrans
+	   || !type_in_anonymous_namespace_p (DECL_CONTEXT (target)))
 *completep = false;
 }
 
-- 
2.11.1



Re: [C++ Patch] PR 70266

2017-03-03 Thread Jason Merrill
OK.

On Fri, Mar 3, 2017 at 12:26 AM, Paolo Carlini  wrote:
> Hi,
>
> one more case of OVERLOAD filtering to cxx_eval_constant_expression and
> causing an ICE.
>
> In order to have a proper diagnostic early enough I think that in
> build_must_not_throw_expr we want to use perform_implicit_conversion_flags
> on the condition before calling cxx_constant_value for it. That would be
> also analogous to the code we already have in build_noexcept_spec, for
> example. Tested x86_64-linux.
>
> Thanks, Paolo.
>
> //
>


[PATCH] Revert part of fixed-value fix

2017-03-03 Thread Richard Biener

The following reverts uhigh back to elt (1) which are not 100%
semantically equivalent (and the change is not needed to avoid
the warning-prone elt (0)).

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-03-03  Richard Biener  

* fixed-value.c (fixed_from_string): Restore use of elt (1)
in place of uhigh ().
(fixed_convert_from_real): Likewise.

Index: gcc/fixed-value.c
===
--- gcc/fixed-value.c   (revision 245865)
+++ gcc/fixed-value.c   (working copy)
@@ -131,7 +131,7 @@ fixed_from_string (FIXED_VALUE_TYPE *f,
   wide_int w = real_to_integer (_value, ,
GET_MODE_PRECISION (mode));
   f->data.low = w.ulow ();
-  f->data.high = w.uhigh ();
+  f->data.high = w.elt (1);
 
   if (temp == FIXED_MAX_EPS && ALL_FRACT_MODE_P (f->mode))
 {
@@ -1050,7 +1050,7 @@ fixed_convert_from_real (FIXED_VALUE_TYP
   wide_int w = real_to_integer (_value, ,
GET_MODE_PRECISION (mode));
   f->data.low = w.ulow ();
-  f->data.high = w.uhigh ();
+  f->data.high = w.elt (1);
   temp = check_real_for_fixed_mode (_value, mode);
   if (temp == FIXED_UNDERFLOW) /* Minimum.  */
 {


Re: [PATCH] Fix -O0 -ffloat-store -mavx ICE (PR target/79807)

2017-03-03 Thread Jakub Jelinek
On Fri, Mar 03, 2017 at 12:18:09PM +0100, Uros Bizjak wrote:
> > as the constraint require that both operands aren't memory, shall I create a
> > patch for that?  This is the first category below.
> 
> Yes. Although expander takes care not to generate two memory
> references, combine can propagate memory to the other operand,
> creating semi-invalid RTX that is later resolved by RA.

Ok, will work on a patch.

> > The second category is where matching operand is ok, so my patch can
> > pessimize stuff.  I wonder if we couldn't handle this like:
> >   /* If we aren't optimizing, only allow one memory operand to be
> >  generated.  */
> >   if (memory_operand (op, mode))
> > {
> >   const char *const constraint
> > = insn_data[icode].operand[i + adjust + 1].constraint;
> >   if (optimize
> >   || num_memory != 1
> >   || !rtx_equal_p (real_target, op))
> > num_memory++;
> >   /* sse2_movsd allows matching operand.  */
> >   else if (icode == CODE_FOR_sse2_movsd)
> > ;
> >   /* Various masked insns allow matching operand.  */
> >   else if (insn_data[icode].operand[i + adjust + 1].predicate
> >== vector_move_operand
> >&& (strcmp (constraint, "0C") == 0
> >|| strcmp (constraint, "0C,0") == 0))
> > ;
> >   else
> > num_memory++;
> > }
> > (though perhaps sse2_movsd is still too simplistic, because it has just
> > =m v 0 and =o 0 v alternatives, so if i + adjust + 1 is 2, then it is
> > fine as is, if it is 1, then only if the memory is offsettable; though
> > perhaps LRA can just turn non-offsettable memory into offsettable one
> > through secondary reload).
> 
> There is similarity with arithmetic operations. Please note that arith
> ops fixup their arguments using ix86_expand_binary_operator, and their
> insn patterns have to satisfy ix86_binary_operator_ok insn predicate.
> However, even using this infrastructure, I have noticed that the
> compiler is quite sloppy with memory operands, and we mostly get
> matched memory after combine pass (if we are lucky).
> 
> Regarding the implementation, I'd rather see another flag for
> i386-builtin.def entries, where we can mark if matching memory operand
> is allowed. The expander can then look at the flag, and do some
> special magic.

I think maintaining such a flag would be a nightmare, I think more insns
will need it in the future and it will be hard to remember.  Maybe it is better
not to do anything about those (i.e. apply my patch as is), after all -O0
is not optimized code.

Jakub


Re: [PATCH] Fix -O0 -ffloat-store -mavx ICE (PR target/79807)

2017-03-03 Thread Uros Bizjak
On Fri, Mar 3, 2017 at 11:35 AM, Jakub Jelinek  wrote:
> On Fri, Mar 03, 2017 at 09:10:03AM +0100, Uros Bizjak wrote:
>> > Or are there any define_insn/define_expand where it is desirable to have
>> > both input and output operand a MEM (and does it have to be matching)?
>> > For various scalar binary and unary expanders the backend already uses a 
>> > helper
>> > that will force something into memory if dest and src are both memory and
>> > not rtx_equal_p, but do we have anything like that in anything these two
>> > builtin expanders emit?
>>
>> Any insn with matched memory (movlps, movhps and similar) can also
>> operate with matched register. To my knowledge, all insn patterns are
>> written in this way, since in the past we had plenty of problems with
>> matched memory operands.
>>
>> Also, I really don't remember what benefit brings us to *not* force
>> input operand to a register at -O0 at expand time and leave to RA to
>> find matched memory. OTOH, with -O0 we already have so many
>> unnecessary moves that you get tears in the eyes...
>
> So, I've tried:
> make mddump
> for i in `sed -n -e 's/^.*CODE_FOR_\([^,]*\),.*$/\1/p' 
> ../../gcc/config/i386/i386-builtin.def`; do sed -n 
> '/^(define_\(insn\|expand\) ("'$i'"/,/^$/p' tmp-mddump.md; done
> and looked for "=.*[mo] in there.
> Some insns might want that && !(MEM_P (operands[0]) && MEM_P (operands[1])),
> e.g.
> (define_insn_and_split "avx512f__"
>   [(set (match_operand:AVX512MODE2P 0 "nonimmediate_operand" "=x,m")
> (unspec:AVX512MODE2P
>   [(match_operand: 1 "nonimmediate_operand" "xm,x")]
>   UNSPEC_CAST))]
>   "TARGET_AVX512F"
>   "#"
>   "&& reload_completed"
>   [(set (match_dup 0) (match_dup 1))]
> as the constraint require that both operands aren't memory, shall I create a
> patch for that?  This is the first category below.

Yes. Although expander takes care not to generate two memory
references, combine can propagate memory to the other operand,
creating semi-invalid RTX that is later resolved by RA.

> The second category is where matching operand is ok, so my patch can
> pessimize stuff.  I wonder if we couldn't handle this like:
>   /* If we aren't optimizing, only allow one memory operand to be
>  generated.  */
>   if (memory_operand (op, mode))
> {
>   const char *const constraint
> = insn_data[icode].operand[i + adjust + 1].constraint;
>   if (optimize
>   || num_memory != 1
>   || !rtx_equal_p (real_target, op))
> num_memory++;
>   /* sse2_movsd allows matching operand.  */
>   else if (icode == CODE_FOR_sse2_movsd)
> ;
>   /* Various masked insns allow matching operand.  */
>   else if (insn_data[icode].operand[i + adjust + 1].predicate
>== vector_move_operand
>&& (strcmp (constraint, "0C") == 0
>|| strcmp (constraint, "0C,0") == 0))
> ;
>   else
> num_memory++;
> }
> (though perhaps sse2_movsd is still too simplistic, because it has just
> =m v 0 and =o 0 v alternatives, so if i + adjust + 1 is 2, then it is
> fine as is, if it is 1, then only if the memory is offsettable; though
> perhaps LRA can just turn non-offsettable memory into offsettable one
> through secondary reload).

There is similarity with arithmetic operations. Please note that arith
ops fixup their arguments using ix86_expand_binary_operator, and their
insn patterns have to satisfy ix86_binary_operator_ok insn predicate.
However, even using this infrastructure, I have noticed that the
compiler is quite sloppy with memory operands, and we mostly get
matched memory after combine pass (if we are lucky).

Regarding the implementation, I'd rather see another flag for
i386-builtin.def entries, where we can mark if matching memory operand
is allowed. The expander can then look at the flag, and do some
special magic.

> And the last category is with "=m" destination (not allowing anything else)
> and (match_dup 0) somewhere in the pattern, I believe those have to be
> expanded specially, because otherwise one e.g. wouldn't get MEM_P target
> if optimize at all (I think the builtins are supposed to pass pointer to the
> result in that case) and the match_dup just doesn't appear as another operand.

It looks to me, that the expander already takes care for matching,
where strictly required. OTOH, matching memory requirement would also
fit in the proposed implementation with flag in builtin entries, we
can mark if matching memory operand is required, not only allowed.

However, as said above, expanding with memory operands is somehow
neglected area, desperately looking for some love.

Uros.

> missing !(MEM && MEM) ?
> ===
> avx512f_pd512_256pd
> avx512f_pd512_pd
> avx512f_ps512_256ps
> 

Re: [PATCH] Fix IFN_ATOMIC* handling with -fnon-call-exceptions (PR middle-end/79805)

2017-03-03 Thread Jakub Jelinek
On Fri, Mar 03, 2017 at 09:21:55AM +0100, Richard Biener wrote:
> > --- gcc/gimple-fold.c.jj2017-02-07 16:40:45.0 +0100
> > +++ gcc/gimple-fold.c   2017-03-02 16:04:51.304850077 +0100
> > @@ -3533,6 +3533,8 @@ fold_builtin_atomic_compare_exchange (gi
> >tree itype = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (parmt)));
> >tree ctype = build_complex_type (itype);
> >tree expected = TREE_OPERAND (gimple_call_arg (stmt, 1), 0);
> > +  bool may_throw = false;
> > +  edge e = NULL;
> >gimple *g = gimple_build_assign (make_ssa_name (TREE_TYPE (expected)),
> >expected);
> >gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > @@ -3558,19 +3560,43 @@ fold_builtin_atomic_compare_exchange (gi
> >gimple_set_vdef (g, gimple_vdef (stmt));
> >gimple_set_vuse (g, gimple_vuse (stmt));
> >SSA_NAME_DEF_STMT (gimple_vdef (g)) = g;
> > -  if (gimple_call_lhs (stmt))
> > +  tree oldlhs = gimple_call_lhs (stmt);
> > +  if (flag_non_call_exceptions && stmt_ends_bb_p (stmt))
> 
> I think a more appropriate check is stmt_can_throw_internal (stmt)

Ok.  I suppose without "flag_non_call_exceptions && ".

> >  {
> > -  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > +  may_throw = true;
> 
> and 'throws' rather than 'may_throw'.

Ack.

> > +  gimple_call_set_lhs (stmt, NULL_TREE);
> > +  gsi_replace (gsi, g, true);
> > +  e = find_fallthru_edge (gsi_bb (*gsi)->succs);
> > +}
> > +  gimple_call_set_nothrow (as_a  (g), flag_non_call_exceptions == 
> > 0);
> 
> it should copy nothrow state from the source(s) of the folding if you
> consider flag_non_call_exceptions == true in this fn but working on
> inlined code from a flag_non_call_exceptions == false function.  At least
> for calls in the source(s) that works, for other stmts I think we do
> not have an explicit nothrow on the stmt.

So just
  gimple_call_set_nothrow (as_a  (g), !throws);
then?

> > +  if (oldlhs)
> > +{
> > +  if (!may_throw)
> > +   gsi_insert_before (gsi, g, GSI_SAME_STMT);
> >g = gimple_build_assign (make_ssa_name (itype), IMAGPART_EXPR,
> >build1 (IMAGPART_EXPR, itype, lhs));
> > -  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > -  g = gimple_build_assign (gimple_call_lhs (stmt), NOP_EXPR,
> > -  gimple_assign_lhs (g));
> > +  if (may_throw)
> > +   {
> > + gsi_insert_on_edge_immediate (e, g);
> > + *gsi = gsi_for_stmt (g);
> > +   }
> > +  else
> > +   gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > +  g = gimple_build_assign (oldlhs, NOP_EXPR, gimple_assign_lhs (g));
> > +  if (may_throw)
> > +   gsi_insert_after (gsi, g, GSI_NEW_STMT);
> >  }
> > -  gsi_replace (gsi, g, true);
> > +  if (!may_throw)
> > +gsi_replace (gsi, g, true);
> >g = gimple_build_assign (make_ssa_name (itype), REALPART_EXPR,
> >build1 (REALPART_EXPR, itype, lhs));
> > -  gsi_insert_after (gsi, g, GSI_NEW_STMT);
> > +  if (may_throw && oldlhs == NULL_TREE)
> > +{
> > +  gsi_insert_on_edge_immediate (e, g);
> > +  *gsi = gsi_for_stmt (g);
> > +}
> > +  else
> > +gsi_insert_after (gsi, g, GSI_NEW_STMT);
> 
> rather than these dances with if (may_throw) isn't it easier to
> compute an insert gsi after finding the fallthru edge, splitting
> the edge if it is a critical one?  (basically doing a less
> fancy gimple_find_edge_insert_loc)

I can't see how that could simplify the code, gimple_find_edge_insert_loc
is lots of code, not exported and can return ins_after/!ins_after which
afterwards needs to be honored.

I've simplified the code a little bit by doing gsi_replace early, then
the difference is pretty much just:
-  gsi_insert_after (gsi, g, GSI_NEW_STMT);
+  if (throws)
+   {
+ gsi_insert_on_edge_immediate (e, g);
+ *gsi = gsi_for_stmt (g);
+   }
+  else
+   gsi_insert_after (gsi, g, GSI_NEW_STMT);
that has to be done on the first insertion after the call (which
unfortunately means two locations).  Another option I see is to
gimple_seq_add_stmt_without_update all the stmts following the call
into a seq and then conditionally insert that seq on the edge (immediate)
or insert the seq after the call.  It would make sense only for
gimple-fold.c, in tree-ssa-ccp.c there is just one stmt added this way,
the debug stmt needs extra care.

2017-03-03  Jakub Jelinek  

PR middle-end/79805
* internal-fn.def (ATOMIC_BIT_TEST_AND_SET, ATOMIC_BIT_TEST_AND_RESET,
ATOMIC_BIT_TEST_AND_COMPLEMENT, ATOMIC_COMPARE_EXCHANGE): Remove
ECF_NOTHROW.
* gimple-fold.c (fold_builtin_atomic_compare_exchange): Set
gimple_call_nothrow_p flag based on whether original builtin can throw.
If it can, emit following stmts on the fallthrough edge.
* tree-ssa-ccp.c (optimize_atomic_bit_test_and): Similarly, except
don't create new bb if inserting just 

Re: GCSE: Use HOST_WIDE_INT instead of int (PR, rtl-optimization/79574).

2017-03-03 Thread Richard Biener
On Thu, Mar 2, 2017 at 8:58 PM, Bernd Schmidt  wrote:
> On 03/02/2017 06:50 PM, Martin Liška wrote:
>>
>> Hello.
>>
>> This is second part of fixes needed to not trigger integer overflow in
>> gcse pass.
>
>
> So, how is this intended to work? The min/max stored in the param is an int,
> and by using a HOST_WIDE_INT here, we expect that it is a larger type and
> therefore won't overflow?

HOST_WIDE_INT is equal to uint64_t (and we could mass-replace where we can
do so consistently)

Richard.

>>{
>> expr = flat_table[i];
>> fprintf (file, "Index %d (hash value %d; max distance %d)\n  ",
>> -expr->bitmap_index, hash_val[i], expr->max_distance);
>> +expr->bitmap_index, hash_val[i],
>> (int)expr->max_distance);
>> print_rtl (file, expr->expr);
>> fprintf (file, "\n");
>
>
> Use HOST_WIDE_INT_PRINT_DEC maybe? Otherwise OK, I guess.
>
>
> Bernd


Re: [PATCH] free MPFR caches in gimple-ssa-sprintf.c (PR 79699)

2017-03-03 Thread Richard Biener
On Thu, Mar 2, 2017 at 10:33 PM, Martin Sebor  wrote:
> On 03/02/2017 01:08 AM, Richard Biener wrote:
>>
>> On Thu, Mar 2, 2017 at 2:01 AM, Joseph Myers 
>> wrote:
>>>
>>> On Wed, 1 Mar 2017, Martin Sebor wrote:
>>>
 Joseph, since you commented on the bug, do you have a suggestion
 for a different site for it or a guard?  The only other call to
 the function is in the Fortran FE and it's neither guarded nor
 does it appear to ever be called.
>>>
>>>
>>> I don't think a guard is needed.  Arguably it should be called from an
>>> atexit handler, but since we don't have such a handler calling it from
>>> the
>>> relevant pass seems reasonable (and I'm not sure what the right way to
>>> handle such freeing of memory in the JIT context is).
>>
>>
>> IMHO we should call it from gcc::~context
>
>
> Thanks, that makes sense to me.  The attached patch does that.

"mpfr.h" as include is wrong, please use "realmpfr.h" as all other code in GCC
(which includes  and , ideally those would move to system.h
guarded with a INCLUDE_XXX guard or always included as we include gmp.h).

Ok with that change,
Richard.

> Martin


Re: Add -Wdisabled-optimization to loop prefetching pass (PR, tree-optimization/79803).

2017-03-03 Thread Richard Biener
On Fri, Mar 3, 2017 at 11:57 AM, Richard Biener
 wrote:
> On Thu, Mar 2, 2017 at 6:48 PM, Martin Liška  wrote:
>> Hello.
>>
>> I've just followed Richi's advises in the PR.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
>
> I think the check tests for power-of two, not even.  Also you should mention
> this is a --param, otherwise people might think their CPU is broken ;)
>
> Ok otherwise.

Oh, and maybe only warn once, not for each function...

>
> Thanks,
> Richard.
>
>> Martin


Re: Add -Wdisabled-optimization to loop prefetching pass (PR, tree-optimization/79803).

2017-03-03 Thread Richard Biener
On Thu, Mar 2, 2017 at 6:48 PM, Martin Liška  wrote:
> Hello.
>
> I've just followed Richi's advises in the PR.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

I think the check tests for power-of two, not even.  Also you should mention
this is a --param, otherwise people might think their CPU is broken ;)

Ok otherwise.

Thanks,
Richard.

> Martin


Re: [PATCH] S/390: Disable vectorization for loops with few iterations

2017-03-03 Thread Richard Biener
On Thu, Mar 2, 2017 at 4:39 PM, Robin Dapp  wrote:
> Hi,
>
> the following patch defines the PARAM_MIN_VECT_LOOP_BOUND parameter in
> the s390 backend.  It helps with the vectorization epilogue problem
> described here [1].
> I see an overall performance increase of > 1% in SPECfp2006, yet some
> cases like cactusADM regress.  This seems to be caused by the vectorizer
> creating an epilogue guard for one more iteration than before, which, in
> turn, causes e.g. predcom to run on the epilogue that it used to ignore
> before ("Loop iterates only 1 time, nothing to do.").  Subsequent,
> minor, effects cause an eventual slowdown.
>
> Until the reason for the bad epilogue code is understood, this patch
> mitigates the problem.  When investigating the issue, I stumbled across
> an attempt to vectorize the epilogue itself as well as combine it with
> the vectorized loop in addition to vector masking [2].  A similar
> approach might also help here.  My original observation of high register
> pressure within the epilogue still stands.  In this specific case, it
> would most likely suffice to save all registers once, run the epilogue
> and restore the registers.  I'm pretty sure this would be faster than
> the "spill fest" that's currently happening.

Note it also honors PARAM_MIN_VECT_LOOP_BOUND if there's no epilouge.

Richard.

> Regards
>  Robin
>
> [1] https://gcc.gnu.org/ml/gcc/2017-01/msg00234.html
> [2] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01562.html
>
> --
>
> gcc/ChangeLog:
>
> 2017-03-02  Robin Dapp  
>
> * config/s390/s390.c (s390_option_override_internal): Set
> PARAM_MIN_VECT_LOOP_BOUND


[PATCH] rs6000: Fix for -mwarn-cell-microcode (PR43763)

2017-03-03 Thread Segher Boessenkool
If using -mwarn-cell-microcode, rs6000_final_prescan_insn calls
get_insn_template to get the name of the machine instruction.  But,
get_insn_template calls the output template if that is code, and that
then can modify recog_data (it is normal to change the operands, for
example).

This patch saves and restores recog_data around the call to
get_insn_template to fix the problems this causes.

Bootstrap and test running on powerpc64-linux; I'll commit it to
trunk if that goes well (and 6 and 5 later).


Segher


2017-03-03  Segher Boesssenkool  

PR target/43763
* config/rs6000/rs6000.c (rs6000_final_prescan_insn): Save and
restore recog_data (including the operand rtxes inside it) around
the call to get_insn_data.

---
 gcc/config/rs6000/rs6000.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 3f13ce5..ecc7823 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -39041,7 +39041,12 @@ rs6000_final_prescan_insn (rtx_insn *insn, rtx 
*operand ATTRIBUTE_UNUSED,
   if (insn_code_number < 0)
return;
 
+  /* get_insn_data can modify recog_data, so save and restore it.  */
+  struct recog_data_d recog_data_save = recog_data;
+  for (int i = 0; i < recog_data.n_operands; i++)
+   recog_data.operand[i] = copy_rtx (recog_data.operand[i]);
   temp = get_insn_template (insn_code_number, insn);
+  recog_data = recog_data_save;
 
   if (get_attr_cell_micro (insn) == CELL_MICRO_ALWAYS)
warning_at (location, OPT_mwarn_cell_microcode,
-- 
1.9.3



Re: [PATCH] Fix -O0 -ffloat-store -mavx ICE (PR target/79807)

2017-03-03 Thread Jakub Jelinek
On Fri, Mar 03, 2017 at 09:10:03AM +0100, Uros Bizjak wrote:
> > Or are there any define_insn/define_expand where it is desirable to have
> > both input and output operand a MEM (and does it have to be matching)?
> > For various scalar binary and unary expanders the backend already uses a 
> > helper
> > that will force something into memory if dest and src are both memory and
> > not rtx_equal_p, but do we have anything like that in anything these two
> > builtin expanders emit?
> 
> Any insn with matched memory (movlps, movhps and similar) can also
> operate with matched register. To my knowledge, all insn patterns are
> written in this way, since in the past we had plenty of problems with
> matched memory operands.
> 
> Also, I really don't remember what benefit brings us to *not* force
> input operand to a register at -O0 at expand time and leave to RA to
> find matched memory. OTOH, with -O0 we already have so many
> unnecessary moves that you get tears in the eyes...

So, I've tried:
make mddump
for i in `sed -n -e 's/^.*CODE_FOR_\([^,]*\),.*$/\1/p' 
../../gcc/config/i386/i386-builtin.def`; do sed -n '/^(define_\(insn\|expand\) 
("'$i'"/,/^$/p' tmp-mddump.md; done
and looked for "=.*[mo] in there.
Some insns might want that && !(MEM_P (operands[0]) && MEM_P (operands[1])),
e.g.
(define_insn_and_split "avx512f__"
  [(set (match_operand:AVX512MODE2P 0 "nonimmediate_operand" "=x,m")
(unspec:AVX512MODE2P
  [(match_operand: 1 "nonimmediate_operand" "xm,x")]
  UNSPEC_CAST))]
  "TARGET_AVX512F"
  "#"
  "&& reload_completed"
  [(set (match_dup 0) (match_dup 1))]
as the constraint require that both operands aren't memory, shall I create a
patch for that?  This is the first category below.

The second category is where matching operand is ok, so my patch can
pessimize stuff.  I wonder if we couldn't handle this like:
  /* If we aren't optimizing, only allow one memory operand to be
 generated.  */
  if (memory_operand (op, mode))
{
  const char *const constraint
= insn_data[icode].operand[i + adjust + 1].constraint;
  if (optimize
  || num_memory != 1
  || !rtx_equal_p (real_target, op))
num_memory++;
  /* sse2_movsd allows matching operand.  */
  else if (icode == CODE_FOR_sse2_movsd)
;
  /* Various masked insns allow matching operand.  */
  else if (insn_data[icode].operand[i + adjust + 1].predicate
   == vector_move_operand
   && (strcmp (constraint, "0C") == 0
   || strcmp (constraint, "0C,0") == 0))
;
  else
num_memory++;
}
(though perhaps sse2_movsd is still too simplistic, because it has just
=m v 0 and =o 0 v alternatives, so if i + adjust + 1 is 2, then it is
fine as is, if it is 1, then only if the memory is offsettable; though
perhaps LRA can just turn non-offsettable memory into offsettable one
through secondary reload).

And the last category is with "=m" destination (not allowing anything else)
and (match_dup 0) somewhere in the pattern, I believe those have to be
expanded specially, because otherwise one e.g. wouldn't get MEM_P target
if optimize at all (I think the builtins are supposed to pass pointer to the
result in that case) and the match_dup just doesn't appear as another operand.

missing !(MEM && MEM) ?
===
avx512f_pd512_256pd
avx512f_pd512_pd
avx512f_ps512_256ps
avx512f_ps512_ps
avx512f_si512_256si
avx512f_si512_si
avx_pd256_pd
avx_ps256_ps
avx_si256_si
sse_storehps
sse_storelps

valid matching mem
==
avx512f_ss_truncatev16siv16hi2_mask
avx512f_ss_truncatev16siv16qi2_mask
avx512f_ss_truncatev8div8hi2_mask
avx512f_ss_truncatev8div8si2_mask
avx512f_truncatev16siv16hi2_mask
avx512f_truncatev16siv16qi2_mask
avx512f_truncatev8div8hi2_mask
avx512f_truncatev8div8si2_mask
avx512f_us_truncatev16siv16hi2_mask
avx512f_us_truncatev16siv16qi2_mask
avx512f_us_truncatev8div8hi2_mask
avx512f_us_truncatev8div8si2_mask
avx512f_vcvtps2ph512_mask
avx512vl_ss_truncatev16hiv16qi2_mask
avx512vl_ss_truncatev4div4si2_mask
avx512vl_ss_truncatev8siv8hi2_mask
avx512vl_truncatev16hiv16qi2_mask
avx512vl_truncatev4div4si2_mask
avx512vl_truncatev8siv8hi2_mask
avx512vl_us_truncatev16hiv16qi2_mask
avx512vl_us_truncatev4div4si2_mask
avx512vl_us_truncatev8siv8hi2_mask
sse2_movsd
vcvtps2ph256_mask

match_dup on mem target
===
avx512f_ss_truncatev8div16qi2_mask_store
avx512f_storev16sf_mask
avx512f_storev16si_mask
avx512f_storev8df_mask
avx512f_storev8di_mask
avx512f_truncatev8div16qi2_mask_store
avx512f_us_truncatev8div16qi2_mask_store
avx512vl_compressstorev2df_mask
avx512vl_compressstorev2di_mask
avx512vl_compressstorev4df_mask
avx512vl_compressstorev4di_mask
avx512vl_compressstorev4sf_mask
avx512vl_compressstorev4si_mask

Re: GCSE: Use HOST_WIDE_INT instead of int (PR, rtl-optimization/79574).

2017-03-03 Thread Martin Liška
On 03/02/2017 08:58 PM, Bernd Schmidt wrote:
> On 03/02/2017 06:50 PM, Martin Liška wrote:
>> Hello.
>>
>> This is second part of fixes needed to not trigger integer overflow in gcse 
>> pass.
> 
> So, how is this intended to work? The min/max stored in the param is an int, 
> and by using a HOST_WIDE_INT here, we expect that it is a larger type and 
> therefore won't overflow?

Using int type is fine, but as gcse.c does an arithmetic with a distance:

  /* Prefer to hoist EXPR if register pressure is decreased.  */
  if (decreased_pressure > *nregs)
distance += bb_size[bb->index];

...

max_distance += (bb_size[dominated->index]
 - to_bb_head[INSN_UID (occr->insn)]);


> 
>>{
>>  expr = flat_table[i];
>>  fprintf (file, "Index %d (hash value %d; max distance %d)\n  ",
>> - expr->bitmap_index, hash_val[i], expr->max_distance);
>> + expr->bitmap_index, hash_val[i], (int)expr->max_distance);
>>  print_rtl (file, expr->expr);
>>  fprintf (file, "\n");
> 
> Use HOST_WIDE_INT_PRINT_DEC maybe? Otherwise OK, I guess.

That's for nit, I'll install the patch with that.

Martin

> 
> 
> Bernd



[C++ Patch] PR 70266

2017-03-03 Thread Paolo Carlini

Hi,

one more case of OVERLOAD filtering to cxx_eval_constant_expression and 
causing an ICE.


In order to have a proper diagnostic early enough I think that in 
build_must_not_throw_expr we want to use 
perform_implicit_conversion_flags on the condition before calling 
cxx_constant_value for it. That would be also analogous to the code we 
already have in build_noexcept_spec, for example. Tested x86_64-linux.


Thanks, Paolo.

//

/cp
2017-03-03  Paolo Carlini  

PR c++/70266
* except.c (build_must_not_throw_expr): Perform the implicit
conversions on the condition.

/testsuite
2017-03-03  Paolo Carlini  

PR c++/70266
* g++.dg/tm/pr70266.C: New.
Index: cp/except.c
===
--- cp/except.c (revision 245836)
+++ cp/except.c (working copy)
@@ -268,6 +268,9 @@ build_must_not_throw_expr (tree body, tree cond)
 
   if (cond && !value_dependent_expression_p (cond))
 {
+  cond = perform_implicit_conversion_flags (boolean_type_node, cond,
+   tf_warning_or_error,
+   LOOKUP_NORMAL);
   cond = cxx_constant_value (cond);
   if (integer_zerop (cond))
return body;
Index: testsuite/g++.dg/tm/pr70266.C
===
--- testsuite/g++.dg/tm/pr70266.C   (revision 0)
+++ testsuite/g++.dg/tm/pr70266.C   (working copy)
@@ -0,0 +1,8 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-fgnu-tm" }
+
+template < typename T > 
+int foo (int x, T t)
+{
+  return __transaction_atomic noexcept (foo) (1);  // { dg-error "cannot 
resolve" }
+}


Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-03 Thread Richard Biener
On Fri, 3 Mar 2017, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >> >
> >> >> Richard Biener  writes:
> >> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >> >> >
> >> >> >> Richard Biener  writes:
> >> >> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >> >> >> >
> >> >> >> >> Sorry for the late reply, but:
> >> >> >> >> 
> >> >> >> >> Richard Biener  writes:
> >> >> >> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
> >> >> >> >> >
> >> >> >> >> >> 
> >> >> >> >> >> Currently we force peeling for gaps whenever element
> >> >> >> >> >> overrun can occur
> >> >> >> >> >> but for aligned accesses we know that the loads won't trap
> >> >> >> >> >> and thus
> >> >> >> >> >> we can avoid this.
> >> >> >> >> >> 
> >> >> >> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu
> >> >> >> >> >> (I expect
> >> >> >> >> >> some testsuite fallout here so didn't bother to invent a
> >> >> >> >> >> new testcase).
> >> >> >> >> >> 
> >> >> >> >> >> Just in case somebody thinks the overrun is a bad idea in 
> >> >> >> >> >> general
> >> >> >> >> >> (even when not trapping).  Like for ASAN or valgrind.
> >> >> >> >> >
> >> >> >> >> > This is what I applied.
> >> >> >> >> >
> >> >> >> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >> >> >> >> >
> >> >> >> >> > Richard.
> >> >> >> >> [...]
> >> >> >> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> >> >> >> >> > index 15aec21..c29e73d 100644
> >> >> >> >> > --- a/gcc/tree-vect-stmts.c
> >> >> >> >> > +++ b/gcc/tree-vect-stmts.c
> >> >> >> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, 
> >> >> >> >> > tree vectype, bool slp,
> >> >> >> >> >/* If there is a gap at the end of the group then these 
> >> >> >> >> > optimizations
> >> >> >> >> >  would access excess elements in the last iteration.  */
> >> >> >> >> >bool would_overrun_p = (gap != 0);
> >> >> >> >> > +  /* If the access is aligned an overrun is fine.  */
> >> >> >> >> > +  if (would_overrun_p
> >> >> >> >> > + && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
> >> >> >> >> > +   would_overrun_p = false;
> >> >> >> >> >if (!STMT_VINFO_STRIDED_P (stmt_info)
> >> >> >> >> >   && (can_overrun_p || !would_overrun_p)
> >> >> >> >> >   && compare_step_with_zero (stmt) > 0)
> >> >> >> >> 
> >> >> >> >> ...is this right for all cases?  I think it only looks for
> >> >> >> >> single-vector
> >> >> >> >> alignment, but the gap can in principle be vector-sized or larger,
> >> >> >> >> at least for load-lanes.
> >> >> >> >>
> >> >> >> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
> >> >> >> >> and a gap of 2 or 3.  Even if the access itself is aligned, the 
> >> >> >> >> group
> >> >> >> >> spans two vectors and we have no guarantee that the second one
> >> >> >> >> is mapped.
> >> >> >> >
> >> >> >> > The check assumes that if aligned_access_p () returns true then the
> >> >> >> > whole access is aligned in a way that it can't cross page 
> >> >> >> > boundaries.
> >> >> >> > That's of course not the case if alignment is 16 bytes but the 
> >> >> >> > access
> >> >> >> > will be a multiple of that.
> >> >> >> >  
> >> >> >> >> I haven't been able to come up with a testcase though.  We seem 
> >> >> >> >> to be
> >> >> >> >> overly conservative when computing alignments.
> >> >> >> >
> >> >> >> > Not sure if we can run into this with load-lanes given that bumps 
> >> >> >> > the
> >> >> >> > vectorization factor.  Also does load-lane work with gaps?
> >> >> >> >
> >> >> >> > I think that gap can never be larger than nunits-1 so it is by 
> >> >> >> > definition
> >> >> >> > in the last "vector" independent of the VF.
> >> >> >> >
> >> >> >> > Classical gap case is
> >> >> >> >
> >> >> >> > for (i=0; i >> >> >> >  {
> >> >> >> >y[3*i + 0] = x[4*i + 0];
> >> >> >> >y[3*i + 1] = x[4*i + 1];
> >> >> >> >y[3*i + 2] = x[4*i + 2];
> >> >> >> >  }
> >> >> >> >
> >> >> >> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
> >> >> >> > the y's different streams and you should get the perfect case for
> >> >> >> > load-lane:
> >> >> >> >
> >> >> >> > for (i=0; i >> >> >> >  {
> >> >> >> >y[i] = x[4*i + 0];
> >> >> >> >z[i] = x[4*i + 1];
> >> >> >> >w[i] = x[4*i + 2];
> >> >> >> >  } 
> >> >> >> >
> >> >> >> > previously we'd peel at least 4 iterations into the epilogue for
> >> >> >> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
> >> >> >> > ok.
> >> >> >> 
> >> >> >> The case I was thinking of was like the second, but with the
> >> >> >> element type being DI or DF and with the + 2 statement removed.
> >> >> >> E.g.:
> >> >> >> 
> >> >> >> double 

[PATCH][C++] Fix PR79825

2017-03-03 Thread Richard Biener

The following obvious patch fixes PR79825, we were not treating
EMPTY_CLASS_EXPR as simple_empty_class_p.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-03-03  Richard Biener  

PR c++/79825
* cp-gimplify.c (simple_empty_class_p): Handle EMPTY_CLASS_EXPR.

* g++.dg/warn/Wuninitialized-8.C: New testcase.

Index: gcc/cp/cp-gimplify.c
===
*** gcc/cp/cp-gimplify.c(revision 245863)
--- gcc/cp/cp-gimplify.c(working copy)
*** simple_empty_class_p (tree type, tree op
*** 549,554 
--- 549,555 
return
  ((TREE_CODE (op) == COMPOUND_EXPR
&& simple_empty_class_p (type, TREE_OPERAND (op, 1)))
+  || TREE_CODE (op) == EMPTY_CLASS_EXPR
   || is_gimple_lvalue (op)
   || INDIRECT_REF_P (op)
   || (TREE_CODE (op) == CONSTRUCTOR
Index: gcc/testsuite/g++.dg/warn/Wuninitialized-8.C
===
*** gcc/testsuite/g++.dg/warn/Wuninitialized-8.C(nonexistent)
--- gcc/testsuite/g++.dg/warn/Wuninitialized-8.C(working copy)
***
*** 0 
--- 1,18 
+ // { dg-do compile }
+ // { dg-options "-Wuninitialized" }
+ 
+ struct A;
+ struct B
+ {
+   B(A);
+ };
+ struct C
+ {
+   template  void m_fn1(PassT p1) { new B(p1); }  // { 
dg-bogus "uninitialized" }
+ };
+ struct A {};
+ void fn1()
+ {
+   C a;
+   a.m_fn1(A());
+ }


[wwwdocs] gcc-4.7/changes.html -- "typed DWARF stack"

2017-03-03 Thread Gerald Pfeifer
This link broke, but I found a replacement that is actually more
in line with the other references in that section (in that it uses
the ShowIssue facility on www.dwarfstd.org).

Applied.

Gerald

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.145
diff -u -r1.145 changes.html
--- changes.html28 Dec 2016 23:42:42 -  1.145
+++ changes.html3 Mar 2017 10:06:00 -
@@ -997,7 +997,7 @@
 value and http://www.dwarfstd.org/ShowIssue.php?issue=100909.2;>call
 site information, http://www.dwarfstd.org/doc/040408.1.html;>typed DWARF stack
+href="http://www.dwarfstd.org/ShowIssue.php?issue=140425.1;>typed DWARF 
stack
 or http://www.dwarfstd.org/ShowIssue.php?issue=110722.1;>a
 more compact macro representation.  Support for these extensions


Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-03 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 1 Mar 2017, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >
>> >> Richard Biener  writes:
>> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >> >
>> >> >> Richard Biener  writes:
>> >> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >> >> >
>> >> >> >> Sorry for the late reply, but:
>> >> >> >> 
>> >> >> >> Richard Biener  writes:
>> >> >> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
>> >> >> >> >
>> >> >> >> >> 
>> >> >> >> >> Currently we force peeling for gaps whenever element
>> >> >> >> >> overrun can occur
>> >> >> >> >> but for aligned accesses we know that the loads won't trap
>> >> >> >> >> and thus
>> >> >> >> >> we can avoid this.
>> >> >> >> >> 
>> >> >> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu
>> >> >> >> >> (I expect
>> >> >> >> >> some testsuite fallout here so didn't bother to invent a
>> >> >> >> >> new testcase).
>> >> >> >> >> 
>> >> >> >> >> Just in case somebody thinks the overrun is a bad idea in general
>> >> >> >> >> (even when not trapping).  Like for ASAN or valgrind.
>> >> >> >> >
>> >> >> >> > This is what I applied.
>> >> >> >> >
>> >> >> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >> >> >> >
>> >> >> >> > Richard.
>> >> >> >> [...]
>> >> >> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> >> >> >> > index 15aec21..c29e73d 100644
>> >> >> >> > --- a/gcc/tree-vect-stmts.c
>> >> >> >> > +++ b/gcc/tree-vect-stmts.c
>> >> >> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, 
>> >> >> >> > tree vectype, bool slp,
>> >> >> >> >/* If there is a gap at the end of the group then these 
>> >> >> >> > optimizations
>> >> >> >> >would access excess elements in the last iteration.  */
>> >> >> >> >bool would_overrun_p = (gap != 0);
>> >> >> >> > +  /* If the access is aligned an overrun is fine.  */
>> >> >> >> > +  if (would_overrun_p
>> >> >> >> > +   && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
>> >> >> >> > + would_overrun_p = false;
>> >> >> >> >if (!STMT_VINFO_STRIDED_P (stmt_info)
>> >> >> >> > && (can_overrun_p || !would_overrun_p)
>> >> >> >> > && compare_step_with_zero (stmt) > 0)
>> >> >> >> 
>> >> >> >> ...is this right for all cases?  I think it only looks for
>> >> >> >> single-vector
>> >> >> >> alignment, but the gap can in principle be vector-sized or larger,
>> >> >> >> at least for load-lanes.
>> >> >> >>
>> >> >> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
>> >> >> >> and a gap of 2 or 3.  Even if the access itself is aligned, the 
>> >> >> >> group
>> >> >> >> spans two vectors and we have no guarantee that the second one
>> >> >> >> is mapped.
>> >> >> >
>> >> >> > The check assumes that if aligned_access_p () returns true then the
>> >> >> > whole access is aligned in a way that it can't cross page boundaries.
>> >> >> > That's of course not the case if alignment is 16 bytes but the access
>> >> >> > will be a multiple of that.
>> >> >> >  
>> >> >> >> I haven't been able to come up with a testcase though.  We seem to 
>> >> >> >> be
>> >> >> >> overly conservative when computing alignments.
>> >> >> >
>> >> >> > Not sure if we can run into this with load-lanes given that bumps the
>> >> >> > vectorization factor.  Also does load-lane work with gaps?
>> >> >> >
>> >> >> > I think that gap can never be larger than nunits-1 so it is by 
>> >> >> > definition
>> >> >> > in the last "vector" independent of the VF.
>> >> >> >
>> >> >> > Classical gap case is
>> >> >> >
>> >> >> > for (i=0; i> >> >> >  {
>> >> >> >y[3*i + 0] = x[4*i + 0];
>> >> >> >y[3*i + 1] = x[4*i + 1];
>> >> >> >y[3*i + 2] = x[4*i + 2];
>> >> >> >  }
>> >> >> >
>> >> >> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
>> >> >> > the y's different streams and you should get the perfect case for
>> >> >> > load-lane:
>> >> >> >
>> >> >> > for (i=0; i> >> >> >  {
>> >> >> >y[i] = x[4*i + 0];
>> >> >> >z[i] = x[4*i + 1];
>> >> >> >w[i] = x[4*i + 2];
>> >> >> >  } 
>> >> >> >
>> >> >> > previously we'd peel at least 4 iterations into the epilogue for
>> >> >> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
>> >> >> > ok.
>> >> >> 
>> >> >> The case I was thinking of was like the second, but with the
>> >> >> element type being DI or DF and with the + 2 statement removed.
>> >> >> E.g.:
>> >> >> 
>> >> >> double __attribute__((noinline))
>> >> >> foo (double *a)
>> >> >> {
>> >> >>   double res = 0.0;
>> >> >>   for (int n = 0; n < 256; n += 4)
>> >> >> res += a[n] + a[n + 1];
>> >> >>   return res;
>> >> >> }
>> >> >> 
>> >> >> (with -ffast-math).  We do use LD4 for this, and having "a" aligned
>> >> >> to V2DF isn't enough to guarantee that we can access a[n + 2]
>> >> 

Re: [PATCH, i386]: Fix PR79514, ICE with -m96bit-long-double on x86_64

2017-03-03 Thread Uros Bizjak
On Thu, Mar 2, 2017 at 7:52 PM, Uros Bizjak  wrote:
> Attached patch adds new insn_and_split pattern to handle pre_modify
> XFmode push variant. -m96bit-long-double option sets XFmode size to 12
> bytes that is not divisible with push size (8 bytes), resulting in
> pre_modify push variant.

Attached patch is also needed to avoid ICE on x32.

2017-03-03  Uros Bizjak  

PR target/79514
* config/i386/i386.md (*pushxf_rounded): Use Pmode instead of DImode.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline, will be backported to release branches.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 245860)
+++ config/i386/i386.md (working copy)
@@ -3017,14 +3017,14 @@
 
 (define_insn_and_split "*pushxf_rounded"
   [(set (mem:XF
- (pre_modify:DI
-   (reg:DI SP_REG)
-   (plus:DI (reg:DI SP_REG) (const_int -16
+ (pre_modify:P
+   (reg:P SP_REG)
+   (plus:P (reg:P SP_REG) (const_int -16
(match_operand:XF 0 "nonmemory_no_elim_operand" "f,r,*r,C"))]
   "TARGET_64BIT"
   "#"
   "&& 1"
-  [(set (reg:DI SP_REG) (plus:DI (reg:DI SP_REG) (const_int -16)))
+  [(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (const_int -16)))
(set (match_dup 1) (match_dup 0))]
 {
   rtx pat = PATTERN (curr_insn);


Re: [PATCH docs] remove Java from GCC 7 release criteria

2017-03-03 Thread Richard Biener
On Thu, 2 Mar 2017, Martin Sebor wrote:

> On 03/02/2017 12:58 AM, Richard Biener wrote:
> > On Wed, 1 Mar 2017, Martin Sebor wrote:
> > 
> > > On 02/28/2017 11:41 PM, Richard Biener wrote:
> > > > On March 1, 2017 3:34:46 AM GMT+01:00, Martin Sebor 
> > > > wrote:
> > > > > On 02/28/2017 01:41 PM, Richard Biener wrote:
> > > > > > On February 28, 2017 7:00:39 PM GMT+01:00, Jeff Law 
> > > > > > 
> > > > > wrote:
> > > > > > > On 02/28/2017 10:54 AM, Martin Sebor wrote:
> > > > > > > > The GCC 7 release criteria page mentions Java even though
> > > > > > > > the front end has been removed.  The attached patch removes Java
> > > > > > > > from the criteria page.  While reviewing the rest of the text I
> > > > > > > > noticed a few minor typos that I corrected in the patch as well.
> > > > > > > > 
> > > > > > > > Btw., as an aside, I read the page to see if I could find out
> > > > > > > > more
> > > > > > > > about the "magic" bug counts that are being aimed for to decide
> > > > > when
> > > > > > > > to cut the release.  Can someone say what those are and where to
> > > > > > > > find them?  I understand from the document that they're not
> > > > > > > > exact
> > > > > > > > but even ballpark numbers would be useful.
> > > > > > > 
> > > > > > > OK.
> > > > > > > 
> > > > > > > WRT the bug counts.  0 P1 regressions, < 100 P1-P3 regressions.
> > > > > > > I'm
> > > > > > > not
> > > > > > > sure if that's documented anywhere though.
> > > > > > 
> > > > > > Actually the only criteria is zero P1 regressions.  Those are
> > > > > documented to block a release.
> > > > > 
> > > > > Yes, that is mentioned in the document.  Would it be fair to say
> > > > > that the number of P2 bugs (or regressions) or their nature plays
> > > > > into the decision in some way as well?  If so, what can the release
> > > > > criteria say about it?
> > > > 
> > > > Ultimatively P2 bugs do not play a role and 'time' will trump them.
> > > > OTOH we
> > > > never were in an uncomfortable situation with P2s at the desired point
> > > > of
> > > > release.
> > > > 
> > > > Also note that important P2 bugs can be promoted to P1 and not important
> > > > P1
> > > > to P2.
> > > > 
> > > > > I'm trying to get a better idea which bugs to work on and where
> > > > > my help might have the biggest impact.  I think having better
> > > > > visibility into the bug triage process (such as bug priorities
> > > > > and how they impact the release schedule) might help others
> > > > > focus too.
> > > > 
> > > > In order of importance:
> > > > - P1
> > > > - wrong-code, rejects-valid, ice-on-valid (even if not regressions,
> > > > regressions more important)
> > > > - P2 regressions, more recent ones first (newest working version)
> > > 
> > > I see.  This is helpful, thanks.
> > > 
> > > The kinds of problems you mention are discussed in the document
> > > so just to make the importance clear, would adding the following
> > > after this sentence
> > > 
> > >   In general bugs blocking the release are marked with priority P1
> > >   (Maintaining the GCC Bugzilla database).
> > > 
> > > accurately reflect what you described?
> > > 
> > >   As a general rule of thumb, within each priority level, bugs that
> > >   result in incorrect code are considered more urgent than those
> > >   that lead to rejecting valid code, which in turn are viewed as
> > >   more severe than ice-on-valid code (compiler crashes).  More
> > >   recently reported bugs are also prioritized over very old ones.
> > 
> > I'd rather see to clarify things in bugs/management.html.  Note
> > that wrong-code, rejects-valid, ice-on-valid are equally important.
> > Less important would be accepts-invalid or ice-on-invalid or, of course,
> > missed-optimization.  Also it's not more recently _reported_  bugs
> > but a [6/7 Regression] is more important to fix than a [5/6/7 Regression]
> > (this is also why [7 Regression]s are P1 by default).
> 
> Got it.  Attached is a rewording of the paragraph above added to
> bugs/management.html.  I put it under the Importance (Severity
> and Priority) heading but it could probably fit just as well under
> Procedures and Policies.
> 
> Please let me know if there's anything else that can be said to
> further clarify things, or if you have any other suggestions.

Looks fine!

Thus, ok.

Richard.


Re: [PATCH] Fix IFN_ATOMIC* handling with -fnon-call-exceptions (PR middle-end/79805)

2017-03-03 Thread Richard Biener
On Thu, 2 Mar 2017, Jakub Jelinek wrote:

> Hi!
> 
> The __atomic/__sync builtins have nothrow conditional on
> -fnon-call-exceptions, but internal fns have right now only constant flags.
> As per discussions on IRC, this patch removes ECF_NOTHROW from those 4 ifns
> and instead calls gimple_call_set_nothrow when creating those to say
> if it can or can't throw.  Furthermore, if we need to add statements after
> the former builtin, now ifn with -fnon-call-exceptions, we need to insert
> them on the fallthru edge instead of right after it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2017-03-02  Jakub Jelinek  
> 
>   PR middle-end/79805
>   * internal-fn.def (ATOMIC_BIT_TEST_AND_SET, ATOMIC_BIT_TEST_AND_RESET,
>   ATOMIC_BIT_TEST_AND_COMPLEMENT, ATOMIC_COMPARE_EXCHANGE): Remove
>   ECF_NOTHROW.
>   * gimple-fold.c (fold_builtin_atomic_compare_exchange): Set
>   gimple_call_nothrow_p flag based on flag_non_call_exceptions.  For
>   -fnon-call-exceptions emit following stmts on the fallthrough edge.
>   (optimize_atomic_bit_test_and): Similarly, except don't create new
>   bb if inserting just debug stmts on the edge, try to insert them
>   on the fallthru bb or just reset debug stmts.
> 
>   * g++.dg/opt/pr79805.C: New test.
> 
> --- gcc/internal-fn.def.jj2017-02-09 14:55:38.0 +0100
> +++ gcc/internal-fn.def   2017-03-02 13:14:43.659419232 +0100
> @@ -205,11 +205,13 @@ DEF_INTERNAL_FN (GOACC_TILE, ECF_NOTHROW
> current target.  */
>  DEF_INTERNAL_FN (SET_EDOM, ECF_LEAF | ECF_NOTHROW, NULL)
>  
> -/* Atomic functions.  */
> -DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET, ECF_LEAF | ECF_NOTHROW, NULL)
> -DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF | ECF_NOTHROW, 
> NULL)
> -DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF | ECF_NOTHROW, NULL)
> -DEF_INTERNAL_FN (ATOMIC_COMPARE_EXCHANGE, ECF_LEAF | ECF_NOTHROW, NULL)
> +/* Atomic functions.  These don't have ECF_NOTHROW because for
> +   -fnon-call-exceptions they can throw, otherwise we set
> +   gimple_call_nothrow_p on it.  */
> +DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET, ECF_LEAF, NULL)
> +DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF, NULL)
> +DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF, NULL)
> +DEF_INTERNAL_FN (ATOMIC_COMPARE_EXCHANGE, ECF_LEAF, NULL)
>  
>  /* To implement [[fallthrough]].  */
>  DEF_INTERNAL_FN (FALLTHROUGH, ECF_LEAF | ECF_NOTHROW, NULL)
> --- gcc/gimple-fold.c.jj  2017-02-07 16:40:45.0 +0100
> +++ gcc/gimple-fold.c 2017-03-02 16:04:51.304850077 +0100
> @@ -3533,6 +3533,8 @@ fold_builtin_atomic_compare_exchange (gi
>tree itype = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (parmt)));
>tree ctype = build_complex_type (itype);
>tree expected = TREE_OPERAND (gimple_call_arg (stmt, 1), 0);
> +  bool may_throw = false;
> +  edge e = NULL;
>gimple *g = gimple_build_assign (make_ssa_name (TREE_TYPE (expected)),
>  expected);
>gsi_insert_before (gsi, g, GSI_SAME_STMT);
> @@ -3558,19 +3560,43 @@ fold_builtin_atomic_compare_exchange (gi
>gimple_set_vdef (g, gimple_vdef (stmt));
>gimple_set_vuse (g, gimple_vuse (stmt));
>SSA_NAME_DEF_STMT (gimple_vdef (g)) = g;
> -  if (gimple_call_lhs (stmt))
> +  tree oldlhs = gimple_call_lhs (stmt);
> +  if (flag_non_call_exceptions && stmt_ends_bb_p (stmt))

I think a more appropriate check is stmt_can_throw_internal (stmt)

>  {
> -  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +  may_throw = true;

and 'throws' rather than 'may_throw'.

> +  gimple_call_set_lhs (stmt, NULL_TREE);
> +  gsi_replace (gsi, g, true);
> +  e = find_fallthru_edge (gsi_bb (*gsi)->succs);
> +}
> +  gimple_call_set_nothrow (as_a  (g), flag_non_call_exceptions == 
> 0);

it should copy nothrow state from the source(s) of the folding if you
consider flag_non_call_exceptions == true in this fn but working on
inlined code from a flag_non_call_exceptions == false function.  At least
for calls in the source(s) that works, for other stmts I think we do
not have an explicit nothrow on the stmt.

> +  if (oldlhs)
> +{
> +  if (!may_throw)
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
>g = gimple_build_assign (make_ssa_name (itype), IMAGPART_EXPR,
>  build1 (IMAGPART_EXPR, itype, lhs));
> -  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> -  g = gimple_build_assign (gimple_call_lhs (stmt), NOP_EXPR,
> -gimple_assign_lhs (g));
> +  if (may_throw)
> + {
> +   gsi_insert_on_edge_immediate (e, g);
> +   *gsi = gsi_for_stmt (g);
> + }
> +  else
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +  g = gimple_build_assign (oldlhs, NOP_EXPR, gimple_assign_lhs (g));
> +  if (may_throw)
> + gsi_insert_after (gsi, g, GSI_NEW_STMT);
>  }
> -  gsi_replace (gsi, g, true);
> +  if (!may_throw)
> 

Re: [PATCH] Fix -O0 -ffloat-store -mavx ICE (PR target/79807)

2017-03-03 Thread Uros Bizjak
On Thu, Mar 2, 2017 at 10:44 PM, Jakub Jelinek  wrote:
> Hi!
>
> For -O1 and above, we force all operands and targets of x86 builtins into
> registers during expansion, because we expect combiner will do its job.
> But for -O0, we try to keep them in MEM if the predicate allows it, we just
> make sure that at most one input operand is a MEM.  We still allow
> MEM destination and one MEM input operand, which is why we ICE - most
> insns/expanders don't allow that in the predicates and it is fine, but
> various insns/expanders have e.g. nonimmediate_operand on both target
> and some input operand and just have a condition that requires that one
> of those is a register or that both of them are not memory.  The expanders
> don't check the condition though, it is checked only when the insn is being
> recognized in vregs pass and that is too late.
> The following patch fixes that by forcing all input operands at -O0 into
> non-memory if the target is memory.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Or are there any define_insn/define_expand where it is desirable to have
> both input and output operand a MEM (and does it have to be matching)?
> For various scalar binary and unary expanders the backend already uses a 
> helper
> that will force something into memory if dest and src are both memory and
> not rtx_equal_p, but do we have anything like that in anything these two
> builtin expanders emit?

Any insn with matched memory (movlps, movhps and similar) can also
operate with matched register. To my knowledge, all insn patterns are
written in this way, since in the past we had plenty of problems with
matched memory operands.

Also, I really don't remember what benefit brings us to *not* force
input operand to a register at -O0 at expand time and leave to RA to
find matched memory. OTOH, with -O0 we already have so many
unnecessary moves that you get tears in the eyes...

> 2017-03-02  Jakub Jelinek  
>
> PR target/79807
> * config/i386/i386.c (ix86_expand_multi_arg_builtin): If target
> is a memory operand, increase num_memory.
> (ix86_expand_args_builtin): Likewise.
>
> * gcc.target/i386/pr79807.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.c.jj   2017-03-02 08:08:43.0 +0100
> +++ gcc/config/i386/i386.c  2017-03-02 17:58:28.396999033 +0100
> @@ -34249,6 +34249,8 @@ ix86_expand_multi_arg_builtin (enum insn
>|| GET_MODE (target) != tmode
>|| !insn_data[icode].operand[0].predicate (target, tmode))
>  target = gen_reg_rtx (tmode);
> +  else if (memory_operand (target, tmode))
> +num_memory++;
>
>gcc_assert (nargs <= 4);
>
> @@ -35534,6 +35536,8 @@ ix86_expand_args_builtin (const struct b
>   || GET_MODE (target) != tmode
>   || !insn_p->operand[0].predicate (target, tmode))
> target = gen_reg_rtx (tmode);
> +  else if (memory_operand (target, tmode))
> +   num_memory++;
>real_target = target;
>  }
>else
> --- gcc/testsuite/gcc.target/i386/pr79807.c.jj  2017-03-02 18:03:30.032023436 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr79807.c 2017-03-02 18:02:53.0 
> +0100
> @@ -0,0 +1,12 @@
> +/* PR target/79807 */
> +/* { dg-do compile } */
> +/* { dg-options "-O0 -mavx -ffloat-store" } */
> +
> +typedef double __v2df __attribute__ ((__vector_size__ (16)));
> +typedef double __v4df __attribute__ ((__vector_size__ (32)));
> +
> +__v2df
> +foo (__v4df x)
> +{
> +  return __builtin_ia32_pd_pd256 (x);
> +}
>
> Jakub