Re: [PATCH] Fix pr80044, -static and -pie insanity, and pr81170

2017-08-28 Thread Jeff Law
On 08/28/2017 04:33 PM, Alan Modra wrote:
> On Mon, Aug 28, 2017 at 08:27:35AM -0600, Jeff Law wrote:
>> So sorry for the horrible delay.  What was the final resolution here?  I
>> saw a lot of back and forth with HJ and yourself.  80044 is
>> CLOSED/WONTFIX and 81170 has a patch attached to it, but is still in the
>> ASSIGNED state.
> 
> The patch went in trunk as 5a402d649 (r250974) and a9b2df6cc
> (r251065).  PR81170 and PR81295 are still open due to needing a fix
> for powerpc --enable-default-pie on the branches.  Last I checked,
> both patches apply without any difficulty.
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00831.html
I think if you want to backport, go ahead.

jeff


Improve DOM's ability to derive equivalences when traversing edges

2017-08-28 Thread Jeff Law
This is a two part patchkit to improve DOM's ability to derive constant
equivalences that arise as a result of traversing a particular edge in
the CFG.

Until now we only allowed a single NAME = NAME|CONST equivalence to be
associated with an edge in the CFG.  Patch #1 generalizes that code so
that we can record multiple simple equivalences on an edge.  Much like
expression equivalences, we just shove them into a vec and iterate on
the vec in the appropriate places.

Patch #2 has the interesting bits.

Back in the gcc-7 effort I added the ability to look at the operands of
a BIT_IOR_EXPR that had a zero result.  In that case each operand of the
BIT_IOR must have a zero value.  This was to address a missed
optimization regression bug during stage4.

The plan was always to add analogous BIT_AND support, but I didn't feel
like handling BIT_AND was appropriate at the time (no bz entry and no
regressions related to that capability).

I'd also had the sense that further improvements could be made here. For
example, it is common for the BIT_IOR or BIT_AND to be fed by a
comparison and we ought to be able to record the result of the
comparison.  If the comparison happened to be an equality test, then we
may ultimately derive a constant for on operand of the equality test as
well.

It also seemed like the NOP/CONVERT_EXPR handling could be incorporated
into such a change.

So I pulled together some instrumentation.  Lots of things generate
equivalences -- but a much smaller subset of those equivalences are
ultimately useful.

Probably the most surprising was BIT_XOR, which allows us to generate
all kinds of equivalences, but none that were useful for ultimate
simplification in any of the tests I looked at.


The most subtle was COND_EXPRs.  We might have something like

res = (a != 5) ? x : 1;


We can't actually derive anything useful for "a" here, even if we know
the result is one.  That's because "x" could have the value 1.  So you
end up only being able to derive equivalences for COND_EXPRs when both
arms have a constant value.  That restriction dramatically reduces the
utility of handling COND_EXPR -- to the point where I'm not including it.

So what we end up with is BIT_AND/BIT_IOR, conversions, plus/minus,
comparisons and neg/not.

So when we determine that a particular SSA_NAME has a constant value, we
look at the defining statement and essentially try to derive a value for
the input operand(s) based on knowing the result value.  If we can
derive a constant value for an input operand, we record that value and
recurse.

In cases where we walk backwards to a condition.  We will record the
condition into the available expression table.


The code is written such that if we find cases where the equivalences
for other nodes are useful, they're easy to add.


These equivalences are most useful to the threader, but I've seen them
help in other cases as well.  There's a half-dozen or so new tests
reduced from GCC itself.

Bootstrapped and regression tested on x86_64, lightly tested on ppc64le
via bootstrapping and running the new tests to verify they do the right
thing on a !logical_op_short_circuit target.

Installing on the trunk.

Jeff

commit 506ac60cacbc4c4e5e166513ea83c1d2e14eaf3b
Author: law 
Date:   Tue Aug 29 05:03:22 2017 +

* tree-ssa-dom.c (class edge_info): Changed from a struct
to a class.  Add ctor/dtor, methods and data members.
(edge_info::edge_info): Renamed from allocate_edge_info.
Initialize additional members.
(edge_info::~edge_info): New.
(free_dom_edge_info): Delete the edge info.
(record_edge_info): Use new class & associated member functions.
Tighten forms for testing for edge equivalences.
(record_temporary_equivalences): Iterate over the simple
equivalences rather than assuming there's only one per edge.
(cprop_into_successor_phis): Iterate over the simple
equivalences rather than assuming there's only one per edge.
(optimize_stmt): Use operand_equal_p rather than pointer
equality for mini-DSE code.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251396 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index abe7d855260..258d4242f30 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,20 @@
+2017-08-28  Jeff Law  
+
+   * tree-ssa-dom.c (class edge_info): Changed from a struct
+   to a class.  Add ctor/dtor, methods and data members.
+   (edge_info::edge_info): Renamed from allocate_edge_info.
+   Initialize additional members.
+   (edge_info::~edge_info): New.
+   (free_dom_edge_info): Delete the edge info.
+   (record_edge_info): Use new class & associated member functions.
+   Tighten forms for testing for edge equivalences.
+   (record_temporary_equivalences): Iterate over 

Re: [AARCH64] Disable pc relative literal load irrespective of TARGET_FIX_ERR_A53_84341

2017-08-28 Thread Kugan Vivekanandarajah
ping^3

Thanks,
Kugan

On 11 August 2017 at 16:09, Kugan Vivekanandarajah
 wrote:
> Ping^2?
>
> Thanks,
> Kugan
>
> On 21 July 2017 at 20:12, Kugan Vivekanandarajah
>  wrote:
>> Ping ?
>>
>> Thanks,
>> Kugan
>>
>> On 27 June 2017 at 11:20, Kugan Vivekanandarajah
>>  wrote:
>>> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00614.html  added this
>>> workaround to get kernel building with when TARGET_FIX_ERR_A53_843419
>>> is enabled.
>>>
>>> This was added to support building kernel loadable modules. In kernel,
>>> when CONFIG_ARM64_ERRATUM_843419 is selected, the relocation needed
>>> for ADRP/LDR (R_AARCH64_ADR_PREL_PG_HI21 and
>>> R_AARCH64_ADR_PREL_PG_HI21_NC are removed from the kernel to avoid
>>> loading objects with possibly offending sequence). Thus, it could only
>>> support pc relative literal loads.
>>>
>>> However, the following patch was posted to kernel to add
>>> -mpc-relative-literal-loads
>>> http://www.spinics.net/lists/arm-kernel/msg476149.html
>>>
>>> -mpc-relative-literal-loads is unconditionally added to the kernel
>>> build as can be seen from:
>>> https://github.com/torvalds/linux/blob/master/arch/arm64/Makefile
>>>
>>> Therefore this patch removes the hunk so that applications like
>>> SPECcpu2017's 521/621.wrf can be built (with LTO in this case) without
>>> -mno-pc-relative-literal-loads
>>>
>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new 
>>> regressions.
>>>
>>> Is this OK for trunk?
>>>
>>> Thanks,
>>> Kugan
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2017-06-27  Kugan Vivekanandarajah  
>>>
>>> * gcc.target/aarch64/pr63304_1.c: Remove-mno-fix-cortex-a53-843419.
>>>
>>> gcc/ChangeLog:
>>>
>>> 2017-06-27  Kugan Vivekanandarajah  
>>>
>>> * config/aarch64/aarch64.c (aarch64_override_options_after_change_1):
>>> Disable pc relative literal load irrespective of 
>>> TARGET_FIX_ERR_A53_84341
>>> for default.


[PATCH], PR target/82015, add PowerPC warning for unpack_vector_int128 with illegal 2nd argument

2017-08-28 Thread Michael Meissner
One of the local programmers tried to use the __builtin_unpack_vector_int128
function, but his second argument was not the constant 0 or 1.  The compiler
put the 2nd argument into a register, but there wasn't a valid insn for this,
and raised an insn not found message.  GCC should warn about this illegal
usage.

This patch adds such a warning.  While I was mucking about with this built-in
function, I fixed the constraints to allow the 64-bit integer for unpack result
and pack inputs to be in the traditional Altivec registers as well as the
traditional floating point registers.

I did a bootstrap of the compiler, and there were no regressions on a little
endian power8 system.  I verified that the new test is run.  Can I apply this
patch to the trunk?

Can I apply the part of the patch from rs6000.c to the existing GCC 6/7
branches as well after a shake down period?  The patch to rs6000.md is not
appropriate for GCC 6 (since DImode can't go into Altivec registers).  For GCC
7, it would need to be modified to use the 'wi' constraint instead of 'wa',
since GCC 7 still had support for the -mno-upper-regs-di option to control
access to the traditional Altivec register.

[gcc]
2017-08-28  Michael Meissner  

PR target/82015
* config/rs6000/rs6000.c (rs6000_expand_binop_builtin): Insure
that the second argument of the built-in functions to unpack
128-bit scalar types to 64-bit values is 0 or 1.  Change to use a
switch statement instead a lot of if statements.
* config/rs6000/rs6000.md (unpack, FMOVE128_VSX iterator):
Allow 64-bit values to be in Altivec registers as well as
traditional floating point registers.
(pack, FMOVE128_VSX iterator): Likewise.

[gcc/testsuite]
2017-08-28  Michael Meissner  

PR target/82015
* gcc.target/powerpc/pr82015.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 251390)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -14001,14 +14001,17 @@ rs6000_expand_binop_builtin (enum insn_c
   if (arg0 == error_mark_node || arg1 == error_mark_node)
 return const0_rtx;
 
-  if (icode == CODE_FOR_altivec_vcfux
-  || icode == CODE_FOR_altivec_vcfsx
-  || icode == CODE_FOR_altivec_vctsxs
-  || icode == CODE_FOR_altivec_vctuxs
-  || icode == CODE_FOR_altivec_vspltb
-  || icode == CODE_FOR_altivec_vsplth
-  || icode == CODE_FOR_altivec_vspltw)
+  switch (icode)
 {
+default:
+  break;
+case CODE_FOR_altivec_vcfux:
+case CODE_FOR_altivec_vcfsx:
+case CODE_FOR_altivec_vctsxs:
+case CODE_FOR_altivec_vctuxs:
+case CODE_FOR_altivec_vspltb:
+case CODE_FOR_altivec_vsplth:
+case CODE_FOR_altivec_vspltw:
   /* Only allow 5-bit unsigned literals.  */
   STRIP_NOPS (arg1);
   if (TREE_CODE (arg1) != INTEGER_CST
@@ -14017,16 +14020,15 @@ rs6000_expand_binop_builtin (enum insn_c
  error ("argument 2 must be a 5-bit unsigned literal");
  return CONST0_RTX (tmode);
}
-}
-  else if (icode == CODE_FOR_dfptstsfi_eq_dd
-  || icode == CODE_FOR_dfptstsfi_lt_dd
-  || icode == CODE_FOR_dfptstsfi_gt_dd
-  || icode == CODE_FOR_dfptstsfi_unordered_dd
-  || icode == CODE_FOR_dfptstsfi_eq_td
-  || icode == CODE_FOR_dfptstsfi_lt_td
-  || icode == CODE_FOR_dfptstsfi_gt_td
-  || icode == CODE_FOR_dfptstsfi_unordered_td)
-{
+  break;
+case CODE_FOR_dfptstsfi_eq_dd:
+case CODE_FOR_dfptstsfi_lt_dd:
+case CODE_FOR_dfptstsfi_gt_dd:
+case CODE_FOR_dfptstsfi_unordered_dd:
+case CODE_FOR_dfptstsfi_eq_td:
+case CODE_FOR_dfptstsfi_lt_td:
+case CODE_FOR_dfptstsfi_gt_td:
+case CODE_FOR_dfptstsfi_unordered_td:
   /* Only allow 6-bit unsigned literals.  */
   STRIP_NOPS (arg0);
   if (TREE_CODE (arg0) != INTEGER_CST
@@ -14035,13 +14037,12 @@ rs6000_expand_binop_builtin (enum insn_c
  error ("argument 1 must be a 6-bit unsigned literal");
  return CONST0_RTX (tmode);
}
-}
-  else if (icode == CODE_FOR_xststdcqp
-  || icode == CODE_FOR_xststdcdp
-  || icode == CODE_FOR_xststdcsp
-  || icode == CODE_FOR_xvtstdcdp
-  || icode == CODE_FOR_xvtstdcsp)
-{
+  break;
+case CODE_FOR_xststdcqp:
+case CODE_FOR_xststdcdp:
+case CODE_FOR_xststdcsp:
+case CODE_FOR_xvtstdcdp:
+case CODE_FOR_xvtstdcsp:
   /* Only allow 7-bit unsigned literals. */
   STRIP_NOPS (arg1);
   if (TREE_CODE (arg1) != INTEGER_CST
@@ -14050,6 +14051,21 @@ rs6000_expand_binop_builtin (enum insn_c
  error ("argument 2 must be a 7-bit unsigned literal");
  return CONST0_RTX (tmode);
}

Re: [PATCH 1/3] improve detection of attribute conflicts (PR 81544)

2017-08-28 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01087.html

On 08/24/2017 02:43 PM, Martin Sebor wrote:

The bulk of this patch touches what I think is considered
the middle-end (attribs.c) so let me include its maintainers,
Ian, Jeff, and Richard:

  https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01087.html

The high-level description and rationale for the change are
here:

  https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00602.html

Thanks
Martin

On 08/17/2017 10:03 AM, Martin Sebor wrote:

Attached is an updated patch with just the minor editorial
tweaks for the issues pointed out by Marek (stray comments
and spaces), and with the C++ and libstdc++ bits removed
and posted separately.  I also added some text to the manual
to clarify the const/pure effects.

I thought quite a bit more about the const/pure attributes we
discussed and tried the approach of warning only on a const
declaration after one with attribute pure has been used, but
otherwise allowing and silently ignoring pure after const.
In the end I decided against it, for a few reasons (most of
which I already mentioned but just to summarize).

First, there is the risk that someone will write code based
on the pure declaration even if there's no intervening call
to the function between it and the const one.  Code tends to
be sloppy, and it's also not uncommon to declare the same
function more than once, for whatever reason.  (The ssa-ccp-2.c
test is an example of the latter.)

Second, there are cases of attribute conflicts that GCC already
points out that are much more benign in their effects (the ones
I know about are always_inline/noinline and cold/hot).  In light
of the risk above it seems only helpful to include const/pure in
the same set.

Third, I couldn't find another pair of attributes that GCC would
deliberately handle this way (silently accept both but prefer one
over the other), and introducing a special case just for these
two seemed like a wart.

Finally, compiling Binutils, GDB, Glkibc, and the Linux kernel
with the enhanced warning didn't turn up any code that does this
sort of thing, either intentionally or otherwise.

With that, I've left the handling unchanged.

I do still have the question whether diagnosing attribute
conflicts under -Wattributes is right.  The manual implies
that -Wattributes is for attributes in the wrong places or
on the wrong entities, and that -Wignored-attributes should
be expected instead when GCC decides to drop one for some
reason.

It is a little unfortunate that many -Wattributes warnings
print just "attribute ignored" (and have done so for years).
I think they should all be enhanced to also print why the
attribute is ignored (e.g., "'packed' attribute on function
declarations ignored/not valid/not supported" or something
like that).  Those that ignore attributes that would
otherwise be valid e.g., because they conflict with other
specifiers of the same attribute but with a different
operand might then be candidate for changing to
-Wignored-attributes.

Thanks
Martin







[PING 2] [PATCH 3/4] enhance overflow and truncation detection in strncpy and strncat (PR 81117)

2017-08-28 Thread Martin Sebor

Ping #2: https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00912.html

On 08/23/2017 01:46 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00912.html

Jeff, is this version good to commit or are there any other
changes you'd like to see?

Martin

On 08/14/2017 04:40 PM, Martin Sebor wrote:

On 08/10/2017 01:29 PM, Martin Sebor wrote:

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 016f68d..1aa9e22 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c

[ ... ]

+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+{
+  /* Return the constant size unless it's zero (that's a
zero-length
+ array likely at the end of a struct).  */
+  tree size = TYPE_SIZE_UNIT (type);
+  if (size && TREE_CODE (size) == INTEGER_CST
+  && !integer_zerop (size))
+return size;
+}

Q. Do we have a canonical test for the trailing array idiom?   In some
contexts isn't it size 1?  ISTM This test needs slight improvement.
Ideally we'd use some canonical test for detect the trailing array
idiom
rather than open-coding it here.  You might look at the array index
warnings in tree-vrp.c to see if it's got a canonical test you can call
or factor and use.


You're right, there is an API for this (array_at_struct_end_p,
as Richard pointed out).  I didn't want to use it because it
treats any array at the end of a struct as a flexible array
member, but simple tests show that that's what -Wstringop-
overflow does now, and it wasn't my intention to tighten up
the checking under this change.  It surprises me that no tests
exposed this. Let me relax the check and think about proposing
to tighten it up separately.


Done in the attached patch.  (I opened bug 81849 for the enhancement
to have -Wstringop-overflow diagnose overflows when writing to member
arrays bigger than 1 element even if they're last).

(I've left the handling for zero size in place because GCC allows
global arrays to be declared to have zero elements.)




@@ -3883,6 +3920,30 @@ expand_builtin_strncat (tree exp, rtx)
   return NULL_RTX;
 }

+/* Helper to check the sizes of sequences and the destination of
calls
+   to __builtin_strncpy (DST, SRC, LEN) and __builtin___strncpy_chk.
+   Returns true on success (no overflow warning), false
otherwise.  */
+
+static bool
+check_strncpy_sizes (tree exp, tree dst, tree src, tree len)
+{
+  tree dstsize = compute_objsize (dst, warn_stringop_overflow - 1);
+
+  if (!check_sizes (OPT_Wstringop_overflow_,
+exp, len, /*maxlen=*/NULL_TREE, src, dstsize))
+return false;
+
+  if (!dstsize || TREE_CODE (len) != INTEGER_CST)
+return true;
+
+  if (tree_int_cst_lt (dstsize, len))
+warning_at (EXPR_LOCATION (exp), OPT_Wstringop_truncation,
+"%K%qD specified bound %E exceeds destination size %E",
+exp, get_callee_fndecl (exp), len, dstsize);
+
+  return true;

So in the case where you issue the warning, what should the return
value
be?  According to the comment it should be false.  It looks like you
got
the wrong return value for the tree_int_cst_lt (dstsize, len) test.


Corrected.  The return value is unused by the only caller so
there is no test to exercise it.


Done in the attached patch.


+/* A helper of handle_builtin_stxncpy.  Check to see if the specified
+   bound is a) equal to the size of the destination DST and if so, b)
+   if it's immediately followed by DST[LEN - 1] = '\0'.  If a) holds
+   and b) does not, warn.  Otherwise, do nothing.  Return true if
+   diagnostic has been issued.
+
+   The purpose is to diagnose calls to strncpy and stpncpy that do
+   not nul-terminate the copy while allowing for the idiom where
+   such a call is immediately followed by setting the last element
+   to nul, as in:
+ char a[32];
+ strncpy (a, s, sizeof a);
+ a[sizeof a - 1] = '\0';
+*/

So using gsi_next to find the next statement could make the heuristic
fail to find the a[sizeof a - 1] = '\0'; statement when debugging is
enabled.

gsi_next_nondebug would be better as it would skip over any debug
insns.


Thanks.  I'll have to remember this.


I went with this simple approach for now since it worked for GDB.
If it turns out that there are important instances of this idiom
that rely on intervening statements the warning can be relaxed.


What might be even better would be to use the immediate uses of the
memory tag.  For your case there should be only one immediate use
and it
should point to the statement which NUL terminates the destination.  Or
maybe that would be worse in that you only want to allow this exception
when the statements are consecutive.



 /* Handle a memcpy-like ({mem{,p}cpy,__mem{,p}cpy_chk}) call.
If strlen of the second argument is known and length of the third
argument
is that plus one, strlen of the first argument is the same after
this
@@ -2512,6 +2789,19 @@ strlen_optimize_stmt (gimple_stmt_iterator
*gsi)

You still need to rename strlen_optimize_stmt since after your changes
it does both optimizations and warnings.



Re: [PATCH, rs6000] Fix PR81833 (incorrect code gen for vec_msum)

2017-08-28 Thread Bill Schmidt
On Aug 28, 2017, at 3:56 PM, Bill Schmidt  wrote:
> 
> Hi, 
> 
> PR81833 identifies a problem with the little-endian vector multiply-sum
> instructions.  The original implementation is quite poor (and I am allowed
> to say that, since it was mine).  This patch fixes the code properly.
> 
> The revised code still uses UNSPECs for these ops, which is not strictly
> necessary, although descriptive rtl for them would be pretty complex.  I've
> put in a FIXME to make note of that for a future cleanup.
> 
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  I am
> currently testing on powerpc64-linux-gnu for 32- and 64-bit.  Provided that
> testing succeeds, is this ok for trunk, and for eventual backport to all
> supported releases?

FYI, big-endian tests have completed successfully.

Bill
> 
> Thanks,
> Bill
> 
> 
> [gcc]
> 
> 2017-08-28  Bill Schmidt  
> 
>   PR target/81833
>   * config/rs6000/altivec.md (altivec_vsum2sws): Convert from a
>   define_insn to a define_expand.
>   (altivec_vsum2sws_direct): New define_insn.
>   (altivec_vsumsws): Convert from a define_insn to a define_expand.
> 
> [gcc/testsuite]
> 
> 2017-08-28  Bill Schmidt  
> 
>   PR target/81833
>   * gcc.target/powerpc/pr81833.c: New file.
> 
> 
> Index: gcc/config/rs6000/altivec.md
> ===
> --- gcc/config/rs6000/altivec.md  (revision 251369)
> +++ gcc/config/rs6000/altivec.md  (working copy)
> @@ -1804,51 +1804,61 @@
>   "vsum4ss %0,%1,%2"
>   [(set_attr "type" "veccomplex")])
> 
> -;; FIXME: For the following two patterns, the scratch should only be
> -;; allocated for !VECTOR_ELT_ORDER_BIG, and the instructions should
> -;; be emitted separately.
> -(define_insn "altivec_vsum2sws"
> -  [(set (match_operand:V4SI 0 "register_operand" "=v")
> -(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> -  (match_operand:V4SI 2 "register_operand" "v")]
> -  UNSPEC_VSUM2SWS))
> -   (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))
> -   (clobber (match_scratch:V4SI 3 "=v"))]
> +(define_expand "altivec_vsum2sws"
> +  [(use (match_operand:V4SI 0 "register_operand"))
> +   (use (match_operand:V4SI 1 "register_operand"))
> +   (use (match_operand:V4SI 2 "register_operand"))]
>   "TARGET_ALTIVEC"
> {
>   if (VECTOR_ELT_ORDER_BIG)
> -return "vsum2sws %0,%1,%2";
> +emit_insn (gen_altivec_vsum2sws_direct (operands[0], operands[1],
> +operands[2]));
>   else
> -return "vsldoi %3,%2,%2,12\n\tvsum2sws %3,%1,%3\n\tvsldoi %0,%3,%3,4";
> -}
> -  [(set_attr "type" "veccomplex")
> -   (set (attr "length")
> - (if_then_else
> -   (match_test "VECTOR_ELT_ORDER_BIG")
> -   (const_string "4")
> -   (const_string "12")))])
> +{
> +  rtx tmp1 = gen_reg_rtx (V4SImode);
> +  rtx tmp2 = gen_reg_rtx (V4SImode);
> +  emit_insn (gen_altivec_vsldoi_v4si (tmp1, operands[2],
> +  operands[2], GEN_INT (12)));
> +  emit_insn (gen_altivec_vsum2sws_direct (tmp2, operands[1], tmp1));
> +  emit_insn (gen_altivec_vsldoi_v4si (operands[0], tmp2, tmp2,
> +  GEN_INT (4)));
> +}
> +  DONE;
> +})
> 
> -(define_insn "altivec_vsumsws"
> +; FIXME: This can probably be expressed without an UNSPEC.
> +(define_insn "altivec_vsum2sws_direct"
>   [(set (match_operand:V4SI 0 "register_operand" "=v")
> (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> -  (match_operand:V4SI 2 "register_operand" "v")]
> -  UNSPEC_VSUMSWS))
> -   (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))
> -   (clobber (match_scratch:V4SI 3 "=v"))]
> +   (match_operand:V4SI 2 "register_operand" "v")]
> +  UNSPEC_VSUM2SWS))]
>   "TARGET_ALTIVEC"
> +  "vsum2sws %0,%1,%2"
> +  [(set_attr "type" "veccomplex")
> +   (set_attr "length" "4")])
> +
> +(define_expand "altivec_vsumsws"
> +  [(use (match_operand:V4SI 0 "register_operand"))
> +   (use (match_operand:V4SI 1 "register_operand"))
> +   (use (match_operand:V4SI 2 "register_operand"))]
> +  "TARGET_ALTIVEC"
> {
>   if (VECTOR_ELT_ORDER_BIG)
> -return "vsumsws %0,%1,%2";
> +emit_insn (gen_altivec_vsumsws_direct (operands[0], operands[1],
> +   operands[2]));
>   else
> -return "vspltw %3,%2,0\n\tvsumsws %3,%1,%3\n\tvsldoi %0,%3,%3,12";
> -}
> -  [(set_attr "type" "veccomplex")
> -   (set (attr "length")
> - (if_then_else
> -   (match_test "(VECTOR_ELT_ORDER_BIG)")
> -   (const_string "4")
> -   (const_string "12")))])
> +{
> +  rtx tmp1 = gen_reg_rtx (V4SImode);
> +  rtx tmp2 = gen_reg_rtx (V4SImode);
> +  emit_insn 

Re: [PING #2] [PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)

2017-08-28 Thread Martin Sebor

On 08/24/2017 04:09 PM, Jeff Law wrote:

On 08/22/2017 02:45 AM, Richard Biener wrote:

On Mon, Aug 21, 2017 at 10:10 PM, Martin Sebor  wrote:

On 08/09/2017 10:14 AM, Jeff Law wrote:


On 08/06/2017 05:08 PM, Martin Sebor wrote:



Well, simply because the way as implemented isn't a must-alias query
but maybe one that's good enough for warnings (reduces false positives
but surely doesn't eliminate them).



I'm very interested in reducing the rate of false positives in
these and all other warnings.  As I mentioned in my comments,
I did my best to weed them out of the implementation by building
GDB, Glibc, Busybox, and the Linux kernel.  That of course isn't
a guarantee that there aren't any.  But the first implementation
of any non-trivial feature is never perfect, and hardly any
warning of sufficient complexity is free of false positives, no
matter here it's implemented (the middle-end, front-end, or
a standalone static analysis tool).  If you spotted some cases
I had missed I'd certainly be grateful for examples.  Otherwise,
they will undoubtedly be reported as more software is exposed
to the warning and, if possible, fixed, as happens with all
other warnings.


I think Richi is saying that the must alias query you've built isn't
proper/correct.  It's less about false positives for Richi and more
about building a proper must alias query if I understand him correctly.

I suspect he's also saying that you can't reasonably build must-alias on
top of a may-alias query framework.  They're pretty different queries.

If you need something that is close to, but not quite a must alias
query, then you're going to have to make a argument for that and you
can't call it a must alias query.



Attached is an updated and simplified patch that avoids making
changes to any of the may-alias functions.  It turns out that
all the information the logic needs to determine the overlap
is already in the ao_ref structures populated by
ao_ref_init_from_ptr_and_size.  The only changes to the pass
are the enhancement to ao_ref_init_from_ptr_and_size to make
use of range info and the addition of the two new functions
used by the -Wrestrict clients outside the pass.


Warning for memcpy (p, p, ...) is going to fire false positives all around
given the C++ FE emits those in some cases and optimization can
expose that we are dealing with self-assignments.  And *p = *p is
valid.

Correct.  I wound my way through this mess a while back.  Essentially
Red Hat had a customer with code that had overlapped memcpy arguments.
We had them use the memstomp interposition library to start tracking
these problems down.

One of the things that popped up was structure/class copies which were
implemented via calls to memcpy.In the case of self assignment, the
interposition library would note the overlap and (rightly IMHO) complain.


Is this bug 32667?  I'm not having any luck reproducing it with
any of the test cases there and varying struct sizes, or with
the test case in the duplicate bug 65029 I filed for the same
thing last year.  It would be nice to have a test case.


One could argue that GCC should emit memmove by default for structure
assignments, only using memcpy when it knows its not doing self
assignment (which may be hard to determine).  Furthermore, GCC should
eliminate self structure/class assignment.


If it's still a problem emitting memmove seems like the right
thing to do.  From what I've read the performance advantage of
memcpy over memmove seems debatable at best.  Most performance
sensitive code avoids making copies of very large objects so
the only code that can be impacted doesn't care about efficiency
quite so much.  For small enough objects, inlining the copy as
GCC already does would obviate the efficiency concern altogether.






@@ -1028,6 +1066,10 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
}
}

+  /* Avoid folding the call if overlap is detected.  */
+  if (check_overlap && detect_overlap (loc, stmt, dest, src, len))
+   return false;
+

no, please not.  You diagnosed the call (which might be a false positive)
so why keep it undefined?  The folded stmt will either have the same
semantics (aggregate copies expanded as memcpy) or have all reads
performed before writes.

So can we distinguish here between overlap and the self-copy case?


Yes, but only in a limited subset of cases.



A self-copy should just be folded away.  It's no different than x = x on
scalars except that we've dropped it to a memcpy in the IL.  Doing so
makes the code more efficient and removes false positives from tools
like the memstomp interposition library, making those tools more useful.


It's possible to do in simple cases but not in general.  I agree
that in the general case when overlap is possible the only safe
solution, short of actually testing for it at runtime, is to call
memmove.

Martin


Re: [PING #2] [PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)

2017-08-28 Thread Martin Sebor

On 08/22/2017 02:45 AM, Richard Biener wrote:

On Mon, Aug 21, 2017 at 10:10 PM, Martin Sebor  wrote:

On 08/09/2017 10:14 AM, Jeff Law wrote:


On 08/06/2017 05:08 PM, Martin Sebor wrote:



Well, simply because the way as implemented isn't a must-alias query
but maybe one that's good enough for warnings (reduces false positives
but surely doesn't eliminate them).



I'm very interested in reducing the rate of false positives in
these and all other warnings.  As I mentioned in my comments,
I did my best to weed them out of the implementation by building
GDB, Glibc, Busybox, and the Linux kernel.  That of course isn't
a guarantee that there aren't any.  But the first implementation
of any non-trivial feature is never perfect, and hardly any
warning of sufficient complexity is free of false positives, no
matter here it's implemented (the middle-end, front-end, or
a standalone static analysis tool).  If you spotted some cases
I had missed I'd certainly be grateful for examples.  Otherwise,
they will undoubtedly be reported as more software is exposed
to the warning and, if possible, fixed, as happens with all
other warnings.


I think Richi is saying that the must alias query you've built isn't
proper/correct.  It's less about false positives for Richi and more
about building a proper must alias query if I understand him correctly.

I suspect he's also saying that you can't reasonably build must-alias on
top of a may-alias query framework.  They're pretty different queries.

If you need something that is close to, but not quite a must alias
query, then you're going to have to make a argument for that and you
can't call it a must alias query.



Attached is an updated and simplified patch that avoids making
changes to any of the may-alias functions.  It turns out that
all the information the logic needs to determine the overlap
is already in the ao_ref structures populated by
ao_ref_init_from_ptr_and_size.  The only changes to the pass
are the enhancement to ao_ref_init_from_ptr_and_size to make
use of range info and the addition of the two new functions
used by the -Wrestrict clients outside the pass.


Warning for memcpy (p, p, ...) is going to fire false positives all around
given the C++ FE emits those in some cases and optimization can
expose that we are dealing with self-assignments.  And *p = *p is
valid.


I changed it to only warn for calls to the library function and
not to the __builtin_xxx.  Though I haven't been able to reproduce
the problem you referring to (bug 32667) which makes me wonder if
it's been fixed.



@@ -1028,6 +1066,10 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
}
}

+  /* Avoid folding the call if overlap is detected.  */
+  if (check_overlap && detect_overlap (loc, stmt, dest, src, len))
+   return false;
+

no, please not.  You diagnosed the call (which might be a false positive)
so why keep it undefined?  The folded stmt will either have the same
semantics (aggregate copies expanded as memcpy) or have all reads
performed before writes.


My goal was to follow the approach reflected in the comments
elsewhere in the file:

/* Out of bound array access.  Value is undefined,
   but don't fold.  */

While gimple_fold_builtin_memory_op may be able to provide well-
defined behavior for the otherwise undefined semantics in a small
subset of cases, it doesn't attempt to fold many more that it
otherwise could (it only folds calls with sizes that are powers
of 2).  So it seems of dubious value to make an effort in this
relatively small subset of cases.

In my experience, users also don't appreciate optimizations that
"rely on" undefined behavior one way or the other.  What they would
like to see instead is that when their compiler detects undefined
behavior it diagnoses it but either doesn't use it to make
optimization decisions, or uses it to disable them.  For calls to
library functions, that in my view means making the call and not
folding it.  (Btw., do we have some sort of a policy or guideline
for how to handle such cases in general?)

With all that said, I don't see a big problem with proceeding with
the folding as you suggest either, so I did and added a comment
documenting it and a test to verify this guarantee.

I should also acknowledge that in my approach I forgot that once
the overlap has been diagnosed and the no-warning bit set, the
next call to gimple_fold_builtin_memory_op() with the same
statement would just go ahead and fold it anyway.  So the tests
were ineffective regardless.


The ao_ref_init_from_ptr_and_size change misses a changelog entry.

+detect_overlap (location_t loc, gimple *stmt, tree dst, tree src, tree size,
+   bool adjust /* = false */)
+{
+  ao_ref dstref, srcref;
+  unsigned HOST_WIDE_INT range[2];
+
+  /* Initialize and store the lower bound of a constant offset (in
+ bits), disregarding the offset for the destination.  */
+  ao_ref_init_from_ptr_and_size (, 

Re: [PATCH] Fix pr80044, -static and -pie insanity, and pr81170

2017-08-28 Thread Alan Modra
On Mon, Aug 28, 2017 at 08:27:35AM -0600, Jeff Law wrote:
> So sorry for the horrible delay.  What was the final resolution here?  I
> saw a lot of back and forth with HJ and yourself.  80044 is
> CLOSED/WONTFIX and 81170 has a patch attached to it, but is still in the
> ASSIGNED state.

The patch went in trunk as 5a402d649 (r250974) and a9b2df6cc
(r251065).  PR81170 and PR81295 are still open due to needing a fix
for powerpc --enable-default-pie on the branches.  Last I checked,
both patches apply without any difficulty.

https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00831.html

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH, rs6000] Stop non-volatile CR usage from killing shrink-wrap

2017-08-28 Thread Pat Haugen
The following patch allows shrink-wrapping to succeed in the presence of
non-volatile CR save/restore. The movesi_from_cr define_insn used to
list all CRs as used, even though it's only the non-volatile values that
we are interested in saving/restoring. This prevented the prolog from
being moved past the early exit test because that compare was defining a
register used in the prolog (a volatile CR). The patch removes the
mentions of the volatile CRs and renames the functions involved so that
it's hopefully clear they are for prolog generation only.

Bootstrap/regtest on powerpc64le-linux with no new regressions. Ok for
trunk?

-Pat



2017-08-28  Pat Haugen  

* config/rs6000/rs6000.c (rs6000_emit_prolog_move_from_cr): Rename from
rs6000_emit_move_from_cr and call renamed function.
(rs6000_emit_prologue): Call renamed functions.
* config/rs6000/rs6000.md (prolog_movesi_from_cr): Rename from
prolog_movesi_from_cr, remove volatile CRs.


testsuite/ChangeLog:
2017-08-28  Pat Haugen  

* gcc.target/powerpc/cr_shrink-wrap.c: New.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c	(revision 251389)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -26083,10 +26083,14 @@ rs6000_emit_savres_rtx (rs6000_stack_t *
   return insn;
 }
 
-/* Emit code to store CR fields that need to be saved into REG.  */
+/* Emit prolog code to store CR fields that need to be saved into REG. This
+   function should only be called when moving the non-volatile CRs to REG, it
+   is not a general purpose routine to move the entire set of CRs to REG.
+   Specifically, gen_prolog_movesi_from_cr() does not contain uses of the
+   volatile CRs.  */
 
 static void
-rs6000_emit_move_from_cr (rtx reg)
+rs6000_emit_prolog_move_from_cr (rtx reg)
 {
   /* Only the ELFv2 ABI allows storing only selected fields.  */
   if (DEFAULT_ABI == ABI_ELFv2 && TARGET_MFCRF)
@@ -26117,7 +26121,7 @@ rs6000_emit_move_from_cr (rtx reg)
 	 as well, using logical operations to combine the values.  */
 }
 
-  emit_insn (gen_movesi_from_cr (reg));
+  emit_insn (gen_prolog_movesi_from_cr (reg));
 }
 
 /* Return whether the split-stack arg pointer (r12) is used.  */
@@ -26857,7 +26861,7 @@ rs6000_emit_prologue (void)
 {
   cr_save_rtx = gen_rtx_REG (SImode, cr_save_regno);
   START_USE (cr_save_regno);
-  rs6000_emit_move_from_cr (cr_save_rtx);
+  rs6000_emit_prolog_move_from_cr (cr_save_rtx);
 }
 
   /* Do any required saving of fpr's.  If only one or two to save, do
@@ -27095,7 +27099,7 @@ rs6000_emit_prologue (void)
 	{
 	  START_USE (0);
 	  cr_save_rtx = gen_rtx_REG (SImode, 0);
-	  rs6000_emit_move_from_cr (cr_save_rtx);
+	  rs6000_emit_prolog_move_from_cr (cr_save_rtx);
 	}
 
   /* Saving CR requires a two-instruction sequence: one instruction
@@ -27182,7 +27186,7 @@ rs6000_emit_prologue (void)
   /* ??? We might get better performance by using multiple mfocrf
 	 instructions.  */
   crsave = gen_rtx_REG (SImode, 0);
-  emit_insn (gen_movesi_from_cr (crsave));
+  emit_insn (gen_prolog_movesi_from_cr (crsave));
 
   for (i = 0; i < 8; i++)
 	if (!call_used_regs[CR0_REGNO + i])
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md	(revision 251389)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -13032,12 +13032,14 @@ (define_insn "*movesi_from_cr_one"
 }"
   [(set_attr "type" "mfcrf")])
 
-(define_insn "movesi_from_cr"
+;; Don't include the volatile CRs since their values are not used wrt CR save
+;; in the prolog and doing so prevents shrink-wrapping because we can't move the
+;; prolog past an insn (early exit test) that defines a register used in the
+;; prolog.
+(define_insn "prolog_movesi_from_cr"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
-(unspec:SI [(reg:CC CR0_REGNO) (reg:CC CR1_REGNO)
-		(reg:CC CR2_REGNO) (reg:CC CR3_REGNO)
-		(reg:CC CR4_REGNO) (reg:CC CR5_REGNO)
-		(reg:CC CR6_REGNO) (reg:CC CR7_REGNO)]
+(unspec:SI [(reg:CC CR2_REGNO) (reg:CC CR3_REGNO)
+		(reg:CC CR4_REGNO)]
 		   UNSPEC_MOVESI_FROM_CR))]
   ""
   "mfcr %0"
Index: gcc/testsuite/gcc.target/powerpc/cr_shrink-wrap.c
===
--- gcc/testsuite/gcc.target/powerpc/cr_shrink-wrap.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/cr_shrink-wrap.c	(working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */
+
+void foo(int i)
+{
+  if (i > 0)
+/* Non-volatile CR kill on true path should not prevent shrink-wrap.  */
+asm ("" : : : "cr2", "cr3");
+}
+
+/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 "pro_and_epilogue" } } */


[PATCH, rs6000] Fix PR81833 (incorrect code gen for vec_msum)

2017-08-28 Thread Bill Schmidt
Hi, 

PR81833 identifies a problem with the little-endian vector multiply-sum
instructions.  The original implementation is quite poor (and I am allowed
to say that, since it was mine).  This patch fixes the code properly.

The revised code still uses UNSPECs for these ops, which is not strictly
necessary, although descriptive rtl for them would be pretty complex.  I've
put in a FIXME to make note of that for a future cleanup.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  I am
currently testing on powerpc64-linux-gnu for 32- and 64-bit.  Provided that
testing succeeds, is this ok for trunk, and for eventual backport to all
supported releases?

Thanks,
Bill


[gcc]

2017-08-28  Bill Schmidt  

PR target/81833
* config/rs6000/altivec.md (altivec_vsum2sws): Convert from a
define_insn to a define_expand.
(altivec_vsum2sws_direct): New define_insn.
(altivec_vsumsws): Convert from a define_insn to a define_expand.

[gcc/testsuite]

2017-08-28  Bill Schmidt  

PR target/81833
* gcc.target/powerpc/pr81833.c: New file.


Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 251369)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -1804,51 +1804,61 @@
   "vsum4ss %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-;; FIXME: For the following two patterns, the scratch should only be
-;; allocated for !VECTOR_ELT_ORDER_BIG, and the instructions should
-;; be emitted separately.
-(define_insn "altivec_vsum2sws"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
-  (match_operand:V4SI 2 "register_operand" "v")]
-UNSPEC_VSUM2SWS))
-   (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))
-   (clobber (match_scratch:V4SI 3 "=v"))]
+(define_expand "altivec_vsum2sws"
+  [(use (match_operand:V4SI 0 "register_operand"))
+   (use (match_operand:V4SI 1 "register_operand"))
+   (use (match_operand:V4SI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
   if (VECTOR_ELT_ORDER_BIG)
-return "vsum2sws %0,%1,%2";
+emit_insn (gen_altivec_vsum2sws_direct (operands[0], operands[1],
+operands[2]));
   else
-return "vsldoi %3,%2,%2,12\n\tvsum2sws %3,%1,%3\n\tvsldoi %0,%3,%3,4";
-}
-  [(set_attr "type" "veccomplex")
-   (set (attr "length")
- (if_then_else
-   (match_test "VECTOR_ELT_ORDER_BIG")
-   (const_string "4")
-   (const_string "12")))])
+{
+  rtx tmp1 = gen_reg_rtx (V4SImode);
+  rtx tmp2 = gen_reg_rtx (V4SImode);
+  emit_insn (gen_altivec_vsldoi_v4si (tmp1, operands[2],
+  operands[2], GEN_INT (12)));
+  emit_insn (gen_altivec_vsum2sws_direct (tmp2, operands[1], tmp1));
+  emit_insn (gen_altivec_vsldoi_v4si (operands[0], tmp2, tmp2,
+  GEN_INT (4)));
+}
+  DONE;
+})
 
-(define_insn "altivec_vsumsws"
+; FIXME: This can probably be expressed without an UNSPEC.
+(define_insn "altivec_vsum2sws_direct"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
-  (match_operand:V4SI 2 "register_operand" "v")]
-UNSPEC_VSUMSWS))
-   (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))
-   (clobber (match_scratch:V4SI 3 "=v"))]
+ (match_operand:V4SI 2 "register_operand" "v")]
+UNSPEC_VSUM2SWS))]
   "TARGET_ALTIVEC"
+  "vsum2sws %0,%1,%2"
+  [(set_attr "type" "veccomplex")
+   (set_attr "length" "4")])
+
+(define_expand "altivec_vsumsws"
+  [(use (match_operand:V4SI 0 "register_operand"))
+   (use (match_operand:V4SI 1 "register_operand"))
+   (use (match_operand:V4SI 2 "register_operand"))]
+  "TARGET_ALTIVEC"
 {
   if (VECTOR_ELT_ORDER_BIG)
-return "vsumsws %0,%1,%2";
+emit_insn (gen_altivec_vsumsws_direct (operands[0], operands[1],
+   operands[2]));
   else
-return "vspltw %3,%2,0\n\tvsumsws %3,%1,%3\n\tvsldoi %0,%3,%3,12";
-}
-  [(set_attr "type" "veccomplex")
-   (set (attr "length")
- (if_then_else
-   (match_test "(VECTOR_ELT_ORDER_BIG)")
-   (const_string "4")
-   (const_string "12")))])
+{
+  rtx tmp1 = gen_reg_rtx (V4SImode);
+  rtx tmp2 = gen_reg_rtx (V4SImode);
+  emit_insn (gen_altivec_vspltw_direct (tmp1, operands[2], const0_rtx));
+  emit_insn (gen_altivec_vsumsws_direct (tmp2, operands[1], tmp1));
+  emit_insn (gen_altivec_vsldoi_v4si (operands[0], tmp2, tmp2,
+  GEN_INT (12)));
+}
+  DONE;
+})
 
+; FIXME: This can probably be expressed without an UNSPEC.
 (define_insn 

Re: [patch, fortran] Warn about suspicious assignment to contiguous pointers

2017-08-28 Thread Janus Weil
Hi Thomas,

>>> I think an unconditional warning is OK
>>> in this case because
>>>
>>> - Assigning to a pointer from an obvious non-contiguous target
>>>is not useful at all, that I can see
>>
>>
>> I guess you're talking about a *contiguous* pointer here,
>
> Correct.
>
>> and in that
>> case I would argue that, beyond being "not useful", it's even illegal,
>> so why not throw a hard error, if we can infer at compile-time that
>> the target is non-contiguous?
>
> Problem is, we cannot infer this from the tests done.
> We would also have to add a test if the array is empty
> or that it contains only a single element, and that (I think)
> is a) impossible in the general case, and b) not worth the bother.

I'm not sure I understand which cases you're worried about here. Maybe
you can give an example?

In any case, I think your test case is a bit short, so I extended it
somewhat (see attachment) and found two cases along the way where your
patch throws a warning but shouldn't:

r => x(::-1)
r => x2(2:3,1)

The first one is a stride of minus one, which is valid and contiguous
AFAICS. The second one is obviously also contiguous.


>> As explained above, I'd vote for an error (or at least a conditional
>> warning, so that it can be disabled, like most(?) other gfortran
>> warnings).
>
> Attached is a version which makes this a warning enabled by -Wall;
> this should be enough to give people a heads-up.
>
> I have introduced most of your comments into the revised patch.

Thank you, looks much better already.

Apart from the two mishandled cases above, one other thing comes to my
mind: It might be a good idea to apply your checks not only to pointer
assignments, but also to dummy arguments (passing a non-contiguous
array to a contiguous dummy pointer), where the same rules should
apply.


> So, here is the updated patch.  Regression-tested on
> powerpc-linux, make dvi and make pdf also passed.
> OK for trunk?

Not quite, but we're getting closer :)

Sorry for being such a nag ...

Cheers,
Janus
! { dg-do compile }
! { dg-additional-options "-Wcontiguous" }
program cont_01_neg
  implicit none
  real, pointer, contiguous :: r(:)
  real, pointer, contiguous :: r2(:,:)
  real, target :: x(45)
  real, target :: x2(5,9)
  integer :: i

  x = (/ (real(i),i=1,45) /)
  x2 = reshape(x,shape(x2))
  i = 4

  r => x(::1)
  r => x(::-1)  ! bogus warning here
  r => x(::3)   ! { dg-warning "Assignment to contiguous pointer" }
  r => x(::-3)  ! { dg-warning "Assignment to contiguous pointer" }
  r => x(::i)   ! { dg-warning "Assignment to contiguous pointer" }

  r2 => x2(:,2:)
  r2 => x2(1:,:)
  r2 => x2(2:,:)! { dg-warning "Assignment to contiguous pointer" }

  r2 => x2(:,:4)
  r2 => x2(:5,:)
  r2 => x2(:4,:)! { dg-warning "Assignment to contiguous pointer" }

  r2 => x2(:,2:3)
  r2 => x2(1:5,:)
  r2 => x2(2:3,:)   ! { dg-warning "Assignment to contiguous pointer" }

  r => x2(3,:)  ! { dg-warning "Assignment to contiguous pointer" }
  r => x2(:,3)

  r2 => x2(3:3,:)   ! { dg-warning "Assignment to contiguous pointer" }
  r2 => x2(:,3:3)

  r2 => x2(2:3:2,:) ! { dg-warning "Assignment to contiguous pointer" }
  r2 => x2(:,2:3:2) ! { dg-warning "Assignment to contiguous pointer" }

  r => x2(1,2:3)
  r => x2(2:3,1)! bogus warning here

end program


[PATCH v3] Fix PR81503 (SLSR invalid fold)

2017-08-28 Thread Bill Schmidt
> On Aug 28, 2017, at 7:37 AM, Richard Biener  
> wrote:
> 
> Not sure, but would it be fixed in a similar way when writing

> ?

Thanks, Richard, that works very well.  I decided this was a good opportunity 
to also
simplify the control flow a little with early returns.  Here's the result.  
Bootstrapped and
tested on powerpc64le-linux-gnu with no regressions.  Is this ok for trunk, and
eventually for backport to gcc 5, 6, and 7?  (I can omit the control flow 
cleanups for
the older releases if desired.)

Thanks,
Bill

[gcc]

2017-08-03  Bill Schmidt  
Jakub Jelinek  
Richard Biener  

PR tree-optimization/81503
* gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure
folded constant fits in the target type; reorder tests for clarity.

[gcc/testsuite]

2017-08-03  Bill Schmidt  
Jakub Jelinek  
Richard Biener  

PR tree-optimization/81503
* gcc.c-torture/execute/pr81503.c: New file.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 251369)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -2089,104 +2089,104 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
   tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
   enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
 
-  /* It is highly unlikely, but possible, that the resulting
- bump doesn't fit in a HWI.  Abandon the replacement
- in this case.  This does not affect siblings or dependents
- of C.  Restriction to signed HWI is conservative for unsigned
- types but allows for safe negation without twisted logic.  */
-  if (wi::fits_shwi_p (bump)
-  && bump.to_shwi () != HOST_WIDE_INT_MIN
-  /* It is not useful to replace casts, copies, negates, or adds of
-an SSA name and a constant.  */
-  && cand_code != SSA_NAME
-  && !CONVERT_EXPR_CODE_P (cand_code)
-  && cand_code != PLUS_EXPR
-  && cand_code != POINTER_PLUS_EXPR
-  && cand_code != MINUS_EXPR
-  && cand_code != NEGATE_EXPR)
+  /* It is not useful to replace casts, copies, negates, or adds of
+ an SSA name and a constant.  */
+  if (cand_code == SSA_NAME
+  || CONVERT_EXPR_CODE_P (cand_code)
+  || cand_code == PLUS_EXPR
+  || cand_code == POINTER_PLUS_EXPR
+  || cand_code == MINUS_EXPR
+  || cand_code == NEGATE_EXPR)
+return;
+
+  enum tree_code code = PLUS_EXPR;
+  tree bump_tree;
+  gimple *stmt_to_print = NULL;
+
+  if (wi::neg_p (bump))
 {
-  enum tree_code code = PLUS_EXPR;
-  tree bump_tree;
-  gimple *stmt_to_print = NULL;
+  code = MINUS_EXPR;
+  bump = -bump;
+}
 
-  /* If the basis name and the candidate's LHS have incompatible
-types, introduce a cast.  */
-  if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name)))
-   basis_name = introduce_cast_before_cand (c, target_type, basis_name);
-  if (wi::neg_p (bump))
+  /* It is possible that the resulting bump doesn't fit in target_type.
+ Abandon the replacement in this case.  This does not affect
+ siblings or dependents of C.  */
+  if (bump != wi::ext (bump, TYPE_PRECISION (target_type),
+  TYPE_SIGN (target_type)))
+return;
+
+  bump_tree = wide_int_to_tree (target_type, bump);
+
+  /* If the basis name and the candidate's LHS have incompatible types,
+ introduce a cast.  */
+  if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name)))
+basis_name = introduce_cast_before_cand (c, target_type, basis_name);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fputs ("Replacing: ", dump_file);
+  print_gimple_stmt (dump_file, c->cand_stmt, 0);
+}
+
+  if (bump == 0)
+{
+  tree lhs = gimple_assign_lhs (c->cand_stmt);
+  gassign *copy_stmt = gimple_build_assign (lhs, basis_name);
+  gimple_stmt_iterator gsi = gsi_for_stmt (c->cand_stmt);
+  slsr_cand_t cc = c;
+  gimple_set_location (copy_stmt, gimple_location (c->cand_stmt));
+  gsi_replace (, copy_stmt, false);
+  c->cand_stmt = copy_stmt;
+  while (cc->next_interp)
{
- code = MINUS_EXPR;
- bump = -bump;
+ cc = lookup_cand (cc->next_interp);
+ cc->cand_stmt = copy_stmt;
}
-
-  bump_tree = wide_int_to_tree (target_type, bump);
-
   if (dump_file && (dump_flags & TDF_DETAILS))
+   stmt_to_print = copy_stmt;
+}
+  else
+{
+  tree rhs1, rhs2;
+  if (cand_code != NEGATE_EXPR) {
+   rhs1 = gimple_assign_rhs1 (c->cand_stmt);
+   rhs2 = gimple_assign_rhs2 (c->cand_stmt);
+  }
+  if (cand_code != NEGATE_EXPR
+ && ((operand_equal_p (rhs1, basis_name, 

Rb_tree constructor optimization

2017-08-28 Thread François Dumont

Hi

Here is the always equal allocator optimization for associative 
containers.


Tested under Linux x86_64.

* include/bits/stl_tree.h
(_Rb_tree_impl(_Rb_tree_impl&&, _Node_allocator&&)): New.
(_Rb_tree(_Rb_tree&&, _Node_allocator&&, std::true_type)): New.
(_Rb_tree(_Rb_tree&&, _Node_allocator&&, std::false_type)): Likewise.
(_Rb_tree(_Rb_tree&&, _Node_allocator&&)): Adapt, use latters.

Ok to apply ?

François


diff --git a/libstdc++-v3/include/bits/stl_tree.h b/libstdc++-v3/include/bits/stl_tree.h
index c2417f1..f7d34e3 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -704,6 +704,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #else
 	  _Rb_tree_impl(_Rb_tree_impl&&) = default;
 
+	  _Rb_tree_impl(_Rb_tree_impl&& __x, _Node_allocator&& __a) noexcept
+	: _Node_allocator(std::move(__a)),
+	  _Base_key_compare(std::move(__x)),
+	  _Rb_tree_header(std::move(__x))
+	  { }
+
 	  _Rb_tree_impl(const _Key_compare& __comp, _Node_allocator&& __a)
 	  : _Node_allocator(std::move(__a)), _Base_key_compare(__comp)
 	  { }
@@ -947,7 +953,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   : _Rb_tree(std::move(__x), _Node_allocator(__a))
   { }
 
-  _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a);
+private:
+  _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a, std::true_type) noexcept
+  : _M_impl(std::move(__x._M_impl), std::move(__a))
+  { }
+
+  _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a, std::false_type);
+
+public:
+  _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a)
+  : _Rb_tree(std::move(__x), std::move(__a),
+		 typename _Alloc_traits::is_always_equal{})
+  { }
 #endif
 
   ~_Rb_tree() _GLIBCXX_NOEXCEPT
@@ -1591,12 +1608,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::
-_Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a)
+_Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a, std::false_type __eq)
 : _M_impl(__x._M_impl._M_key_compare, std::move(__a))
 {
-  using __eq = typename _Alloc_traits::is_always_equal;
   if (__x._M_root() != nullptr)
-	_M_move_data(__x, __eq());
+	_M_move_data(__x, __eq);
 }
 
   template

Re: [06/77] Make GET_MODE_WIDER return an opt_mode

2017-08-28 Thread Jeff Law
On 08/28/2017 01:05 PM, Richard Sandiford wrote:
> 
>> As for the name, get_nonvoid?  Ugh.  Not sure.  Open to suggestions.
> 
> I'd rather avoid "nonvoid", since the use of VOIDmode for "no mode" is
> really an implementation detail in things like opt_mode .
> Other possiblities might be:
Yea, good point on encoding the implementation detail not being a good idea.

> 
>   - require
>   - demand
>   - mode
>   - get_mode
>   - require_mode
>   - demand_mode
>   - else_fail (to go with else_void and else_blk)
>   - noelse
require, demand with or without the _mode suffix seem good to me.

jeff


Re: std::forward_list optim for always equal allocator

2017-08-28 Thread François Dumont

Hi

Any news for this patch ?

It does remove a constructor:
-_Fwd_list_impl(const _Node_alloc_type& __a)
-: _Node_alloc_type(__a), _M_head()

 It was already unused before the patch. Do you think it has ever 
been used and so do I need to restore it ?


I eventually restore the _M_head() in _Fwd_list_impl constructors 
cause IMO it is the inline init of _M_next in _Fwd_list_node_base which 
should be removed. But I remember Jonathan that you didn't want to do so 
because gcc was not good enough in detecting usage of uninitialized 
variables, is it still true ?


François


On 17/07/2017 22:10, François Dumont wrote:

Hi

Here is the patch to implement the always equal alloc optimization 
for forward_list. With this version there is no abi issue.


I also prefer to implement the _Fwd_list_node_base move operator 
for consistency with the move constructor and used it where applicable.



* include/bits/forward_list.h
(_Fwd_list_node_base& operator=(_Fwd_list_node_base&&)): Implement.
(_Fwd_list_impl(_Fwd_list_impl&&, _Node_alloc_type&&)): New.
(_Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&, 
std::true_type)):

New, use latter.
(forward_list(forward_list&&, _Node_alloc_type&&, std::false_type)):
New.
(forward_list(forward_list&&, _Node_alloc_type&&, std::true_type)):
New.
(forward_list(forward_list&&, const _Alloc&)): Adapt to use latters.
* include/bits/forward_list.tcc
(_Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&)): Adapt to use
_M_impl._M_head move assignment.
(forward_list<>::merge(forward_list<>&&, _Comp)): Likewise.

Tested under Linux x86_64, ok to commit ?

François



diff --git a/libstdc++-v3/include/bits/forward_list.h b/libstdc++-v3/include/bits/forward_list.h
index 9d86fcc..772e9a0 100644
--- a/libstdc++-v3/include/bits/forward_list.h
+++ b/libstdc++-v3/include/bits/forward_list.h
@@ -54,6 +54,20 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   struct _Fwd_list_node_base
   {
 _Fwd_list_node_base() = default;
+_Fwd_list_node_base(_Fwd_list_node_base&& __x) noexcept
+  : _M_next(__x._M_next)
+{ __x._M_next = nullptr; }
+
+_Fwd_list_node_base(const _Fwd_list_node_base&) = delete;
+_Fwd_list_node_base& operator=(const _Fwd_list_node_base&) = delete;
+
+_Fwd_list_node_base&
+operator=(_Fwd_list_node_base&& __x) noexcept
+{
+  _M_next = __x._M_next;
+  __x._M_next = nullptr;
+  return *this;
+}
 
 _Fwd_list_node_base* _M_next = nullptr;
 
@@ -68,7 +82,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	  __end->_M_next = _M_next;
 	}
   else
-	__begin->_M_next = 0;
+	__begin->_M_next = nullptr;
   _M_next = __keep;
   return __end;
 }
@@ -173,7 +187,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	if (_M_node)
 	  return _Fwd_list_iterator(_M_node->_M_next);
 	else
-  return _Fwd_list_iterator(0);
+	  return _Fwd_list_iterator(nullptr);
   }
 
   _Fwd_list_node_base* _M_node;
@@ -244,7 +258,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	if (this->_M_node)
 	  return _Fwd_list_const_iterator(_M_node->_M_next);
 	else
-  return _Fwd_list_const_iterator(0);
+	  return _Fwd_list_const_iterator(nullptr);
   }
 
   const _Fwd_list_node_base* _M_node;
@@ -285,11 +299,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	_Fwd_list_node_base _M_head;
 
 	_Fwd_list_impl()
+	  noexcept( noexcept(_Node_alloc_type()) )
 	: _Node_alloc_type(), _M_head()
 	{ }
 
-_Fwd_list_impl(const _Node_alloc_type& __a)
-: _Node_alloc_type(__a), _M_head()
+	_Fwd_list_impl(_Fwd_list_impl&&) = default;
+
+	_Fwd_list_impl(_Fwd_list_impl&& __fl, _Node_alloc_type&& __a)
+	: _Node_alloc_type(std::move(__a)), _M_head(std::move(__fl._M_head))
 	{ }
 
 	_Fwd_list_impl(_Node_alloc_type&& __a)
@@ -312,26 +329,26 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   _M_get_Node_allocator() const noexcept
   { return this->_M_impl; }
 
-  _Fwd_list_base()
-  : _M_impl() { }
+  _Fwd_list_base() = default;
 
   _Fwd_list_base(_Node_alloc_type&& __a)
   : _M_impl(std::move(__a)) { }
 
+  // When allocators are always equal.
+  _Fwd_list_base(_Fwd_list_base&& __lst, _Node_alloc_type&& __a,
+		 std::true_type)
+  : _M_impl(std::move(__lst._M_impl), std::move(__a))
+  { }
+
+  // When allocators are not always equal.
   _Fwd_list_base(_Fwd_list_base&& __lst, _Node_alloc_type&& __a);
 
-  _Fwd_list_base(_Fwd_list_base&& __lst)
-  : _M_impl(std::move(__lst._M_get_Node_allocator()))
-  {
-	this->_M_impl._M_head._M_next = __lst._M_impl._M_head._M_next;
-	__lst._M_impl._M_head._M_next = 0;
-  }
+  _Fwd_list_base(_Fwd_list_base&&) = default;
 
   ~_Fwd_list_base()
-  { _M_erase_after(&_M_impl._M_head, 0); }
+  { _M_erase_after(&_M_impl._M_head, nullptr); }
 
 protected:
-
   _Node*
   _M_get_node()
   {
@@ -437,10 +454,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   /**

Re: [C++ PATCH] Kill CLASSTYPE_SORTED_FIELDS

2017-08-28 Thread Nathan Sidwell
Some quick tests show that memory use increased by 1.5%  Compilation 
time reduced by 1% to 3%


Your comment on IRC about not needing a identifier->decl map, just a 
decl hash, because the decl already has the identifier is a good one.  I 
recall considering that when converting namespaces, but there the 
anonymous namespace has a NULL name, which is annoying.  I just used the 
same map table here.


However, it's probably worth the effort.

It'd also be nice if the hash table primitive didn't unconditionally 
hold instrumentation counters.  AFAICT they only get folded to global 
counters upon destruction of the hash table.  Which for these things 
never happens.


Remember, when I succeed in folding METHOD_VEC into the same structure, 
we'll have several members in this table for all but the simplest of 
structures.


nathan

--
Nathan Sidwell


Re: [06/77] Make GET_MODE_WIDER return an opt_mode

2017-08-28 Thread Richard Sandiford
Jeff Law  writes:
> On 08/11/2017 12:24 PM, Richard Sandiford wrote:
>> Jeff Law  writes:
>>> On 07/13/2017 02:40 AM, Richard Sandiford wrote:
 GET_MODE_WIDER previously returned VOIDmode if no wider mode existed.
 That would cause problems with stricter mode classes, since VOIDmode
 isn't for example a valid scalar integer or floating-point mode.
 This patch instead makes it return a new opt_mode class, which
 holds either a T or nothing.

 2017-07-13  Richard Sandiford  
Alan Hayward  
David Sherwood  

 gcc/
* coretypes.h (opt_mode): New class.
* machmode.h (opt_mode): Likewise.
(opt_mode::else_void): New function.
(opt_mode::operator *): Likewise.
(opt_mode::exists): Likewise.
(GET_MODE_WIDER_MODE): Turn into a function and return an opt_mode.
(GET_MODE_2XWIDER_MODE): Likewise.
(mode_iterator::get_wider): Update accordingly.
(mode_iterator::get_2xwider): Likewise.
(mode_iterator::get_known_wider): Likewise, turning into a template.
* combine.c (make_extraction): Update use of GET_MODE_WIDER_MODE,
forcing a wider mode to exist.
* config/cr16/cr16.h (LONG_REG_P): Likewise.
* rtlanal.c (init_num_sign_bit_copies_in_rep): Likewise.
* config/c6x/c6x.c (c6x_rtx_costs): Update use of
GET_MODE_2XWIDER_MODE, forcing a wider mode to exist.
* lower-subreg.c (init_lower_subreg): Likewise.
* optabs-libfuncs.c (init_sync_libfuncs_1): Likewise, but not
on the final iteration.
* config/i386/i386.c (ix86_expand_set_or_movmem): Check whether
a wider mode exists before asking for a move pattern.
(get_mode_wider_vector): Update use of GET_MODE_WIDER_MODE,
forcing a wider mode to exist.
(expand_vselect_vconcat): Update use of GET_MODE_2XWIDER_MODE,
returning false if no such mode exists.
* config/ia64/ia64.c (expand_vselect_vconcat): Likewise.
* config/mips/mips.c (mips_expand_vselect_vconcat): Likewise.
* expmed.c (init_expmed_one_mode): Update use of GET_MODE_WIDER_MODE.
Avoid checking for a MODE_INT if we already know the mode is not a
SCALAR_INT_MODE_P.
(extract_high_half): Update use of GET_MODE_WIDER_MODE,
forcing a wider mode to exist.
(expmed_mult_highpart_optab): Likewise.
(expmed_mult_highpart): Likewise.
* expr.c (expand_expr_real_2): Update use of GET_MODE_WIDER_MODE,
using else_void.
* lto-streamer-in.c (lto_input_mode_table): Likewise.
* optabs-query.c (find_widening_optab_handler_and_mode): Likewise.
* stor-layout.c (bit_field_mode_iterator::next_mode): Likewise.
* internal-fn.c (expand_mul_overflow): Update use of
GET_MODE_2XWIDER_MODE.
* omp-low.c (omp_clause_aligned_alignment): Likewise.
* tree-ssa-math-opts.c (convert_mult_to_widen): Update use of
GET_MODE_WIDER_MODE.
(convert_plusminus_to_widen): Likewise.
* tree-switch-conversion.c (array_value_type): Likewise.
* var-tracking.c (emit_note_insn_var_location): Likewise.
* tree-vrp.c (simplify_float_conversion_using_ranges): Likewise.
Return false inside rather than outside the loop if no wider mode
exists
* optabs.c (expand_binop): Update use of GET_MODE_WIDER_MODE
and GET_MODE_2XWIDER_MODE
(can_compare_p): Use else_void.
* gdbhooks.py (OptMachineModePrinter): New class.
(build_pretty_printer): Use it for opt_mode.

 gcc/ada/
* gcc-interface/decl.c (validate_size): Update use of
GET_MODE_WIDER_MODE, forcing a wider mode to exist.
>>> I'm not a big fan of the API here, particularly using operator* to
>>> handle asserting the mode exists.  I'd prefer to just use a member
>>> function rather than overloading operator*.
>>>
>>> What's the rationale behind using operator* to imply the assertion?
>>>
>>> THe changes themsleves look fine, it's really just a question of the API
>>> we present.
>> 
>> The original idea was to make opt_mode look pointer-ish, so that
>> the dyn_cast <...> result could be used in the same way as for
>> dyn_cast  etc.  The first cut therefore had operator bool ()
>> to test whether there was a mode and operator * to dereference it.
>> 
>> However, operator bool () created various subtle problems (as it always
>> seems to) so we dropped it in favour of exists ().  I was neutral
>> on whether we should keep '*' or switch to a function, so in the
>> end the status quo won out.  I'm happy to change it to a named
>> accessor though.
>> 
>> Any better ideas than "get ()" for the name?  Maybe something
>> to emphasis that it is asserting for non-nullness/non-emptiness
>> (which '*' does implicitly)?
>
> Yea, when I was 

Re: [Patch, Fortran] PR 81770: [5/6/7 Regression] Bogus warning: Pointer in pointer assignment might outlive the pointer target

2017-08-28 Thread Janus Weil
2017-08-28 10:31 GMT+02:00 Thomas Koenig :
> Hi Janus,
>
>> the attached patch fixes a bogus warning. The purpose of the warning
>> is to detect cases where a pointer lives longer than its target. If
>> the target itself is (1) a pointer or (2) a component of a DT pointer,
>> we do not know about the lifetime of the target at compile time and no
>> warning should be thrown. The existing check only handles case (1) and
>> my patch adds the handling of case (2).
>>
>> Regtestes cleanly on x86_64-linux-gnu. Ok for trunk and the release
>> branches?
>
>
> OK, and thanks for the patch!

Thanks, Thomas! Committed as r251390 (together with a small typo fix
in a related test case):

https://gcc.gnu.org/viewcvs/gcc?view=revision=251390

Will take care of the release branches in a couple of days (probably
on the weekend).

Cheers,
Janus


[PATCH], Fix PR 81959 (power9 IEEE 128-bit float convert from 32-bit memory)

2017-08-28 Thread Michael Meissner
When I added the optimization for loading 32-bit values directly into the
vector registers from memory to convert to IEEE 128-bit floating point, I
forgot to make sure the address did not have PRE_INCREMENT, etc. addressing.

I checked the compiler on a little endian power8 system.  Is it ok to check
this patch into the trunk and back port it GCC 7?  GCC 6 did not have the
optimization.

[gcc]
2017-08-28  Michael Meissner  

PR target/81959
* config/rs6000/rs6000.md (float_si2_hw): If register
allocation hasn't been done, make sure the memory address is
X-FORM (register+register).
(floatuns_si2_hw2): Likewise.

[gcct/testsuite]
2017-08-28  Michael Meissner  

PR target/81959
* gcc.target/powerpc/pr81959.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 251358)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -14505,6 +14505,9 @@ (define_insn_and_split "float_si2_
 {
   if (GET_CODE (operands[2]) == SCRATCH)
 operands[2] = gen_reg_rtx (DImode);
+
+  if (MEM_P (operands[1]) && !reload_completed)
+operands[1] = rs6000_address_for_fpconvert (operands[1]);
 })
 
 (define_insn_and_split "float2"
@@ -14568,6 +14571,9 @@ (define_insn_and_split "floatuns_s
 {
   if (GET_CODE (operands[2]) == SCRATCH)
 operands[2] = gen_reg_rtx (DImode);
+
+  if (MEM_P (operands[1]) && !reload_completed)
+operands[1] = rs6000_address_for_fpconvert (operands[1]);
 })
 
 (define_insn_and_split "floatuns2"
Index: gcc/testsuite/gcc.target/powerpc/pr81959.c
===
--- gcc/testsuite/gcc.target/powerpc/pr81959.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr81959.c  (revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mpower9-vector -O2 -mfloat128" } */
+
+/* PR 81959, the compiler raised on unrecognizable insn message in converting
+   int to __float128, where the int had a PRE_INC in the address.  */
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE 1024
+#endif
+
+void
+convert_int_to_float128 (__float128 * __restrict__ p,
+int * __restrict__ q)
+{
+  unsigned long i;
+
+  for (i = 0; i < ARRAY_SIZE; i++)
+p[i] = (__float128)q[i];
+}
+
+/* { dg-final { scan-assembler {\mlfiwax\M|\mlxsiwax\M} } } */
+/* { dg-final { scan-assembler {\mxscvsdqp\M}   } } */
+/* { dg-final { scan-assembler-not {\mmtvsrd\M} } } */
+/* { dg-final { scan-assembler-not {\mmtvsrw[sz]\M} } } */


Re: [PATCH] Fix PR81503 (SLSR invalid fold)

2017-08-28 Thread Bill Schmidt
On Aug 28, 2017, at 1:40 PM, Bill Schmidt  wrote:
> 
>> 
>> On Aug 28, 2017, at 12:57 PM, Bill Schmidt  
>> wrote:
>> 
>> On Aug 28, 2017, at 7:37 AM, Richard Biener  
>> wrote:
>>> 
>>> On Thu, Aug 3, 2017 at 9:34 PM, Bill Schmidt
>>>  wrote:
 Hi,
 
 Here's v2 of the patch with Jakub's suggestions incorporated.  Bootstrapped
 and tested on powerpc64le-linux-gnu with no regressions.  Is this ok for
 trunk?
 
 Eventually this should be backported to all active releases as well.
 Ok for that after a week or so of burn-in? (And after 7.2, I imagine.)
 
 Thanks,
 Bill
 
 
 [gcc]
 
 2017-08-03  Bill Schmidt  
  Jakub Jelinek  
 
  PR tree-optimization/81503
  * gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure
  folded constant fits in the target type.
 
 [gcc/testsuite]
 
 2017-08-03  Bill Schmidt  
  Jakub Jelinek  
 
  PR tree-optimization/81503
  * gcc.c-torture/execute/pr81503.c: New file.
 
 
 Index: gcc/gimple-ssa-strength-reduction.c
 ===
 --- gcc/gimple-ssa-strength-reduction.c (revision 250791)
 +++ gcc/gimple-ssa-strength-reduction.c (working copy)
 @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
 {
 tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
 enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
 +  unsigned int prec = TYPE_PRECISION (target_type);
 +  tree maxval = (POINTER_TYPE_P (target_type)
 +? TYPE_MAX_VALUE (sizetype)
 +: TYPE_MAX_VALUE (target_type));
 
 /* It is highly unlikely, but possible, that the resulting
bump doesn't fit in a HWI.  Abandon the replacement
 @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
types but allows for safe negation without twisted logic.  */
 if (wi::fits_shwi_p (bump)
 && bump.to_shwi () != HOST_WIDE_INT_MIN
 +  /* It is more likely that the bump doesn't fit in the target
 +type, so check whether constraining it to that type changes
 +the value.  For a signed type, the value mustn't change.
 +For an unsigned type, the value may only change to a
 +congruent value (for negative bumps).  */
 +  && (TYPE_UNSIGNED (target_type)
 + ? wi::eq_p (wi::neg_p (bump)
 + ? bump + wi::to_widest (maxval) + 1
 + : bump,
 + wi::zext (bump, prec))
 + : wi::eq_p (bump, wi::sext (bump, prec)))
>>> 
>>> Not sure, but would it be fixed in a similar way when writing
>>> 
>>> @@ -2089,16 +2089,9 @@ replace_mult_candidate (slsr_cand_t c, t
>>> tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
>>> enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
>>> 
>>> -  /* It is highly unlikely, but possible, that the resulting
>>> - bump doesn't fit in a HWI.  Abandon the replacement
>>> - in this case.  This does not affect siblings or dependents
>>> - of C.  Restriction to signed HWI is conservative for unsigned
>>> - types but allows for safe negation without twisted logic.  */
>>> -  if (wi::fits_shwi_p (bump)
>>> -  && bump.to_shwi () != HOST_WIDE_INT_MIN
>>> -  /* It is not useful to replace casts, copies, negates, or adds of
>>> -an SSA name and a constant.  */
>>> -  && cand_code != SSA_NAME
>>> +  /* It is not useful to replace casts, copies, negates, or adds of
>>> + an SSA name and a constant.  */
>>> +  if (cand_code != SSA_NAME
>>> && !CONVERT_EXPR_CODE_P (cand_code)
>>> && cand_code != PLUS_EXPR
>>> && cand_code != POINTER_PLUS_EXPR
>>> @@ -2109,18 +2102,25 @@ replace_mult_candidate (slsr_cand_t c, t
>>> tree bump_tree;
>>> gimple *stmt_to_print = NULL;
>>> 
>>> -  /* If the basis name and the candidate's LHS have incompatible
>>> -types, introduce a cast.  */
>>> -  if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name)))
>>> -   basis_name = introduce_cast_before_cand (c, target_type, 
>>> basis_name);
>>> if (wi::neg_p (bump))
>>>  {
>>>code = MINUS_EXPR;
>>>bump = -bump;
>>>  }
>>> +  /* It is possible that the resulting bump doesn't fit in target_type.
>>> +Abandon the replacement in this case.  This does not affect
>>> +siblings or dependents of C.  */
>>> +  if (bump != wi::ext (bump, TYPE_PRECISION (target_type),
>>> +  TYPE_SIGN (target_type)))
>>> +   return;
>>> 
>>> 

Re: [PATCH] Fix PR81503 (SLSR invalid fold)

2017-08-28 Thread Bill Schmidt

> On Aug 28, 2017, at 12:57 PM, Bill Schmidt  
> wrote:
> 
> On Aug 28, 2017, at 7:37 AM, Richard Biener  
> wrote:
>> 
>> On Thu, Aug 3, 2017 at 9:34 PM, Bill Schmidt
>>  wrote:
>>> Hi,
>>> 
>>> Here's v2 of the patch with Jakub's suggestions incorporated.  Bootstrapped
>>> and tested on powerpc64le-linux-gnu with no regressions.  Is this ok for
>>> trunk?
>>> 
>>> Eventually this should be backported to all active releases as well.
>>> Ok for that after a week or so of burn-in? (And after 7.2, I imagine.)
>>> 
>>> Thanks,
>>> Bill
>>> 
>>> 
>>> [gcc]
>>> 
>>> 2017-08-03  Bill Schmidt  
>>>   Jakub Jelinek  
>>> 
>>>   PR tree-optimization/81503
>>>   * gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure
>>>   folded constant fits in the target type.
>>> 
>>> [gcc/testsuite]
>>> 
>>> 2017-08-03  Bill Schmidt  
>>>   Jakub Jelinek  
>>> 
>>>   PR tree-optimization/81503
>>>   * gcc.c-torture/execute/pr81503.c: New file.
>>> 
>>> 
>>> Index: gcc/gimple-ssa-strength-reduction.c
>>> ===
>>> --- gcc/gimple-ssa-strength-reduction.c (revision 250791)
>>> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
>>> @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>>> {
>>>  tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
>>>  enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
>>> +  unsigned int prec = TYPE_PRECISION (target_type);
>>> +  tree maxval = (POINTER_TYPE_P (target_type)
>>> +? TYPE_MAX_VALUE (sizetype)
>>> +: TYPE_MAX_VALUE (target_type));
>>> 
>>>  /* It is highly unlikely, but possible, that the resulting
>>> bump doesn't fit in a HWI.  Abandon the replacement
>>> @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>>> types but allows for safe negation without twisted logic.  */
>>>  if (wi::fits_shwi_p (bump)
>>>  && bump.to_shwi () != HOST_WIDE_INT_MIN
>>> +  /* It is more likely that the bump doesn't fit in the target
>>> +type, so check whether constraining it to that type changes
>>> +the value.  For a signed type, the value mustn't change.
>>> +For an unsigned type, the value may only change to a
>>> +congruent value (for negative bumps).  */
>>> +  && (TYPE_UNSIGNED (target_type)
>>> + ? wi::eq_p (wi::neg_p (bump)
>>> + ? bump + wi::to_widest (maxval) + 1
>>> + : bump,
>>> + wi::zext (bump, prec))
>>> + : wi::eq_p (bump, wi::sext (bump, prec)))
>> 
>> Not sure, but would it be fixed in a similar way when writing
>> 
>> @@ -2089,16 +2089,9 @@ replace_mult_candidate (slsr_cand_t c, t
>>  tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
>>  enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
>> 
>> -  /* It is highly unlikely, but possible, that the resulting
>> - bump doesn't fit in a HWI.  Abandon the replacement
>> - in this case.  This does not affect siblings or dependents
>> - of C.  Restriction to signed HWI is conservative for unsigned
>> - types but allows for safe negation without twisted logic.  */
>> -  if (wi::fits_shwi_p (bump)
>> -  && bump.to_shwi () != HOST_WIDE_INT_MIN
>> -  /* It is not useful to replace casts, copies, negates, or adds of
>> -an SSA name and a constant.  */
>> -  && cand_code != SSA_NAME
>> +  /* It is not useful to replace casts, copies, negates, or adds of
>> + an SSA name and a constant.  */
>> +  if (cand_code != SSA_NAME
>>  && !CONVERT_EXPR_CODE_P (cand_code)
>>  && cand_code != PLUS_EXPR
>>  && cand_code != POINTER_PLUS_EXPR
>> @@ -2109,18 +2102,25 @@ replace_mult_candidate (slsr_cand_t c, t
>>  tree bump_tree;
>>  gimple *stmt_to_print = NULL;
>> 
>> -  /* If the basis name and the candidate's LHS have incompatible
>> -types, introduce a cast.  */
>> -  if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name)))
>> -   basis_name = introduce_cast_before_cand (c, target_type, basis_name);
>>  if (wi::neg_p (bump))
>>   {
>> code = MINUS_EXPR;
>> bump = -bump;
>>   }
>> +  /* It is possible that the resulting bump doesn't fit in target_type.
>> +Abandon the replacement in this case.  This does not affect
>> +siblings or dependents of C.  */
>> +  if (bump != wi::ext (bump, TYPE_PRECISION (target_type),
>> +  TYPE_SIGN (target_type)))
>> +   return;
>> 
>>  bump_tree = wide_int_to_tree (target_type, bump);
>> 
>> +  /* If the basis name and the candidate's LHS have incompatible
>> +types, introduce a cast.  */
>> +  

Re: Fix inconsistent section flags

2017-08-28 Thread Joerg Sonnenberger
On Mon, Aug 28, 2017 at 11:42:53AM -0600, Jeff Law wrote:
> On 07/17/2017 08:35 AM, Joerg Sonnenberger wrote:
> > Hello,
> > attached patch fixes inconsistent handling of section flags when using
> > the section attribute, i.e.:
> > 
> > __attribute__((section("writable1"))) int foo1;
> > __attribute__((section("readonly1"))) const int foo1c;
> > __attribute__((section("writable2"))) int foo2 = 42;
> > __attribute__((section("readonly2"))) const int foo2c = 42;
> > 
> > should give section attributes of "aw", "a", "aw", "a" in that order.
> > Currently, "foo1c" is classified as BSS though and therefore put into a
> > writable section.
> ISTM the test we need here is whether or not the underlying DECL is
> readonly.  If it READONLY, then it shouldn't go into .bss, but should
> instead to into a readable section.
> 
> Testing based on names seems wrong.
> 
> Does the attached (untested) patch work for you?

The intention should work, will test it.  The attached patch will likely
also be needed on top.

Joerg
Index: varasm.c
===
RCS file: /home/joerg/repo/netbsd/src/external/gpl3/gcc/dist/gcc/varasm.c,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -p -r1.2 -r1.3
--- varasm.c17 Jul 2017 19:53:10 -  1.2
+++ varasm.c22 Jul 2017 20:52:52 -  1.3
@@ -6428,7 +6428,8 @@ categorize_decl_for_section (const_tree 
   location.  -fmerge-all-constants allows even that (at the
   expense of not conforming).  */
ret = SECCAT_RODATA;
-  else if (TREE_CODE (DECL_INITIAL (decl)) == STRING_CST)
+  else if (DECL_INITIAL (decl) != NULL
+   && TREE_CODE (DECL_INITIAL (decl)) == STRING_CST)
ret = SECCAT_RODATA_MERGE_STR_INIT;
   else
ret = SECCAT_RODATA_MERGE_CONST;


[PATCH] Fix bug in simplify_ternary_operation

2017-08-28 Thread Tom de Vries

Hi,

I think I found a bug in r17465:
...

   * cse.c (simplify_ternary_operation): Handle more IF_THEN_ELSE
   simplifications.

diff --git a/gcc/cse.c b/gcc/cse.c
index e001597..3c27387 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4713,6 +4713,17 @@ simplify_ternary_operation (code, mode, op0_mode, op0, 
op1, op2)


Note: the parameters of simplify_ternary_operation have the following 
meaning:

...
/* Simplify CODE, an operation with result mode MODE and three operands,
   OP0, OP1, and OP2.  OP0_MODE was the mode of OP0 before it became
   a constant.  Return 0 if no simplifications is possible.  */

rtx
simplify_ternary_operation (code, mode, op0_mode, op0, op1, op2)
 enum rtx_code code;
 enum machine_mode mode, op0_mode;
 rtx op0, op1, op2;
...


  && rtx_equal_p (XEXP (op0, 1), op1)
  && rtx_equal_p (XEXP (op0, 0), op2))
return op2;
+  else if (! side_effects_p (op0))
+   {
+ rtx temp;
+ temp = simplify_relational_operation (GET_CODE (op0), op0_mode,
+   XEXP (op0, 0), XEXP (op0, 1));


We're handling code == IF_THEN_ELSE here, so op0 is the condition, op1 
is the 'then expr' and op2 is the 'else expr'.


The parameters of simplify_relational_operation have the following meaning:
...
/* Like simplify_binary_operation except used for relational operators.
   MODE is the mode of the operands, not that of the result.  If MODE
   is VOIDmode, both operands must also be VOIDmode and we compare the
   operands in "infinite precision".

   If no simplification is possible, this function returns zero.
   Otherwise, it returns either const_true_rtx or const0_rtx.  */

rtx
simplify_relational_operation (code, mode, op0, op1)
 enum rtx_code code;
 enum machine_mode mode;
 rtx op0, op1;
...

The problem in the patch is that we use op0_mode argument for the mode 
parameter. The mode parameter of simplify_relational_operation needs to 
be the mode of the operands of the condition, while op0_mode is the mode 
of the condition.


Patch below fixes this on current trunk.

[ I found this by running into an ICE in 
gcc.c-torture/compile/pr28776-2.c for gcn target. I haven't been able to 
reproduce this with an upstream branch yet. ]


OK for trunk if bootstrap and reg-test for x86_64 succeeds?

Thanks,
- Tom

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 0133d43..fbf979b 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5567,8 +5567,6 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
 	  XEXP (op0, 0), XEXP (op0, 1));
 	}
 
-	  if (cmp_mode == VOIDmode)
-	cmp_mode = op0_mode;
 	  temp = simplify_relational_operation (GET_CODE (op0), op0_mode,
 			  			cmp_mode, XEXP (op0, 0),
 		XEXP (op0, 1));


Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-08-28 Thread Jeff Law
On 08/10/2017 03:14 PM, Michael Collison wrote:
> Hi all,
> 
> One issue that we keep encountering on aarch64 is GCC not making good use of 
> the flag-setting arithmetic instructions
> like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare 
> the result against zero.
> They are represented in a fairly standard way in the backend as PARALLEL 
> patterns:
> (parallel [(set (reg x1) (op (reg x2) (reg x3)))
>(set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))])
> 
> GCC isn't forming these from separate arithmetic and comparison instructions 
> as aggressively as it could.
> A particular pain point is when the result of the arithmetic insn is used 
> before the comparison instruction.
> The testcase in this patch is one such example where we have:
> (insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73  ] [73])
> (plus:SI (reg:SI 0 x0 [ x ])
> (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
>  (nil))
> (insn 33 7 34 2 (set (reg:SI 1 x1 [77])
> (plus:SI (reg/v:SI 0 x0 [orig:73  ] [73])
> (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
>  (nil))
> (insn 34 33 17 2 (set (reg:CC 66 cc)
> (compare:CC (reg/v:SI 0 x0 [orig:73  ] [73])
> (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
>  (nil))
> 
> This scares combine away as x0 is used in insn 33 as well as the comparison 
> in insn 34.
> I think the compare-elim pass can help us here.
Is it the multiple use or the hard register that combine doesn't
appreciate.  The latter would definitely steer us towards compare-elim.

> 
> This patch extends it by handling comparisons against zero, finding the 
> defining instruction of the compare
> and merging the comparison with the defining instruction into a PARALLEL that 
> will hopefully match the form
> described above. If between the comparison and the defining insn we find an 
> instruction that uses the condition
> registers or any control flow we bail out, and we don't cross basic blocks.
> This simple technique allows us to catch cases such as this example.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu 
> and x86_64.
> 
> Ok for trunk?
> 
> 2017-08-05  Kyrylo Tkachov  
>   Michael Collison 
> 
>   * compare-elim.c: Include emit-rtl.h.
>   (can_merge_compare_into_arith): New function.
>   (try_validate_parallel): Likewise.
>   (try_merge_compare): Likewise.
>   (try_eliminate_compare): Call the above when no previous clobber
>   is available.
>   (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
>   dataflow problems.
> 
> 2017-08-05  Kyrylo Tkachov  
>   Michael Collison 
>   
>   * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.
> 
> 
> pr5198v1.patch
> 
> 
> diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
> index 7e557a2..c65d155 100644
> --- a/gcc/compare-elim.c
> +++ b/gcc/compare-elim.c
> @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tm_p.h"
>  #include "insn-config.h"
>  #include "recog.h"
> +#include "emit-rtl.h"
>  #include "cfgrtl.h"
>  #include "tree-pass.h"
>  #include "domwalk.h"
> @@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, 
> rtx_insn *start)
>return reg;
>  }
>  
> +/* Return true if it is okay to merge the comparison CMP_INSN with
> +   the instruction ARITH_INSN.  Both instructions are assumed to be in the
> +   same basic block with ARITH_INSN appearing before CMP_INSN.  This checks
> +   that there are no uses or defs of the condition flags or control flow
> +   changes between the two instructions.  */
> +
> +static bool
> +can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn)
> +{
> +  for (rtx_insn *insn = PREV_INSN (cmp_insn);
> +   insn && insn != arith_insn;
> +   insn = PREV_INSN (insn))
> +{
> +  if (!NONDEBUG_INSN_P (insn))
> + continue;
> +  /* Bail if there are jumps or calls in between.  */
> +  if (!NONJUMP_INSN_P (insn))
> + return false;
> +
> +  df_ref ref;
> +  /* Find a USE of the flags register.  */
> +  FOR_EACH_INSN_USE (ref, insn)
> + if (DF_REF_REGNO (ref) == targetm.flags_regnum)
> +   return false;
> +
> +  /* Find a DEF of the flags register.  */
> +  FOR_EACH_INSN_DEF (ref, insn)
> + if (DF_REF_REGNO (ref) == targetm.flags_regnum)
> +   return false;
> +}
> +  return true;
> +}
What about old style asms?  Do you need to explicit punt those?  I don't
think they carry DF info.

Don't you also have to verify that the inputs to ARITH_INSN are
unchanged between ARITH_INSN and CMP_INSN?


> +
> +/* Given two SET expressions, SET_A and SET_B determine whether they form
> +   a recognizable pattern when emitted in parallel.  Return that parallel
> +   if so.  Otherwise return NULL.  This tries 

Re: [Patch][aarch64] Use IFUNCs to enable LSE instructions in libatomic on aarch64

2017-08-28 Thread Steve Ellcey
On Fri, 2017-08-25 at 15:37 +0100, Szabolcs Nagy wrote:

> the use of ifunc in gcc target libraries was a mistake
> in my opinion, there are several known bugs in the ifunc
> design and uclibc/musl/bionic don't plan to support it.
> but at this point i dont have a better proposal for doing
> runtime selection of optimal atomics code.
> 
> however in this patch i don't see why would the ctor run
> before ifunc resolvers. how does this work on x86_64?
> (there the different 16byte atomics are not even compatible,
> so if ifunc resolvers in different dsos return different
> result because one ran before the ctor, another after then
> the runtime behaviour is broken. this can happen when one
> dso is bindnow so ifunc relocation is processed before
> ctors and another runs resolvers lazily or dlopened later..
> but i didnt test it just looks broken)

I am not sure I followed everything here.  My basic testing all
worked, init_cpu_revision got run when libatomic was loaded and
then the IFUNC resolver gets called after that when one of the IFUNC
atomic functions is called.  init_cpu_revision sets libat_have_lse
which, I assume, will not be used by any other libraries.  If other
libraries have IFUNCs they would have their own static constructors and
set their own variables to be checked by their own IFUNCs.  So I am
not sure how one DSO is going to break another DSO.

> note that aarch64 ifunc resolvers get hwcap as an argument
> so all this brokenness can be avoided (and if we want to
> disable hwcap flags then probably glibc should take care
> of that before passing hwcap to the ifunc resolver).

I looked at the IFUNC memcpy resolver in glibc and it does not look
like it gets hwcap as an argument.  When I preprocess everything I see:

void *__libc_memcpy_ifunc (void) __asm__ ("__libc_memcpy");
void *__libc_memcpy_ifunc (void)
{
  uint64_t __attribute__((unused)) midr = _dl_aarch64_cpu_features.midr_el1;
  *res = ** expression that looks at midr value and returns function pointer **;
  return res;
}
__asm__ (".type " "__libc_memcpy" ", %gnu_indirect_function");
extern __typeof (__libc_memcpy) memcpy __attribute__ ((alias 
("__libc_memcpy")));


Steve Ellcey
sell...@cavium.com


Re: [06/77] Make GET_MODE_WIDER return an opt_mode

2017-08-28 Thread Jeff Law
On 08/11/2017 12:24 PM, Richard Sandiford wrote:
> Jeff Law  writes:
>> On 07/13/2017 02:40 AM, Richard Sandiford wrote:
>>> GET_MODE_WIDER previously returned VOIDmode if no wider mode existed.
>>> That would cause problems with stricter mode classes, since VOIDmode
>>> isn't for example a valid scalar integer or floating-point mode.
>>> This patch instead makes it return a new opt_mode class, which
>>> holds either a T or nothing.
>>>
>>> 2017-07-13  Richard Sandiford  
>>> Alan Hayward  
>>> David Sherwood  
>>>
>>> gcc/
>>> * coretypes.h (opt_mode): New class.
>>> * machmode.h (opt_mode): Likewise.
>>> (opt_mode::else_void): New function.
>>> (opt_mode::operator *): Likewise.
>>> (opt_mode::exists): Likewise.
>>> (GET_MODE_WIDER_MODE): Turn into a function and return an opt_mode.
>>> (GET_MODE_2XWIDER_MODE): Likewise.
>>> (mode_iterator::get_wider): Update accordingly.
>>> (mode_iterator::get_2xwider): Likewise.
>>> (mode_iterator::get_known_wider): Likewise, turning into a template.
>>> * combine.c (make_extraction): Update use of GET_MODE_WIDER_MODE,
>>> forcing a wider mode to exist.
>>> * config/cr16/cr16.h (LONG_REG_P): Likewise.
>>> * rtlanal.c (init_num_sign_bit_copies_in_rep): Likewise.
>>> * config/c6x/c6x.c (c6x_rtx_costs): Update use of
>>> GET_MODE_2XWIDER_MODE, forcing a wider mode to exist.
>>> * lower-subreg.c (init_lower_subreg): Likewise.
>>> * optabs-libfuncs.c (init_sync_libfuncs_1): Likewise, but not
>>> on the final iteration.
>>> * config/i386/i386.c (ix86_expand_set_or_movmem): Check whether
>>> a wider mode exists before asking for a move pattern.
>>> (get_mode_wider_vector): Update use of GET_MODE_WIDER_MODE,
>>> forcing a wider mode to exist.
>>> (expand_vselect_vconcat): Update use of GET_MODE_2XWIDER_MODE,
>>> returning false if no such mode exists.
>>> * config/ia64/ia64.c (expand_vselect_vconcat): Likewise.
>>> * config/mips/mips.c (mips_expand_vselect_vconcat): Likewise.
>>> * expmed.c (init_expmed_one_mode): Update use of GET_MODE_WIDER_MODE.
>>> Avoid checking for a MODE_INT if we already know the mode is not a
>>> SCALAR_INT_MODE_P.
>>> (extract_high_half): Update use of GET_MODE_WIDER_MODE,
>>> forcing a wider mode to exist.
>>> (expmed_mult_highpart_optab): Likewise.
>>> (expmed_mult_highpart): Likewise.
>>> * expr.c (expand_expr_real_2): Update use of GET_MODE_WIDER_MODE,
>>> using else_void.
>>> * lto-streamer-in.c (lto_input_mode_table): Likewise.
>>> * optabs-query.c (find_widening_optab_handler_and_mode): Likewise.
>>> * stor-layout.c (bit_field_mode_iterator::next_mode): Likewise.
>>> * internal-fn.c (expand_mul_overflow): Update use of
>>> GET_MODE_2XWIDER_MODE.
>>> * omp-low.c (omp_clause_aligned_alignment): Likewise.
>>> * tree-ssa-math-opts.c (convert_mult_to_widen): Update use of
>>> GET_MODE_WIDER_MODE.
>>> (convert_plusminus_to_widen): Likewise.
>>> * tree-switch-conversion.c (array_value_type): Likewise.
>>> * var-tracking.c (emit_note_insn_var_location): Likewise.
>>> * tree-vrp.c (simplify_float_conversion_using_ranges): Likewise.
>>> Return false inside rather than outside the loop if no wider mode
>>> exists
>>> * optabs.c (expand_binop): Update use of GET_MODE_WIDER_MODE
>>> and GET_MODE_2XWIDER_MODE
>>> (can_compare_p): Use else_void.
>>> * gdbhooks.py (OptMachineModePrinter): New class.
>>> (build_pretty_printer): Use it for opt_mode.
>>>
>>> gcc/ada/
>>> * gcc-interface/decl.c (validate_size): Update use of
>>> GET_MODE_WIDER_MODE, forcing a wider mode to exist.
>> I'm not a big fan of the API here, particularly using operator* to
>> handle asserting the mode exists.  I'd prefer to just use a member
>> function rather than overloading operator*.
>>
>> What's the rationale behind using operator* to imply the assertion?
>>
>> THe changes themsleves look fine, it's really just a question of the API
>> we present.
> 
> The original idea was to make opt_mode look pointer-ish, so that
> the dyn_cast <...> result could be used in the same way as for
> dyn_cast  etc.  The first cut therefore had operator bool ()
> to test whether there was a mode and operator * to dereference it.
> 
> However, operator bool () created various subtle problems (as it always
> seems to) so we dropped it in favour of exists ().  I was neutral
> on whether we should keep '*' or switch to a function, so in the
> end the status quo won out.  I'm happy to change it to a named
> accessor though.
> 
> Any better ideas than "get ()" for the name?  Maybe something
> to emphasis that it is asserting for non-nullness/non-emptiness
> (which '*' does implicitly)?Yea, when I was reading the first few patches it 
> felt like you trying to
do a pointer-ish 

Re: [PATCH] Fix PR81503 (SLSR invalid fold)

2017-08-28 Thread Bill Schmidt
On Aug 28, 2017, at 7:37 AM, Richard Biener  wrote:
> 
> On Thu, Aug 3, 2017 at 9:34 PM, Bill Schmidt
>  wrote:
>> Hi,
>> 
>> Here's v2 of the patch with Jakub's suggestions incorporated.  Bootstrapped
>> and tested on powerpc64le-linux-gnu with no regressions.  Is this ok for
>> trunk?
>> 
>> Eventually this should be backported to all active releases as well.
>> Ok for that after a week or so of burn-in? (And after 7.2, I imagine.)
>> 
>> Thanks,
>> Bill
>> 
>> 
>> [gcc]
>> 
>> 2017-08-03  Bill Schmidt  
>>Jakub Jelinek  
>> 
>>PR tree-optimization/81503
>>* gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure
>>folded constant fits in the target type.
>> 
>> [gcc/testsuite]
>> 
>> 2017-08-03  Bill Schmidt  
>>Jakub Jelinek  
>> 
>>PR tree-optimization/81503
>>* gcc.c-torture/execute/pr81503.c: New file.
>> 
>> 
>> Index: gcc/gimple-ssa-strength-reduction.c
>> ===
>> --- gcc/gimple-ssa-strength-reduction.c (revision 250791)
>> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
>> @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>> {
>>   tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
>>   enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
>> +  unsigned int prec = TYPE_PRECISION (target_type);
>> +  tree maxval = (POINTER_TYPE_P (target_type)
>> +? TYPE_MAX_VALUE (sizetype)
>> +: TYPE_MAX_VALUE (target_type));
>> 
>>   /* It is highly unlikely, but possible, that the resulting
>>  bump doesn't fit in a HWI.  Abandon the replacement
>> @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>>  types but allows for safe negation without twisted logic.  */
>>   if (wi::fits_shwi_p (bump)
>>   && bump.to_shwi () != HOST_WIDE_INT_MIN
>> +  /* It is more likely that the bump doesn't fit in the target
>> +type, so check whether constraining it to that type changes
>> +the value.  For a signed type, the value mustn't change.
>> +For an unsigned type, the value may only change to a
>> +congruent value (for negative bumps).  */
>> +  && (TYPE_UNSIGNED (target_type)
>> + ? wi::eq_p (wi::neg_p (bump)
>> + ? bump + wi::to_widest (maxval) + 1
>> + : bump,
>> + wi::zext (bump, prec))
>> + : wi::eq_p (bump, wi::sext (bump, prec)))
> 
> Not sure, but would it be fixed in a similar way when writing
> 
> @@ -2089,16 +2089,9 @@ replace_mult_candidate (slsr_cand_t c, t
>   tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
>   enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
> 
> -  /* It is highly unlikely, but possible, that the resulting
> - bump doesn't fit in a HWI.  Abandon the replacement
> - in this case.  This does not affect siblings or dependents
> - of C.  Restriction to signed HWI is conservative for unsigned
> - types but allows for safe negation without twisted logic.  */
> -  if (wi::fits_shwi_p (bump)
> -  && bump.to_shwi () != HOST_WIDE_INT_MIN
> -  /* It is not useful to replace casts, copies, negates, or adds of
> -an SSA name and a constant.  */
> -  && cand_code != SSA_NAME
> +  /* It is not useful to replace casts, copies, negates, or adds of
> + an SSA name and a constant.  */
> +  if (cand_code != SSA_NAME
>   && !CONVERT_EXPR_CODE_P (cand_code)
>   && cand_code != PLUS_EXPR
>   && cand_code != POINTER_PLUS_EXPR
> @@ -2109,18 +2102,25 @@ replace_mult_candidate (slsr_cand_t c, t
>   tree bump_tree;
>   gimple *stmt_to_print = NULL;
> 
> -  /* If the basis name and the candidate's LHS have incompatible
> -types, introduce a cast.  */
> -  if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name)))
> -   basis_name = introduce_cast_before_cand (c, target_type, basis_name);
>   if (wi::neg_p (bump))
>{
>  code = MINUS_EXPR;
>  bump = -bump;
>}
> +  /* It is possible that the resulting bump doesn't fit in target_type.
> +Abandon the replacement in this case.  This does not affect
> +siblings or dependents of C.  */
> +  if (bump != wi::ext (bump, TYPE_PRECISION (target_type),
> +  TYPE_SIGN (target_type)))
> +   return;
> 
>   bump_tree = wide_int_to_tree (target_type, bump);
> 
> +  /* If the basis name and the candidate's LHS have incompatible
> +types, introduce a cast.  */
> +  if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name)))
> +   basis_name = introduce_cast_before_cand (c, target_type, basis_name);
> +
>   if (dump_file && 

Re: Fix inconsistent section flags

2017-08-28 Thread Jeff Law
On 07/17/2017 08:35 AM, Joerg Sonnenberger wrote:
> Hello,
> attached patch fixes inconsistent handling of section flags when using
> the section attribute, i.e.:
> 
> __attribute__((section("writable1"))) int foo1;
> __attribute__((section("readonly1"))) const int foo1c;
> __attribute__((section("writable2"))) int foo2 = 42;
> __attribute__((section("readonly2"))) const int foo2c = 42;
> 
> should give section attributes of "aw", "a", "aw", "a" in that order.
> Currently, "foo1c" is classified as BSS though and therefore put into a
> writable section.
ISTM the test we need here is whether or not the underlying DECL is
readonly.  If it READONLY, then it shouldn't go into .bss, but should
instead to into a readable section.

Testing based on names seems wrong.

Does the attached (untested) patch work for you?

JEff
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 91680d6..37438c1 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -976,16 +976,16 @@ decode_reg_name (const char *name)
 bool
 bss_initializer_p (const_tree decl)
 {
-  return (DECL_INITIAL (decl) == NULL
- /* In LTO we have no errors in program; error_mark_node is used
-to mark offlined constructors.  */
- || (DECL_INITIAL (decl) == error_mark_node
- && !in_lto_p)
- || (flag_zero_initialized_in_bss
- /* Leave constant zeroes in .rodata so they
-can be shared.  */
- && !TREE_READONLY (decl)
- && initializer_zerop (DECL_INITIAL (decl;
+  /* Do not put constants into the .bss section, they belong in a readonly
+ section.  */
+  return (!TREE_READONLY (decl)
+ && (DECL_INITIAL (decl) == NULL
+ /* In LTO we have no errors in program; error_mark_node is used
+to mark offlined constructors.  */
+ || (DECL_INITIAL (decl) == error_mark_node
+ && !in_lto_p)
+ || (flag_zero_initialized_in_bss
+ && initializer_zerop (DECL_INITIAL (decl);
 }
 
 /* Compute the alignment of variable specified by DECL.
@@ -6508,7 +6508,8 @@ categorize_decl_for_section (const_tree decl, int reloc)
ret = SECCAT_BSS;
   else if (! TREE_READONLY (decl)
   || TREE_SIDE_EFFECTS (decl)
-  || ! TREE_CONSTANT (DECL_INITIAL (decl)))
+  || (DECL_INITIAL (decl)
+  && ! TREE_CONSTANT (DECL_INITIAL (decl
{
  /* Here the reloc_rw_mask is not testing whether the section should
 be read-only or not, but whether the dynamic link will have to
@@ -6528,7 +6529,8 @@ categorize_decl_for_section (const_tree decl, int reloc)
   location.  -fmerge-all-constants allows even that (at the
   expense of not conforming).  */
ret = SECCAT_RODATA;
-  else if (TREE_CODE (DECL_INITIAL (decl)) == STRING_CST)
+  else if (DECL_INITIAL (decl)
+  && TREE_CODE (DECL_INITIAL (decl)) == STRING_CST)
ret = SECCAT_RODATA_MERGE_STR_INIT;
   else
ret = SECCAT_RODATA_MERGE_CONST;


Re: [PATCH] Add -static-pie to GCC driver to create static PIE

2017-08-28 Thread H.J. Lu
On Mon, Aug 28, 2017 at 9:10 AM, Joseph Myers  wrote:
> On Tue, 8 Aug 2017, H.J. Lu wrote:
>
>> This patch adds -static-pie to GCC driver to create static PIE.  A static
>> position independent executable (PIE) is similar to static executable,
>> but can be loaded at any address without a dynamic linker.  All linker
>> input files must be compiled with -fpie or -fPIE and linker must support
>> --no-dynamic-linker to avoid linking with dynamic linker.  "-z text" is
>> also needed to prevent dynamic relocations in read-only segments.
>>
>> OK for trunk?
>
> I think the documentation for various options needs updating to clarify
> exactly what they mean.  (And potentially help text, which for driver
> options is in gcc.c:display_help with the common.opt text being ignored in
> that case.)

Done.

> -static is no longer just "prevents linking with the shared libraries" as
> the documentation says, given it's also overriding (explicit or
> configure-time default) -pie.  -pie is no longer just "Produce a position
> independent executable", it's producing a *dynamically linked* PIE.

Done.

>> +@item -static-pie
>> +@opindex static-pie
>> +Produce a static position independent executable on targets that support
>> +it.  A static position independent executable is similar to static
>> +executable, but can be loaded at any address without a dynamic linker.
>
> "to a static executable".
>

Done.

Here is the updated patch.   OK for trunk?

Thanks.


-- 
H.J.
From da4fdd54c53cfafbd45c7a7fe996f907b6d141f4 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 20 Jul 2017 14:08:18 -0700
Subject: [PATCH] Add -static-pie to GCC driver to create static PIE

This patch adds -static-pie to GCC driver to create static PIE.  A static
position independent executable (PIE) is similar to static executable,
but can be loaded at any address without a dynamic linker.  All linker
input files must be compiled with -fpie or -fPIE and linker must support
--no-dynamic-linker to avoid linking with dynamic linker.  "-z text" is
also needed to prevent dynamic relocations in read-only segments.

	PR driver/81498
	* common.opt (-static-pie): New alias.
	(shared): Negate static-pie.
	(-no-pie): Update help text.
	(-pie): Likewise.
	(static-pie): New option.
	* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Add
	-static-pie support.
	(GNU_USER_TARGET_ENDFILE_SPEC): Likewise.
	(LINK_EH_SPEC): Likewise.
	(LINK_GCC_C_SEQUENCE_SPEC): Likewise.
	* config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): Likewise.
	* config/i386/gnu-user64.h (GNU_USER_TARGET_LINK_SPEC): Likewise.
	* gcc.c (LINK_COMMAND_SPEC): Likewise.
	(init_gcc_specs): Likewise.
	(init_spec): Likewise.
	(display_help): Update help message for -pie.
	* doc/invoke.texi: Update -pie, -no-pie and -static.  Document
	-static-pie.
---
 gcc/common.opt   | 13 ++---
 gcc/config/gnu-user.h| 15 ---
 gcc/config/i386/gnu-user.h   |  7 ---
 gcc/config/i386/gnu-user64.h | 11 ++-
 gcc/doc/invoke.texi  | 24 +---
 gcc/gcc.c| 20 
 6 files changed, 57 insertions(+), 33 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 1331008f811..fb88ed655fe 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -353,6 +353,9 @@ Common Alias(pedantic-errors)
 -pie
 Driver Alias(pie)
 
+-static-pie
+Driver Alias(static-pie)
+
 -pipe
 Driver Alias(pipe)
 
@@ -3062,7 +3065,7 @@ x
 Driver Joined Separate
 
 shared
-Driver RejectNegative Negative(pie)
+Driver RejectNegative Negative(static-pie)
 Create a shared library.
 
 shared-libgcc
@@ -3108,11 +3111,15 @@ Driver
 
 no-pie
 Driver RejectNegative Negative(shared)
-Don't create a position independent executable.
+Don't create a dynamically linked position independent executable.
 
 pie
 Driver RejectNegative Negative(no-pie)
-Create a position independent executable.
+Create a dynamically linked position independent executable.
+
+static-pie
+Driver RejectNegative Negative(pie)
+Create a static position independent executable.
 
 z
 Driver Joined Separate
diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
index de605b0c466..a967b69a350 100644
--- a/gcc/config/gnu-user.h
+++ b/gcc/config/gnu-user.h
@@ -53,11 +53,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
   "%{shared:; \
  pg|p|profile:gcrt1.o%s; \
  static:crt1.o%s; \
- " PIE_SPEC ":Scrt1.o%s; \
+ static-pie|" PIE_SPEC ":Scrt1.o%s; \
  :crt1.o%s} \
crti.o%s \
%{static:crtbeginT.o%s; \
- shared|" PIE_SPEC ":crtbeginS.o%s; \
+ shared|static-pie|" PIE_SPEC ":crtbeginS.o%s; \
  :crtbegin.o%s} \
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
@@ -70,7 +70,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
  :crt1.o%s} \
crti.o%s \
%{static:crtbeginT.o%s; \
- shared|pie:crtbeginS.o%s; \
+ 

Re: [C++ PATCH] Kill CLASSTYPE_SORTED_FIELDS

2017-08-28 Thread Michael Matz
Hi,

On Mon, 28 Aug 2017, Nathan Sidwell wrote:

> This patch replaces the sorted field vector with a hash map.  Lookup for 
> non-function members of a complete class is now O(1), not O(log(n)).  
> We still do linear lookup when the class is incomplete.  Fixing that is 
> still on the todo list.

Have you any memory measurement on a non-trivial C++ codebase with many 
classes (template instantiations?)?  On LP64 platforms the sorted list 
for n members was 8+8*n bytes, the hash-map is 48 bytes itself plus 16*n 
bytes for the entries, where you have at least 13 entries always, next 31, 
next 61 entries (and so on).  You pay the price for the next larger chunk 
already at about 2/3 fill rate, so e.g a class with 10 members was 88 
bytes before and is 544 bytes now, which doesn't look that attractive 
given that the lookup before took four tries and now takes one, especially 
taking into account cache effects.


Ciao,
Michael.


Re: [PATCH] Fix PR81921

2017-08-28 Thread Joseph Myers
On Mon, 28 Aug 2017, Joseph Myers wrote:

> On Sat, 26 Aug 2017, Richard Biener wrote:
> 
> > On August 26, 2017 12:51:57 AM GMT+02:00, Joseph Myers 
> >  wrote:
> > >I'm seeing a build failure for s390x-linux-gnu that looks like it could
> > >be 
> > >related to this change (build was OK at r251332, failed at r251358).
> > 
> > Can you please open a bug? Can you confirm it fails the same way before 
> > the patch if you use - flto?
> 
> Bug 82012 filed (as noted there, I don't know if the problem is in the 
> compiler or libitm).  I can confirm the same failure appears building the 
> preprocessed source with a GCC 7 branch compiler with -flto added to the 
> options, rebuilding trunk r251332 to check with that.

Now confirmed with -flto with trunk r251332.

-- 
Joseph S. Myers
jos...@codesourcery.com


[C++ PATCH] Kill CLASSTYPE_SORTED_FIELDS

2017-08-28 Thread Nathan Sidwell
This patch replaces the sorted field vector with a hash map.  Lookup for 
non-function members of a complete class is now O(1), not O(log(n)).  We 
still do linear lookup when the class is incomplete.  Fixing that is 
still on the todo list.


This permits moving a bunch of sorted_field_vec handling from c-common 
to the c FE, and I'll post that in a subsequent patch.


committed to trunk.

nathan
--
Nathan Sidwell
2017-08-28  Nathan Sidwell  

	* cp-tree.h (lang_type): Replace sorted_fields vector with
	bindings map.
	(CLASSTYPE_SORTED_FIELDS): Delete.
	(CLASSTYPE_BINDINGS): New.
	* decl.c (finish_enum_value_list): Swap args of
	insert_late_enum_def_bindings.
	* name-lookup.c (lookup_field_1): Replace binary search of sorted
	fields with map->get.
	(sorted_fields_type_new, count_fields,
	add_fields_to_record_type, add_enum_fields_to_record_type): Delete.
	(add_class_member, add_class_members): New.
	(set_class_bindings): Create map and insert.
	(insert_late_enum_def_binding): Swap parms.  Use add_clasS_member.
	* ptree.c (cxx_print_type): Delete sorted fields printing.

Index: cp-tree.h
===
--- cp-tree.h	(revision 251387)
+++ cp-tree.h	(working copy)
@@ -2014,10 +2014,10 @@ struct GTY(()) lang_type {
  as a list of adopted protocols or a pointer to a corresponding
  @interface.  See objc/objc-act.h for details.  */
   tree objc_info;
-  /* sorted_fields is sorted based on a pointer, so we need to be able
- to resort it if pointers get rearranged.  */
-  struct sorted_fields_type * GTY ((reorder ("resort_sorted_fields")))
-sorted_fields;
+
+  /* Map from IDENTIFIER nodes to DECLS.  */
+  hash_map *bindings;
+
   /* FIXME reuse another field?  */
   tree lambda_expr;
 };
@@ -3243,10 +3243,9 @@ extern void decl_shadowed_for_var_insert
&& TREE_CODE (TYPE_NAME (NODE)) == TYPE_DECL	\
&& TYPE_DECL_ALIAS_P (TYPE_NAME (NODE)))
 
-/* For a class type: if this structure has many fields, we'll sort them
-   and put them into a TREE_VEC.  */
-#define CLASSTYPE_SORTED_FIELDS(NODE) \
-  (LANG_TYPE_CLASS_CHECK (NODE)->sorted_fields)
+/* The binding map for a class (not always present).  */
+#define CLASSTYPE_BINDINGS(NODE) \
+  (LANG_TYPE_CLASS_CHECK (NODE)->bindings)
 
 /* If non-NULL for a VAR_DECL, FUNCTION_DECL, TYPE_DECL or
TEMPLATE_DECL, the entity is either a template specialization (if
Index: decl.c
===
--- decl.c	(revision 251387)
+++ decl.c	(working copy)
@@ -14316,7 +14316,8 @@ finish_enum_value_list (tree enumtype)
   && COMPLETE_TYPE_P (current_class_type)
   && UNSCOPED_ENUM_P (enumtype))
 {
-  insert_late_enum_def_bindings (enumtype, current_class_type);
+  insert_late_enum_def_bindings (current_class_type, enumtype);
+  /* TYPE_FIELDS needs fixup.  */
   fixup_type_variants (current_class_type);
 }
 
Index: name-lookup.c
===
--- name-lookup.c	(revision 251387)
+++ name-lookup.c	(working copy)
@@ -1183,58 +1183,33 @@ lookup_fnfields_slot_nolazy (tree type,
 tree
 lookup_field_1 (tree type, tree name, bool want_type)
 {
-  tree field;
+  tree field = NULL_TREE;
 
   gcc_assert (identifier_p (name) && RECORD_OR_UNION_TYPE_P (type));
 
-  if (CLASSTYPE_SORTED_FIELDS (type))
+  if (CLASSTYPE_BINDINGS (type))
 {
-  tree *fields = _SORTED_FIELDS (type)->elts[0];
-  int lo = 0, hi = CLASSTYPE_SORTED_FIELDS (type)->len;
-  int i;
-
-  while (lo < hi)
-	{
-	  i = (lo + hi) / 2;
-
-	  if (DECL_NAME (fields[i]) > name)
-	hi = i;
-	  else if (DECL_NAME (fields[i]) < name)
-	lo = i + 1;
-	  else
-	{
-	  field = NULL_TREE;
+  tree *slot = CLASSTYPE_BINDINGS (type)->get (name);
+
+  if (slot)
+	{
+	  field = *slot;
 
-	  /* We might have a nested class and a field with the
-		 same name; we sorted them appropriately via
-		 field_decl_cmp, so just look for the first or last
-		 field with this name.  */
+	  if (STAT_HACK_P (field))
+	{
 	  if (want_type)
-		{
-		  do
-		field = fields[i--];
-		  while (i >= lo && DECL_NAME (fields[i]) == name);
-		  if (!DECL_DECLARES_TYPE_P (field))
-		field = NULL_TREE;
-		}
+		field = STAT_TYPE (field);
 	  else
-		{
-		  do
-		field = fields[i++];
-		  while (i < hi && DECL_NAME (fields[i]) == name);
-		}
-
-	  if (field)
-	  	{
-	  	  field = strip_using_decl (field);
-	  	  if (is_overloaded_fn (field))
-	  	field = NULL_TREE;
-	  	}
-
-	  return field;
+		field = STAT_DECL (field);
 	}
+
+	  field = strip_using_decl (field);
+	  if (OVL_P (field))
+	field = NULL_TREE;
+	  else if (want_type && !DECL_DECLARES_TYPE_P (field))
+	field = NULL_TREE;
 	}
-  return NULL_TREE;
+  return field;
 }
 
   field = TYPE_FIELDS (type);
@@ -1312,113 +1287,62 @@ lookup_fnfields_slot (tree type, tree na
   

Re: [PATCH] Fix PR81921

2017-08-28 Thread Joseph Myers
On Sat, 26 Aug 2017, Richard Biener wrote:

> On August 26, 2017 12:51:57 AM GMT+02:00, Joseph Myers 
>  wrote:
> >I'm seeing a build failure for s390x-linux-gnu that looks like it could
> >be 
> >related to this change (build was OK at r251332, failed at r251358).
> 
> Can you please open a bug? Can you confirm it fails the same way before 
> the patch if you use - flto?

Bug 82012 filed (as noted there, I don't know if the problem is in the 
compiler or libitm).  I can confirm the same failure appears building the 
preprocessed source with a GCC 7 branch compiler with -flto added to the 
options, rebuilding trunk r251332 to check with that.

Options used with GCC 7 branch to get the same error: 
s390x-glibc-linux-gnu-g++ -S -o /dev/null -ftls-model=initial-exec -mzarch 
-mhtm -std=gnu++0x -funwind-tables -fno-exceptions -fno-rtti 
-fabi-version=4 -g -O2 -fPIC beginend.ii -flto

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Add -static-pie to GCC driver to create static PIE

2017-08-28 Thread Joseph Myers
On Tue, 8 Aug 2017, H.J. Lu wrote:

> This patch adds -static-pie to GCC driver to create static PIE.  A static
> position independent executable (PIE) is similar to static executable,
> but can be loaded at any address without a dynamic linker.  All linker
> input files must be compiled with -fpie or -fPIE and linker must support
> --no-dynamic-linker to avoid linking with dynamic linker.  "-z text" is
> also needed to prevent dynamic relocations in read-only segments.
> 
> OK for trunk?

I think the documentation for various options needs updating to clarify 
exactly what they mean.  (And potentially help text, which for driver 
options is in gcc.c:display_help with the common.opt text being ignored in 
that case.)

-static is no longer just "prevents linking with the shared libraries" as 
the documentation says, given it's also overriding (explicit or 
configure-time default) -pie.  -pie is no longer just "Produce a position 
independent executable", it's producing a *dynamically linked* PIE.

> +@item -static-pie
> +@opindex static-pie
> +Produce a static position independent executable on targets that support
> +it.  A static position independent executable is similar to static
> +executable, but can be loaded at any address without a dynamic linker.

"to a static executable".

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][RFC] Patch candidate for other/39851

2017-08-28 Thread Joseph Myers
On Tue, 8 Aug 2017, Martin Liška wrote:

> Hi.
> 
> As mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39851#c0 we need
> to call targetm.target_option.override () in order to properly report which
> ISA options are enabled for a -march/-mtune. Currently, opts.c uses just
> #include "common/common-target.h" we are unable to call the function direct.
> One solution might be to put the hook to gcc/common/common-target.def, but
> that would require huge refactoring of i386.c and i386-common.c files.
> 
> Thus I came with a small hook that lives in cl_option_handler_func.
> With that I see proper results for test-case mentioned in the PR.

This patch is OK.  As you note, making the targetm.target_option.override 
hook into a common option would involve much refactoring, because many of 
those hooks mix things acting purely on the options structure (which could 
readily become common) and things relating to other back-end state (which 
would need to stay in a hook that's only used in the compiler proper, not 
the driver).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [testsuite, i386] Require -static support in gcc.dg/pie-static-[12].c (PR testsuite/81793)

2017-08-28 Thread Iain Sandoe
Hi Rainer,

> On 28 Aug 2017, at 16:33, Rainer Orth  wrote:
> 
> Hi Mike,
> 
>> On Aug 12, 2017, at 9:03 AM, Rainer Orth  
>> wrote:
>>> 
>>> The new gcc.dg/pie-static-[12].c testcases FAIL on several systems:
>>> 
>>> * Solaris 11.4 has PIE support, but lacks static libc, libm
>>> 
>>> * Linux without the static libc, libm installed
>>> 
>>> The following patch fixes this by requiring both PIE and -static
>>> support.
>>> 
>>> Tested with the appropriate runtest invocations on i386-pc-solaris2.11
>>> and x86_64-pc-linux-gnu (where the tests come up as UNSUPPORTED; I don't
>>> have a Linux system with static libc/libm installed), installed on
>>> mainline.
>>> 
>>> The tests also FAIL on Darwin/x86_64, but the failure mode is different:
>>> for -static, the executable is linked with -lcrt0.o (done this way to
>>> locate crt0.o in the linker's search path, cf. config/darwin.h
>>> (STARTFILE_SPEC)), but neither on Darwin 11 nor on Darwin 17 could I
>>> find where crt0.o would come from, so I've left this part alone.
>> 
>> darwin isn't exactly like other systems.  There is no crt0.o and static is
>> more special than you can imagine.  There was once 1 program that did a
>> static link, but one was exceptionally special.
>> 
>> Indeed, one way to implement it would be as a request option, and then
>> ignore the parts that don't make sense for the platform.  In that case,
>> -static-libgcc and friends might be the end semantics.
> 
> All this begs the question why on earth would darwin.h (STARTFILE_SPEC)
> accept -static and try to link libcrt0.o it the latter doesn't exist.
> 
> However, I've just found that the bundled clang on Darwin 11 does the
> same (and fails in the same way: "ld: library not found for -lcrt0.o”).

I think it’s because when one is bringing up the system, then one does (or at 
least used to) have a static libc/crt set.
Thus the compiler did/does need to support it for that case.  I’ve not done a 
kernel bootstrap since ≈ Darwin9 era, so things might have changed and this 
could be history leaking through.

Iain

Iain Sandoe
CodeSourcery / Mentor Embedded / Siemens





Re: [testsuite, i386] Require -static support in gcc.dg/pie-static-[12].c (PR testsuite/81793)

2017-08-28 Thread Rainer Orth
Hi Mike,

> On Aug 12, 2017, at 9:03 AM, Rainer Orth  
> wrote:
>> 
>> The new gcc.dg/pie-static-[12].c testcases FAIL on several systems:
>> 
>> * Solaris 11.4 has PIE support, but lacks static libc, libm
>> 
>> * Linux without the static libc, libm installed
>> 
>> The following patch fixes this by requiring both PIE and -static
>> support.
>> 
>> Tested with the appropriate runtest invocations on i386-pc-solaris2.11
>> and x86_64-pc-linux-gnu (where the tests come up as UNSUPPORTED; I don't
>> have a Linux system with static libc/libm installed), installed on
>> mainline.
>> 
>> The tests also FAIL on Darwin/x86_64, but the failure mode is different:
>> for -static, the executable is linked with -lcrt0.o (done this way to
>> locate crt0.o in the linker's search path, cf. config/darwin.h
>> (STARTFILE_SPEC)), but neither on Darwin 11 nor on Darwin 17 could I
>> find where crt0.o would come from, so I've left this part alone.
>
> darwin isn't exactly like other systems.  There is no crt0.o and static is
> more special than you can imagine.  There was once 1 program that did a
> static link, but one was exceptionally special.
>
> Indeed, one way to implement it would be as a request option, and then
> ignore the parts that don't make sense for the platform.  In that case,
> -static-libgcc and friends might be the end semantics.

All this begs the question why on earth would darwin.h (STARTFILE_SPEC)
accept -static and try to link libcrt0.o it the latter doesn't exist.

However, I've just found that the bundled clang on Darwin 11 does the
same (and fails in the same way: "ld: library not found for -lcrt0.o").

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] C/C++: add fix-it hints for various missing symbols

2017-08-28 Thread Jeff Law
On 07/03/2017 12:37 PM, David Malcolm wrote:
> This patch improves our C/C++ frontends' handling of missing
> symbols, by making c_parser_require and cp_parser_require use
> "better" locations for the diagnostic, and insert fix-it hints,
> under certain circumstances (see the comments in the patch for
> full details).
> 
> For example, for this code with a missing semicolon:
> 
>   $ cat test.c
>   int missing_semicolon (void)
>   {
> return 42
>   }
> 
> trunk currently emits:
> 
>   test.c:4:1: error: expected ‘;’ before ‘}’ token
>}
>^
> 
> This patch adds a fix-it hint for the missing semicolon, and puts
> the error at the location of the missing semicolon, printing the
> followup token as a secondary location:
> 
>   test.c:3:12: error: expected ‘;’ before ‘}’ token
>  return 42
>   ^
>   ;
>}
>~
> 
> More examples can be seen in the test cases.
> 
> For reference, clang prints the following:
> 
>   test.c:3:12: error: expected ';' after return statement
> return 42
>  ^
>  ;
> 
> i.e. describing what syntactic thing came before, which
> I think is likely to be more meaningful to the user.
> 
> clang can also print notes about matching opening symbols
> e.g. the note here:
> 
>   missing-symbol-2.c:25:22: error: expected ']'
> const char test [42;
>^
>   missing-symbol-2.c:25:19: note: to match this '['
> const char test [42;
> ^
> which, although somewhat redundant for this example, seems much more
> useful if there's non-trivial nesting of constructs, or more than a few
> lines separating the open/close symbols (e.g. showing a stray "namespace {"
> that the user forgot to close).
> 
> I'd like to implement both of these ideas as followups, but in
> the meantime, is the fix-it hint patch OK for trunk?
> (successfully bootstrapped & regrtested on x86_64-pc-linux-gnu)
> 
> gcc/c-family/ChangeLog:
>   * c-common.c (c_parse_error): Add RICHLOC param, and use it rather
>   than implicitly using input_location.
>   (enum missing_token_insertion_kind): New enum.
>   (get_missing_token_insertion_kind): New function.
>   (maybe_suggest_missing_token_insertion): New function.
>   * c-common.h (c_parse_error): Add RICHLOC param.
>   (maybe_suggest_missing_token_insertion): New decl.
> 
> gcc/c/ChangeLog:
>   * c-parser.c (struct c_parser): Add "previous_token_loc" field.
>   (c_parser_consume_token): Set parser->previous_token_loc.
>   (c_parser_error): Rename to...
>   (c_parser_error_richloc): ...this, making static, and adding
>   "richloc" parameter, passing it to the c_parse_error call,
>   rather than calling c_parser_set_source_position_from_token.
>   (c_parser_error): Reintroduce, reimplementing in terms of the
>   above.
>   (c_parser_require): Add "type_is_unique" param.  Use
>   c_parser_error_richloc rather than c_parser_error, calling
>   maybe_suggest_missing_token_insertion.
>   (c_parser_parms_list_declarator): Override default value of new
>   "type_is_unique" param to c_parser_require.
>   (c_parser_asm_statement): Likewise.
>   * c-parser.h (c_parser_require): Add "type_is_unique" param,
>   defaulting to true.
> 
> gcc/cp/ChangeLog:
>   * parser.c (cp_parser_error): Add rich_location to call to
>   c_parse_error.
>   (get_required_cpp_ttype): New function.
>   (cp_parser_required_error): Remove calls to cp_parser_error,
>   instead setting a non-NULL gmsgid, and handling it if set by
>   calling c_parse_error, potentially with a fix-it hint.
> 
> gcc/testsuite/ChangeLog:
>   * c-c++-common/cilk-plus/AN/parser_errors.c: Update expected
>   output to reflect changes to reported locations of missing
>   symbols.
>   * c-c++-common/cilk-plus/AN/parser_errors2.c: Likewise.
>   * c-c++-common/cilk-plus/AN/parser_errors3.c: Likewise.
>   * c-c++-common/cilk-plus/AN/pr61191.c: Likewise.
>   * c-c++-common/gomp/pr63326.c: Likewise.
>   * c-c++-common/missing-symbol.c: New test case.
>   * g++.dg/cpp1y/digit-sep-neg.C: Update expected output to reflect
>   changes to reported locations of missing symbols.
>   * g++.dg/cpp1y/pr65202.C: Likewise.
>   * g++.dg/other/do1.C: Likewise.
>   * g++.dg/missing-symbol-2.C: New test case.
>   * g++.dg/parse/error11.C: Update expected output to reflect
>   changes to reported locations of missing symbols.
>   * g++.dg/parse/pragma2.C: Likewise.
>   * g++.dg/template/error11.C: Likewise.
>   * gcc.dg/missing-symbol-2.c: New test case.
>   * gcc.dg/missing-symbol-3.c: New test case.
>   * gcc.dg/noncompile/940112-1.c: Update expected output to reflect
>   changes to reported locations of missing symbols.
>   * gcc.dg/noncompile/971104-1.c: Likewise.
>   * obj-c++.dg/exceptions-6.mm: Likewise.
>   * obj-c++.dg/pr48187.mm: Likewise.

[C++ PATCH] Move field_vec creation

2017-08-28 Thread Nathan Sidwell
this patch moves the FIELD_VEC creation routines from class.c to 
name-lookup.c


It's slightly more than a simple move as I include the following renaming:
insert_into_classtype_sorted_fields -> set_class_bindings
insert_late_enum_def_into_classtype_sorted_fields -> 
insert_late_enum_def_bindings


Most of this should eventually evaporate, as I intend adding the 
bindings progressively during class definition. (and late bindings 
during their definition).  But we'll see how far that gets ...


nathan
--
Nathan Sidwell
2017-08-28  Nathan Sidwell  

	* cp-tree.h (insert_late_enum_def_into_classtype_sorted_fields):
	Delete.
	* name-lookup.h (set_class_bindings,
	insert_late_enum_def_bindings): Declare.
	* decl.c (finish_enum_value_list): Adjust for
	insert_late_enum_def_bindings name change.
	* class.c (finish_struct_1): Call set_class_bindings.
	(count_fields, add_fields_to_record_type,
	add_enum_fields_to_record_type, sorted_fields_type_new,
	insert_into_classtype_sorted_fields,
	insert_late_enum_def_into_classtype_sorted_fields): Move to ...
	* name-lookup.h (count_fields, add_fields_to_record_type,
	add_enum_fields_to_record_type, sorted_fields_type_new,
	set_class_bindings, insert_late_enum_def_bindings): ... here.

Index: class.c
===
--- class.c	(revision 251385)
+++ class.c	(working copy)
@@ -139,9 +139,6 @@ static tree build_simple_base_path (tree
 static tree build_vtbl_ref_1 (tree, tree);
 static void build_vtbl_initializer (tree, tree, tree, tree, int *,
 vec **);
-static int count_fields (tree);
-static int add_fields_to_record_type (tree, struct sorted_fields_type*, int);
-static void insert_into_classtype_sorted_fields (tree, tree, int);
 static bool check_bitfield_decl (tree);
 static bool check_field_decl (tree, tree, int *, int *);
 static void check_field_decls (tree, tree *, int *, int *);
@@ -3378,61 +3375,6 @@ add_implicitly_declared_members (tree t,
 }
 }
 
-/* Subroutine of insert_into_classtype_sorted_fields.  Recursively
-   count the number of fields in TYPE, including anonymous union
-   members.  */
-
-static int
-count_fields (tree fields)
-{
-  tree x;
-  int n_fields = 0;
-  for (x = fields; x; x = DECL_CHAIN (x))
-{
-  if (DECL_DECLARES_FUNCTION_P (x))
-	/* Functions are dealt with separately.  */;
-  else if (TREE_CODE (x) == FIELD_DECL && ANON_AGGR_TYPE_P (TREE_TYPE (x)))
-	n_fields += count_fields (TYPE_FIELDS (TREE_TYPE (x)));
-  else
-	n_fields += 1;
-}
-  return n_fields;
-}
-
-/* Subroutine of insert_into_classtype_sorted_fields.  Recursively add
-   all the fields in the TREE_LIST FIELDS to the SORTED_FIELDS_TYPE
-   elts, starting at offset IDX.  */
-
-static int
-add_fields_to_record_type (tree fields, struct sorted_fields_type *field_vec, int idx)
-{
-  tree x;
-  for (x = fields; x; x = DECL_CHAIN (x))
-{
-  if (DECL_DECLARES_FUNCTION_P (x))
-	/* Functions are handled separately.  */;
-  else if (TREE_CODE (x) == FIELD_DECL && ANON_AGGR_TYPE_P (TREE_TYPE (x)))
-	idx = add_fields_to_record_type (TYPE_FIELDS (TREE_TYPE (x)), field_vec, idx);
-  else
-	field_vec->elts[idx++] = x;
-}
-  return idx;
-}
-
-/* Add all of the enum values of ENUMTYPE, to the FIELD_VEC elts,
-   starting at offset IDX.  */
-
-static int
-add_enum_fields_to_record_type (tree enumtype,
-struct sorted_fields_type *field_vec,
-int idx)
-{
-  tree values;
-  for (values = TYPE_VALUES (enumtype); values; values = TREE_CHAIN (values))
-  field_vec->elts[idx++] = TREE_VALUE (values);
-  return idx;
-}
-
 /* FIELD is a bit-field.  We are finishing the processing for its
enclosing type.  Issue any appropriate messages and set appropriate
flags.  Returns false if an error has been diagnosed.  */
@@ -6592,21 +6534,6 @@ determine_key_method (tree type)
   return;
 }
 
-
-/* Allocate and return an instance of struct sorted_fields_type with
-   N fields.  */
-
-static struct sorted_fields_type *
-sorted_fields_type_new (int n)
-{
-  struct sorted_fields_type *sft;
-  sft = (sorted_fields_type *) ggc_internal_alloc (sizeof (sorted_fields_type)
-  + n * sizeof (tree));
-  sft->len = n;
-
-  return sft;
-}
-
 /* Helper of find_flexarrays.  Return true when FLD refers to a non-static
class data member of non-zero size, otherwise false.  */
 
@@ -7145,14 +7072,7 @@ finish_struct_1 (tree t)
 	&& same_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (x)), t))
   SET_DECL_MODE (x, TYPE_MODE (t));
 
-  /* Done with FIELDS...now decide whether to sort these for
- faster lookups later.
-
- We use a small number because most searches fail (succeeding
- ultimately as the search bores through the inheritance
- hierarchy), and we want this failure to occur quickly.  */
-
-  insert_into_classtype_sorted_fields (TYPE_FIELDS (t), t, 8);
+  set_class_bindings (t, TYPE_FIELDS (t));
 
   /* Complain if one of the field types requires 

Re: [PATCH] Fix pr80044, -static and -pie insanity, and pr81170

2017-08-28 Thread Jeff Law
On 06/22/2017 09:28 AM, Alan Modra wrote:
> PR80044 notes that -static and -pie together behave differently when
> gcc is configured with --enable-default-pie as compared to configuring
> without (or --disable-default-pie).  This patch removes that
> difference.  In both cases you now will have -static completely
> overriding -pie.
> 
> Fixing this wasn't quite as simple as you'd expect, due to poor
> separation of functionality.  PIE_SPEC didn't just mean that -pie was
> on explicitly or by default, but also -r and -shared were *not* on.
> Fortunately the three files touched by this patch are the only places
> PIE_SPEC and NO_PIE_SPEC are used, so it isn't too hard to see that
> the reason PIE_SPEC and NO_PIE_SPEC are not inverses is the use of
> PIE_SPEC in LINK_PIE_SPEC.  So, move the inelegant symmetry breaking
> addition, to LINK_PIE_SPEC where it belongs.  Doing that showed
> another problem in gnu-user.h, with PIE_SPEC and NO_PIE_SPEC selection
> of crtbegin*.o not properly hooked into a chain of if .. elseif ..
> conditions, which required both PIE_SPEC and NO_PIE_SPEC to exclude
> -static and -shared.  Fixing that particular problem finally allows
> PIE_SPEC to serve just one purpose, and NO_PIE_SPEC to disappear.
> 
> Bootstrapped and regression tested powerpc64le-linux c,c++.  No
> regressions and a bunch of --enable-default-pie failures squashed.
> OK mainline and active branches?
> 
> Incidentally, there is a fairly strong case to be made for adding
> -static to the -shared, -pie, -no-pie chain of RejectNegative's in
> common.opt.  Since git 0d6378a9e (svn r48039) 2001-11-15, -static has
> done more than just the traditional "prevent linking with dynamic
> libraries", as -static selects crtbeginT.o rather than crtbegin.o
> on GNU systems.  Realizing this is what led me to close pr80044, which
> I'd opened with the aim of making -pie -static work together (with the
> traditional meaning of -static).  I don't that is worth doing, but
> mention pr80044 in the changelog due to fixing the insane output
> produced by -pie -static with --disable-default-pie.
> 
>   PR driver/80044
>   PR target/81170
>   * gcc.c (NO_PIE_SPEC): Delete.
>   (PIE_SPEC): Define as !no-pie/pie.  Move static|shared|r exclusion..
>   (LINK_PIE_SPEC): ..to here.
>   * config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Correct
>   chain of crtbegin*.o selection, update for PIE_SPEC changes and format.
>   (GNU_USER_TARGET_ENDFILE_SPEC): Similarly.
>   * config/sol2.h (STARTFILE_CRTBEGIN_SPEC): Similarly.
>   (ENDFILE_CRTEND_SPEC): Similarly.
>   * config/rs6000/sysv4.h (STARTFILE_LINUX_SPEC): Upgrade to
>   match gnu-user.h startfile.
>   (ENDFILE_LINUX_SPEC): Similarly.
So sorry for the horrible delay.  What was the final resolution here?  I
saw a lot of back and forth with HJ and yourself.  80044 is
CLOSED/WONTFIX and 81170 has a patch attached to it, but is still in the
ASSIGNED state.

jeff


[PATCH] small gcc.c cleanup

2017-08-28 Thread Nathan Sidwell
I committed this update to my post of 
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01011.html, deciding it 
now sufficiently obvious :)


nathan
--
Nathan Sidwell
2017-08-28  Nathan Sidwell  

	* gcc.c (execute): Fold SIGPIPE handling into switch
	statement.  Adjust internal error message.

Index: gcc.c
===
--- gcc.c	(revision 251380)
+++ gcc.c	(working copy)
@@ -3135,49 +3135,50 @@ execute (void)
 	int status = statuses[i];
 
 	if (WIFSIGNALED (status))
-	  {
-#ifdef SIGPIPE
-	/* SIGPIPE is a special case.  It happens in -pipe mode
-	   when the compiler dies before the preprocessor is done,
-	   or the assembler dies before the compiler is done.
-	   There's generally been an error already, and this is
-	   just fallout.  So don't generate another error unless
-	   we would otherwise have succeeded.  */
-	if (WTERMSIG (status) == SIGPIPE
-		&& (signal_count || greatest_status >= MIN_FATAL_STATUS))
-	  {
-		signal_count++;
-		ret_code = -1;
-	  }
-	else
-#endif
-	  switch (WTERMSIG (status))
-		{
-		case SIGINT:
-		/* SIGQUIT and SIGKILL are not available on MinGW.  */
+	  switch (WTERMSIG (status))
+	{
+	case SIGINT:
+	case SIGTERM:
+	  /* SIGQUIT and SIGKILL are not available on MinGW.  */
 #ifdef SIGQUIT
-		case SIGQUIT:
+	case SIGQUIT:
 #endif
 #ifdef SIGKILL
-		case SIGKILL:
+	case SIGKILL:
 #endif
-		case SIGTERM:
-		  /* The user (or environment) did something to the
-		 inferior.  Making this an ICE confuses the user
-		 into thinking there's a compiler bug.  Much more
-		 likely is the user or OOM killer nuked it.  */
-		  fatal_error (input_location,
-			   "%s signal terminated program %s",
-			   strsignal (WTERMSIG (status)),
-			   commands[i].prog);
+	  /* The user (or environment) did something to the
+		 inferior.  Making this an ICE confuses the user into
+		 thinking there's a compiler bug.  Much more likely is
+		 the user or OOM killer nuked it.  */
+	  fatal_error (input_location,
+			   "%s signal terminated program %s",
+			   strsignal (WTERMSIG (status)),
+			   commands[i].prog);
+	  break;
+
+#ifdef SIGPIPE
+	case SIGPIPE:
+	  /* SIGPIPE is a special case.  It happens in -pipe mode
+		 when the compiler dies before the preprocessor is
+		 done, or the assembler dies before the compiler is
+		 done.  There's generally been an error already, and
+		 this is just fallout.  So don't generate another
+		 error unless we would otherwise have succeeded.  */
+	  if (signal_count || greatest_status >= MIN_FATAL_STATUS)
+		{
+		  signal_count++;
+		  ret_code = -1;
 		  break;
-		default:
-		  /* The inferior failed to catch the signal.  */
-		  internal_error_no_backtrace ("%s (program %s)",
-	   strsignal (WTERMSIG (status)),
-	   commands[i].prog);
 		}
-	  }
+#endif
+	  /* FALLTHROUGH */
+
+	default:
+	  /* The inferior failed to catch the signal.  */
+	  internal_error_no_backtrace ("%s signal terminated program %s",
+	   strsignal (WTERMSIG (status)),
+	   commands[i].prog);
+	}
 	else if (WIFEXITED (status)
 		 && WEXITSTATUS (status) >= MIN_FATAL_STATUS)
 	  {


Re: [patch, fortran] Warn about suspicious assignment to contiguous pointers

2017-08-28 Thread Thomas Koenig

Hi Janus,





I think an unconditional warning is OK
in this case because

- Assigning to a pointer from an obvious non-contiguous target
   is not useful at all, that I can see


I guess you're talking about a *contiguous* pointer here,


Correct.


and in that
case I would argue that, beyond being "not useful", it's even illegal,
so why not throw a hard error, if we can infer at compile-time that
the target is non-contiguous?


Problem is, we cannot infer this from the tests done.
We would also have to add a test if the array is empty
or that it contains only a single element, and that (I think)
is a) impossible in the general case, and b) not worth the bother.




- Some language laywer will come up with the fact that it is,
   in fact, legal if the target is empty or has a single
   element only, so a hard error would be a rejects-valid.


Should be possible to detect and ignore such cases, shouldn't it?


Not in general.




However, I can also make this into a warning depending on
-Wall, if this is preferred.


As explained above, I'd vote for an error (or at least a conditional
warning, so that it can be disabled, like most(?) other gfortran
warnings).


Attached is a version which makes this a warning enabled by -Wall;
this should be enough to give people a heads-up.

I have introduced most of your comments into the revised patch.


Also I noticed that there is a function called
"gfc_is_simply_contiguous" in expr.c. Could that be useful for your
purpose? (Some of the code in there looks very similar to the logic
you're adding.)


There is a subtle distinction between "simply contiguous" where
the compiler _has_ to know at compile-time that the expression
is contigous (and failure to diagnose is is a compiler error)
and "contiguous" where the burden is on the programmer.
My patch is intended to be an aid to the programmer for
the "continuous" case.

Because of the "simple contiguous" thing, the function does
not quite match the requirements.

So, here is the updated patch.  Regression-tested on
powerpc-linux, make dvi and make pdf also passed.
OK for trunk?

Regards

Thomas

2017-08-27  Thomas Koenig  

PR fortran/49232
* expr.c (gfc_check_pointer_assign): Warn for
suspicious assignments with contiguous pointrs if
-Wcontiguous is given.
* lang.opt (Wcontiguous): New option, implied
by -Wall.
* invoke.texi: Document -Wcontiguous.

2017-08-27  Thomas Koenig  

PR fortran/49232
* gfortran.dg/contiguous_4.f90: New test.
Index: expr.c
===
--- expr.c	(Revision 251375)
+++ expr.c	(Arbeitskopie)
@@ -3802,6 +3802,54 @@
 	  }
 }
 
+  /* Warn for assignments of contiguous pointers to targets which might
+ not be contiguous.  */
+
+  if (warn_contiguous && lhs_attr.contiguous)
+{
+  gfc_array_ref *ar;
+  int i;
+  bool do_warn = false;
+  
+  ar = gfc_find_array_ref (rvalue, true);
+  if (ar && ar->type == AR_SECTION)
+	{
+	  for (i = 0; i < ar->dimen; i++)
+	{
+	  /* Check for p => a(:,:,2).  */
+	  if (ar->dimen_type[i] == DIMEN_RANGE && ar->stride[i]
+		  && (ar->stride[i]->expr_type != EXPR_CONSTANT
+		  || (ar->stride[i]->expr_type == EXPR_CONSTANT
+			  && mpz_cmp_si (ar->stride[i]->value.integer, 1
+		{
+		  do_warn = true;
+		  break;
+		}
+	}
+	  if (!do_warn && ar->dimen > 1)
+	{
+	  /* Check for p => a(2:4,:) */
+	  for (i = 0; i < ar->dimen - 1; i++)
+		{
+		  if ((ar->start[i] && ar->as->lower[i]
+		   && gfc_dep_compare_expr (ar->start[i], ar->as->lower[i])
+		   != 0)
+		  || (ar->end[i] && ar->as->upper[i]
+			  && gfc_dep_compare_expr (ar->end[i], ar->as->upper[i])
+			  != 0))
+		{
+		  do_warn = true;
+		  break;
+		}
+		}
+	}
+	}
+  if (do_warn)
+	gfc_warning (OPT_Wcontiguous, "Assignment to contiguous pointer "
+		 "from possibly non-contiguous target at %L",
+		 >where);
+}
+
   /* Warn if it is the LHS pointer may lives longer than the RHS target.  */
   if (warn_target_lifetime
   && rvalue->expr_type == EXPR_VARIABLE
Index: invoke.texi
===
--- invoke.texi	(Revision 251375)
+++ invoke.texi	(Arbeitskopie)
@@ -145,7 +145,7 @@
 @xref{Error and Warning Options,,Options to request or suppress errors
 and warnings}.
 @gccoptlist{-Waliasing -Wall -Wampersand -Wargument-mismatch -Warray-bounds
--Wc-binding-type -Wcharacter-truncation @gol
+-Wc-binding-type -Wcharacter-truncation -Wcontiguous @gol
 -Wconversion -Wfunction-elimination -Wimplicit-interface @gol
 -Wimplicit-procedure -Wintrinsic-shadow -Wuse-without-only -Wintrinsics-std @gol
 -Wline-truncation -Wno-align-commons -Wno-tabs -Wreal-q-constant @gol
@@ -797,7 +797,7 @@
 This currently includes @option{-Waliasing}, @option{-Wampersand},
 @option{-Wconversion}, 

[PATCH] ira-costs: avoid missing base registers in record_address_regs

2017-08-28 Thread Alexander Monakov
Hello,

The code in record_address_regs shown in the following patch assumes that
if a given target cannot have two registers in a memory address, then the
sole register, if present, must be the leftmost operand in the PLUS chain.

I think this is not true if the target uses unspecs to signify special
addressing modes such as TLS.  In that case the unspec can be to the left
of the register, and this function won't see the register.

The proposed fix is to always recurse into non-constant operands like in the
adjacent case of index registers being the same as base registers. OK to apply?

(most popular targets have MAX_REGS_PER_ADDRESS == 2, so this problem doesn't
arise)

Alexander

* ira-costs.c (record_address_regs): Handle both operands of PLUS for
MAX_REGS_PER_ADDRESS == 1.

diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
index 2cd102a0810..1690e889471 100644
--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -1138,17 +1138,12 @@ record_address_regs (machine_mode mode, addr_space_t 
as, rtx x,
if (code1 == SUBREG)
  arg1 = SUBREG_REG (arg1), code1 = GET_CODE (arg1);
 
-   /* If this machine only allows one register per address, it
-  must be in the first operand.  */
-   if (MAX_REGS_PER_ADDRESS == 1)
- record_address_regs (mode, as, arg0, 0, PLUS, code1, scale);
-
-   /* If index and base registers are the same on this machine,
+   /* If index registers do not appear, or coincide with base registers,
   just record registers in any non-constant operands.  We
   assume here, as well as in the tests below, that all
   addresses are in canonical form.  */
-   else if (INDEX_REG_CLASS
-== base_reg_class (VOIDmode, as, PLUS, SCRATCH))
+   if (MAX_REGS_PER_ADDRESS == 1
+   || INDEX_REG_CLASS == base_reg_class (VOIDmode, as, PLUS, SCRATCH))
  {
record_address_regs (mode, as, arg0, context, PLUS, code1, scale);
if (! CONSTANT_P (arg1))


Re: [PATCH] Add libgcc support for cache clearing on ARM VxWorks

2017-08-28 Thread Olivier Hainque
Hi Richard,

I discussed with the original author, have looked into the ARM cache matters
in greater detail and certainly agree with your general concerns.

There are pieces related to the insn cache clearing in the code, but I can
see how architecture specific everything is here, indeed not appropriate as
a single non-specialized entry point in libgcc.

Thanks again for your arm-expert input on the topic.

Still working on this. 

In the meantime, opinion on the two patches below ?

  https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00278.html
  (Handle DWARF2_UNWIND_INFO in arm_except_unwind_info)

  https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00271.html
  (Fix .cfi inconsistency out of builtin_eh_return)

Maybe the dwarf2 unwind support could go away when support for the old ABI
gets removed. From the VxWorks perspective, dwarf2 is only needed by versions
of VxWorks which rely on the old ABI.

Thanks in advance for your feedback,

With Kind Regards,

Olivier



Re: [WIP] Possible Bug in vect_bb_slp_scalar_cost?

2017-08-28 Thread Richard Biener
On Mon, Aug 28, 2017 at 10:57 AM, Dominik Inführ
 wrote:
> Hi,
>
> As discussed on IRC: This patches fixes the calculation of the scalar costs 
> of SLP vectorization. I’ve added a test case and the auto_vec allocation is 
> now reused for all children of a node.

Looks good to me and thanks for the testcase.  I'll include this in my
next round of testing and make sure it's fixed for GCC 8.

Thanks,
Richard.

> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp.c 
> b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp.c
> new file mode 100644
> index 000..5121a88
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-fdump-tree-slp-details" } */
> +
> +#define N 4
> +
> +int s1[N], s2[N], s3[N];
> +void escape(int, int, int, int);
> +
> +void
> +foo ()
> +{
> +  int t1, t2, t3, t4;
> +
> +  t1 = s1[0] + s2[0] + s3[0];
> +  t2 = s1[1] + s2[1] + s3[1];
> +  t3 = s1[2] + s2[2] + s3[2];
> +  t4 = s1[3] + s2[3] + s3[3];
> +
> +  s3[0] = t1 - s1[0] * s2[0];
> +  s3[1] = t2 - s1[1] * s2[1];
> +  s3[2] = t3 - s1[2] * s2[2];
> +  s3[3] = t4 - s1[3] * s2[3];
> +
> +  escape (t1, t2, t3, t4);
> +}
> +
> +/* { dg-final { scan-tree-dump-not "vectorization is not profitable" "slp2" 
> } } */
> +/* { dg-final { scan-tree-dump "basic block vectorized" "slp2" } } */
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index 04ecaab..cffd19b 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -2690,9 +2690,17 @@ vect_bb_slp_scalar_cost (basic_block bb,
>   scalar_cost += stmt_cost;
> }
>
> +  auto_vec subtree_life;
> +
>   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> if (SLP_TREE_DEF_TYPE (child) == vect_internal_def)
> -  scalar_cost += vect_bb_slp_scalar_cost (bb, child, life);
> +  {
> +   /* Do not directly pass LIFE to the recursive call, copy it to
> +  confine changes in the callee to the current child/subtree.  */
> +   subtree_life.safe_splice (*life);
> +   scalar_cost += vect_bb_slp_scalar_cost (bb, child, _life);
> +   subtree_life.truncate (0);
> +  }
>
>   return scalar_cost;
> }
>
>> On 04 Aug 2017, at 12:19, Richard Biener  wrote:
>>
>> On Fri, Aug 4, 2017 at 12:08 PM, Dominik Inführ
>>  wrote:
>>> Hi,
>>>
>>> vect_bb_slp_scalar_cost computes the scalar cost of a SLP node. If there 
>>> are non-scalar uses of a definition, the costs for it and its operands 
>>> (children) are ignored. The vector LIFE is used to keep track of this and 
>>> an element is set to true, such that the def and its children are ignored. 
>>> But as soon as an element is set to true it is never undone, that means the 
>>> following sibling and parent nodes of the current node also stay ignored.
>>
>> Yes, that's intended.  They are live as well because they are needed
>> to compute the live scalar.
>>
>> This seems wrong to me, a simple fix would be to clone LIFE for every
>> vector, such that changes to LIFE stay in the subtree:
>>>
>>> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
>>> index 032a9444a5a..f919645f28b 100644
>>> --- a/gcc/tree-vect-slp.c
>>> +++ b/gcc/tree-vect-slp.c
>>> @@ -2590,6 +2590,11 @@ vect_bb_slp_scalar_cost (basic_block bb,
>>>   gimple *stmt;
>>>   slp_tree child;
>>>
>>> +  auto_vec subtree_life;
>>> +  subtree_life.safe_splice (*life);
>>> +
>>> +  life = _life;
>>> +
>>>   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt)
>>> {
>>>   unsigned stmt_cost;
>>>
>>>
>


Re: Improve ECF_NOTHROW flags for direct internal functions

2017-08-28 Thread Richard Biener
On Mon, Aug 28, 2017 at 10:16 AM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Thu, Aug 17, 2017 at 1:06 PM, Richard Sandiford
>>  wrote
>>> Richard Biener  writes:
 On Thu, Aug 17, 2017 at 11:49 AM, Richard Sandiford
  wrote:
> Internal functions that map directly to an optab can only throw an
> exception for -fnon-call-exceptions.  This patch handles that in
> internal_fn_flags, in a similar way to ATTR_*NOTHROW in builtins.def.
>
> (Functions that don't throw even for flag_non_call_exceptions should be
> explicitly marked ECF_NOTHROW in internal-fn.def.)
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

 Hmm.  Note the outcome of flag_non_call_exceptions depends on the
 current function and thus IPA passes querying flags would need to
 push an appropriate function context.  This means the function should
 get a struct function * argument and opt_for_fn (fn,
 flag_non_call_exceptions)
 should be used.  It doesn't help very much that all callers don't have
 any such context either which means this "optimization" looks like
 in the wrong place :/  (the global value of flag_non_call_exceptions in
 the IPA case isn't necessarily conservative).

 So if you insist then add a comment and add a && cfun check so
 we're sure we are in non-IPA context (or in properly setup context).
>>>
>>> Bah.  In that case, what should happen if a -fno-non-call-exceptions
>>> function is inlined into an -fnon-call-exceptions one?  Should the call
>>> keep the NOTHROWness of the original function, or should it lose
>>> NOTHROWness (and thus gain an exception edge)?
>>
>> nothrow-ness is tracked on the GIMPLE stmt via gimple_call_[set_]nothrow_p,
>> GIMPLE shouldn't look at flag_non_call_exceptions, it is basically part
>> of the IL.
>>
>>> I guess the path of least resistance would be to add an extra check
>>> for this case in the places that need it, rather than relying solely
>>> on gimple_call_flags.
>>
>> Well, gimple_call_flags works fine already (looking at the above in
>> addition to internal_fn_flags).  call_expr_flags looks like it might not.
>
> OK, how does this look?  Only the gimple flags matter for the use
> case I've seen (which is covered by the SVE tests, but hard to
> test as-is).
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> 2017-08-28  Richard Sandiford  
>
> gcc/
> * gimplify.c (gimplify_call_expr): Copy the nothrow flag to
> calls to internal functions.
> (gimplify_modify_expr): Likewise.
> * tree-call-cdce.c (use_internal_fn): Likewise.
> * tree-ssa-math-opts.c (pass_cse_reciprocals::execute): Likewise.
> (convert_to_divmod): Set the nothrow flag.
> * tree-if-conv.c (predicate_mem_writes):  Likewise.
> * tree-vect-stmts.c (vectorizable_mask_load_store): Likewise.
> (vectorizable_call): Likewise.
> (vectorizable_store): Likewise.
> (vectorizable_load): Likewise.
> * tree-vect-patterns.c (vect_recog_pow_pattern): Likewise.
> (vect_recog_mask_conversion_pattern): Likewise.
>
> Index: gcc/gimplify.c
> ===
> *** gcc/gimplify.c  2017-08-17 13:17:21.266412322 +0100
> --- gcc/gimplify.c  2017-08-28 09:10:13.246297839 +0100
> *** gimplify_call_expr (tree *expr_p, gimple
> *** 3150,3156 
>
> if (EXPR_CILK_SPAWN (*expr_p))
>   gimplify_cilk_detach (pre_p);
> !   gimple *call = gimple_build_call_internal_vec (ifn, vargs);
> gimplify_seq_add_stmt (pre_p, call);
> return GS_ALL_DONE;
>   }
> --- 3150,3157 
>
> if (EXPR_CILK_SPAWN (*expr_p))
>   gimplify_cilk_detach (pre_p);
> !   gcall *call = gimple_build_call_internal_vec (ifn, vargs);
> !   gimple_call_set_nothrow (call, TREE_NOTHROW (*expr_p));
> gimplify_seq_add_stmt (pre_p, call);
> return GS_ALL_DONE;
>   }
> *** gimplify_modify_expr (tree *expr_p, gimp
> *** 5636,5641 
> --- 5637,5643 
>   vargs.quick_push (CALL_EXPR_ARG (*from_p, i));
> }
>   call_stmt = gimple_build_call_internal_vec (ifn, vargs);
> + gimple_call_set_nothrow (call_stmt, TREE_NOTHROW (*from_p));
>   gimple_set_location (call_stmt, EXPR_LOCATION (*expr_p));
> }
> else
> Index: gcc/tree-call-cdce.c
> ===
> *** gcc/tree-call-cdce.c2017-06-30 12:50:38.243662675 +0100
> --- gcc/tree-call-cdce.c2017-08-28 09:10:13.246297839 +0100
> *** use_internal_fn (gcall *call)
> *** 1019,1024 
> --- 

Re: [PATCH] Fix PR81503 (SLSR invalid fold)

2017-08-28 Thread Richard Biener
On Thu, Aug 3, 2017 at 9:34 PM, Bill Schmidt
 wrote:
> Hi,
>
> Here's v2 of the patch with Jakub's suggestions incorporated.  Bootstrapped
> and tested on powerpc64le-linux-gnu with no regressions.  Is this ok for
> trunk?
>
> Eventually this should be backported to all active releases as well.
> Ok for that after a week or so of burn-in? (And after 7.2, I imagine.)
>
> Thanks,
> Bill
>
>
> [gcc]
>
> 2017-08-03  Bill Schmidt  
> Jakub Jelinek  
>
> PR tree-optimization/81503
> * gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure
> folded constant fits in the target type.
>
> [gcc/testsuite]
>
> 2017-08-03  Bill Schmidt  
> Jakub Jelinek  
>
> PR tree-optimization/81503
> * gcc.c-torture/execute/pr81503.c: New file.
>
>
> Index: gcc/gimple-ssa-strength-reduction.c
> ===
> --- gcc/gimple-ssa-strength-reduction.c (revision 250791)
> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
> @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>  {
>tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
>enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
> +  unsigned int prec = TYPE_PRECISION (target_type);
> +  tree maxval = (POINTER_TYPE_P (target_type)
> +? TYPE_MAX_VALUE (sizetype)
> +: TYPE_MAX_VALUE (target_type));
>
>/* It is highly unlikely, but possible, that the resulting
>   bump doesn't fit in a HWI.  Abandon the replacement
> @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>   types but allows for safe negation without twisted logic.  */
>if (wi::fits_shwi_p (bump)
>&& bump.to_shwi () != HOST_WIDE_INT_MIN
> +  /* It is more likely that the bump doesn't fit in the target
> +type, so check whether constraining it to that type changes
> +the value.  For a signed type, the value mustn't change.
> +For an unsigned type, the value may only change to a
> +congruent value (for negative bumps).  */
> +  && (TYPE_UNSIGNED (target_type)
> + ? wi::eq_p (wi::neg_p (bump)
> + ? bump + wi::to_widest (maxval) + 1
> + : bump,
> + wi::zext (bump, prec))
> + : wi::eq_p (bump, wi::sext (bump, prec)))

Not sure, but would it be fixed in a similar way when writing

@@ -2089,16 +2089,9 @@ replace_mult_candidate (slsr_cand_t c, t
   tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
   enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);

-  /* It is highly unlikely, but possible, that the resulting
- bump doesn't fit in a HWI.  Abandon the replacement
- in this case.  This does not affect siblings or dependents
- of C.  Restriction to signed HWI is conservative for unsigned
- types but allows for safe negation without twisted logic.  */
-  if (wi::fits_shwi_p (bump)
-  && bump.to_shwi () != HOST_WIDE_INT_MIN
-  /* It is not useful to replace casts, copies, negates, or adds of
-an SSA name and a constant.  */
-  && cand_code != SSA_NAME
+  /* It is not useful to replace casts, copies, negates, or adds of
+ an SSA name and a constant.  */
+  if (cand_code != SSA_NAME
   && !CONVERT_EXPR_CODE_P (cand_code)
   && cand_code != PLUS_EXPR
   && cand_code != POINTER_PLUS_EXPR
@@ -2109,18 +2102,25 @@ replace_mult_candidate (slsr_cand_t c, t
   tree bump_tree;
   gimple *stmt_to_print = NULL;

-  /* If the basis name and the candidate's LHS have incompatible
-types, introduce a cast.  */
-  if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name)))
-   basis_name = introduce_cast_before_cand (c, target_type, basis_name);
   if (wi::neg_p (bump))
{
  code = MINUS_EXPR;
  bump = -bump;
}
+  /* It is possible that the resulting bump doesn't fit in target_type.
+Abandon the replacement in this case.  This does not affect
+siblings or dependents of C.  */
+  if (bump != wi::ext (bump, TYPE_PRECISION (target_type),
+  TYPE_SIGN (target_type)))
+   return;

   bump_tree = wide_int_to_tree (target_type, bump);

+  /* If the basis name and the candidate's LHS have incompatible
+types, introduce a cast.  */
+  if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name)))
+   basis_name = introduce_cast_before_cand (c, target_type, basis_name);
+
   if (dump_file && (dump_flags & TDF_DETAILS))
{
  fputs ("Replacing: ", dump_file);

?

>/* It is not useful to replace casts, copies, negates, or adds of
>  an SSA name and a constant.  */
>&& cand_code != 

Re: [RFC] [PATCH] Introduce configure flag --with-stage1-cflags.

2017-08-28 Thread Richard Biener
On Fri, Aug 25, 2017 at 9:51 PM, Jeff Law  wrote:
> On 07/31/2017 01:47 AM, Martin Liška wrote:
>> I would like to ping this. Input from other people will be appreciated ;)
> I think the thing to keep in mind here is that IIUC this only affects
> things when we've configured using the --with-stage1-cflags option.
>
> So questions about is -O1 more stable than -O2, should we restrict -O2
> to newer compilers, etc are really more about the defaults we set.
>
> My understanding is the patch is just adding the capability and does not
> change the default.  Assuming that's the case, then I'm comfortable
> acking the raw infrastructure.

OTOH you can simply set STAGE1_CFLAGS so the value of this
as a configure option is somewhat questionable.

> jeff


Re: Statement Frontier Notes, Location Views, and Inlined Entry Point Markers

2017-08-28 Thread Richard Biener
On Fri, Aug 25, 2017 at 4:26 PM, Alexandre Oliva  wrote:
> On Aug 23, 2017, Richard Biener  wrote:
>
 if they are not a problem up until here why care now?
>
>>> IIRC we do have a limit for VTA notes too, but there's a C++ testcase
>>> (g++.dg/tree-ssa/pr14703.C) that expands and inlines fibonacci template
>>> functions so deep, more than doubling the number of statements at all
>>> but the base recursion levels, so we'd end up with over 2^{85+} debug
>>> stmts if we didn't cut them off somehow.
>
>> Yeah, but I meant we've kept them throughout GIMPLE (for all functions!)
>> but are dropping them here at RTL expansion (which we'll have only a
>> single live RTL function at a time).  That looks odd ;)
>
> Aah, yeah, the point is, if we find we exceeded the limit, we don't
> bother to clean up the gimple, we just refrain from wasting further time
> with it, which we would if we converted them to RTL (and then threw them
> away), or copied them all when inlining into some other function.  We
> could clean up at some point, just as we could stop emitting further
> markers once the limit is reached, but it didn't seem important enough
> to do so.  Should it prove to be, I guess it wouldn't be too hard to add
> it to gimple verification passes that walk over all stmts.
>
>> You're already dropping them at inlining as well so the RTL expansion
>> check should be superfluous IMHO (yeah, unrolling might push it over
>> the edge for example but all real issues should come from inlining).
>
> The RTL expansion check is indeed not essential, but if we're over the
> limit, we'll to throw it all away, so why bother expanding it and
> carrying it through all RTL passes just to throw away at the end?  Or
> should we not throw it away in this case, and make the limit apply only
> to inlining?  But then, what if we inline lots of very large functions
> into a single one, do we still want to use markers for that function?
> That's not how I designed it, but I guess it might work that way too.

Hmm, all existing markers should be valid, no?  So throwing them away
at RTL expansion time will only make the function "consistent" but drop
still useful stuff?

>
>> Hmm, yeah.  I guess we'd have to have a multi-DEBUG_STMT that covers
>> not only multiple markers but also multiple binds.  High GIMPLE has
>> nested stmts so it might be tempting to wrap adjacent debug-stmts into
>> a single one (basically make the IL walking overhead with debug stmts 
>> smaller).
>> Costs extra memory instead of less when compared to my idea of course.
>
> Yeah.  I guess that's doable and it won't make gimple passes much
> trickier: in most cases all that matters are the SSA uses in bind value
> expressions, so as long as the update function can efficiently pick the
> SSA uses from the op array, it could be a significant win.
>
> We may need some way to reset one specific bind given a use that is no
> longer valid, which I don't immediately see how to implement efficiently
> in a multi-debug pack .
>
> Now, I spent some time trying to think of how to pack multiple debug
> stmts in a way that made them also save memory.
>
> For each packed stmt, we need at least one bit to indicate whether it's
> a bind or just a marker.  Markers then need a locus, and another bit
> indicating whether it's a begin stmt marker or an inline entry point
> marker.  Debug bind stmts need one bit to indicate tell src binds from
> regular ones, and two trees (no locus).
>
> It is unlikely that it would make sense to allocate extra memory, be it
> trees holding integral values, be it other arrays to hold them.  I'm
> thinking we'd be better off storing some of these bits in an analogous
> of the trailing op VLA, that would be present in gdebug but that would
> deal with GGC and ssa updates in its own way.
>
> For packs with few stmts, we could use bits from the subcode to indicate
> the count and the kinds.  We could use the gimple locus for the first
> marker, and then perhaps pack pairs of loci in tree pointer operands (if
> their sizes are 1:2, as in lp64).
>
> When packing more than few stmts, we could then define a format for a
> 32-bit word to hold the bits for an additional set of stmts, possibly
> packed in the same word as a locus or another such bit pack.  Ideally,
> should we need more than one of these, we should indicate upfront how
> many of these there are, or at least how many ops are used.
>
> I was thinking it would be ideal if combining two many-stmts debug stmts
> could require little more than allocating a gimple with a larger ops
> array and copying (most of) the original op arrays to the right places.
>
> But...  this all feels far too hackish and not very maintainable or
> forward-looking.  E.g., if we add more kinds of debug stmts, the bit
> counting suddenly no longer applies, and needs to be reworked.
>
> So I guess that's also doable, and would save some memory indeed,
> but...  do you think it's 

[PING][PATCH 2/3] retire mem_signal_fence pattern

2017-08-28 Thread Alexander Monakov
Ping (for this and patch 3/3 in the thread).

On Wed, 2 Aug 2017, Alexander Monakov wrote:

> Similar to mem_thread_fence issue from the patch 1/3, RTL representation of
> __atomic_signal_fence must be a compiler barrier.  We have just one backend
> offering this pattern, and it does not place a compiler barrier.
> 
> It does not appear useful to expand signal_fence to some kind of hardware
> instruction, its documentation is wrong/outdated, and we are using it
> incorrectly anyway.  So just remove the whole thing and simply emit a compiler
> memory barrier in the optabs.c handler.
> 
>   * config/s390/s390.md (mem_signal_fence): Remove.
> * doc/md.texi (mem_signal_fence): Remove.
> * optabs.c (expand_mem_signal_fence): Remove uses of mem_signal_fence.
> Update comments.
> * target-insns.def (mem_signal_fence): Remove.
> 
> ---
>  gcc/config/s390/s390.md |  9 -
>  gcc/doc/md.texi | 13 -
>  gcc/optabs.c| 17 +
>  gcc/target-insns.def|  1 -
>  4 files changed, 5 insertions(+), 35 deletions(-)
> 
> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> index d1ac0b8395d..1d63523d3b0 100644
> --- a/gcc/config/s390/s390.md
> +++ b/gcc/config/s390/s390.md
> @@ -10084,15 +10084,6 @@ (define_insn "*basr_tls"
>  ; memory barrier patterns.
>  ;
>  
> -(define_expand "mem_signal_fence"
> -  [(match_operand:SI 0 "const_int_operand")] ;; model
> -  ""
> -{
> -  /* The s390 memory model is strong enough not to require any
> - barrier in order to synchronize a thread with itself.  */
> -  DONE;
> -})
> -
>  (define_expand "mem_thread_fence"
>[(match_operand:SI 0 "const_int_operand")] ;; model
>""
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 0be161a08dc..73d0e7d83c0 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -7058,19 +7058,6 @@ If this pattern is not specified, all memory models 
> except
>  @code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize}
>  barrier pattern.
>  
> -@cindex @code{mem_signal_fence@var{mode}} instruction pattern
> -@item @samp{mem_signal_fence@var{mode}}
> -This pattern emits code required to implement a signal fence with
> -memory model semantics.  Operand 0 is the memory model to be used.
> -
> -This pattern should impact the compiler optimizers the same way that
> -mem_signal_fence does, but it does not need to issue any barrier
> -instructions.
> -
> -If this pattern is not specified, all memory models except
> -@code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize}
> -barrier pattern.
> -
>  @cindex @code{get_thread_pointer@var{mode}} instruction pattern
>  @cindex @code{set_thread_pointer@var{mode}} instruction pattern
>  @item @samp{get_thread_pointer@var{mode}}
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index d568ca376fe..6884fd4b8aa 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -6315,22 +6315,15 @@ expand_mem_thread_fence (enum memmodel model)
>  }
>  }
>  
> -/* This routine will either emit the mem_signal_fence pattern or issue a 
> -   sync_synchronize to generate a fence for memory model MEMMODEL.  */
> +/* Emit a signal fence with given memory model.  */
>  
>  void
>  expand_mem_signal_fence (enum memmodel model)
>  {
> -  if (targetm.have_mem_signal_fence ())
> -emit_insn (targetm.gen_mem_signal_fence (GEN_INT (model)));
> -  else if (!is_mm_relaxed (model))
> -{
> -  /* By default targets are coherent between a thread and the signal
> -  handler running on the same thread.  Thus this really becomes a
> -  compiler barrier, in that stores must not be sunk past
> -  (or raised above) a given point.  */
> -  expand_asm_memory_barrier ();
> -}
> +  /* No machine barrier is required to implement a signal fence, but
> + a compiler memory barrier must be issued, except for relaxed MM.  */
> +  if (!is_mm_relaxed (model))
> +expand_asm_memory_barrier ();
>  }
>  
>  /* This function expands the atomic load operation:
> diff --git a/gcc/target-insns.def b/gcc/target-insns.def
> index fb92f72ac29..4669439c7e1 100644
> --- a/gcc/target-insns.def
> +++ b/gcc/target-insns.def
> @@ -58,7 +58,6 @@ DEF_TARGET_INSN (indirect_jump, (rtx x0))
>  DEF_TARGET_INSN (insv, (rtx x0, rtx x1, rtx x2, rtx x3))
>  DEF_TARGET_INSN (jump, (rtx x0))
>  DEF_TARGET_INSN (load_multiple, (rtx x0, rtx x1, rtx x2))
> -DEF_TARGET_INSN (mem_signal_fence, (rtx x0))
>  DEF_TARGET_INSN (mem_thread_fence, (rtx x0))
>  DEF_TARGET_INSN (memory_barrier, (void))
>  DEF_TARGET_INSN (movstr, (rtx x0, rtx x1, rtx x2))
> 


[PATCH] Improve on PR81968

2017-08-28 Thread Richard Biener

The following avoids invalid flag combinations on removed sections
and fixes a word-size bug I noticed when doing so.

LTO bootstrap / testing in progress on x86_64-unknown-linux-gnu.

Richard.

2017-08-28  Richard Biener  

PR lto/81968
* simple-object-elf.c (simple_object_elf_copy_lto_debug_section):
Adjust field with for sh_type write, set SHF_EXCLUDE only for
removed sections.

Index: libiberty/simple-object-elf.c
===
--- libiberty/simple-object-elf.c   (revision 251377)
+++ libiberty/simple-object-elf.c   (working copy)
@@ -1382,7 +1382,7 @@ simple_object_elf_copy_lto_debug_section
 unused.  That allows the link editor to remove it in a partial
 link.  */
  ELF_SET_FIELD (type_functions, ei_class, Shdr,
-shdr, sh_type, Elf_Addr, SHT_NULL);
+shdr, sh_type, Elf_Word, SHT_NULL);
}
 
   flags = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
@@ -1390,7 +1390,7 @@ simple_object_elf_copy_lto_debug_section
   if (ret == 0)
flags &= ~SHF_EXCLUDE;
   else if (ret == -1)
-   flags |= SHF_EXCLUDE;
+   flags = SHF_EXCLUDE;
   ELF_SET_FIELD (type_functions, ei_class, Shdr,
 shdr, sh_flags, Elf_Addr, flags);
 }


[PATCH] Fix PR81993

2017-08-28 Thread Richard Biener

I am testing the following to cure -gsplit-dwarf with early type
pruning.

It looks like we have zero testsuite coverage for -gsplit-dwarf
(or my grep skills are broken).

No testcase, g++.dg/[dwarf2/] isn't set up to do link tests.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-08-28  Richard Biener  

PR debug/81993
* dwarf2out.c (gen_remaining_tmpl_value_param_die_attributes):
Do nothing for removed DIEs.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 251374)
+++ gcc/dwarf2out.c (working copy)
@@ -26037,7 +26037,8 @@ gen_remaining_tmpl_value_param_die_attri
   j = 0;
   FOR_EACH_VEC_ELT (*tmpl_value_parm_die_table, i, e)
{
- if (!tree_add_const_value_attribute (e->die, e->arg))
+ if (!e->die->removed
+ && !tree_add_const_value_attribute (e->die, e->arg))
{
  dw_loc_descr_ref loc = NULL;
  if (! early_dwarf


[PATCH] Fix PR81977

2017-08-28 Thread Richard Biener

The following fixes a failure to account for lhs_offset (or rather
to consistently use the same offset in op0 and off).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-08-28  Richard Biener  

PR tree-optimization/81977
* tree-ssa-sccvn.c (vn_reference_lookup_3): Fix look through
memcpy.

* g++.dg/torture/pr81977.C: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 251377)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -2334,7 +2334,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   memset (, 0, sizeof (op));
   op.type = vr->type;
   op.opcode = MEM_REF;
-  op.op0 = build_int_cst (ptr_type_node, at - rhs_offset);
+  op.op0 = build_int_cst (ptr_type_node, at - lhs_offset + rhs_offset);
   op.off = at - lhs_offset + rhs_offset;
   vr->operands[0] = op;
   op.type = TREE_TYPE (rhs);
Index: gcc/testsuite/g++.dg/torture/pr81977.C
===
--- gcc/testsuite/g++.dg/torture/pr81977.C  (nonexistent)
+++ gcc/testsuite/g++.dg/torture/pr81977.C  (working copy)
@@ -0,0 +1,55 @@
+/* { dg-do run } */
+/* { dg-require-effective-target int32plus } */
+
+#include 
+
+typedef struct
+{
+  uint16_t  x ;
+  uint16_t  y ;
+  uint64_t  z ;
+} __attribute__((packed, aligned(1))) TestMsgType;
+
+struct Payload
+{
+  uint16_t header_info[2];
+  TestMsgType _pref;
+  void Pack(uint8_t *buffer)
+{
+  __builtin_memcpy(buffer, &_pref, sizeof(_pref));
+}
+  void UnPack(uint8_t *buffer)
+{
+  __builtin_memcpy(&_pref, buffer, sizeof(_pref));
+}
+};
+
+
+struct Msg
+{
+  Payload _payload;
+  void Pack(uint8_t *buffer)
+{
+  _payload.Pack(buffer);
+}
+
+  void UnPack(uint8_t *buffer)
+{
+  _payload.UnPack(buffer);
+}
+};
+
+int main()
+{
+  uint8_t * buffer = new uint8_t [30];
+  Msg msg;
+  Msg msg1;
+  msg._payload._pref.x = 0xabcd;
+  msg._payload._pref.y = 0xa;
+  msg._payload._pref.z = 0x0001020304051617;
+  msg.Pack([0]);
+  msg1.UnPack([0]);
+  if (msg1._payload._pref.x != 0xabcd)
+__builtin_abort ();
+  delete [] buffer;
+}


Turn FUNCTION_ARG_PADDING into a target hook

2017-08-28 Thread Richard Sandiford
This involved renaming the rather general-sounding "enum direction" to
"enum pad_direction" to avoid a conflict with the Fortran frontend.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by checking that there were no extra warnings or changes in
testsuite assembly output for at least one target per CPU.  OK to install?

Richard


2017-08-28  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* coretypes.h (pad_direction): New enum.
* defaults.h (DEFAULT_FUNCTION_ARG_PADDING): Delete.
(FUNCTION_ARG_PADDING): Likewise.
* target.def (function_arg_padding): New hook.
* targhooks.h (default_function_arg_padding): Declare.
* targhooks.c (default_function_arg_padding): New function.
* doc/tm.texi.in (FUNCTION_ARG_PADDING): Replace with...
(TARGET_FUNCTION_ARG_PADDING): ...this.
* doc/tm.texi: Regenerate.
* calls.c (store_unaligned_arguments_into_pseudos): Use pad_direction
instead of direction.
(compute_argument_addresses): Likewise.
(load_register_parameters): Likewise.
(emit_library_call_value_1): Likewise.
(store_one_arg): Use targetm.calls.function_arg_padding instead
of FUNCTION_ARG_PADDING.
(must_pass_in_stack_var_size_or_pad): Likewise.
* expr.c (emit_group_load_1): Use pad_direction instead of direction.
(emit_group_store): Likewise.
(emit_single_push_insn_1): Use targetm.calls.function_arg_padding
instead of FUNCTION_ARG_PADDING.
(emit_push_insn): Likewise, and propagate enum change throughout
function.
* function.h (direction): Delete.
(locate_and_pad_arg_data::where_pad): Use pad_direction instead
of direction.
* function.c (assign_parm_find_stack_rtl): Likewise.
(assign_parm_setup_block_p): Likewise.
(assign_parm_setup_block): Likewise.
(gimplify_parameters): Likewise.
(locate_and_pad_parm): Use targetm.calls.function_arg_padding
instead of FUNCTION_ARG_PADDING, and propagate enum change throughout
function.
* config/aarch64/aarch64.h (FUNCTION_ARG_PADDING): Delete.
(BLOCK_REG_PADDING): Use pad_direction instead of direction.
* config/aarch64/aarch64-protos.h (aarch64_pad_arg_upward): Delete.
* config/aarch64/aarch64.c (aarch64_pad_arg_upward): Replace with...
(aarch64_function_arg_padding): ...this new function.
(aarch64_gimplify_va_arg_expr): Use pad_direction instead of direction.
(TARGET_FUNCTION_ARG_PADDING): Redefine.
* config/arm/arm.h (FUNCTION_ARG_PADDING): Delete.
(BLOCK_REG_PADDING): Use pad_direction instead of direction.
* config/arm/arm-protos.h (arm_pad_arg_upward): Delete.
* config/arm/arm.c (TARGET_FUNCTION_ARG_PADDING): Redefine.
(arm_pad_arg_upward): Replace with...
(arm_function_arg_padding): ...this new function.
* config/c6x/c6x.h (BLOCK_REG_PADDING): Use pad_direction instead
of direction.
* config/ia64/hpux.h (FUNCTION_ARG_PADDING): Delete.
* config/ia64/ia64-protos.h (ia64_hpux_function_arg_padding): Delete.
* config/ia64/ia64.c (TARGET_FUNCTION_ARG_PADDING): Redefine.
(ia64_hpux_function_arg_padding): Replace with...
(ia64_function_arg_padding): ...this new function.  Use pad_direction
instead of direction.  Check for TARGET_HPUX.
* config/iq2000/iq2000.h (FUNCTION_ARG_PADDING): Delete.
* config/iq2000/iq2000.c (TARGET_FUNCTION_ARG_PADDING): Redefine.
(iq2000_function_arg_padding): New function.
* config/mips/mips-protos.h (mips_pad_arg_upward): Delete.
* config/mips/mips.c (mips_pad_arg_upward): Replace with...
(mips_function_arg_padding): ...this new function.
(mips_pad_reg_upward): Update accordingly.
(TARGET_FUNCTION_ARG_PADDING): Redefine.
* config/mips/mips.h (PAD_VARARGS_DOWN): Use
targetm.calls.function_arg_padding.
(FUNCTION_ARG_PADDING): Delete.
(BLOCK_REG_PADDING): Use pad_direction instead of direction.
* config/nios2/nios2.h (FUNCTION_ARG_PADDING): Delete.
(PAD_VARARGS_DOWN): Use targetm.calls.function_arg_padding.
* config/nios2/nios2-protos.h (nios2_function_arg_padding): Delete.
(nios2_block_reg_padding): Return pad_direction instead of direction.
* config/nios2/nios2.c (nios2_block_reg_padding): Return pad_direction
instead of direction.
(nios2_function_arg_padding): Likewise.  Make static.
(TARGET_FUNCTION_ARG_PADDING): Redefine.
* config/pa/pa.h (FUNCTION_ARG_PADDING): Delete.
(BLOCK_REG_PADDING): Use targetm.calls.function_arg_padding.
* config/pa/pa-protos.h (pa_function_arg_padding): Delete.
* 

Turn MODES_TIEABLE_P into a target hook

2017-08-28 Thread Richard Sandiford
The lack of function comments in msp430.c and rl78.c is deliberate;
the local style there is to define the hook macro immediately before
the function as a form of documentation.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by checking that there were no extra warnings or changes in
testsuite assembly output for at least one target per CPU.  OK to install?

Richard


2017-08-27  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* target.def (modes_tieable_p): New hook.
* doc/tm.texi (MODES_TIEABLE_P): Replace with...
(TARGET_MODES_TIEABLE_P): ...this.
* doc/tm.texi.in: Regenerate.
* hooks.h (hook_bool_mode_mode_true): Declare.
* hooks.c (hook_bool_mode_mode_true): New function.
* combine.c (subst): Use targetm.modes_tieable_p instead of
MODES_TIEABLE_P.
* dse.c (find_shift_sequence): Likewise.
* expmed.c (extract_low_bits): Likewise.
* lower-subreg.c: Include target.h.
(find_decomposable_subregs): Use targetm.modes_tieable_p instead of
MODES_TIEABLE_P.
* rtlanal.c (rtx_cost): Likewise.
* config/aarch64/aarch64.h (MODES_TIEABLE_P): Delete.
* config/aarch64/aarch64-protos.h (aarch64_modes_tieable_p): Delete.
* config/aarch64/aarch64.c (aarch64_modes_tieable_p): Make static.
(TARGET_MODES_TIEABLE_P): Redefine.
* config/alpha/alpha.h (MODES_TIEABLE_P): Delete.
* config/alpha/alpha.c (alpha_modes_tieable_p): New function.
(TARGET_MODES_TIEABLE_P): Redefine.
* config/arc/arc.h (MODES_TIEABLE_P): Delete.
* config/arc/arc.c (TARGET_MODES_TIEABLE_P): Redefine.
(arc_modes_tieable_p): New function.
* config/arm/arm.h (MODES_TIEABLE_P): Delete.
* config/arm/arm-protos.h (arm_modes_tieable_p): Delete.
* config/arm/arm.c (TARGET_MODES_TIEABLE_P): Redefine.
(arm_modes_tieable_p): Make static.
* config/avr/avr.h (MODES_TIEABLE_P): Delete.
* config/bfin/bfin.h (MODES_TIEABLE_P): Delete.
* config/bfin/bfin.c (bfin_modes_tieable_p): New function.
(TARGET_MODES_TIEABLE_P): Redefine.
* config/c6x/c6x.h (MODES_TIEABLE_P): Delete.
* config/c6x/c6x.c (c6x_modes_tieable_p): New function.
(TARGET_MODES_TIEABLE_P): Redefine.
* config/cr16/cr16.h (MODES_TIEABLE_P): Delete.
* config/cr16/cr16.c (TARGET_MODES_TIEABLE_P): Redefine.
(cr16_modes_tieable_p): New function.
* config/cris/cris.h (MODES_TIEABLE_P): Delete.
* config/epiphany/epiphany.h (MODES_TIEABLE_P): Delete.
* config/fr30/fr30.h (MODES_TIEABLE_P): Delete.
(TRULY_NOOP_TRUNCATION): Update comment.
* config/frv/frv.h (MODES_TIEABLE_P): Delete.
(TRULY_NOOP_TRUNCATION): Update comment.
* config/frv/frv.c (TARGET_MODES_TIEABLE_P): Redefine.
(frv_modes_tieable_p): New function.
* config/ft32/ft32.h (MODES_TIEABLE_P): Delete.
* config/h8300/h8300.h (MODES_TIEABLE_P): Delete.
* config/h8300/h8300.c (h8300_modes_tieable_p): New function.
(TARGET_MODES_TIEABLE_P): Redefine.
* config/i386/i386.h (MODES_TIEABLE_P): Delete.
* config/i386/i386-protos.h (ix86_modes_tieable_p): Delete.
* config/i386/i386.c (ix86_modes_tieable_p): Make static.
(TARGET_MODES_TIEABLE_P): Redefine.
* config/ia64/ia64.h (MODES_TIEABLE_P): Delete.
* config/ia64/ia64.c (TARGET_MODES_TIEABLE_P): Redefine.
(ia64_modes_tieable_p): New function.
* config/iq2000/iq2000.h (MODES_TIEABLE_P): Delete.
* config/iq2000/iq2000.c (TARGET_MODES_TIEABLE_P): Redefine.
(iq2000_modes_tieable_p): New function.
* config/lm32/lm32.h (MODES_TIEABLE_P): Delete.
* config/lm32/lm32.c (TARGET_MODES_TIEABLE_P): Redefine.
(lm32_modes_tieable_p): New function.
* config/m32c/m32c.h (MODES_TIEABLE_P): Delete.
* config/m32c/m32c-protos.h (m32c_modes_tieable_p): Delete.
* config/m32c/m32c.c (m32c_modes_tieable_p): Make static.
(TARGET_MODES_TIEABLE_P): Redefine.
* config/m32r/m32r.h (MODES_TIEABLE_P): Delete.
* config/m32r/m32r.c (TARGET_MODES_TIEABLE_P): Redefine.
(m32r_modes_tieable_p): New function.
* config/m68k/m68k.h (MODES_TIEABLE_P): Delete.
* config/m68k/m68k.c (TARGET_MODES_TIEABLE_P): Redefine.
(m68k_modes_tieable_p): New function.
* config/mcore/mcore.h (MODES_TIEABLE_P): Delete.
* config/mcore/mcore.c (TARGET_MODES_TIEABLE_P): Redefine.
(mcore_modes_tieable_p): New function.
* config/microblaze/microblaze.h (MODES_TIEABLE_P): Delete.
* config/microblaze/microblaze.c (microblaze_modes_tieable_p): New
function.
(TARGET_MODES_TIEABLE_P): Redefine.
* 

RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-08-28 Thread Richard Biener
On Mon, 28 Aug 2017, Jon Beniston wrote:

> Hi Richard,
> 
> >-  if (nunits < 1) /* Support V1SI.  */
> >+  if (nunits < 1 || (nunits == 1 && !reduct_p))
> > return NULL_TREE;
> >
> >doesn't seem to be against trunk which has
> >
> >  if (nunits <= 1)
> >return NULL_TREE;
> >
> >so what happens if you just change that to
> >
> >  if (nunits < 1)
> >return NULL_TREE;
> 
> Ah, that's what I first tried, and mistakenly sent the patch against that.
> 
> That did help the initial problem, but then I needed to add a lot of support
> for the V1SI type in the backend (which just duplicated SI patterns) and
> there are a few places where GCC seems to have sanity checks against vectors
> that are only one element wide. A dot product producing a scalar result
> seems natural.

Where do you need V1SI types?  The vectorizer should perform a SI extract
from V1SI here.  You need to elaborate a bit here.

> 
> Also, as well as ARC benefitting from this, I think the TI c6x port would
> too. That currently has a sdot_prodv2hi pattern that uses SI and V2HI types.

The vectorizer doesn't really support vector->scalar ops so V1SI feels
more natural here.  That is, your patch looks a bit ugly -- there's
nothing wrong with V1SI vector types.  So maybe instead of adjusting
that function the respective caller when facing a lane-reducing op
such as DOT_PROD needs to consider scalar result types.

Richard.

> Cheers,
> Jon
> 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [WIP] Possible Bug in vect_bb_slp_scalar_cost?

2017-08-28 Thread Dominik Inführ
Hi,

As discussed on IRC: This patches fixes the calculation of the scalar costs of 
SLP vectorization. I’ve added a test case and the auto_vec allocation is now 
reused for all children of a node.

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp.c
new file mode 100644
index 000..5121a88
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fdump-tree-slp-details" } */
+
+#define N 4
+
+int s1[N], s2[N], s3[N];
+void escape(int, int, int, int);
+
+void
+foo ()
+{
+  int t1, t2, t3, t4;
+
+  t1 = s1[0] + s2[0] + s3[0];
+  t2 = s1[1] + s2[1] + s3[1];
+  t3 = s1[2] + s2[2] + s3[2];
+  t4 = s1[3] + s2[3] + s3[3];
+
+  s3[0] = t1 - s1[0] * s2[0];
+  s3[1] = t2 - s1[1] * s2[1];
+  s3[2] = t3 - s1[2] * s2[2];
+  s3[3] = t4 - s1[3] * s2[3];
+
+  escape (t1, t2, t3, t4);
+}
+
+/* { dg-final { scan-tree-dump-not "vectorization is not profitable" "slp2" } 
} */
+/* { dg-final { scan-tree-dump "basic block vectorized" "slp2" } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 04ecaab..cffd19b 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2690,9 +2690,17 @@ vect_bb_slp_scalar_cost (basic_block bb,
  scalar_cost += stmt_cost;
}

+  auto_vec subtree_life;
+
  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
if (SLP_TREE_DEF_TYPE (child) == vect_internal_def)
-  scalar_cost += vect_bb_slp_scalar_cost (bb, child, life);
+  {
+   /* Do not directly pass LIFE to the recursive call, copy it to
+  confine changes in the callee to the current child/subtree.  */
+   subtree_life.safe_splice (*life);
+   scalar_cost += vect_bb_slp_scalar_cost (bb, child, _life);
+   subtree_life.truncate (0);
+  }

  return scalar_cost;
}

> On 04 Aug 2017, at 12:19, Richard Biener  wrote:
> 
> On Fri, Aug 4, 2017 at 12:08 PM, Dominik Inführ
>  wrote:
>> Hi,
>> 
>> vect_bb_slp_scalar_cost computes the scalar cost of a SLP node. If there are 
>> non-scalar uses of a definition, the costs for it and its operands 
>> (children) are ignored. The vector LIFE is used to keep track of this and an 
>> element is set to true, such that the def and its children are ignored. But 
>> as soon as an element is set to true it is never undone, that means the 
>> following sibling and parent nodes of the current node also stay ignored.
> 
> Yes, that's intended.  They are live as well because they are needed
> to compute the live scalar.
> 
> This seems wrong to me, a simple fix would be to clone LIFE for every
> vector, such that changes to LIFE stay in the subtree:
>> 
>> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
>> index 032a9444a5a..f919645f28b 100644
>> --- a/gcc/tree-vect-slp.c
>> +++ b/gcc/tree-vect-slp.c
>> @@ -2590,6 +2590,11 @@ vect_bb_slp_scalar_cost (basic_block bb,
>>   gimple *stmt;
>>   slp_tree child;
>> 
>> +  auto_vec subtree_life;
>> +  subtree_life.safe_splice (*life);
>> +
>> +  life = _life;
>> +
>>   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt)
>> {
>>   unsigned stmt_cost;
>> 
>> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Turn HARD_REGNO_CALL_PART_CLOBBERED into a target hook

2017-08-28 Thread Richard Sandiford
The SVE patches change the size of a machine_mode from a compile-time
constant to a runtime invariant.  However, target-specific code can
continue to treat the modes as constant-sized if the target only has
constant-sized modes.

The main snag with this approach is that target-independent code still
uses macros from the target .h file.  This patch is one of several that
converts a target macro to a hook.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by checking that there were no extra warnings or changes in
testsuite assembly output for at least one target per CPU.  OK to install?

Richard


2017-08-28  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* target.def (hard_regno_call_part_clobbered): New hook.
* doc/tm.texi.in (HARD_REGNO_CALL_PART_CLOBBERED): Replace with...
(TARGET_HARD_REGNO_CALL_PART_CLOBBERED): ...this hook.
* doc/tm.texi: Regenerate.
* hooks.h (hook_bool_uint_mode_false): Declare.
* hooks.c (hook_bool_uint_mode_false): New function.
* regs.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete.
* cselib.c (cselib_process_insn): Use
targetm.hard_regno_call_part_clobbered instead of
HARD_REGNO_CALL_PART_CLOBBERED.
* ira-conflicts.c (ira_build_conflicts): Likewise.
* ira-costs.c (ira_tune_allocno_costs): Likewise.
* lra-constraints.c (need_for_call_save_p): Likewise.
* lra-lives.c: Include target.h.
(check_pseudos_live_through_calls): Use
targetm.hard_regno_call_part_clobbered instead of
HARD_REGNO_CALL_PART_CLOBBERED.
* regcprop.c: Include target.h.
(copyprop_hardreg_forward_1): Use
targetm.hard_regno_call_part_clobbered instead of
HARD_REGNO_CALL_PART_CLOBBERED.
* reginfo.c (choose_hard_reg_mode): Likewise.
* regrename.c (check_new_reg_p): Likewise.
* reload.c (find_equiv_reg): Likewise.
* reload1.c (emit_reload_insns): Likewise.
* sched-deps.c (deps_analyze_insn): Likewise.
* sel-sched.c (init_regs_for_mode): Likewise.
(mark_unavailable_hard_regs): Likewise.
* targhooks.c (default_dwarf_frame_reg_mode): Likewise.
* config/aarch64/aarch64.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete.
* config/aarch64/aarch64.c (aarch64_hard_regno_call_part_clobbered):
New function.
(TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine.
* config/avr/avr.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete.
* config/avr/avr-protos.h (avr_hard_regno_call_part_clobbered):
Delete.
* config/avr/avr.c (avr_hard_regno_call_part_clobbered): Make static
and return a bool.
(TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine.
* config/i386/i386.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete.
* config/i386/i386.c (ix86_hard_regno_call_part_clobbered): New
function.
(TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine.
* config/mips/mips.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete.
* config/mips/mips.c (mips_hard_regno_call_part_clobbered): New
function.
(TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine.
* config/powerpcspe/powerpcspe.h (HARD_REGNO_CALL_PART_CLOBBERED):
Delete.
* config/powerpcspe/powerpcspe.c
(rs6000_hard_regno_call_part_clobbered): New function.
(TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine.
* config/rs6000/rs6000.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete.
* config/rs6000/rs6000.c (rs6000_hard_regno_call_part_clobbered):
New function.
(TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine.
* config/s390/s390.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete.
* config/s390/s390.c (s390_hard_regno_call_part_clobbered): New
function.
(TARGET_HARD_REGNO_CALL_PART_CLOBBERED): Redefine.
* config/sh/sh.h (HARD_REGNO_CALL_PART_CLOBBERED): Delete.
* system.h (HARD_REGNO_CALL_PART_CLOBBERED): Poison.

Index: gcc/target.def
===
--- gcc/target.def  2017-08-28 09:36:09.610827107 +0100
+++ gcc/target.def  2017-08-28 09:36:10.075827140 +0100
@@ -5395,6 +5395,19 @@ The default version of this hook always
  bool, (unsigned int regno),
  default_hard_regno_scratch_ok)
 
+DEFHOOK
+(hard_regno_call_part_clobbered,
+ "This hook should return true if @var{regno} is partly call-saved and\n\
+partly call-clobbered, and if a value of mode @var{mode} would be partly\n\
+clobbered by a call.  For example, if the low 32 bits of @var{regno} are\n\
+preserved across a call but higher bits are clobbered, this hook should\n\
+return true for a 64-bit mode but false for a 32-bit mode.\n\
+\n\
+The default implementation returns false, which is correct\n\
+for targets 

Re: [Patch, Fortran] PR 81770: [5/6/7 Regression] Bogus warning: Pointer in pointer assignment might outlive the pointer target

2017-08-28 Thread Thomas Koenig

Hi Janus,


the attached patch fixes a bogus warning. The purpose of the warning
is to detect cases where a pointer lives longer than its target. If
the target itself is (1) a pointer or (2) a component of a DT pointer,
we do not know about the lifetime of the target at compile time and no
warning should be thrown. The existing check only handles case (1) and
my patch adds the handling of case (2).

Regtestes cleanly on x86_64-linux-gnu. Ok for trunk and the release branches?


OK, and thanks for the patch!

Regards

Thomas


RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-08-28 Thread Jon Beniston
Hi Richard,

>-  if (nunits < 1) /* Support V1SI.  */
>+  if (nunits < 1 || (nunits == 1 && !reduct_p))
> return NULL_TREE;
>
>doesn't seem to be against trunk which has
>
>  if (nunits <= 1)
>return NULL_TREE;
>
>so what happens if you just change that to
>
>  if (nunits < 1)
>return NULL_TREE;

Ah, that's what I first tried, and mistakenly sent the patch against that.

That did help the initial problem, but then I needed to add a lot of support
for the V1SI type in the backend (which just duplicated SI patterns) and
there are a few places where GCC seems to have sanity checks against vectors
that are only one element wide. A dot product producing a scalar result
seems natural. 

Also, as well as ARC benefitting from this, I think the TI c6x port would
too. That currently has a sdot_prodv2hi pattern that uses SI and V2HI types.

Cheers,
Jon




Add wider_subreg_mode helper functions

2017-08-28 Thread Richard Sandiford
This patch adds helper functions that say which of the two modes
involved in a subreg is the larger, preferring the outer mode in
the event of a tie.  It also converts IRA and reload to track modes
instead of byte sizes, since this is slightly more convenient when
variable-sized modes are added later.

Tested on aarch64-linux-gnu and x86_64-linux-gnu, and by checking
there was no change in the testsuite assembly output for at least
one target per CPU.  OK to install?

Richard


2017-08-28  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* rtl.h (wider_subreg_mode): New function.
* ira.h (ira_sort_regnos_for_alter_reg): Take a machine_mode *
rather than an unsigned int *.
* ira-color.c (regno_max_ref_width): Replace with...
(regno_max_ref_mode): ...this new variable.
(coalesced_pseudo_reg_slot_compare): Update accordingly.
Use wider_subreg_mode.
(ira_sort_regnos_for_alter_reg): Likewise.  Take a machine_mode *
rather than an unsigned int *.
* lra-constraints.c (uses_hard_regs_p): Use wider_subreg_mode.
(process_alt_operands): Likewise.
(invariant_p): Likewise.
* lra-spills.c (assign_mem_slot): Likewise.
(add_pseudo_to_slot): Likewise.
* lra.c (collect_non_operand_hard_regs): Likewise.
(add_regs_to_insn_regno_info): Likewise.
* reload1.c (regno_max_ref_width): Replace with...
(regno_max_ref_mode): ...this new variable.
(reload): Update accordingly.  Update call to
ira_sort_regnos_for_alter_reg.
(alter_reg): Update to use regno_max_ref_mode.  Call wider_subreg_mode.
(init_eliminable_invariants): Update to use regno_max_ref_mode.
(scan_paradoxical_subregs): Likewise.

Index: gcc/rtl.h
===
--- gcc/rtl.h   2017-08-28 09:23:51.181220876 +0100
+++ gcc/rtl.h   2017-08-28 09:23:56.230220860 +0100
@@ -2840,6 +2840,24 @@ subreg_lowpart_offset (machine_mode oute
 GET_MODE_SIZE (innermode));
 }
 
+/* Given that a subreg has outer mode OUTERMODE and inner mode INNERMODE,
+   return the mode that is big enough to hold both the outer and inner
+   values.  Prefer the outer mode in the event of a tie.  */
+
+inline machine_mode
+wider_subreg_mode (machine_mode outermode, machine_mode innermode)
+{
+  return partial_subreg_p (outermode, innermode) ? innermode : outermode;
+}
+
+/* Likewise for subreg X.  */
+
+inline machine_mode
+wider_subreg_mode (const_rtx x)
+{
+  return wider_subreg_mode (GET_MODE (x), GET_MODE (SUBREG_REG (x)));
+}
+
 extern unsigned int subreg_size_highpart_offset (unsigned int, unsigned int);
 
 /* Return the SUBREG_BYTE for an OUTERMODE highpart of an INNERMODE value.  */
Index: gcc/ira.h
===
--- gcc/ira.h   2017-08-28 09:22:51.676686496 +0100
+++ gcc/ira.h   2017-08-28 09:23:56.227520860 +0100
@@ -195,7 +195,7 @@ extern void ira_set_pseudo_classes (bool
 extern void ira_expand_reg_equiv (void);
 extern void ira_update_equiv_info_by_shuffle_insn (int, int, rtx_insn *);
 
-extern void ira_sort_regnos_for_alter_reg (int *, int, unsigned int *);
+extern void ira_sort_regnos_for_alter_reg (int *, int, machine_mode *);
 extern void ira_mark_allocation_change (int);
 extern void ira_mark_memory_move_deletion (int, int);
 extern bool ira_reassign_pseudos (int *, int, HARD_REG_SET, HARD_REG_SET *,
Index: gcc/ira-color.c
===
--- gcc/ira-color.c 2017-08-28 09:22:51.676686496 +0100
+++ gcc/ira-color.c 2017-08-28 09:23:56.226620860 +0100
@@ -3908,7 +3908,7 @@ coalesced_pseudo_reg_freq_compare (const
 
 /* Widest width in which each pseudo reg is referred to (via subreg).
It is used for sorting pseudo registers.  */
-static unsigned int *regno_max_ref_width;
+static machine_mode *regno_max_ref_mode;
 
 /* Sort pseudos according their slot numbers (putting ones with
   smaller numbers first, or last when the frame pointer is not
@@ -3921,7 +3921,7 @@ coalesced_pseudo_reg_slot_compare (const
   ira_allocno_t a1 = ira_regno_allocno_map[regno1];
   ira_allocno_t a2 = ira_regno_allocno_map[regno2];
   int diff, slot_num1, slot_num2;
-  int total_size1, total_size2;
+  machine_mode mode1, mode2;
 
   if (a1 == NULL || ALLOCNO_HARD_REGNO (a1) >= 0)
 {
@@ -3936,11 +3936,11 @@ coalesced_pseudo_reg_slot_compare (const
   if ((diff = slot_num1 - slot_num2) != 0)
 return (frame_pointer_needed
|| (!FRAME_GROWS_DOWNWARD) == STACK_GROWS_DOWNWARD ? diff : -diff);
-  total_size1 = MAX (PSEUDO_REGNO_BYTES (regno1),
-regno_max_ref_width[regno1]);
-  total_size2 = MAX (PSEUDO_REGNO_BYTES (regno2),
-regno_max_ref_width[regno2]);
-  if ((diff = 

Add subreg_memory_offset helper functions

2017-08-28 Thread Richard Sandiford
This patch adds routines for converting a SUBREG_BYTE offset into a
memory address offset.  The two only differ for paradoxical subregs,
where SUBREG_BYTE is always 0 but the memory address offset can be
negative.

Tested on aarch64-linux-gnu and x86_64-linux-gnu, and by checking
there was no change in the testsuite assembly output for at least
one target per CPU.  OK to install?

Richard


2017-08-28  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* rtl.h (subreg_memory_offset): Declare.
* emit-rtl.c (subreg_memory_offset): New function.
* expmed.c (store_bit_field_1): Use it.
* expr.c (undefined_operand_subword_p): Likewise.
* simplify-rtx.c (simplify_subreg): Likewise.

Index: gcc/rtl.h
===
--- gcc/rtl.h   2017-08-27 23:19:56.284397205 +0100
+++ gcc/rtl.h   2017-08-28 09:19:13.737714836 +0100
@@ -2827,6 +2827,8 @@ subreg_highpart_offset (machine_mode out
 }
 
 extern int byte_lowpart_offset (machine_mode, machine_mode);
+extern int subreg_memory_offset (machine_mode, machine_mode, unsigned int);
+extern int subreg_memory_offset (const_rtx);
 extern rtx make_safe_from (rtx, rtx);
 extern rtx convert_memory_address_addr_space_1 (machine_mode, rtx,
addr_space_t, bool, bool);
Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  2017-08-22 17:14:30.335896832 +0100
+++ gcc/emit-rtl.c  2017-08-28 09:19:13.736814836 +0100
@@ -1013,6 +1013,33 @@ byte_lowpart_offset (machine_mode outer_
   else
 return subreg_lowpart_offset (outer_mode, inner_mode);
 }
+
+/* Return the offset of (subreg:OUTER_MODE (mem:INNER_MODE X) OFFSET)
+   from address X.  For paradoxical big-endian subregs this is a
+   negative value, otherwise it's the same as OFFSET.  */
+
+int
+subreg_memory_offset (machine_mode outer_mode, machine_mode inner_mode,
+ unsigned int offset)
+{
+  if (paradoxical_subreg_p (outer_mode, inner_mode))
+{
+  gcc_assert (offset == 0);
+  return -subreg_lowpart_offset (inner_mode, outer_mode);
+}
+  return offset;
+}
+
+/* As above, but return the offset that existing subreg X would have
+   if SUBREG_REG (X) were stored in memory.  The only significant thing
+   about the current SUBREG_REG is its mode.  */
+
+int
+subreg_memory_offset (const_rtx x)
+{
+  return subreg_memory_offset (GET_MODE (x), GET_MODE (SUBREG_REG (x)),
+  SUBREG_BYTE (x));
+}
 
 /* Generate a REG rtx for a new pseudo register of mode MODE.
This pseudo is assigned the next sequential register number.  */
Index: gcc/expmed.c
===
--- gcc/expmed.c2017-08-27 23:19:56.284397205 +0100
+++ gcc/expmed.c2017-08-28 09:19:13.736814836 +0100
@@ -726,29 +726,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
   while (GET_CODE (op0) == SUBREG)
 {
-  /* The following line once was done only if WORDS_BIG_ENDIAN,
-but I think that is a mistake.  WORDS_BIG_ENDIAN is
-meaningful at a much higher level; when structures are copied
-between memory and regs, the higher-numbered regs
-always get higher addresses.  */
-  int inner_mode_size = GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0)));
-  int outer_mode_size = GET_MODE_SIZE (GET_MODE (op0));
-  int byte_offset = 0;
-
-  /* Paradoxical subregs need special handling on big-endian machines.  */
-  if (paradoxical_subreg_p (op0))
-   {
- int difference = inner_mode_size - outer_mode_size;
-
- if (WORDS_BIG_ENDIAN)
-   byte_offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD;
- if (BYTES_BIG_ENDIAN)
-   byte_offset += difference % UNITS_PER_WORD;
-   }
-  else
-   byte_offset = SUBREG_BYTE (op0);
-
-  bitnum += byte_offset * BITS_PER_UNIT;
+  bitnum += subreg_memory_offset (op0) * BITS_PER_UNIT;
   op0 = SUBREG_REG (op0);
 }
 
Index: gcc/expr.c
===
--- gcc/expr.c  2017-08-27 23:19:56.149397206 +0100
+++ gcc/expr.c  2017-08-28 09:19:13.737714836 +0100
@@ -3524,30 +3524,12 @@ emit_move_ccmode (machine_mode mode, rtx
 static bool
 undefined_operand_subword_p (const_rtx op, int i)
 {
-  machine_mode innermode, innermostmode;
-  int offset;
   if (GET_CODE (op) != SUBREG)
 return false;
-  innermode = GET_MODE (op);
-  innermostmode = GET_MODE (SUBREG_REG (op));
-  offset = i * UNITS_PER_WORD + SUBREG_BYTE (op);
-  /* The SUBREG_BYTE represents offset, as if the value were stored in
- memory, except for a paradoxical subreg where we define
- SUBREG_BYTE to be 0; undo this exception as in
- simplify_subreg.  */
-  if (SUBREG_BYTE (op) == 0
-   

Re: Improve ECF_NOTHROW flags for direct internal functions

2017-08-28 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Aug 17, 2017 at 1:06 PM, Richard Sandiford
>  wrote
>> Richard Biener  writes:
>>> On Thu, Aug 17, 2017 at 11:49 AM, Richard Sandiford
>>>  wrote:
 Internal functions that map directly to an optab can only throw an
 exception for -fnon-call-exceptions.  This patch handles that in
 internal_fn_flags, in a similar way to ATTR_*NOTHROW in builtins.def.

 (Functions that don't throw even for flag_non_call_exceptions should be
 explicitly marked ECF_NOTHROW in internal-fn.def.)

 Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>>
>>> Hmm.  Note the outcome of flag_non_call_exceptions depends on the
>>> current function and thus IPA passes querying flags would need to
>>> push an appropriate function context.  This means the function should
>>> get a struct function * argument and opt_for_fn (fn,
>>> flag_non_call_exceptions)
>>> should be used.  It doesn't help very much that all callers don't have
>>> any such context either which means this "optimization" looks like
>>> in the wrong place :/  (the global value of flag_non_call_exceptions in
>>> the IPA case isn't necessarily conservative).
>>>
>>> So if you insist then add a comment and add a && cfun check so
>>> we're sure we are in non-IPA context (or in properly setup context).
>>
>> Bah.  In that case, what should happen if a -fno-non-call-exceptions
>> function is inlined into an -fnon-call-exceptions one?  Should the call
>> keep the NOTHROWness of the original function, or should it lose
>> NOTHROWness (and thus gain an exception edge)?
>
> nothrow-ness is tracked on the GIMPLE stmt via gimple_call_[set_]nothrow_p,
> GIMPLE shouldn't look at flag_non_call_exceptions, it is basically part
> of the IL.
>
>> I guess the path of least resistance would be to add an extra check
>> for this case in the places that need it, rather than relying solely
>> on gimple_call_flags.
>
> Well, gimple_call_flags works fine already (looking at the above in
> addition to internal_fn_flags).  call_expr_flags looks like it might not.

OK, how does this look?  Only the gimple flags matter for the use
case I've seen (which is covered by the SVE tests, but hard to
test as-is).

Tested on aarch64-linux-gnu and x86_64-linux-gnu.

Thanks,
Richard


2017-08-28  Richard Sandiford  

gcc/
* gimplify.c (gimplify_call_expr): Copy the nothrow flag to
calls to internal functions.
(gimplify_modify_expr): Likewise.
* tree-call-cdce.c (use_internal_fn): Likewise.
* tree-ssa-math-opts.c (pass_cse_reciprocals::execute): Likewise.
(convert_to_divmod): Set the nothrow flag.
* tree-if-conv.c (predicate_mem_writes):  Likewise.
* tree-vect-stmts.c (vectorizable_mask_load_store): Likewise.
(vectorizable_call): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
* tree-vect-patterns.c (vect_recog_pow_pattern): Likewise.
(vect_recog_mask_conversion_pattern): Likewise.

Index: gcc/gimplify.c
===
*** gcc/gimplify.c  2017-08-17 13:17:21.266412322 +0100
--- gcc/gimplify.c  2017-08-28 09:10:13.246297839 +0100
*** gimplify_call_expr (tree *expr_p, gimple
*** 3150,3156 
  
if (EXPR_CILK_SPAWN (*expr_p))
  gimplify_cilk_detach (pre_p);
!   gimple *call = gimple_build_call_internal_vec (ifn, vargs);
gimplify_seq_add_stmt (pre_p, call);
return GS_ALL_DONE;
  }
--- 3150,3157 
  
if (EXPR_CILK_SPAWN (*expr_p))
  gimplify_cilk_detach (pre_p);
!   gcall *call = gimple_build_call_internal_vec (ifn, vargs);
!   gimple_call_set_nothrow (call, TREE_NOTHROW (*expr_p));
gimplify_seq_add_stmt (pre_p, call);
return GS_ALL_DONE;
  }
*** gimplify_modify_expr (tree *expr_p, gimp
*** 5636,5641 
--- 5637,5643 
  vargs.quick_push (CALL_EXPR_ARG (*from_p, i));
}
  call_stmt = gimple_build_call_internal_vec (ifn, vargs);
+ gimple_call_set_nothrow (call_stmt, TREE_NOTHROW (*from_p));
  gimple_set_location (call_stmt, EXPR_LOCATION (*expr_p));
}
else
Index: gcc/tree-call-cdce.c
===
*** gcc/tree-call-cdce.c2017-06-30 12:50:38.243662675 +0100
--- gcc/tree-call-cdce.c2017-08-28 09:10:13.246297839 +0100
*** use_internal_fn (gcall *call)
*** 1019,1024 
--- 1019,1025 
  args.safe_push (gimple_call_arg (call, i));
gcall *new_call = gimple_build_call_internal_vec (ifn, args);
gimple_set_location (new_call, gimple_location (call));
+   gimple_call_set_nothrow (new_call, gimple_call_nothrow_p (call));
  
/* Transfer the 

Re: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-08-28 Thread Richard Biener
On Sun, 27 Aug 2017, Jon Beniston wrote:

> Hi,
> 
> I have an out-of-tree GCC port and it is struggling supporting
> auto-vectorization on some dot product instructions.  For example, I have an
> instruction that takes three operands which are all 32-bit general
> registers. The second and third operands will be treated as V2HI then do dot
> product, and then generate an SI result which is then added to the first
> operand which is SI as well.
> 
> I do see there is dot product recognizer in tree-vect-patters.c, however, I
> found the following testcase still can't be auto-vectorized on my port which
> has implemented all necessary dot product standard patterns.  This testcase
> can't be auto-vectorized on other targets that have similar V2HI dot product
> instructions as well, for example ARC.
> 
> === test.c ===
> #define K 4
> #define M 4
> #define N 256
> int in[N*K][M];
> int out[K];
> int coeff[N][M];
> void
> foo (void)
> {
>   int i, j, k;
>   int sum;
>   for (k = 0; k < K; k++)
> {
>   sum = 0;
>   for (j = 0; j < M; j++)
> for (i = 0; i < N; i++)
>   sum += in[i+k][j] * coeff[i][j];
>   out[k] = sum;
> }
> }
> ===
>   The reason that auto-vectorizer doesn't work seems to be that GCC doesn't
> support single-element vector types in get_vectype_for_scalar_type_and_size.
> tree-vect-stmts.c: get_vectype_for_scalar_type_and_size
>   ...
>   if (nunits <= 1)
> return NULL_TREE;
> 
> So, I am thinking this actually should be relaxed to support more cases.  At
> least on vector reduction operations which normally will have scalar result
> with wider types than the element type of input operands.
> 
> I have tried to make the auto-vectorizer work for my V2HI dot product case,
> with the patch attached. Is this the correct approach?

Hum,

@@ -9039,7 +9040,7 @@
   else
 simd_mode = mode_for_vector (inner_mode, size / nbytes);
   nunits = GET_MODE_SIZE (simd_mode) / nbytes;
-  if (nunits < 1) /* Support V1SI.  */
+  if (nunits < 1 || (nunits == 1 && !reduct_p))
 return NULL_TREE;

   vectype = build_vector_type (scalar_type, nunits);

doesn't seem to be against trunk which has

  if (nunits <= 1)
return NULL_TREE;

so what happens if you just change that to

  if (nunits < 1)
return NULL_TREE;

?

Richard.

> Cheers,
> Jon
> 
> gcc/
> 2017-08-27  Jon Beniston 
> 
> * tree-vectorizer.h (get_vectype_for_scalar_type): New optional
> parameter declaration.
> * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Add new
> optional parameter "reduct_p".  Support single element vector types
> if it is true.
> (get_vectype_for_scalar_type): Add new parameter "reduct_p".
> * tree-vect-patterns.c (vect_pattern_recog_1): Pass new parameter
> "reduct_p".
> * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
> (vect_model_reduction_cost): Likewise.
> (get_initial_def_for_induction): Likewise.
> (vect_create_epilog_for_reduction): Likewise.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)