Re: [PATCH, rs6000] 3/3 Add x86 SSE intrinsics to GCC PPC64LE taget

2017-08-16 Thread Segher Boessenkool
On Wed, Aug 16, 2017 at 03:50:55PM -0500, Steven Munroe wrote:
> This it part 3/3 for contributing PPC64LE support for X86 SSE
> instrisics. This patch includes testsuite/gcc.target tests for the
> intrinsics included by xmmintrin.h. 

> +#define CHECK_EXP(UINON_TYPE, VALUE_TYPE, FMT)   \

Should that be UNION_TYPE?

Looks fine otherwise :-)


Segher


Re: [PATCH, rs6000] 2/3 Add x86 SSE intrinsics to GCC PPC64LE taget

2017-08-16 Thread Segher Boessenkool
Hi!

On Wed, Aug 16, 2017 at 03:35:40PM -0500, Steven Munroe wrote:
> +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_add_ss (__m128 __A, __m128 __B)
> +{
> +#ifdef _ARCH_PWR7
> +  __m128 a, b, c;
> +  static const __vector unsigned int mask = {0x, 0, 0, 0};
> +  /* PowerISA VSX does not allow partial (for just lower double)
> +   * results. So to insure we don't generate spurious exceptions
> +   * (from the upper double values) we splat the lower double
> +   * before we to the operation. */

No leading stars in comments please.

> +  a = vec_splat (__A, 0);
> +  b = vec_splat (__B, 0);
> +  c = a + b;
> +  /* Then we merge the lower float result with the original upper
> +   * float elements from __A.  */
> +  return (vec_sel (__A, c, mask));
> +#else
> +  __A[0] = __A[0] + __B[0];
> +  return (__A);
> +#endif
> +}

It would be nice if we could just write the #else version and get the
more optimised code, but I guess we get something horrible going through
memory, instead?

> +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_rcp_ps (__m128 __A)
> +{
> +  __v4sf result;
> +
> +  __asm__(
> +  "xvresp %x0,%x1;\n"
> +  : "=v" (result)
> +  : "v" (__A)
> +  : );
> +
> +  return (result);
> +}

There is a builtin for this (__builtin_vec_re).

> +/* Convert the lower SPFP value to a 32-bit integer according to the current
> +   rounding mode.  */
> +extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_cvtss_si32 (__m128 __A)
> +{
> +  __m64 res = 0;
> +#ifdef _ARCH_PWR8
> +  __m128 vtmp;
> +  __asm__(
> +  "xxsldwi %x1,%x2,%x2,3;\n"
> +  "xscvspdp %x1,%x1;\n"
> +  "fctiw  %1,%1;\n"
> +  "mfvsrd  %0,%x1;\n"
> +  : "=r" (res),
> + "=" (vtmp)
> +  : "wa" (__A)
> +  : );
> +#endif
> +  return (res);
> +}

Maybe it could do something better than return the wrong answer for non-p8?

> +extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))

> +#ifdef __LITTLE_ENDIAN__
> +  return result[1];
> +#elif __BIG_ENDIAN__
> +  return result [0];

Remove the extra space here?

> +_mm_max_pi16 (__m64 __A, __m64 __B)

> +  res.as_short[0] = (m1.as_short[0] > m2.as_short[0])? m1.as_short[0]: 
> m2.as_short[0];
> +  res.as_short[1] = (m1.as_short[1] > m2.as_short[1])? m1.as_short[1]: 
> m2.as_short[1];
> +  res.as_short[2] = (m1.as_short[2] > m2.as_short[2])? m1.as_short[2]: 
> m2.as_short[2];
> +  res.as_short[3] = (m1.as_short[3] > m2.as_short[3])? m1.as_short[3]: 
> m2.as_short[3];

Space before ? and : .

> +_mm_min_pi16 (__m64 __A, __m64 __B)

In this function, too.

> +extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_m_pmovmskb (__m64 __A)
> +{
> +  return _mm_movemask_pi8 (__A);
> +}
> +/* Multiply four unsigned 16-bit values in A by four unsigned 16-bit values
> +   in B and produce the high 16 bits of the 32-bit results.  */
> +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))

Newline before the comment?

> +_mm_sad_pu8 (__m64  __A, __m64  __B)

> +  /* Sum four groups of bytes into integers.  */
> +  vsum = (__vector signed int) vec_sum4s (vabsdiff, zero);
> +  /* sum across four integers with integer result.  */
> +  vsum = vec_sums (vsum, (__vector signed int) zero);
> +  /* the sum is in the right most 32-bits of the vector result.
> +   Transfer the a GPR and truncate to 16 bits.  */

That last line has an indentation problem.  Sentences should start with
a capital, in those last two comments.

> +/* Stores the data in A to the address P without polluting the caches.  */
> +extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_stream_pi (__m64 *__P, __m64 __A)
> +{
> +  /* Use the data cache block touch for store transient.  */
> +  __asm__ (
> +"dcbtstt 0,%0;"

No need for the ";".  dcbtstt isn't really the same semantics but I don't
think you can do better.

The rest looks great.

Please do something about the _mm_cvtss_si32, have a look at the other
comments, and it is ready to commit.  Thanks,


Segher


Ping on target independent stack clash protection patches

2017-08-16 Thread Jeff Law

#01 of #08:
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01971.html

#02 of #08:
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01972.html

#03 of #08:
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01974.html

Need to reach some kind of closure on these, then I can start pinging
the target maintainers for the rest of the bits...

Jeff


[PATCH 5/3] C++ bits to improve detection of attribute conflicts (PR 81544)

2017-08-16 Thread Martin Sebor

To make review easier I broke out the C++ changes for the attributes
work into a patch of their own.  I also found the API I had asked
about, to look up a declaration based on one that's about to be
added/merged.

This patch depends on the foundation bits posted here:

  https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00599.html

Thanks
Martin

PS The decls_match changes were required only because the x86_64
target's targetm.target_option.function_versions() fails with
a hard error when it finds a conflict in the target attribute
while comparing declarations.  That doesn't seem quite right
for a predicate-type of an API but I didn't see a better way
around it than to introduce a new argument and avoid calling
into the back end when comparing attributes.

PPS I found a surprising number of bugs in the C++ front end
while working on this project: 81873, 81762, 81761, and 81609
so I split up the tests I had initially added under c-c++-common
between C and C++ and left in c-c++-common only the subset that
passes in both languages.
PR c/81544 - attribute noreturn and warn_unused_result on the same function accepted

gcc/cp/ChangeLog:

	PR c/81544
	* cp-tree.h (decls_match): Add default argument.
	* decl.c (decls_match): Avoid calling into the target back end
	and triggering an error.
	* decl2.c (cplus_decl_attributes): Look up existing declaration and
	pass it to decl_attributes.
	* tree.c (cxx_attribute_table): Initialize new member of struct
	attribute_spec.

gcc/testsuite/ChangeLog:

	PR c/81544
	* g++.dg/Wattributes-2.C: New test.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 115cdaf..f2bac2a 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6080,7 +6080,7 @@ extern void finish_scope			(void);
 extern void push_switch(tree);
 extern void pop_switch(void);
 extern tree make_lambda_name			(void);
-extern int decls_match(tree, tree);
+extern int decls_match(tree, tree, bool = true);
 extern tree duplicate_decls			(tree, tree, bool);
 extern tree declare_local_label			(tree);
 extern tree define_label			(location_t, tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index aab2019..c9c8cd1 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -993,7 +993,7 @@ push_local_name (tree decl)
`const int&'.  */
 
 int
-decls_match (tree newdecl, tree olddecl)
+decls_match (tree newdecl, tree olddecl, bool record_versions /* = true */)
 {
   int types_match;
 
@@ -1088,6 +1088,7 @@ decls_match (tree newdecl, tree olddecl)
   if (types_match
 	  && !DECL_EXTERN_C_P (newdecl)
 	  && !DECL_EXTERN_C_P (olddecl)
+	  && record_versions
 	  && targetm.target_option.function_versions (newdecl, olddecl))
 	{
 	  /* Mark functions as versions if necessary.  Modify the mangled decl
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 2a52f8c..82df5e3 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -1404,7 +1404,31 @@ cplus_decl_attributes (tree *decl, tree attributes, int flags)
 		   attributes, flags);
 }
   else
-decl_attributes (decl, attributes, flags);
+{
+  tree last_decl = (DECL_P (*decl) && DECL_NAME (*decl)
+			? lookup_name (DECL_NAME (*decl)) : NULL_TREE);
+
+  if (last_decl && TREE_CODE (last_decl) == OVERLOAD)
+	for (ovl_iterator iter (last_decl, true); ; ++iter)
+	  {
+	if (!iter)
+	  {
+		last_decl = NULL_TREE;
+		break;
+	  }
+
+	if (TREE_CODE (*iter) == OVERLOAD)
+	  continue;
+
+	if (decls_match (*decl, *iter, /*record_decls=*/false))
+	  {
+		last_decl = *iter;
+		break;
+	  }
+	  }
+
+  decl_attributes (decl, attributes, flags, last_decl);
+}
 
   if (TREE_CODE (*decl) == TYPE_DECL)
 SET_IDENTIFIER_TYPE_VALUE (DECL_NAME (*decl), TREE_TYPE (*decl));
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 8f18665..6cbf4b2 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -4314,10 +4314,10 @@ const struct attribute_spec cxx_attribute_table[] =
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
affects_type_identity } */
   { "init_priority",  1, 1, true,  false, false,
-handle_init_priority_attribute, false },
+handle_init_priority_attribute, false, NULL },
   { "abi_tag", 1, -1, false, false, false,
-handle_abi_tag_attribute, true },
-  { NULL,	  0, 0, false, false, false, NULL, false }
+handle_abi_tag_attribute, true, NULL },
+  { NULL, 0, 0, false, false, false, NULL, false, NULL }
 };
 
 /* Table of C++ standard attributes.  */
@@ -4326,10 +4326,10 @@ const struct attribute_spec std_attribute_table[] =
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
affects_type_identity } */
   { "maybe_unused", 0, 0, false, false, false,
-handle_unused_attribute, false },
+handle_unused_attribute, false, NULL },
   { "nodiscard", 0, 0, false, false, false,
-handle_nodiscard_attribute, false },
-  { NULL,	  0, 0, false, false, false, NULL, false }
+handle_nodiscard_attribute, false, NULL },
+  { NULL, 0, 0, false, false, 

Re: [RS6000] PR 80938, Don't emit eh_frame for regs that don't need saving

2017-08-16 Thread Segher Boessenkool
On Thu, Aug 17, 2017 at 11:25:27AM +0930, Alan Modra wrote:
> On Wed, Aug 16, 2017 at 06:23:13PM -0500, Segher Boessenkool wrote:
> > Maybe whatever is creating those instructions should set RTX_FRAME_RELATED_P
> > by itself?  Not sure if that is nicer.
> > 
> > Both this CR2 and R0 handling are pretty nasty hacks.  Could you add a
> > comment saying that?
> 
> I wouldn't say the R0 handling is a nasty hack at all.  You can't save
> LR directly, storing to the stack must go via a gpr.  I'm 100% sure
> you know that, and so would anyone else working on powerpc gcc
> support.  It so happens that we use r0 in every case we hit this 
> code.  *That* fact is commented.  I don't really know what else you
> want.

But R0 isn't necessarily used for LR here.  That is true currently in
all cases I can see, but it can be used for other things.  The ABI
doesn't force things here.

The separate shrink-wrapping code does use R0 for other things, but it
manually sets the CFI notes anyway, so no problem there.  Maybe I should
just stop worrying.

> (Incidentally, the dwarf2cfi.c behaviour is described by
> 
>   In addition, if a register has previously been saved to a different
>   register,
> 
> Yup, great comment that one!  Dates back to 2004, commit 60ea93bb72.)

Perhaps inspired by the REG_CFA_REGISTER name for the RTL note ;-)

> The CR2 thing is a long-standing convention for frame info about CR, a
> wart fixed by ELFv2.  See elsewhere

That ELFv2 fix is why I still haven't submitted the separate shrink-wrapping
for CR fields code -- it complicates things a lot :-/

> I'l go with:
> 
>   /* If we see CR2 then we are here on a Darwin world save.  Saves of
>  CR2 signify the whole CR is being saved.  This is a long-standing
>  ABI wart fixed by ELFv2.  As for r0/lr there is no need to check
>  that CR needs to be saved.  */

That is fine, thanks again.


Segher


Re: [RS6000] PR 80938, Don't emit eh_frame for regs that don't need saving

2017-08-16 Thread Alan Modra
On Wed, Aug 16, 2017 at 06:23:13PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Aug 16, 2017 at 08:05:04AM +0930, Alan Modra wrote:
> > Repost with requested changes.  I've extracted the logic that omits
> > frame saves from rs6000_frame_related to a new function, because it's
> > easier to document that way.  The logic has been simplified a little
> > too: fixed_reg_p doesn't need to be called there.
> > 
> > PR target/80938
> > * config/rs6000/rs6000.c (rs6000_savres_strategy): Revert 2017-08-09.
> > Don't use store multiple if only one reg needs saving.
> > (interesting_frame_related_regno): New function.
> > (rs6000_frame_related): Don't emit eh_frame for regs that
> > don't need saving.
> > (rs6000_emit_epilogue): Likewise.
> 
> It's not just eh_frame, it's all call frame information.

Sure, I meant to fix that.

> > +/* This function is called when rs6000_frame_related is processing
> > +   SETs within a PARALLEL, and returns whether the REGNO save ought to
> > +   be marked RTX_FRAME_RELATED_P.  The PARALLELs involved are those
> > +   for out-of-line register save functions, store multiple, and the
> > +   Darwin world_save.  They may contain registers that don't really
> > +   need saving.  */
> > +
> > +static bool
> > +interesting_frame_related_regno (unsigned int regno)
> > +{
> > +  /* Saves apparently of r0 are actually saving LR.  */
> > +  if (regno == 0)
> > +return true;
> > +  /* If we see CR2 then we are here on a Darwin world save.  Saves of
> > + CR2 signify the whole CR is being saved.  */
> > +  if (regno == CR2_REGNO)
> > +return true;
> > +  /* Omit frame info for any user-defined global regs.  If frame info
> > + is supplied for them, frame unwinding will restore a user reg.
> > + Also omit frame info for any reg we don't need to save, as that
> > + bloats eh_frame and can cause problems with shrink wrapping.
> > + Since global regs won't be seen as needing to be saved, both of
> > + these conditions are covered by save_reg_p.  */
> > +  return save_reg_p (regno);
> > +}
> 
> The function name isn't so great, doesn't say what it does at all.  Not
> that I can think of anything better.
> 
> Maybe whatever is creating those instructions should set RTX_FRAME_RELATED_P
> by itself?  Not sure if that is nicer.
> 
> Both this CR2 and R0 handling are pretty nasty hacks.  Could you add a
> comment saying that?

I wouldn't say the R0 handling is a nasty hack at all.  You can't save
LR directly, storing to the stack must go via a gpr.  I'm 100% sure
you know that, and so would anyone else working on powerpc gcc
support.  It so happens that we use r0 in every case we hit this 
code.  *That* fact is commented.  I don't really know what else you
want.  Hmm, maybe I'm just too close to this code.  I'll go with
expounding some of the things known, as follows. 

  /* Saves apparently of r0 are actually saving LR.  It doesn't make
 sense to substitute the regno here to test save_reg_p (LR_REGNO).
 We *know* LR needs saving, and dwarf2cfi.c is able to deduce that
 (set (mem) (r0)) is saving LR from a prior (set (r0) (lr)) marked
 as frame related.  */

(Incidentally, the dwarf2cfi.c behaviour is described by

  In addition, if a register has previously been saved to a different
  register,

Yup, great comment that one!  Dates back to 2004, commit 60ea93bb72.)

The CR2 thing is a long-standing convention for frame info about CR, a
wart fixed by ELFv2.  See elsewhere

  /* CR register traditionally saved as CR2.  */
and
  /* In other ABIs, by convention, we use a single CR regnum to
 represent the fact that all call-saved CR fields are saved.
 We use CR2_REGNO to be compatible with gcc-2.95 on Linux.  */

I'l go with:

  /* If we see CR2 then we are here on a Darwin world save.  Saves of
 CR2 signify the whole CR is being saved.  This is a long-standing
 ABI wart fixed by ELFv2.  As for r0/lr there is no need to check
 that CR needs to be saved.  */

> Okay for trunk with those improvements (eh_frame and hack comment).
> Thanks!
> 
> 
> Segher

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH, rs6000] Fix PR target/72804: Poor code gen with -mvsx-timode

2017-08-16 Thread Segher Boessenkool
On Wed, Aug 16, 2017 at 05:56:09PM -0500, Peter Bergner wrote:
> On 8/16/17 5:30 PM, Segher Boessenkool wrote:
> > On Mon, Aug 14, 2017 at 04:28:25PM -0500, Peter Bergner wrote:
> >> +   mr %0,%L1; mr %L0,%1
> > 
> >mr %0,%L1\;mr %L0,%1
> 
> So you want the ';' escaped and the space removed?  Ok.

>From doc/md.texi:

"""
The template may generate multiple assembler instructions.  Write the text
for the instructions, with @samp{\;} between them.
"""

(It actually expands to "\n\t", see read-md.c).


Segher


Re: [PATCH, rs6000] 1/3 Add x86 SSE <xmmintrin,h> intrinsics to GCC PPC64LE taget

2017-08-16 Thread Segher Boessenkool
Hi!

On Wed, Aug 16, 2017 at 02:11:31PM -0500, Steven Munroe wrote:
> These is the third major contribution of X86 intrinsic equivalent
> headers for PPC64LE.

> This patch just adds the mm_malloc.h header with is will be needed by
> xmmintrin.h and cleans up some noisy warnings from the previous MMX
> commit.

> +static __inline void *
> +_mm_malloc (size_t size, size_t alignment)
> +{
> +  /* PowerPC64 ELF V2 ABI requires quadword alignment. */

(Two spaces after full stop; there's one in the changelog as well).

Okay for trunk, thanks,


Segher


Re: [RS6000] PR 80938, Don't emit eh_frame for regs that don't need saving

2017-08-16 Thread Segher Boessenkool
Hi!

On Wed, Aug 16, 2017 at 08:05:04AM +0930, Alan Modra wrote:
> Repost with requested changes.  I've extracted the logic that omits
> frame saves from rs6000_frame_related to a new function, because it's
> easier to document that way.  The logic has been simplified a little
> too: fixed_reg_p doesn't need to be called there.
> 
>   PR target/80938
>   * config/rs6000/rs6000.c (rs6000_savres_strategy): Revert 2017-08-09.
>   Don't use store multiple if only one reg needs saving.
>   (interesting_frame_related_regno): New function.
>   (rs6000_frame_related): Don't emit eh_frame for regs that
>   don't need saving.
>   (rs6000_emit_epilogue): Likewise.

It's not just eh_frame, it's all call frame information.

> +/* This function is called when rs6000_frame_related is processing
> +   SETs within a PARALLEL, and returns whether the REGNO save ought to
> +   be marked RTX_FRAME_RELATED_P.  The PARALLELs involved are those
> +   for out-of-line register save functions, store multiple, and the
> +   Darwin world_save.  They may contain registers that don't really
> +   need saving.  */
> +
> +static bool
> +interesting_frame_related_regno (unsigned int regno)
> +{
> +  /* Saves apparently of r0 are actually saving LR.  */
> +  if (regno == 0)
> +return true;
> +  /* If we see CR2 then we are here on a Darwin world save.  Saves of
> + CR2 signify the whole CR is being saved.  */
> +  if (regno == CR2_REGNO)
> +return true;
> +  /* Omit frame info for any user-defined global regs.  If frame info
> + is supplied for them, frame unwinding will restore a user reg.
> + Also omit frame info for any reg we don't need to save, as that
> + bloats eh_frame and can cause problems with shrink wrapping.
> + Since global regs won't be seen as needing to be saved, both of
> + these conditions are covered by save_reg_p.  */
> +  return save_reg_p (regno);
> +}

The function name isn't so great, doesn't say what it does at all.  Not
that I can think of anything better.

Maybe whatever is creating those instructions should set RTX_FRAME_RELATED_P
by itself?  Not sure if that is nicer.

Both this CR2 and R0 handling are pretty nasty hacks.  Could you add a
comment saying that?

Okay for trunk with those improvements (eh_frame and hack comment).
Thanks!


Segher


[PATCH], Enable -mfloat128 by default on PowerPC VSX systems

2017-08-16 Thread Michael Meissner
This patch enables -mfloat128 to be the default on PowerPC Linux VSX systems.

This patch depends on the libquadmatch/81848 patch being approved and
installed:
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00977.html

In this patch, I removed the old undocumented -mfloat128-type switch, and made
the default to be on for PowerPC Linux systems.  The default is off for other
systems (such as AIX) because they don't build the float128 emulation routines
in libgcc.

This patch also supersedes the patch for target/70589 that was previously
submitted.  It includes the changes for that bug to enable/disable -mfloat128
via target pragmas/attributes with the exception that the option
-mfloat128-type is now deleted:
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00894.html

I fixed up all of the float128 tests to remove the explict -mfloat128 option
and add -mno-float128 as needed.

I've checked this on a big endian power7 system (both 32-bit and 64-bit) and a
little endian power8 system.  It bootstrapped fine and had no regressions in
the test suite.  I also built the latest boost library and got the same results
for the pre -mfloat128 compiler and the compiler with -mfloat128 as the
default, and there were no changes.  I also built and ran spec 2006 FP tests
with the two compilers, and there no changes in runtime of the tests.

Can I check this into the trunk once the patch for 81848 has been applied?

[gcc]
2017-08-16  Michael Meissner  

PR target/81872
* config/rs6000/rs6000-cpus.def (OTHER_VSX_VECTOR_MASKS): Delete
OPTION_MASK_FLOAT128_TYPE.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.opt (TARGET_FLOAT128_ENABLE_TYPE): Make
this a variable instead of a target switch.
(-mfloat128-type): Delete switch.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Set
-mfloat128 by default on Linux VSX systems.  Remove support for
an explicit -mfloat128-type option.  Rework how
-mfloat128-hardware is set on ISA 3.0 systems.  Even if
-mno-float128 is used, enable the IEEE 128-bit type support
internally.
(rs6000_opt_masks): Delete -mfloat128-type.
* config/rs6000/rs6000.h (MASK_FLOAT128_TYPE): Delete.
(MASK_FLOAT128_KEYWORD): Define.
(RS6000_BTM_FLOAT128): Use the mask for -mfloat128 instead of
-mfloat128-type to signal we should build the IEEE 128-bit
built-in functions.  The function rs6000_builtin_mask_calculate in
rs6000.c still uses TARGET_FLOAT128_ENABLE_TYPE to determine
whether to enable the built-in functions or not.
* doc/invoke.texi (RS/6000 and PowerPC Options): Update -mfloat128
and -mfloat128-hardware documentation.

PR target/70589
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Make the
real keyword for IEEE 128-bit be __ieee128 for VSX runs, and define
__float128 to be __ieee128 if -mfloat128 is used or a float128
target attribute/pragma is used.  Enable _Float128 all of the
time.
(rs6000_cpu_cpp_builtins): Likewise.
* config/rs6000/rs6000.c (rs6000_init_builtins): Likewise.
(rs6000_floatn_mode): Likewise.
(rs6000_opt_masks): Likewise.

[gcc/testsuite]
2017-08-16  Michael Meissner  

PR target/81872
* gcc.target/powerpc/float128-1.c: Change float128 tests to assume
-mfloat128 is on by default.  Explicitly use -mno-float128 to test
when the __float128 keyword should not be available.  Use -mvsx,
-mpower8-vector, or -mpower9-vector instead of -mcpu=power7/8/9 in
many cases.
* gcc.target/powerpc/float128-2.c: Likewise.
* gcc.target/powerpc/float128-cmp.c: Likewise.
* gcc.target/powerpc/float128-complex-1.c: Likewise.
* gcc.target/powerpc/float128-complex-2.c: Likewise.
* gcc.target/powerpc/float128-hw.c: Likewise.
* gcc.target/powerpc/float128-mix.c: Likewise.
* gcc.target/powerpc/float128-type-1.c: Likewise.
* gcc.target/powerpc/float128-type-2.c: Likewise.

PR target/70589
* gcc.target/powerpc/float128-3.c: New test.
* gcc.target/powerpc/float128-4.c: Likewise.
* gcc.target/powerpc/float128-5.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-cpus.def
===
--- gcc/config/rs6000/rs6000-cpus.def   
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 251056)
+++ gcc/config/rs6000/rs6000-cpus.def   (.../gcc/config/rs6000) (working copy)
@@ -92,7 +92,6 @@
 #define OTHER_VSX_VECTOR_MASKS (OTHER_P8_VECTOR_MASKS  \
 | OPTION_MASK_EFFICIENT_UNALIGNED_VSX  

Re: [PATCH, rs6000] Fix endianness issue with vmrgew and vmrgow permute constant recognition

2017-08-16 Thread Segher Boessenkool
Hi!

On Tue, Aug 15, 2017 at 04:14:21PM -0500, Bill Schmidt wrote:
> One of Carl Love's proposed built-in function patches exposed a bug in the 
> Power
> code that recognizes specific permute control vector patterns for a permute, 
> and
> changes the permute to a more specific and more efficient instruction.  The
> patterns for p8_vmrgew_v4si and p8_vmrgow are generated regardless of 
> endianness,
> leading to problems on the little-endian port.

Ouch.

> The use in rs6000.c of p8_vmrgew_v4sf_direct, rather than 
> p8_vmrgew_v4si_direct,
> is arbitrary.  The existing code already handles converting (for free) a V4SI
> operand to a V4SF one, so there's no need to specify the mode directly; and it
> would actually complicate the code to extract the mode so the "proper" pattern
> would match.  I think what I have here is better, but if you disagree I can
> change it.

It's fine I think.

This is okay for trunk, thanks!


Segher


Re: [PATCH, rs6000] Fix PR target/72804: Poor code gen with -mvsx-timode

2017-08-16 Thread Peter Bergner
On 8/16/17 5:30 PM, Segher Boessenkool wrote:
> On Mon, Aug 14, 2017 at 04:28:25PM -0500, Peter Bergner wrote:
>> +   mr %0,%L1; mr %L0,%1
> 
>mr %0,%L1\;mr %L0,%1


So you want the ';' escaped and the space removed?  Ok.



>> +  [(set (match_operand:VSX_TI 0 "int_reg_operand" "")
> 
> You can leave out the default ""?

Cut/paste of another peephole2.  I'll try without it, thanks.



>> +/* { dg-do compile { target { powerpc64*-*-* } } } */
> 
> powerpc64*-*-* is never correct, we are biarch.  { target lp64 } instead?

Ah yes, that is better.


I'll make the above changes and commit after another quick test cycle.
Thanks!

Peter



Re: [PATCH, rs6000] Remove TARGET_VSX_TIMODE and -mno-vsx-timode usage

2017-08-16 Thread Segher Boessenkool
On Mon, Aug 14, 2017 at 06:02:51PM -0500, Peter Bergner wrote:
> The undocumented option -mvsx-timode was added because there were reload
> bugs we couldn't fix when we tried allowing TImode values in VSX registers.
> We used the option to allow TImode values in VSX registers when LRA was
> being used, but not when reload was being used.  Now that GCC 8 has removed
> the option of using reload, the TARGET_VSX_TIMODE flag is now logically the
> same as TARGET_VSX.  This patch replaces the TARGET_VSX_TIMODE flag with
> uses of TARGET_VSX and removes the ability of using -mno-vsx-timode.
> The option -mvsx-timode is now just a dummy stub similar to what we did
> with -mlra.

> Index: gcc/config/rs6000/altivec.md
> ===
> --- gcc/config/rs6000/altivec.md  (revision 251056)
> +++ gcc/config/rs6000/altivec.md  (working copy)
> @@ -218,7 +218,7 @@ (define_mode_attr VS_sxwsp [(V4SI "sxw")
>  (define_mode_iterator VParity [V4SI
>  V2DI
>  V1TI
> -(TI "TARGET_VSX_TIMODE")])
> +(TI "TARGET_VSX")])

You can completely remove the condition here as far as I see.

> --- gcc/config/rs6000/rs6000.md   (revision 251056)
> +++ gcc/config/rs6000/rs6000.md   (working copy)
> @@ -399,7 +399,7 @@ (define_mode_iterator FMOVE128_FPR [(TF
>   (TD "TARGET_HARD_FLOAT")])
>  
>  ; Iterators for 128 bit types for direct move
> -(define_mode_iterator FMOVE128_GPR [(TI"TARGET_VSX_TIMODE")
> +(define_mode_iterator FMOVE128_GPR [(TI"TARGET_VSX")
>   (V16QI "")
>   (V8HI  "")
>   (V4SI  "")

Same here I think (less sure).

> --- gcc/config/rs6000/vector.md   (revision 251056)
> +++ gcc/config/rs6000/vector.md   (working copy)
> @@ -31,7 +31,7 @@ (define_mode_iterator VEC_IP [V8HI
> V4SI
> V2DI
> V1TI
> -   (TI "TARGET_VSX_TIMODE")])
> +   (TI "TARGET_VSX")])

This one, too.

> --- gcc/config/rs6000/vsx.md  (revision 251056)
> +++ gcc/config/rs6000/vsx.md  (working copy)
> @@ -34,11 +34,11 @@ (define_mode_iterator VSX_D [V2DF V2DI])
>  ;; types that goes in a single vector register.
>  (define_mode_iterator VSX_LE_128 [(KF   "FLOAT128_VECTOR_P (KFmode)")
> (TF   "FLOAT128_VECTOR_P (TFmode)")
> -   (TI   "TARGET_VSX_TIMODE")
> +   (TI   "TARGET_VSX")
> V1TI])

And this.

>  ;; Iterator for 128-bit integer types that go in a single vector register.
> -(define_mode_iterator VSX_TI [(TI "TARGET_VSX_TIMODE") V1TI])
> +(define_mode_iterator VSX_TI [(TI "TARGET_VSX") V1TI])

More :-)

> @@ -71,7 +71,7 @@ (define_mode_iterator VSX_M [V16QI
>V1TI
>(KF"FLOAT128_VECTOR_P (KFmode)")
>(TF"FLOAT128_VECTOR_P (TFmode)")
> -  (TI"TARGET_VSX_TIMODE")])
> +  (TI"TARGET_VSX")])

And the last.

(In most of those cases we have V1TI without condition already).

Okay for trunk with that simplification.  Thanks!


Segher


[PATCH 4/3] improve detection of attribute conflicts (PR 81544)

2017-08-16 Thread Martin Sebor

Jon,

Attached is the libstdc++ only patch to remove the pointless
const attribute from __pool::_M_destroy_thread_key(void*).

  https://gcc.gnu.org/ml/gcc/2017-08/msg00027.html

I only belatedly now broke it out of the larger patch under
review here:

  https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00599.html

Thanks
Martin
libstdc++-v3/ChangeLog:

	PR c/81544
	* include/ext/mt_allocator.h (_M_destroy_thread_key): Remove
	pointless attribute const.

diff --git a/libstdc++-v3/include/ext/mt_allocator.h b/libstdc++-v3/include/ext/mt_allocator.h
index effb13b..f349ff8 100644
--- a/libstdc++-v3/include/ext/mt_allocator.h
+++ b/libstdc++-v3/include/ext/mt_allocator.h
@@ -355,7 +355,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   // XXX GLIBCXX_ABI Deprecated
-  _GLIBCXX_CONST void 
+  void
   _M_destroy_thread_key(void*) throw ();
 
   size_t 


Re: [PATCH, rs6000] Fix PR target/72804: Poor code gen with -mvsx-timode

2017-08-16 Thread Segher Boessenkool
On Mon, Aug 14, 2017 at 04:28:25PM -0500, Peter Bergner wrote:
> The following patch fixes a performance issue when loading/storing/moving
> TImode values when using -mvsx-timode -mcpu=power7 with LRA.  The problem is
> that the vsx_le_permute_ and vsx_le_perm_{load,store}_ patterns
> do no support TImode values in GPRs, and LRA is using these patterns to
> fixup constraints, which ends up leading to really bad code gen as seen by
> the test cases in the bug report.
> 
> This patch fixes the bug by adding GPR support to the above patterns, as
> well as a couple of peepholes that improve the code for loads and stores
> to/from GPRs.
> 
> This passed bootstrapping and regtesting with no regressions and Mike
> ran this on SPEC2006 and found no performance regressions with it.
> 
> Ok for trunk?  Do we want this on the GCC 7 branch where LRA is on by default?

Okay for trunk (see nits below).  For 7, okay after waiting a week or so
for fallout, if you think it really helps there.

>"@
> xxpermdi %x0,%x1,%x1,2
> lxvd2x %x0,%y1
> -   stxvd2x %x1,%y0"
> -  [(set_attr "length" "4")
> -   (set_attr "type" "vecperm,vecload,vecstore")])
> +   stxvd2x %x1,%y0
> +   mr %0,%L1; mr %L0,%1

   mr %0,%L1\;mr %L0,%1

etc.

> +;; Peepholes to catch loads and stores for TImode if TImode landed in
> +;; GPR registers on a little endian system.
> +(define_peephole2
> +  [(set (match_operand:VSX_TI 0 "int_reg_operand" "")

You can leave out the default ""?

> Index: gcc/testsuite/gcc.target/powerpc/pr72804.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/pr72804.c(nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr72804.c(working copy)
> @@ -0,0 +1,25 @@
> +/* { dg-do compile { target { powerpc64*-*-* } } } */

powerpc64*-*-* is never correct, we are biarch.  { target lp64 } instead?

Thanks,


Segher


Re: [PATCH], Enable _Float128 in VSX PowerPC system and enable #pragma GCC target "float128"

2017-08-16 Thread Segher Boessenkool
Hi Mike,

On Mon, Aug 14, 2017 at 03:21:42PM -0400, Michael Meissner wrote:
> This patch enables the _Float128 keyword for the C langauge all of the time 
> for
> PowerPC VSX systems.  The __float128 keyword continues to be only enabled if
> you use the -mfloat128 option.

Looks good, please commit.  Thanks!


Segher


>   PR target/70589
>   * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Make the
>   real keyword for IEEE 128-bit be __ieee128 for VSX runs, and define
>   __float128 to be __ieee128 if -mfloat128 is used or a float128
>   target attribute/pragma is used.  Enable _Float128 all of the
>   time.
>   (rs6000_cpu_cpp_builtins): Likewise.
>   * config/rs6000/rs6000.c (rs6000_init_builtins): Likewise.
>   (rs6000_floatn_mode): Likewise.
>   (rs6000_opt_masks): Likewise.
> 
> [gcc/testsuite]
> 2017-08-14  Michael Meissner  
> 
>   PR target/70589
>   * gcc.target/powerpc/float128-1.c: Eliminate using the
>   -static-libgcc option.  Use -mvsx instead of -mcpu=power7 to run
>   the tests.
>   * gcc.target/powerpc/float128-2.c: Likewise.
>   * gcc.target/powerpc/float128-3.c: Test enabling -mfloat128 via
>   #pragma target.
>   * gcc.target/powerpc/float128-4.c: New test, clone float128-1.c,
>   using the _Float128 keyword.  Do not use the -mfloat128 option.


libgo patch committed: improvements for AIX netpoll

2017-08-16 Thread Ian Lance Taylor
This patch by Tony Reix improves the AIX netpoll code used by libgo.
Bootstrapped on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 251127)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-f02183eb66f5718769f3f6541dcc6744ae1771c0
+152164a7249ecc5c2bfd4a091450dc7c2855f609
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/netpoll_aix.go
===
--- libgo/go/runtime/netpoll_aix.go (revision 250873)
+++ libgo/go/runtime/netpoll_aix.go (working copy)
@@ -7,10 +7,10 @@ package runtime
 import "unsafe"
 
 // This is based on the former libgo/runtime/netpoll_select.c implementation
-// except that it uses poll instead of select and is written in Go.
+// except that it uses AIX pollset_poll instead of select and is written in Go.
+
+type pollset_t int32
 
-// These definitions should come from sysinfo.go as they may be OS-dependent.
-// These are the definitions for the AIX operating system.
 type pollfd struct {
fd  int32
events  int16
@@ -22,8 +22,23 @@ const _POLLOUT = 0x0002
 const _POLLHUP = 0x2000
 const _POLLERR = 0x4000
 
-//extern poll
-func libc_poll(pfds *pollfd, npfds uintptr, timeout uintptr) int32
+type poll_ctl struct {
+   cmdint16
+   events int16
+   fd int32
+}
+
+const _PS_ADD = 0x0
+const _PS_DELETE = 0x2
+
+//extern pollset_create
+func pollset_create(maxfd int32) pollset_t
+
+//extern pollset_ctl
+func pollset_ctl(ps pollset_t, pollctl_array *poll_ctl, array_length int32) 
int32
+
+//extern pollset_poll
+func pollset_poll(ps pollset_t, polldata_array *pollfd, array_length int32, 
timeout int32) int32
 
 //extern pipe
 func libc_pipe(fd *int32) int32
@@ -31,60 +46,100 @@ func libc_pipe(fd *int32) int32
 //extern __go_fcntl_uintptr
 func fcntlUintptr(fd, cmd, arg uintptr) (uintptr, uintptr)
 
-func closeonexec(fd int32) {
-   fcntlUintptr(uintptr(fd), _F_SETFD, _FD_CLOEXEC)
+func fcntl(fd, cmd int32, arg uintptr) uintptr {
+   r, _ := fcntlUintptr(uintptr(fd), uintptr(cmd), arg)
+   return r
 }
 
 var (
-   allocated int
-   pfds  []pollfd
-   mpfds map[uintptr]*pollDesc
-   pmtx  mutex
-   rdwakeint32
-   wrwakeint32
+   ps  pollset_t = -1
+   mpfds   map[int32]*pollDesc
+   pmtxmutex
+   rdwake  int32
+   wrwake  int32
+   needsUpdate bool
 )
 
 func netpollinit() {
var p [2]int32
 
-   // Create the pipe we use to wakeup poll.
+   if ps = pollset_create(-1); ps < 0 {
+   throw("netpollinit: failed to create pollset")
+   }
+   // It is not possible to add or remove descriptors from
+   // the pollset while pollset_poll is active.
+   // We use a pipe to wakeup pollset_poll when the pollset
+   // needs to be updated.
if err := libc_pipe([0]); err < 0 {
throw("netpollinit: failed to create pipe")
}
rdwake = p[0]
wrwake = p[1]
 
-   closeonexec(rdwake)
-   closeonexec(wrwake)
-
-   // Pre-allocate array of pollfd structures for poll.
-   allocated = 128
-   pfds = make([]pollfd, allocated)
+   fl := fcntl(rdwake, _F_GETFL, 0)
+   fcntl(rdwake, _F_SETFL, fl|_O_NONBLOCK)
+   fcntl(rdwake, _F_SETFD, _FD_CLOEXEC)
+
+   fl = fcntl(wrwake, _F_GETFL, 0)
+   fcntl(wrwake, _F_SETFL, fl|_O_NONBLOCK)
+   fcntl(wrwake, _F_SETFD, _FD_CLOEXEC)
+
+   // Add the read side of the pipe to the pollset.
+   var pctl poll_ctl
+   pctl.cmd = _PS_ADD
+   pctl.fd = rdwake
+   pctl.events = _POLLIN
+   if pollset_ctl(ps, , 1) != 0 {
+   throw("netpollinit: failed to register pipe")
+   }
 
-   mpfds = make(map[uintptr]*pollDesc)
+   mpfds = make(map[int32]*pollDesc)
 }
 
 func netpollopen(fd uintptr, pd *pollDesc) int32 {
+   // pollset_ctl will block if pollset_poll is active
+   // so wakeup pollset_poll first.
lock()
-   mpfds[fd] = pd
+   needsUpdate = true
unlock()
-
-   // Wakeup poll.
b := [1]byte{0}
write(uintptr(wrwake), unsafe.Pointer([0]), 1)
 
+   var pctl poll_ctl
+   pctl.cmd = _PS_ADD
+   pctl.fd = int32(fd)
+   pctl.events = _POLLIN | _POLLOUT
+   if pollset_ctl(ps, , 1) != 0 {
+   return int32(errno())
+   }
+   lock()
+   mpfds[int32(fd)] = pd
+   needsUpdate = false
+   unlock()
+
return 0
 }
 
 func netpollclose(fd uintptr) int32 {
+   // pollset_ctl will block if pollset_poll is active
+   // so wakeup pollset_poll first.
lock()
-   delete(mpfds, fd)
+   needsUpdate = true
unlock()
-
-   // Wakeup poll.
   

Re: [PATCH] use strnlen in pretty printer for "%.*s" (PR 81859)

2017-08-16 Thread David Malcolm
On Wed, 2017-08-16 at 10:12 -0600, Martin Sebor wrote:
> PR c/81859 - [8 Regression] valgrind error from
> warn_about_normalization
> 
> gcc/ChangeLog:
> 
>   PR c/81859
>   * pretty-print.c (pp_format): Use strnlen in %.*s to avoid
> reading
>   past the end of an array.
>   (test_pp_format): Add test cases.
> 
> Index: gcc/pretty-print.c
> ===
> --- gcc/pretty-print.c(revision 251100)
> +++ gcc/pretty-print.c(working copy)
> @@ -668,15 +668,11 @@ pp_format (pretty_printer *pp, text_info *text)
>  
>   s = va_arg (*text->args_ptr, const char *);
>  
> - /* Negative precision is treated as if it were
> omitted.  */
> - if (n < 0)
> -   n = INT_MAX;
> + /* Append the lesser of precision and strlen (s)
> characters
> +from the array (which need not be a nul-terminated
> string).
> +Negative precision is treated as if it were
> omitted.  */
> + size_t len = n < 0 ? strlen (s) : strnlen (s, n);
>  
> - /* Append the lesser of precision and strlen (s)
> characters.  */
> - size_t len = strlen (s);
> - if ((unsigned) n < len)
> -   len = n;
> -
>   pp_append_text (pp, s, s + len);
> }
> break;
> @@ -1438,6 +1434,13 @@ test_pp_format ()
>ASSERT_PP_FORMAT_2 ("A 12345678", "%c %x", 'A', 0x12345678);
>ASSERT_PP_FORMAT_2 ("hello world 12345678", "%s %x", "hello
> world",
> 0x12345678);
> +
> +  /* Not nul-terminated.  */
> +  char arr[5] = { '1', '2', '3', '4', '5' };
> +  ASSERT_PP_FORMAT_2 ("123", "%.*s", 3, arr);
> +  ASSERT_PP_FORMAT_2 ("1234", "%.*s", -1, "1234");
> +  ASSERT_PP_FORMAT_2 ("12345", "%.*s", 7, "12345");
> +

The other examples in this selftest append a trailing argument with a
known bit pattern (0x12345678), to ensure that we're consuming
arguments correctly.

Please can you do the same for these tests.

Thanks
Dave


[PATCH, rs6000] 3/3 Add x86 SSE intrinsics to GCC PPC64LE taget

2017-08-16 Thread Steven Munroe
This it part 3/3 for contributing PPC64LE support for X86 SSE
instrisics. This patch includes testsuite/gcc.target tests for the
intrinsics included by xmmintrin.h. 

For these tests I added -Wno-psabi to dg-options to suppress warnings
associated with the vector ABI change in GCC5. These warning are
associated with unions defined in m128-check.h (ported with minimal
change from i386). This removes some noise from make check.


[gcc/testsuite]

2017-08-16  Steven Munroe  

* gcc.target/powerpc/m128-check.h: New file.
* gcc.target/powerpc/sse-check.h: New file.
* gcc.target/powerpc/sse-movmskps-1.c: New file.
* gcc.target/powerpc/sse-movlps-2.c: New file.
* gcc.target/powerpc/sse-pavgw-1.c: New file.
* gcc.target/powerpc/sse-cvttss2si-1.c: New file.
* gcc.target/powerpc/sse-cvtpi32x2ps-1.c: New file.
* gcc.target/powerpc/sse-cvtss2si-1.c: New file.
* gcc.target/powerpc/sse-divss-1.c: New file.
* gcc.target/powerpc/sse-movhps-1.c: New file.
* gcc.target/powerpc/sse-cvtsi2ss-2.c: New file.
* gcc.target/powerpc/sse-subps-1.c: New file.
* gcc.target/powerpc/sse-minps-1.c: New file.
* gcc.target/powerpc/sse-pminub-1.c: New file.
* gcc.target/powerpc/sse-cvtpu16ps-1.c: New file.
* gcc.target/powerpc/sse-shufps-1.c: New file.
* gcc.target/powerpc/sse-ucomiss-2.c: New file.
* gcc.target/powerpc/sse-maxps-1.c: New file.
* gcc.target/powerpc/sse-pmaxub-1.c: New file.
* gcc.target/powerpc/sse-movmskb-1.c: New file.
* gcc.target/powerpc/sse-ucomiss-4.c: New file.
* gcc.target/powerpc/sse-unpcklps-1.c: New file.
* gcc.target/powerpc/sse-mulps-1.c: New file.
* gcc.target/powerpc/sse-rcpps-1.c: New file.
* gcc.target/powerpc/sse-pminsw-1.c: New file.
* gcc.target/powerpc/sse-ucomiss-6.c: New file.
* gcc.target/powerpc/sse-subss-1.c: New file.
* gcc.target/powerpc/sse-movss-2.c: New file.
* gcc.target/powerpc/sse-pmaxsw-1.c: New file.
* gcc.target/powerpc/sse-minss-1.c: New file.
* gcc.target/powerpc/sse-movaps-2.c: New file.
* gcc.target/powerpc/sse-movlps-1.c: New file.
* gcc.target/powerpc/sse-maxss-1.c: New file.
* gcc.target/powerpc/sse-movhlps-1.c: New file.
* gcc.target/powerpc/sse-cvttss2si-2.c: New file.
* gcc.target/powerpc/sse-cvtpi8ps-1.c: New file.
* gcc.target/powerpc/sse-cvtpi32ps-1.c: New file.
* gcc.target/powerpc/sse-mulss-1.c: New file.
* gcc.target/powerpc/sse-cvtsi2ss-1.c: New file.
* gcc.target/powerpc/sse-cvtss2si-2.c: New file.
* gcc.target/powerpc/sse-movlhps-1.c: New file.
* gcc.target/powerpc/sse-movhps-2.c: New file.
* gcc.target/powerpc/sse-rsqrtps-1.c: New file.
* gcc.target/powerpc/sse-xorps-1.c: New file.
* gcc.target/powerpc/sse-cvtpspi8-1.c: New file.
* gcc.target/powerpc/sse-orps-1.c: New file.
* gcc.target/powerpc/sse-addps-1.c: New file.
* gcc.target/powerpc/sse-cvtpi16ps-1.c: New file.
* gcc.target/powerpc/sse-ucomiss-1.c: New file.
* gcc.target/powerpc/sse-ucomiss-3.c: New file.
* gcc.target/powerpc/sse-pmulhuw-1.c: New file.
* gcc.target/powerpc/sse-andps-1.c: New file.
* gcc.target/powerpc/sse-cmpss-1.c: New file.
* gcc.target/powerpc/sse-divps-1.c: New file.
* gcc.target/powerpc/sse-andnps-1.c: New file.
* gcc.target/powerpc/sse-ucomiss-5.c: New file.
* gcc.target/powerpc/sse-movss-1.c: New file.
* gcc.target/powerpc/sse-sqrtps-1.c: New file.
* gcc.target/powerpc/sse-cvtpu8ps-1.c: New file.
* gcc.target/powerpc/sse-cvtpspi16-1.c: New file.
* gcc.target/powerpc/sse-movaps-1.c: New file.
* gcc.target/powerpc/sse-movss-3.c: New file.
* gcc.target/powerpc/sse-unpckhps-1.c: New file.
* gcc.target/powerpc/sse-addss-1.c: New file.
* gcc.target/powerpc/sse-psadbw-1.c: New file.


Index: gcc/testsuite/gcc.target/powerpc/sse-movmskps-1.c
===
--- gcc/testsuite/gcc.target/powerpc/sse-movmskps-1.c   (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/sse-movmskps-1.c   (revision 0)
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -mpower8-vector" } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target p8vector_hw } */
+
+#define NO_WARN_X86_INTRINSICS 1
+
+#ifndef CHECK_H
+#define CHECK_H "sse-check.h"
+#endif
+
+#include CHECK_H
+
+#ifndef TEST
+#define TEST sse_test_movmskps_1
+#endif
+
+#include 
+
+static int
+__attribute__((noinline, unused))
+test (__m128 a)
+{
+  return _mm_movemask_ps (a); 
+}
+
+static void
+TEST (void)
+{
+  union128 u;
+  float s[4] = {-2134.3343, 1234.635654, 1.2234, -876.8976};
+  int d;
+  int e = 0;
+  int i;
+
+  u.x = _mm_loadu_ps 

[PATCH, rs6000] 2/3 Add x86 SSE intrinsics to GCC PPC64LE taget

2017-08-16 Thread Steven Munroe
This it part 2/3 for contributing PPC64LE support for X86 SSE
instrisics. This patch includes the new (for PPC) xmmintrin.h and
associated config.gcc changes.

This submission implements all the SSE Technology intrinsic functions
except those associated with directly accessing and updating the MX
Status and Control Register (MXSCR). 

1) The features and layout of the MXSCR is specific to the Intel
Architecture. 
2) Not all the controls and status bits of the MXSCR have equivalents in
the PowerISA's FPSCR. 
3) And using the Posix Floating Point Environments  API is a
better cross platform solution.


./gcc/ChangeLog:

2017-08-16  Steven Munroe  

* config.gcc (powerpc*-*-*): Add xmmintrin.h and mm_malloc.h.
* config/rs6000/xmmintrin.h: New file.
* config/rs6000/x86intrin.h [__ALTIVEC__]: Include xmmintrin.h.



Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 250986)
+++ gcc/config.gcc  (working copy)
@@ -457,6 +457,7 @@ powerpc*-*-*)
extra_objs="rs6000-string.o rs6000-p8swap.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
+   extra_headers="${extra_headers} xmmintrin.h mm_malloc.h"
extra_headers="${extra_headers} mmintrin.h x86intrin.h"
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} paired.h"
Index: gcc/config/rs6000/x86intrin.h
===
--- gcc/config/rs6000/x86intrin.h   (revision 250986)
+++ gcc/config/rs6000/x86intrin.h   (working copy)
@@ -37,6 +37,8 @@
 
 #ifdef __ALTIVEC__
 #include 
+
+#include 
 #endif /* __ALTIVEC__ */
 
 #include 
Index: gcc/config/rs6000/xmmintrin.h
===
--- gcc/config/rs6000/xmmintrin.h   (revision 0)
+++ gcc/config/rs6000/xmmintrin.h   (revision 0)
@@ -0,0 +1,1815 @@
+/* Copyright (C) 2002-2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 9.0.  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.
+
+   In the specific case of X86 SSE (__m128) intrinsics, the PowerPC
+   VMX/VSX ISA is a good match for vector float SIMD operations.
+   However scalar float operations in vector (XMM) registers require
+   the POWER8 VSX ISA (2.07) level. Also there are important
+   differences for data format and placement of float scalars in the
+   vector register. For PowerISA Scalar floats in FPRs (left most
+   64-bits of the low 32 VSRs) is in double format, while X86_64 SSE
+   uses the right most 32-bits of the XMM. These differences require
+   extra steps on POWER to match the SSE scalar float semantics.
+
+   Most SSE scalar float intrinsic operations can be performed more
+   efficiently as C language float scalar operations or optimized to
+   use vector SIMD operations.  We recommend this for new applications.
+
+   Another difference is the format and details of the X86_64 MXSCR vs
+   the PowerISA FPSCR / VSCR registers. We recommend applications
+   replace direct access to the MXSCR with the more portable 
+   Posix APIs. */
+#warning "Please read comment above.  Use -DNO_WARN_X86_INTRINSICS to disable 
this warning."
+#endif
+
+#ifndef _XMMINTRIN_H_INCLUDED
+#define _XMMINTRIN_H_INCLUDED
+
+#include 
+#include 
+
+/* We need type definitions from the MMX header file.  */

[PATCH] Move TYPE_BINFO

2017-08-16 Thread Nathan Sidwell
My excision of TYPE_METHODS frees up the TYPE_MAX_VALUES_RAW slot in 
RECORD_TYPEs.  This patch moves TYPE_BINFO to there.


That move frees up type_non_common::binfo, which I rename to 
type_non_common::lang_1 and adjust TYPE_LANG_SLOT_1 to allow any type 
slot to use it.


TYPE_LANG_SLOT_1 is used in a number of FEs already, and removing it 
looked hard.


I have a plan for the C++ FE's use of that slot in RECORD_TYPE -- so 
hands off!


Applied to trunk.

nathan
--
Nathan Sidwell
2017-08-16  Nathan Sidwell  

	* tree-core.h (tree_type_non_common): Rename binfo to lang_1.
	* tree.h (TYPE_BINFO): Use type_non_common.maxval.
	(TYPE_LANG_SLOT_1): Use type_non_common.lang_1, for any type.
	* tree.c (free_lang_data_in_type): Use else-if chain.  Always
	clear TYPE_LANG_1.  Remove obsolete member-function stripping.
	(find_decls_types_r): Comment about TYPE_MAX_VALUES_RAW.
	(verify_type): Adjust for TYPE_BINFO move.
	* lto-streamer-out.c (DFS::DFS_write_tree_body): No need to
	process TYPE_BINFO directly.
	(hash_tree): Likewise.
	* tree-streamer-in.c (lto_input_ts_type_non_common_tree_pointers):
	Likewise.
	* tree-streamer-out.c (write_ts_type_non_common_tree_pointers):
	Likewise.

	lto/
	* lto.c (mentions_vars_p_type): Use TYPE_LANG_SLOT_1.
	(compare_tree_sccs_1): No need to compare TYPE_BINFO directly.
	(lto_fixup_prevailing_decls): Use TYPE_LANG_SLOT_1.

Index: lto/lto.c
===
--- lto/lto.c	(revision 251119)
+++ lto/lto.c	(working copy)
@@ -651,7 +651,7 @@ mentions_vars_p_type (tree t)
   CHECK_VAR (TYPE_MAX_VALUE_RAW (t));
 
   /* Accessor is for derived node types only. */
-  CHECK_NO_VAR (t->type_non_common.binfo);
+  CHECK_NO_VAR (TYPE_LANG_SLOT_1 (t));
 
   CHECK_VAR (TYPE_CONTEXT (t));
   CHECK_NO_VAR (TYPE_CANONICAL (t));
@@ -1410,7 +1410,6 @@ compare_tree_sccs_1 (tree t1, tree t2, t
 	   f1 || f2;
 	   f1 = TREE_CHAIN (f1), f2 = TREE_CHAIN (f2))
 	compare_tree_edges (f1, f2);
-	  compare_tree_edges (TYPE_BINFO (t1), TYPE_BINFO (t2));
 	}
   else if (code == FUNCTION_TYPE
 	   || code == METHOD_TYPE)
@@ -2584,7 +2583,7 @@ lto_fixup_prevailing_decls (tree t)
 
   LTO_SET_PREVAIL (TYPE_MIN_VALUE_RAW (t));
   LTO_SET_PREVAIL (TYPE_MAX_VALUE_RAW (t));
-  LTO_NO_PREVAIL (t->type_non_common.binfo);
+  LTO_NO_PREVAIL (TYPE_LANG_SLOT_1 (t));
 
   LTO_SET_PREVAIL (TYPE_CONTEXT (t));
 
Index: lto-streamer-out.c
===
--- lto-streamer-out.c	(revision 251119)
+++ lto-streamer-out.c	(working copy)
@@ -837,8 +837,6 @@ DFS::DFS_write_tree_body (struct output_
   if (!POINTER_TYPE_P (expr))
 	DFS_follow_tree_edge (TYPE_MIN_VALUE_RAW (expr));
   DFS_follow_tree_edge (TYPE_MAX_VALUE_RAW (expr));
-  if (RECORD_OR_UNION_TYPE_P (expr))
-	DFS_follow_tree_edge (TYPE_BINFO (expr));
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_LIST))
@@ -1273,8 +1271,6 @@ hash_tree (struct streamer_tree_cache_d
   if (!POINTER_TYPE_P (t))
 	visit (TYPE_MIN_VALUE_RAW (t));
   visit (TYPE_MAX_VALUE_RAW (t));
-  if (RECORD_OR_UNION_TYPE_P (t))
-	visit (TYPE_BINFO (t));
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_LIST))
Index: tree-core.h
===
--- tree-core.h	(revision 251119)
+++ tree-core.h	(working copy)
@@ -1552,7 +1552,7 @@ struct GTY(()) tree_type_non_common {
   tree values;
   tree minval;
   tree maxval;
-  tree binfo;
+  tree lang_1;
 };
 
 struct GTY (()) tree_binfo {
Index: tree-streamer-in.c
===
--- tree-streamer-in.c	(revision 251119)
+++ tree-streamer-in.c	(working copy)
@@ -841,8 +841,6 @@ lto_input_ts_type_non_common_tree_pointe
   if (!POINTER_TYPE_P (expr))
 TYPE_MIN_VALUE_RAW (expr) = stream_read_tree (ib, data_in);
   TYPE_MAX_VALUE_RAW (expr) = stream_read_tree (ib, data_in);
-  if (RECORD_OR_UNION_TYPE_P (expr))
-TYPE_BINFO (expr) = stream_read_tree (ib, data_in);
 }
 
 
Index: tree-streamer-out.c
===
--- tree-streamer-out.c	(revision 251119)
+++ tree-streamer-out.c	(working copy)
@@ -706,8 +706,6 @@ write_ts_type_non_common_tree_pointers (
   if (!POINTER_TYPE_P (expr))
 stream_write_tree (ob, TYPE_MIN_VALUE_RAW (expr), ref_p);
   stream_write_tree (ob, TYPE_MAX_VALUE_RAW (expr), ref_p);
-  if (RECORD_OR_UNION_TYPE_P (expr))
-stream_write_tree (ob, TYPE_BINFO (expr), ref_p);
 }
 
 
Index: tree.c
===
--- tree.c	(revision 251119)
+++ tree.c	(working copy)
@@ -4890,9 +4890,7 @@ free_lang_data_in_type (tree type)
 	 leading to false ODR violation errors when merging two
 	 instances of the same function signature compiled by
 	 different front ends.  */
-  tree p;
-
-  for (p = TYPE_ARG_TYPES (type); p; p = TREE_CHAIN (p))
+  for (tree p = 

[PATCH, rs6000] 1/3 Add x86 SSE <xmmintrin,h> intrinsics to GCC PPC64LE taget

2017-08-16 Thread Steven Munroe
These is the third major contribution of X86 intrinsic equivalent
headers for PPC64LE.

X86 SSE technology was the second SIMD extension which added wider
128-bit vector (XMM) registers and single precision float capability.
They also addressed missing MMX capabilies and provided transfers (move,
pack, unpack) operations between MMX and XMM registers. This was
embodied in the xmmintrin.h> header (in part 2/3). The implementation
also provided the mm_malloc.h API to allow for correct 16-byte alignment
where the system malloc may only provide 8-byte alignment. PowerPC64LE
can assume the PowerPC quadword (16-byte) alignment but we provide this
header and API to ease the application porting process. The mm_malloc.h
header is implicitly included by xmmintrin.h.

In general the SSE (__m128) intrinsic's are a better match to the
PowerISA VMX/VSX 128-bit vector facilities. This allows direct mapping
of the __m128 type to PowerPC __vector float and allows natural handling
of parameter passing return values and SIMD float operations. 

However while both ISA's support float scalars in vector registers the
X86_64 and PowerPC64LE use different formats (and bits within the vector
register) for float scalars. This requires extra PowerISA operations to
exactly match the X86 scalar float (intrinsics ending in *_ss)
semantics. The intent is to provide a functionally correct
implementation at some reduction in performance.

This patch just adds the mm_malloc.h header with is will be needed by
xmmintrin.h and cleans up some noisy warnings from the previous MMX
commit.

Part 2 adds the xmmintrin.h include and associated config.gcc and
x86intrin.h changes

part 3 adds the associated DG test cases.


./gcc/ChangeLog:

2017-08-16  Steven Munroe  

* config/rs6000/mm_malloc.h: New file.


[gcc/testsuite]

2017-07-21  Steven Munroe  

* gcc.target/powerpc/mmx-packuswb-1.c [NO_WARN_X86_INTRINSICS]:
Define. Suppress warning during tests.


Index: gcc/testsuite/gcc.target/powerpc/mmx-packuswb-1.c
===
--- gcc/testsuite/gcc.target/powerpc/mmx-packuswb-1.c   (revision 250986)
+++ gcc/testsuite/gcc.target/powerpc/mmx-packuswb-1.c   (working copy)
@@ -3,6 +3,8 @@
 /* { dg-require-effective-target lp64 } */
 /* { dg-require-effective-target p8vector_hw } */
 
+#define NO_WARN_X86_INTRINSICS 1
+
 #ifndef CHECK_H
 #define CHECK_H "mmx-check.h"
 #endif
Index: gcc/config/rs6000/mm_malloc.h
===
--- gcc/config/rs6000/mm_malloc.h   (revision 0)
+++ gcc/config/rs6000/mm_malloc.h   (revision 0)
@@ -0,0 +1,62 @@
+/* Copyright (C) 2004-2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License
and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not,
see
+   .  */
+
+#ifndef _MM_MALLOC_H_INCLUDED
+#define _MM_MALLOC_H_INCLUDED
+
+#include 
+
+/* We can't depend on  since the prototype of posix_memalign
+   may not be visible.  */
+#ifndef __cplusplus
+extern int posix_memalign (void **, size_t, size_t);
+#else
+extern "C" int posix_memalign (void **, size_t, size_t) throw ();
+#endif
+
+static __inline void *
+_mm_malloc (size_t size, size_t alignment)
+{
+  /* PowerPC64 ELF V2 ABI requires quadword alignment. */
+  size_t vec_align = sizeof (__vector float);
+  /* Linux GLIBC malloc alignment is at least 2 X ptr size.  */
+  size_t malloc_align = (sizeof (void *) + sizeof (void *));
+  void *ptr;
+
+  if (alignment == malloc_align && alignment == vec_align)
+return malloc (size);
+  if (alignment < vec_align)
+alignment = vec_align;
+  if (posix_memalign (, alignment, size) == 0)
+return ptr;
+  else
+return NULL;
+}
+
+static __inline void
+_mm_free (void * ptr)
+{
+  free (ptr);
+}
+
+#endif /* _MM_MALLOC_H_INCLUDED */




[committed] diagnostic-show-locus.c: remove unused field from class colorizer

2017-08-16 Thread David Malcolm
Successfully bootstrapped on x86_64-pc-linux-gnu.

Committed to trunk as r251128.

gcc/ChangeLog:
* diagnostic-show-locus.c (colorizer::m_caret): Remove unused
field.
---
 gcc/diagnostic-show-locus.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
index b0e72e7..3512111 100644
--- a/gcc/diagnostic-show-locus.c
+++ b/gcc/diagnostic-show-locus.c
@@ -97,7 +97,6 @@ class colorizer
   diagnostic_context *m_context;
   diagnostic_t m_diagnostic_kind;
   int m_current_state;
-  const char *m_caret;
   const char *m_range1;
   const char *m_range2;
   const char *m_fixit_insert;
-- 
1.8.5.3



libgo patch committed: Signal/register improvements

2017-08-16 Thread Ian Lance Taylor
This patch by Tony Reix fixes dumpregs on i386 to use the right type,
implements dumpregs for PPC Linux/AIX, and retrieves the PC value on
AIX.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 251006)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-480fdfa9dd416bd17115a94fa6021c4dd805fc39
+f02183eb66f5718769f3f6541dcc6744ae1771c0
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/go-signal.c
===
--- libgo/runtime/go-signal.c   (revision 250873)
+++ libgo/runtime/go-signal.c   (working copy)
@@ -224,6 +224,9 @@ getSiginfo(siginfo_t *info, void *contex
   #ifdef __linux__
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.regs->nip;
   #endif
+  #ifdef _AIX
+   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.jmp_context.iar;
+  #endif
 #endif
 
if (ret.sigpc == 0) {
@@ -282,19 +285,19 @@ dumpregs(siginfo_t *info __attribute__((
{
mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;
 
-   runtime_printf("eax%X\n", m->gregs[REG_EAX]);
-   runtime_printf("ebx%X\n", m->gregs[REG_EBX]);
-   runtime_printf("ecx%X\n", m->gregs[REG_ECX]);
-   runtime_printf("edx%X\n", m->gregs[REG_EDX]);
-   runtime_printf("edi%X\n", m->gregs[REG_EDI]);
-   runtime_printf("esi%X\n", m->gregs[REG_ESI]);
-   runtime_printf("ebp%X\n", m->gregs[REG_EBP]);
-   runtime_printf("esp%X\n", m->gregs[REG_ESP]);
-   runtime_printf("eip%X\n", m->gregs[REG_EIP]);
-   runtime_printf("eflags %X\n", m->gregs[REG_EFL]);
-   runtime_printf("cs %X\n", m->gregs[REG_CS]);
-   runtime_printf("fs %X\n", m->gregs[REG_FS]);
-   runtime_printf("gs %X\n", m->gregs[REG_GS]);
+   runtime_printf("eax%x\n", m->gregs[REG_EAX]);
+   runtime_printf("ebx%x\n", m->gregs[REG_EBX]);
+   runtime_printf("ecx%x\n", m->gregs[REG_ECX]);
+   runtime_printf("edx%x\n", m->gregs[REG_EDX]);
+   runtime_printf("edi%x\n", m->gregs[REG_EDI]);
+   runtime_printf("esi%x\n", m->gregs[REG_ESI]);
+   runtime_printf("ebp%x\n", m->gregs[REG_EBP]);
+   runtime_printf("esp%x\n", m->gregs[REG_ESP]);
+   runtime_printf("eip%x\n", m->gregs[REG_EIP]);
+   runtime_printf("eflags %x\n", m->gregs[REG_EFL]);
+   runtime_printf("cs %x\n", m->gregs[REG_CS]);
+   runtime_printf("fs %x\n", m->gregs[REG_FS]);
+   runtime_printf("gs %x\n", m->gregs[REG_GS]);
  }
  #endif
 #endif
@@ -339,4 +342,37 @@ dumpregs(siginfo_t *info __attribute__((
  }
   #endif
 #endif
+
+#ifdef __PPC__
+  #ifdef __linux__
+ {
+   mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;
+   int i;
+
+   for (i = 0; i < 32; i++)
+   runtime_printf("r%d %X\n", i, m->regs->gpr[i]);
+   runtime_printf("pc  %X\n", m->regs->nip);
+   runtime_printf("msr %X\n", m->regs->msr);
+   runtime_printf("cr  %X\n", m->regs->ccr);
+   runtime_printf("lr  %X\n", m->regs->link);
+   runtime_printf("ctr %X\n", m->regs->ctr);
+   runtime_printf("xer %X\n", m->regs->xer);
+ }
+  #endif
+  #ifdef _AIX
+ {
+   mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;
+   int i;
+
+   for (i = 0; i < 32; i++)
+   runtime_printf("r%d %p\n", i, m->jmp_context.gpr[i]);
+   runtime_printf("pc  %p\n", m->jmp_context.iar);
+   runtime_printf("msr %p\n", m->jmp_context.msr);
+   runtime_printf("cr  %x\n", m->jmp_context.cr);
+   runtime_printf("lr  %p\n", m->jmp_context.lr);
+   runtime_printf("ctr %p\n", m->jmp_context.ctr);
+   runtime_printf("xer %x\n", m->jmp_context.xer);
+ }
+  #endif
+#endif
 }


[PATCH, testsuite]: Adapt c-c++-common/patchable_function_entry-*.c for alpha

2017-08-16 Thread Uros Bizjak
2017-08-16  Uros Bizjak  

* c-c++-common/patchable_function_entry-decl.c (dg-final): Adapt
scan-assembler-times for alpha*-*-*.
* c-c++-common/patchable_function_entry-default.c (dg-final): Ditto.
* c-c++-common/patchable_function_entry-definition.c (dg-final): Ditto.

Tested on alphaev68-linux-gnu and x86_64-linux-gnu.

Committed to mainline SVN.

Uros.
Index: c-c++-common/patchable_function_entry-decl.c
===
--- c-c++-common/patchable_function_entry-decl.c(revision 251122)
+++ c-c++-common/patchable_function_entry-decl.c(working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { ! nvptx*-*-* } } } */
 /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */
-/* { dg-final { scan-assembler-times "nop" 2 } } */
+/* { dg-final { scan-assembler-times "nop" 2 { target { ! alpha*-*-* } } } } */
+/* { dg-final { scan-assembler-times "bis" 2 { target alpha*-*-* } } } */
 
 extern int a;
 
Index: c-c++-common/patchable_function_entry-default.c
===
--- c-c++-common/patchable_function_entry-default.c (revision 251122)
+++ c-c++-common/patchable_function_entry-default.c (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { ! nvptx*-*-* } } } */
 /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */
-/* { dg-final { scan-assembler-times "nop" 3 } } */
+/* { dg-final { scan-assembler-times "nop" 3 { target { ! alpha*-*-* } } } } */
+/* { dg-final { scan-assembler-times "bis" 3 { target alpha*-*-* } } } */
 
 extern int a;
 
Index: c-c++-common/patchable_function_entry-definition.c
===
--- c-c++-common/patchable_function_entry-definition.c  (revision 251122)
+++ c-c++-common/patchable_function_entry-definition.c  (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { ! nvptx*-*-* } } } */
 /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */
-/* { dg-final { scan-assembler-times "nop" 1 } } */
+/* { dg-final { scan-assembler-times "nop" 1 { target { ! alpha*-*-* } } } } */
+/* { dg-final { scan-assembler-times "bis" 1 { target alpha*-*-* } } } */
 
 extern int a;
 


Re: [PATCH] Further driver signal cleanup

2017-08-16 Thread Marek Polacek
On Wed, Aug 16, 2017 at 02:24:10PM -0400, Nathan Sidwell wrote:
> This, almost obvious, patch merges the older SIGPIPE conditional into the
> new switch I added.  I found the FALLTHROUGH marker needed to be outside the
> #if, which was a little annoying.

True, it needs to precede the casea token.  Or you can use gcc_fallthrough ();
which can be inside #if/{}/...

Marek


[PATCH] Further driver signal cleanup

2017-08-16 Thread Nathan Sidwell
This, almost obvious, patch merges the older SIGPIPE conditional into 
the new switch I added.  I found the FALLTHROUGH marker needed to be 
outside the #if, which was a little annoying.


I changed the backtrace error message to also explicitly say it was a 
signal wot did it.


ok?

nathan
--
Nathan Sidwell
2017-08-16  Nathan Sidwell  

* gcc.c (execute): Fold SIGPIPE handling into switch
statement. Adjust internal error message.

Index: gcc.c
===
--- gcc.c   (revision 251119)
+++ gcc.c   (working copy)
@@ -3135,44 +3135,45 @@ execute (void)
int status = statuses[i];
 
if (WIFSIGNALED (status))
- {
+ switch (WTERMSIG (status))
+   {
+   case SIGINT:
+   case SIGQUIT:
+   case SIGKILL:
+   case SIGTERM:
+ /* The user (or environment) did something to the
+inferior.  Making this an ICE confuses the user into
+thinking there's a compiler bug.  Much more likely is
+the user or OOM killer nuked it.  */
+ fatal_error (input_location,
+  "%s signal terminated program %s",
+  strsignal (WTERMSIG (status)),
+  commands[i].prog);
+ break;
+
 #ifdef SIGPIPE
-   /* SIGPIPE is a special case.  It happens in -pipe mode
-  when the compiler dies before the preprocessor is done,
-  or the assembler dies before the compiler is done.
-  There's generally been an error already, and this is
-  just fallout.  So don't generate another error unless
-  we would otherwise have succeeded.  */
-   if (WTERMSIG (status) == SIGPIPE
-   && (signal_count || greatest_status >= MIN_FATAL_STATUS))
- {
-   signal_count++;
-   ret_code = -1;
- }
-   else
-#endif
- switch (WTERMSIG (status))
+   case SIGPIPE:
+ /* SIGPIPE is a special case.  It happens in -pipe mode
+when the compiler dies before the preprocessor is
+done, or the assembler dies before the compiler is
+done.  There's generally been an error already, and
+this is just fallout.  So don't generate another
+error unless we would otherwise have succeeded.  */
+ if (signal_count || greatest_status >= MIN_FATAL_STATUS)
{
-   case SIGINT:
-   case SIGQUIT:
-   case SIGKILL:
-   case SIGTERM:
- /* The user (or environment) did something to the
-inferior.  Making this an ICE confuses the user
-into thinking there's a compiler bug.  Much more
-likely is the user or OOM killer nuked it.  */
- fatal_error (input_location,
-  "%s signal terminated program %s",
-  strsignal (WTERMSIG (status)),
-  commands[i].prog);
+ signal_count++;
+ ret_code = -1;
  break;
-   default:
- /* The inferior failed to catch the signal.  */
- internal_error_no_backtrace ("%s (program %s)",
-  strsignal (WTERMSIG (status)),
-  commands[i].prog);
}
- }
+#endif
+ /* FALLTHROUGH */
+
+   default:
+ /* The inferior failed to catch the signal.  */
+ internal_error_no_backtrace ("%s signal terminated program %s",
+  strsignal (WTERMSIG (status)),
+  commands[i].prog);
+   }
else if (WIFEXITED (status)
 && WEXITSTATUS (status) >= MIN_FATAL_STATUS)
  {


Re: PR81635: Use chrecs to help find related data refs

2017-08-16 Thread Richard Sandiford
"Bin.Cheng"  writes:
> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
>  wrote:
>> "Bin.Cheng"  writes:
>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>>>  wrote:
 The first loop in the testcase regressed after my recent changes to
 dr_analyze_innermost.  Previously we would treat "i" as an iv even
 for bb analysis and end up with:

DR_BASE_ADDRESS: p or q
DR_OFFSET: 0
DR_INIT: 0 or 4
DR_STEP: 16

 We now always keep the step as 0 instead, so for an int "i" we'd have:

DR_BASE_ADDRESS: p or q
DR_OFFSET: (intptr_t) i
DR_INIT: 0 or 4
DR_STEP: 0

 This is also what we'd like to have for the unsigned "i", but the
 problem is that strip_constant_offset thinks that the "i + 1" in
 "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
 The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
 name that holds "(intptr_t) (i + 1)", meaning that the accesses no
 longer seem to be related to the [i] ones.
>>>
>>> Didn't read the change in detail, so sorry if I mis-understood the issue.
>>> I made changes in scev to better fold type conversion by various sources
>>> of information, for example, vrp, niters, undefined overflow behavior etc.
>>> In theory these information should be available for other optimizers without
>>> querying scev.  For the mentioned test, vrp should compute accurate range
>>> information for "i" so that we can fold (intptr_t) (i + 1) it without
>>> worrying
>>> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1
>>> could be more expensive (especially in case of (T)(i + j)), or because the
>>> CST part is in bigger precision after conversion.
>>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such
>>> an interface, we changed tree-affine and made it do aggressive fold.  I am
>>> curious if it's possible to use aff_tree to implement strip_constant_offset
>>> here since aggressive folding is wanted.  After all, using additional chrec
>>> looks like a little heavy wrto the simple test.
>>
>> Yeah, using aff_tree does work here when the bounds are constant.
>> It doesn't look like it works for things like:
>>
>> double p[1000];
>> double q[1000];
>>
>> void
>> f4 (unsigned int n)
>> {
>>   for (unsigned int i = 0; i < n; i += 4)
>> {
>>   double a = q[i] + p[i];
>>   double b = q[i + 1] + p[i + 1];
>>   q[i] = a;
>>   q[i + 1] = b;
>> }
>> }
>>
>> though, where the bounds on the global arrays guarantee that [i + 1] can't
>> overflow, even though "n" is unconstrained.  The patch as posted handles
>> this case too.
> BTW is this a missed optimization in value range analysis?  The range
> information for i should flow in a way like: array boundary -> niters
> -> scev/vrp.
> I think that's what niters/scev do in analysis.

Yeah, maybe :-)  It looks like the problem is that when SLP runs,
the previous VRP pass came before loop header copying, so the (single)
header has to cope with n == 0 case.  Thus we get:

  Visiting statement:
  i_15 = ASSERT_EXPR ;
  Intersecting
[0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  and
[0, 0]
  to
[0, 0]  EQUIVALENCES: { i_6 } (1 elements)
  Intersecting
[0, 0]  EQUIVALENCES: { i_6 } (1 elements)
  and
[0, 1000]
  to
[0, 0]  EQUIVALENCES: { i_6 } (1 elements)
  Found new range for i_15: [0, 0]

  Visiting statement:
  _3 = i_15 + 1;
  Match-and-simplified i_15 + 1 to 1
  Intersecting
[1, 1]
  and
[0, +INF]
  to
[1, 1]
  Found new range for _3: [1, 1]

(where _3 is the index we care about), followed by:

  Visiting statement:
  i_15 = ASSERT_EXPR ;
  Intersecting
[0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  and
[0, 4]
  to
[0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  Intersecting
[0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  and
[0, 1000]
  to
[0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  Found new range for i_15: [0, n_9(D) + 4294967295]

  Visiting statement:
  _3 = i_15 + 1;
  Intersecting
[0, +INF]
  and
[0, +INF]
  to
[0, +INF]
  Found new range for _3: [0, +INF]

I guess in this case it would be better to intersect the i_15 ranges
to [0, 1000] rather than [0, n_9(D) + 4294967295].

It does work if another VRP pass runs after CH.  But even then,
is it a good idea to rely on the range info being kept up-to-date
all the way through to SLP?  A lot happens inbetween.

FWIW, the old simple_iv check that I removed for bb data-ref analysis
relies on SCEV analysis too, so I don't think this is more expensive
than what we had before.

Thanks,
Richard


Re: [42/77] Use scalar_int_mode in simplify_shift_const_1

2017-08-16 Thread Jeff Law
On 07/13/2017 02:53 AM, Richard Sandiford wrote:
> This patch makes simplify_shift_const_1 use scalar_int_modes
> for all code that is specific to scalars rather than vectors.
> This includes situations in which the new shift mode is different
> from the original one, since the function never changes the mode
> of vector shifts.  That in turn makes it more natural to test for
> equal modes in simplify_shift_const_1 rather than try_widen_shift_mode
> (which only applies to scalars).
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * combine.c (try_widen_shift_mode): Move check for equal modes to...
>   (simplify_shift_const_1): ...here.  Use scalar_int_mode for
>   shift_unit_mode and for modes involved in scalar shifts.
OK.
Jeff


Re: [41/77] Split scalar integer handling out of force_to_mode

2017-08-16 Thread Jeff Law
On 07/13/2017 02:53 AM, Richard Sandiford wrote:
> force_to_mode exits partway through for modes that aren't scalar
> integers.  This patch splits the remainder of the function out
> into a subroutine, force_int_to_mode, so that the modes from that
> point on can have type scalar_int_mode.
> 
> The patch also makes sure that xmode is kept up-to-date with x
> and uses xmode instead of GET_MODE (x) throughout.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * combine.c (force_int_to_mode): New function, split out from...
>   (force_to_mode): ...here.  Keep xmode up-to-date and use it
>   instead of GET_MODE (x).
> 
OK.
jeff



Re: [40/77] Use scalar_int_mode for extraction_insn fields

2017-08-16 Thread Jeff Law
On 07/13/2017 02:52 AM, Richard Sandiford wrote:
> insv, extv and eztzv modify or read a field in a register or
> memory.  The field always has a scalar integer mode, while the
> register or memory either has a scalar integer mode or BLKmode.
> The mode of the bit position is also a scalar integer.
> 
> This patch uses the type system to make that explicit.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * optabs-query.h (extraction_insn::struct_mode): Change type to
>   opt_scalar_int_mode and update comment.
>   (extraction_insn::field_mode): Change type to scalar_int_mode.
>   (extraction_insn::pos_mode): Likewise.
>   * combine.c (make_extraction): Update accordingly.
>   * optabs-query.c (get_traditional_extraction_insn): Likewise.
>   (get_optab_extraction_insn): Likewise.
>   * recog.c (simplify_while_replacing): Likewise.
>   * expmed.c (narrow_bit_field_mem): Chane the type of the mode
>   parameter to opt_scalar_int_mode.
s/Chane/Change/ in the ChangeLog.

OK.
jeff


Re: [39/77] Two changes to the get_best_mode interface

2017-08-16 Thread Jeff Law
On 07/13/2017 02:52 AM, Richard Sandiford wrote:
> get_best_mode always returns a scalar_int_mode on success,
> so this patch makes that explicit in the type system.  Also,
> the "largest_mode" argument is used simply to provide a maximum
> size, and in practice that size is always a compile-time constant,
> even when the concept of variable-sized modes is added later.
> The patch therefore passes the size directly.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * machmode.h (bit_field_mode_iterator::next_mode): Take a pointer
>   to a scalar_int_mode instead of a machine_mode.
>   (bit_field_mode_iterator::m_mode): Change type to opt_scalar_int_mode.
>   (get_best_mode): Return a boolean and use a pointer argument to store
>   the selected mode.  Replace the limit mode parameter with a bit limit.
>   * expmed.c (adjust_bit_field_mem_for_reg): Use scalar_int_mode
>   for the values returned by bit_field_mode_iterator::next_mode.
>   (store_bit_field): Update call to get_best_mode.
>   (store_fixed_bit_field): Likewise.
>   (extract_fixed_bit_field): Likewise.
>   * expr.c (optimize_bitfield_assignment_op): Likewise.
>   * fold-const.c (optimize_bit_field_compare): Likewise.
>   (fold_truth_andor_1): Likewise.
>   * stor-layout.c (bit_field_mode_iterator::next_mode): As above.
>   Update for new type of m_mode.
>   (get_best_mode): As above.
> 
OK.
jeff


Re: [36/77] Use scalar_int_mode in the RTL iv routines

2017-08-16 Thread Jeff Law
On 07/13/2017 02:51 AM, Richard Sandiford wrote:
> This patch changes the iv modes in rtx_iv from machine_mode
> to scalar_int_mode.  It also passes the mode of the iv down
> to subroutines; this avoids the previous situation in which
> the mode information was sometimes lost and had to be added
> by the caller on return.
> 
> Some routines already took a mode argument, but the patch
> tries to standardise on passing it immediately before the
> argument it describes.
> 
> gcc/
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
>   * cfgloop.h (rtx_iv): Change type of extend_mode and mode to
>   scalar_int_mode.
>   (niter_desc): Likewise mode.
>   (iv_analyze): Add a mode parameter.
>   (biv_p): Likewise.
>   (iv_analyze_expr): Pass the mode paraeter before the rtx it describes
>   and change its type to scalar_int_mode.
>   * loop-iv.c: Update commentary at head of file.
>   (iv_constant): Pass the mode paraeter before the rtx it describes
>   and change its type to scalar_int_mode.  Remove VOIDmode handling.
>   (iv_subreg): Change the type of the mode parameter to scalar_int_mode.
>   (iv_extend): Likewise.
>   (shorten_into_mode): Likewise.
>   (iv_add): Use scalar_int_mode.
>   (iv_mult): Likewise.
>   (iv_shift): Likewise.
>   (canonicalize_iv_subregs): Likewise.
>   (get_biv_step_1): Pass the outer_mode parameter before the rtx
>   it describes and change its mode to scalar_int_mode.   Also change
>   the type of the returned inner_mode to scalar_int_mode.
>   (get_biv_step): Likewise, turning outer_mode from a pointer
>   into a direct parameter.  Update call to get_biv_step_1.
>   (iv_analyze_biv): Add an outer_mode parameter.  Update calls to
>   iv_constant and get_biv_step.
>   (iv_analyze_expr): Pass the mode parameter before the rtx it describes
>   and change its type to scalar_int_mode.  Don't initialise iv->mode
>   to VOIDmode and remove later checks for its still being VOIDmode.
>   Update calls to iv_analyze_op and iv_analyze_expr.  Check
>   is_a  when changing the mode under consideration.
>   (iv_analyze_def): Ignore registers that don't have a scalar_int_mode.
>   Update call to iv_analyze_expr.
>   (iv_analyze_op): Add a mode parameter.  Reject subregs whose
>   inner register is not also a scalar_int_mode.  Update call to
>   iv_analyze_biv.
>   (iv_analyze): Add a mode parameter.  Update call to iv_analyze_op.
>   (biv_p): Add a mode parameter.  Update call to iv_analyze_biv.
>   (iv_number_of_iterations): Use is_a  instead of
>   separate mode class checks.  Update calls to iv_analyze.  Remove
>   fix-up of VOIDmodes after iv_analyze_biv.
>   * loop-unroll.c (analyze_iv_to_split_insn): Reject registers that
>   don't have a scalar_int_mode.  Update call to biv_p.
OK.
jeff


[PATCH] use strnlen in pretty printer for "%.*s" (PR 81859)

2017-08-16 Thread Martin Sebor

Bug 81859 points out that my fix for bug 81586 wasn't quite
right (or complete):  the argument of a %.*s directive need
not be a nul-terminated string when the precision is less
than the initialized size of the array the argument points
to.  The attached tweak uses strnlen to avoid reading past
the end of a non-nul terminated array.

The patch has been tested on x86_64-linux and by running
self tests under Valgrind.  I'll go ahead and commit it as
obvious sometime later today if there are no objections in
the meantime.

Martin

PR c/81859 - [8 Regression] valgrind error from warn_about_normalization

gcc/ChangeLog:

	PR c/81859
	* pretty-print.c (pp_format): Use strnlen in %.*s to avoid reading
	past the end of an array.
	(test_pp_format): Add test cases.

Index: gcc/pretty-print.c
===
--- gcc/pretty-print.c	(revision 251100)
+++ gcc/pretty-print.c	(working copy)
@@ -668,15 +668,11 @@ pp_format (pretty_printer *pp, text_info *text)
 
 	s = va_arg (*text->args_ptr, const char *);
 
-	/* Negative precision is treated as if it were omitted.  */
-	if (n < 0)
-	  n = INT_MAX;
+	/* Append the lesser of precision and strlen (s) characters
+	   from the array (which need not be a nul-terminated string).
+	   Negative precision is treated as if it were omitted.  */
+	size_t len = n < 0 ? strlen (s) : strnlen (s, n);
 
-	/* Append the lesser of precision and strlen (s) characters.  */
-	size_t len = strlen (s);
-	if ((unsigned) n < len)
-	  len = n;
-
 	pp_append_text (pp, s, s + len);
 	  }
 	  break;
@@ -1438,6 +1434,13 @@ test_pp_format ()
   ASSERT_PP_FORMAT_2 ("A 12345678", "%c %x", 'A', 0x12345678);
   ASSERT_PP_FORMAT_2 ("hello world 12345678", "%s %x", "hello world",
 		  0x12345678);
+
+  /* Not nul-terminated.  */
+  char arr[5] = { '1', '2', '3', '4', '5' };
+  ASSERT_PP_FORMAT_2 ("123", "%.*s", 3, arr);
+  ASSERT_PP_FORMAT_2 ("1234", "%.*s", -1, "1234");
+  ASSERT_PP_FORMAT_2 ("12345", "%.*s", 7, "12345");
+
   /* We can't test for %p; the pointer is printed in an implementation-defined
  manner.  */
   ASSERT_PP_FORMAT_2 ("normal colored normal 12345678",


Re: PR81635: Use chrecs to help find related data refs

2017-08-16 Thread Bin.Cheng
On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
 wrote:
> "Bin.Cheng"  writes:
>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>>  wrote:
>>> The first loop in the testcase regressed after my recent changes to
>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>>> for bb analysis and end up with:
>>>
>>>DR_BASE_ADDRESS: p or q
>>>DR_OFFSET: 0
>>>DR_INIT: 0 or 4
>>>DR_STEP: 16
>>>
>>> We now always keep the step as 0 instead, so for an int "i" we'd have:
>>>
>>>DR_BASE_ADDRESS: p or q
>>>DR_OFFSET: (intptr_t) i
>>>DR_INIT: 0 or 4
>>>DR_STEP: 0
>>>
>>> This is also what we'd like to have for the unsigned "i", but the
>>> problem is that strip_constant_offset thinks that the "i + 1" in
>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>>> longer seem to be related to the [i] ones.
>>
>> Didn't read the change in detail, so sorry if I mis-understood the issue.
>> I made changes in scev to better fold type conversion by various sources
>> of information, for example, vrp, niters, undefined overflow behavior etc.
>> In theory these information should be available for other optimizers without
>> querying scev.  For the mentioned test, vrp should compute accurate range
>> information for "i" so that we can fold (intptr_t) (i + 1) it without 
>> worrying
>> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1
>> could be more expensive (especially in case of (T)(i + j)), or because the
>> CST part is in bigger precision after conversion.
>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such
>> an interface, we changed tree-affine and made it do aggressive fold.  I am
>> curious if it's possible to use aff_tree to implement strip_constant_offset
>> here since aggressive folding is wanted.  After all, using additional chrec
>> looks like a little heavy wrto the simple test.
>
> Yeah, using aff_tree does work here when the bounds are constant.
> It doesn't look like it works for things like:
>
> double p[1000];
> double q[1000];
>
> void
> f4 (unsigned int n)
> {
>   for (unsigned int i = 0; i < n; i += 4)
> {
>   double a = q[i] + p[i];
>   double b = q[i + 1] + p[i + 1];
>   q[i] = a;
>   q[i + 1] = b;
> }
> }
>
> though, where the bounds on the global arrays guarantee that [i + 1] can't
> overflow, even though "n" is unconstrained.  The patch as posted handles
> this case too.
BTW is this a missed optimization in value range analysis?  The range
information for i should flow in a way like: array boundary -> niters
-> scev/vrp.
I think that's what niters/scev do in analysis.

Thanks,
bin
>
> Thanks,
> Richard
>
>>
>> Thanks,
>> bin
>>>
>>> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type
>>> expects DR_OFFSET and DR_INIT to be generic expressions whose sum
>>> gives the initial offset from DR_BASE_ADDRESS.  The other type uses
>>> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data
>>> references, with the distance between their start addresses being
>>> the difference between their DR_INITs.  For this second type, the
>>> exact form of DR_OFFSET doesn't really matter, and the more it is
>>> able to canonicalise the non-constant offset, the better.
>>>
>>> The patch fixes the PR by adding another offset/init pair to the
>>> innermost loop behaviour for this second group.  The new pair use chrecs
>>> rather than generic exprs for the offset part, with the chrec being
>>> analysed in the innermost loop for which the offset isn't invariant.
>>> This allows us to vectorise the second function in the testcase
>>> as well, which wasn't possible before the earlier patch.
>>>
>>> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?
>>>
>>> Richard
>>>
>>>
>>> gcc/
>>> PR tree-optimization/81635
>>> * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and
>>> chrec_init.
>>> (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.
>>> (dr_equal_offsets_p): Delete.
>>> (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.
>>> * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
>>> (split_constant_offset): Handle POLYNOMIAL_CHREC.
>>> (dr_analyze_innermost): Initialize dr_chrec_offset and 
>>> dr_chrec_init.
>>> (operator ==): Use dr_chrec_offsets_equal_p and compare the
>>> DR_CHREC_INITs.
>>> (prune_runtime_alias_test_list): Likewise.
>>> (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and 
>>> compare
>>> the DR_CHREC_INITs.
>>> (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
>>> (analyze_offset_scev): New function.

Re: PR81635: Use chrecs to help find related data refs

2017-08-16 Thread Richard Sandiford
"Bin.Cheng"  writes:
> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>  wrote:
>> The first loop in the testcase regressed after my recent changes to
>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>> for bb analysis and end up with:
>>
>>DR_BASE_ADDRESS: p or q
>>DR_OFFSET: 0
>>DR_INIT: 0 or 4
>>DR_STEP: 16
>>
>> We now always keep the step as 0 instead, so for an int "i" we'd have:
>>
>>DR_BASE_ADDRESS: p or q
>>DR_OFFSET: (intptr_t) i
>>DR_INIT: 0 or 4
>>DR_STEP: 0
>>
>> This is also what we'd like to have for the unsigned "i", but the
>> problem is that strip_constant_offset thinks that the "i + 1" in
>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>> longer seem to be related to the [i] ones.
>
> Didn't read the change in detail, so sorry if I mis-understood the issue.
> I made changes in scev to better fold type conversion by various sources
> of information, for example, vrp, niters, undefined overflow behavior etc.
> In theory these information should be available for other optimizers without
> querying scev.  For the mentioned test, vrp should compute accurate range
> information for "i" so that we can fold (intptr_t) (i + 1) it without worrying
> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1
> could be more expensive (especially in case of (T)(i + j)), or because the
> CST part is in bigger precision after conversion.
> But such folding is wanted in several places, e.g, IVOPTs.  To provide such
> an interface, we changed tree-affine and made it do aggressive fold.  I am
> curious if it's possible to use aff_tree to implement strip_constant_offset
> here since aggressive folding is wanted.  After all, using additional chrec
> looks like a little heavy wrto the simple test.

Yeah, using aff_tree does work here when the bounds are constant.
It doesn't look like it works for things like:

double p[1000];
double q[1000];

void
f4 (unsigned int n)
{
  for (unsigned int i = 0; i < n; i += 4)
{
  double a = q[i] + p[i];
  double b = q[i + 1] + p[i + 1];
  q[i] = a;
  q[i + 1] = b;
}
}

though, where the bounds on the global arrays guarantee that [i + 1] can't
overflow, even though "n" is unconstrained.  The patch as posted handles
this case too.

Thanks,
Richard

>
> Thanks,
> bin
>>
>> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type
>> expects DR_OFFSET and DR_INIT to be generic expressions whose sum
>> gives the initial offset from DR_BASE_ADDRESS.  The other type uses
>> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data
>> references, with the distance between their start addresses being
>> the difference between their DR_INITs.  For this second type, the
>> exact form of DR_OFFSET doesn't really matter, and the more it is
>> able to canonicalise the non-constant offset, the better.
>>
>> The patch fixes the PR by adding another offset/init pair to the
>> innermost loop behaviour for this second group.  The new pair use chrecs
>> rather than generic exprs for the offset part, with the chrec being
>> analysed in the innermost loop for which the offset isn't invariant.
>> This allows us to vectorise the second function in the testcase
>> as well, which wasn't possible before the earlier patch.
>>
>> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> gcc/
>> PR tree-optimization/81635
>> * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and
>> chrec_init.
>> (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.
>> (dr_equal_offsets_p): Delete.
>> (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.
>> * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
>> (split_constant_offset): Handle POLYNOMIAL_CHREC.
>> (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.
>> (operator ==): Use dr_chrec_offsets_equal_p and compare the
>> DR_CHREC_INITs.
>> (prune_runtime_alias_test_list): Likewise.
>> (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare
>> the DR_CHREC_INITs.
>> (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
>> (analyze_offset_scev): New function.
>> (compute_offset_chrecs): Likewise.
>> (dr_chrec_offsets_equal_p): Likewise.
>> (dr_chrec_offsets_compare): Likewise.
>> * tree-loop-distribution.c (compute_alias_check_pairs): Use
>> dr_chrec_offsets_compare.
>> * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
>> dr_chrec_offsets_compare and compare the DR_CHREC_INITs.
>> (dr_group_sort_cmp): Likewise.
>> 

[PATCH, i386]: Allow memory operands for btr/bts and btc (PR target/46091)

2017-08-16 Thread Uros Bizjak
Hello!

Attached patch allows memory operands for btr/bts and btc
instructions.  The previous comment was wrong, these instructions do
not enforce atomic operations with memory operand without "lock"
prefix. Instructions in its RMW form with immediate operand are not
slower than corresponding logic instructions.

(When variable count operand is used with RMW form, these instructions
are considerably slower, since they can address full 32bit address
range from their base operand.)

2017-08-16  Uros Bizjak  

PR target/46091
* config/i386/i386.md (*anddi_1_btr): Change predicates of
operand 0 and operand 1 to nomimmediate_operand. Add "m" constraint.
Add ix86_binary_operator_ok to insn constraint.
(*iordi_1_bts): Ditto.
(*xordi_1_btc): Ditto.
(*btsq): Change predicate of operand 0 to nonimmediate_operand.
Update corresponding peephole2 pattern.
(*btrq): Ditto.
(*btcq): Ditto.

testsuite/ChangeLog:

2017-08-16  Uros Bizjak  

PR target/46091
* gcc.target/i386/pr46091-1.c: Update scan-assembler-times.
(testm): New test function.
* gcc.target/i386/pr46091-2.c: Ditto.
* gcc.target/i386/pr46091-3.c: Ditto.

Patch was bootstrapped and regression tested on x86_64-linux-gnu.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 059a51832de..2ad3ad7d216 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8268,12 +8268,13 @@
(set_attr "mode" "SI,DI,DI,SI")])
 
 (define_insn_and_split "*anddi_1_btr"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=rm")
(and:DI
-(match_operand:DI 1 "register_operand" "%0")
+(match_operand:DI 1 "nonimmediate_operand" "%0")
 (match_operand:DI 2 "const_int_operand" "n")))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && TARGET_USE_BT
+   && ix86_binary_operator_ok (AND, DImode, operands)
&& IN_RANGE (exact_log2 (~INTVAL (operands[2])), 31, 63)"
   "#"
   "&& reload_completed"
@@ -8813,12 +8814,13 @@
(set_attr "mode" "")])
 
 (define_insn_and_split "*iordi_1_bts"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=rm")
(ior:DI
-(match_operand:DI 1 "register_operand" "%0")
+(match_operand:DI 1 "nonimmediate_operand" "%0")
 (match_operand:DI 2 "const_int_operand" "n")))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && TARGET_USE_BT
+   && ix86_binary_operator_ok (IOR, DImode, operands)
&& IN_RANGE (exact_log2 (INTVAL (operands[2])), 31, 63)"
   "#"
   "&& reload_completed"
@@ -8834,12 +8836,13 @@
(set_attr "mode" "DI")])
 
 (define_insn_and_split "*xordi_1_btc"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=rm")
(xor:DI
-(match_operand:DI 1 "register_operand" "%0")
+(match_operand:DI 1 "nonimmediate_operand" "%0")
 (match_operand:DI 2 "const_int_operand" "n")))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && TARGET_USE_BT
+   && ix86_binary_operator_ok (XOR, DImode, operands)
&& IN_RANGE (exact_log2 (INTVAL (operands[2])), 31, 63)"
   "#"
   "&& reload_completed"
@@ -10996,10 +10999,10 @@
 ;; Bit set / bit test instructions
 
 ;; %%% bts, btr, btc, bt.
-;; In general these instructions are *slow* when applied to memory,
-;; since they enforce atomic operation.  When applied to registers,
-;; it depends on the cpu implementation.  They're never faster than
-;; the corresponding and/ior/xor operations, so with 32-bit there's
+;; In general these instructions are *slow* with variable operand
+;; when applied to memory.  When applied to registers, it depends
+;; on the cpu implementation.  They're never faster than the
+;; corresponding and/ior/xor operations, so with 32-bit there's
 ;; no point.  But in 64-bit, we can't hold the relevant immediates
 ;; within the instruction itself, so operating on bits in the high
 ;; 32-bits of a register becomes easier.
@@ -11009,7 +11012,7 @@
 ;; negdf respectively, so they can never be disabled entirely.
 
 (define_insn "*btsq"
-  [(set (zero_extract:DI (match_operand:DI 0 "register_operand" "+r")
+  [(set (zero_extract:DI (match_operand:DI 0 "nonimmediate_operand" "+rm")
 (const_int 1)
 (match_operand 1 "const_0_to_63_operand" "J"))
(const_int 1))
@@ -11022,7 +11025,7 @@
(set_attr "mode" "DI")])
 
 (define_insn "*btrq"
-  [(set (zero_extract:DI (match_operand:DI 0 "register_operand" "+r")
+  [(set (zero_extract:DI (match_operand:DI 0 "nonimmediate_operand" "+rm")
 (const_int 1)
 (match_operand 1 "const_0_to_63_operand" "J"))
(const_int 0))
@@ -11035,7 +11038,7 @@
(set_attr "mode" "DI")])
 
 (define_insn "*btcq"
-  [(set (zero_extract:DI (match_operand:DI 0 

Re: c-family PATCH to improve -Wtautological-compare (PR c/81783)

2017-08-16 Thread Marek Polacek
On Wed, Aug 16, 2017 at 11:07:36AM -0400, David Malcolm wrote:
> On Wed, 2017-08-16 at 16:29 +0200, Marek Polacek wrote:
> > This patch improves -Wtautological-compare so that it also detects
> > bitwise comparisons involving & and | that are always true or false,
> > e.g.
> > 
> >   if ((a & 16) == 10)
> > return 1;
> > 
> > can never be true.  Note that e.g. "(a & 9) == 8" is *not* always
> > false
> > or true.
> > 
> > I think it's pretty straightforward with one snag: we shouldn't warn
> > if
> > the constant part of the bitwise operation comes from a macro, but
> > currently
> > that's not possible, so I XFAILed this in the new test.
> 
> Maybe I'm missing something here, but why shouldn't it warn when the
> constant comes from a macro?

Just my past experience.  Sometimes you can't really control the macro
and then you get annoying warnings.

E.g. I had to tweak the warning that warns about if (i == i) to not warn about
  
  #define N 2
  if (a[N] == a[2]) {}

because that gave bogus warning during bootstrap, if I recall well.

> At the end of your testcase you have this example:
> 
> #define N 0x10
>   if ((a & N) == 10) /* { dg-bogus "bitwise comparison always evaluates to 
> false" "" { xfail *-*-* } } */
>  return 1;
>   if ((a | N) == 10) /* { dg-bogus "bitwise comparison always evaluates to 
> false" "" { xfail *-*-* } } */
>return 1;
> 
> That code looks bogus to me (and if the defn of "N" is further away,
> it's harder to spot that it's wrong): shouldn't we warn about it?

I'm glad you think so.  More than happy to make it an expected warning.

> > This has found one issue in the GCC codebase and it's a genuine bug:
> > .  
> 
> In this example GOVD_WRITTEN is from an enum, not a macro, but if
> GOVD_WRITTEN had been a macro, shouldn't we still issue a warning?

I feel like we should, but some might feel otherwise.

Thanks,

Marek


Re: c-family PATCH to improve -Wtautological-compare (PR c/81783)

2017-08-16 Thread Eric Gallager
On 8/16/17, David Malcolm  wrote:
> On Wed, 2017-08-16 at 16:29 +0200, Marek Polacek wrote:
>> This patch improves -Wtautological-compare so that it also detects
>> bitwise comparisons involving & and | that are always true or false,
>> e.g.
>>
>>   if ((a & 16) == 10)
>> return 1;
>>
>> can never be true.  Note that e.g. "(a & 9) == 8" is *not* always
>> false
>> or true.
>>
>> I think it's pretty straightforward with one snag: we shouldn't warn
>> if
>> the constant part of the bitwise operation comes from a macro, but
>> currently
>> that's not possible, so I XFAILed this in the new test.
>
> Maybe I'm missing something here, but why shouldn't it warn when the
> constant comes from a macro?
>
> At the end of your testcase you have this example:
>
> #define N 0x10
>   if ((a & N) == 10) /* { dg-bogus "bitwise comparison always evaluates to
> false" "" { xfail *-*-* } } */
>  return 1;
>   if ((a | N) == 10) /* { dg-bogus "bitwise comparison always evaluates to
> false" "" { xfail *-*-* } } */
>return 1;
>
> That code looks bogus to me (and if the defn of "N" is further away,
> it's harder to spot that it's wrong): shouldn't we warn about it?
>

What about:

#ifdef SOME_PREPROCESSOR_CONDITIONAL
# define N 0x10
#else
# define N 0x11
#endif

or

#define N __LINE__

or

#define N __COUNTER__

or something else like that?

>>
>> This has found one issue in the GCC codebase and it's a genuine bug:
>> .
>
> In this example GOVD_WRITTEN is from an enum, not a macro, but if
> GOVD_WRITTEN had been a macro, shouldn't we still issue a warning?
>
> Hope this is constructive
> Dave
>
>> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>>
>> 2017-08-16  Marek Polacek  
>>
>>  PR c/81783
>>  * c-warn.c (warn_tautological_bitwise_comparison): New
>> function.
>>  (warn_tautological_cmp): Call it.
>>
>>  * doc/invoke.texi: Update -Wtautological-compare documentation.
>>
>>  * c-c++-common/Wtautological-compare-5.c: New test.
>>
>> diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
>> index 9c3073444cf..0749d16a50f 100644
>> --- gcc/c-family/c-warn.c
>> +++ gcc/c-family/c-warn.c
>> @@ -321,6 +321,59 @@ find_array_ref_with_const_idx_r (tree *expr_p,
>> int *, void *)
>>return NULL_TREE;
>>  }
>>
>> +/* Subroutine of warn_tautological_cmp.  Warn about bitwise
>> comparison
>> +   that always evaluate to true or false.  LOC is the location of
>> the
>> +   ==/!= comparison specified by CODE; LHS and RHS are the usual
>> operands
>> +   of this comparison.  */
>> +
>> +static void
>> +warn_tautological_bitwise_comparison (location_t loc, tree_code
>> code,
>> +  tree lhs, tree rhs)
>> +{
>> +  if (code != EQ_EXPR && code != NE_EXPR)
>> +return;
>> +
>> +  /* Extract the operands from e.g. (x & 8) == 4.  */
>> +  tree bitop;
>> +  tree cst;
>> +  if ((TREE_CODE (lhs) == BIT_AND_EXPR
>> +   || TREE_CODE (lhs) == BIT_IOR_EXPR)
>> +  && TREE_CODE (rhs) == INTEGER_CST)
>> +bitop = lhs, cst = rhs;
>> +  else if ((TREE_CODE (rhs) == BIT_AND_EXPR
>> +|| TREE_CODE (rhs) == BIT_IOR_EXPR)
>> +   && TREE_CODE (lhs) == INTEGER_CST)
>> +bitop = rhs, cst = lhs;
>> +  else
>> +return;
>> +
>> +  tree bitopcst;
>> +  if (TREE_CODE (TREE_OPERAND (bitop, 0)) == INTEGER_CST)
>> +bitopcst = TREE_OPERAND (bitop, 0);
>> +  else if (TREE_CODE (TREE_OPERAND (bitop, 1)) == INTEGER_CST)
>> +bitopcst = TREE_OPERAND (bitop, 1);
>> +  else
>> +return;
>> +
>> +  wide_int res;
>> +  if (TREE_CODE (bitop) == BIT_AND_EXPR)
>> +res = wi::bit_and (bitopcst, cst);
>> +  else
>> +res = wi::bit_or (bitopcst, cst);
>> +
>> +  /* For BIT_AND only warn if (CST2 & CST1) != CST1, and
>> + for BIT_OR only if (CST2 | CST1) != CST1.  */
>> +  if (res == cst)
>> +return;
>> +
>> +  if (code == EQ_EXPR)
>> +warning_at (loc, OPT_Wtautological_compare,
>> +"bitwise comparison always evaluates to false");
>> +  else
>> +warning_at (loc, OPT_Wtautological_compare,
>> +"bitwise comparison always evaluates to true");
>> +}
>> +
>>  /* Warn if a self-comparison always evaluates to true or false.  LOC
>> is the location of the comparison with code CODE, LHS and RHS are
>> operands of the comparison.  */
>> @@ -337,6 +390,8 @@ warn_tautological_cmp (location_t loc, enum
>> tree_code code, tree lhs, tree rhs)
>>|| from_macro_expansion_at (EXPR_LOCATION (rhs)))
>>  return;
>>
>> +  warn_tautological_bitwise_comparison (loc, code, lhs, rhs);
>> +
>>/* We do not warn for constants because they are typical of macro
>>   expansions that test for features, sizeof, and similar.  */
>>if (CONSTANT_CLASS_P (fold_for_warn (lhs))
>> diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
>> index ec29f1d629e..72a16a19711 100644
>> --- gcc/doc/invoke.texi
>> +++ gcc/doc/invoke.texi
>> @@ -5484,6 

Re: [AArch64, PATCH] Improve Neon store of zero

2017-08-16 Thread Jackson Woodruff

Hi Richard,

I have changed the condition as you suggest below. OK for trunk?

Jackson.

On 08/11/2017 02:56 PM, Richard Earnshaw (lists) wrote:


On 10/08/17 14:12, Jackson Woodruff wrote:

Hi all,

This patch changes patterns in aarch64-simd.md to replace

 moviv0.4s, 0
 strq0, [x0, 16]

With:

 stp xzr, xzr, [x0, 16]

When we are storing zeros to vectors like this:

 void f(uint32x4_t *p) {
   uint32x4_t x = { 0, 0, 0, 0};
   p[1] = x;
 }

Bootstrapped and regtested on aarch64 with no regressions.
OK for trunk?

Jackson

gcc/

2017-08-09  Jackson Woodruff  

 * aarch64-simd.md (mov): No longer force zero
 immediate into register.
 (*aarch64_simd_mov): Add new case for stp
 using zero immediate.


gcc/testsuite

2017-08-09  Jackson Woodruff  

 * gcc.target/aarch64/simd/neon_str_zero.c: New.


patchfile


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
74de9b8c89dd5e4e3d87504594c969de0e0128ce..0149a742d34ae4fd5b3fd705b03c845f94aa1d59
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -23,7 +23,10 @@
(match_operand:VALL_F16 1 "general_operand" ""))]
"TARGET_SIMD"
"
-if (GET_CODE (operands[0]) == MEM)
+if (GET_CODE (operands[0]) == MEM
+   && !(aarch64_simd_imm_zero (operands[1], mode)
+&& aarch64_legitimate_address_p (mode, operands[0],
+ PARALLEL, 1)))
operands[1] = force_reg (mode, operands[1]);
"
  )
@@ -94,63 +97,70 @@
  
  (define_insn "*aarch64_simd_mov"

[(set (match_operand:VD 0 "nonimmediate_operand"
-   "=w, m,  w, ?r, ?w, ?r, w")
+   "=w, m,  m,  w, ?r, ?w, ?r, w")
(match_operand:VD 1 "general_operand"
-   "m,  w,  w,  w,  r,  r, Dn"))]
+   "m,  Dz, w,  w,  w,  r,  r, Dn"))]
"TARGET_SIMD
-   && (register_operand (operands[0], mode)
-   || register_operand (operands[1], mode))"
+   && ((register_operand (operands[0], mode)
+   || register_operand (operands[1], mode))
+  || (memory_operand (operands[0], mode)
+ && immediate_operand (operands[1], mode)))"

Allowing any immediate here seems too lax - it allows any immediate
value which then could cause reload operations to be inserted (that in
turn might cause register pressure calculations to be incorrect).
Wouldn't it be better to use something like aarch64_simd_reg_or_zero?
Similarly below.

R.


  {
 switch (which_alternative)
   {
   case 0: return "ldr\\t%d0, %1";
- case 1: return "str\\t%d1, %0";
- case 2: return "mov\t%0., %1.";
- case 3: return "umov\t%0, %1.d[0]";
- case 4: return "fmov\t%d0, %1";
- case 5: return "mov\t%0, %1";
- case 6:
+ case 1: return "str\\txzr, %0";
+ case 2: return "str\\t%d1, %0";
+ case 3: return "mov\t%0., %1.";
+ case 4: return "umov\t%0, %1.d[0]";
+ case 5: return "fmov\t%d0, %1";
+ case 6: return "mov\t%0, %1";
+ case 7:
return aarch64_output_simd_mov_immediate (operands[1],
  mode, 64);
   default: gcc_unreachable ();
   }
  }
-  [(set_attr "type" "neon_load1_1reg, neon_store1_1reg,\
+  [(set_attr "type" "neon_load1_1reg, neon_stp, neon_store1_1reg,\
 neon_logic, neon_to_gp, f_mcr,\
 mov_reg, neon_move")]
  )
  
  (define_insn "*aarch64_simd_mov"

[(set (match_operand:VQ 0 "nonimmediate_operand"
-   "=w, m,  w, ?r, ?w, ?r, w")
+   "=w, Ump,  m,  w, ?r, ?w, ?r, w")
(match_operand:VQ 1 "general_operand"
-   "m,  w,  w,  w,  r,  r, Dn"))]
+   "m,  Dz, w,  w,  w,  r,  r, Dn"))]
"TARGET_SIMD
-   && (register_operand (operands[0], mode)
-   || register_operand (operands[1], mode))"
+   && ((register_operand (operands[0], mode)
+   || register_operand (operands[1], mode))
+   || (memory_operand (operands[0], mode)
+  && immediate_operand (operands[1], mode)))"
  {
switch (which_alternative)
  {
  case 0:
return "ldr\\t%q0, %1";
  case 1:
-   return "str\\t%q1, %0";
+   return "stp\\txzr, xzr, %0";
  case 2:
-   return "mov\t%0., %1.";
+   return "str\\t%q1, %0";
  case 3:
+   return "mov\t%0., %1.";
  case 4:
  case 5:
-   return "#";
  case 6:
+   return "#";
+case 7:
return aarch64_output_simd_mov_immediate (operands[1], mode, 128);
  default:
gcc_unreachable ();
  }
  }
[(set_attr "type" "neon_load1_1reg, neon_store1_1reg,\
- neon_logic, multiple, multiple, multiple,\
- neon_move")
-   (set_attr "length" "4,4,4,8,8,8,4")]
+neon_stp, neon_logic, multiple, multiple,\
+multiple, neon_move")
+   (set_attr 

Re: c-family PATCH to improve -Wtautological-compare (PR c/81783)

2017-08-16 Thread David Malcolm
On Wed, 2017-08-16 at 16:29 +0200, Marek Polacek wrote:
> This patch improves -Wtautological-compare so that it also detects
> bitwise comparisons involving & and | that are always true or false,
> e.g.
> 
>   if ((a & 16) == 10)
> return 1;
> 
> can never be true.  Note that e.g. "(a & 9) == 8" is *not* always
> false
> or true.
> 
> I think it's pretty straightforward with one snag: we shouldn't warn
> if
> the constant part of the bitwise operation comes from a macro, but
> currently
> that's not possible, so I XFAILed this in the new test.

Maybe I'm missing something here, but why shouldn't it warn when the
constant comes from a macro?

At the end of your testcase you have this example:

#define N 0x10
  if ((a & N) == 10) /* { dg-bogus "bitwise comparison always evaluates to 
false" "" { xfail *-*-* } } */
 return 1;
  if ((a | N) == 10) /* { dg-bogus "bitwise comparison always evaluates to 
false" "" { xfail *-*-* } } */
   return 1;

That code looks bogus to me (and if the defn of "N" is further away,
it's harder to spot that it's wrong): shouldn't we warn about it?

> 
> This has found one issue in the GCC codebase and it's a genuine bug:
> .  

In this example GOVD_WRITTEN is from an enum, not a macro, but if
GOVD_WRITTEN had been a macro, shouldn't we still issue a warning?

Hope this is constructive
Dave

> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2017-08-16  Marek Polacek  
> 
>   PR c/81783
>   * c-warn.c (warn_tautological_bitwise_comparison): New
> function.
>   (warn_tautological_cmp): Call it.
> 
>   * doc/invoke.texi: Update -Wtautological-compare documentation.
> 
>   * c-c++-common/Wtautological-compare-5.c: New test.
> 
> diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
> index 9c3073444cf..0749d16a50f 100644
> --- gcc/c-family/c-warn.c
> +++ gcc/c-family/c-warn.c
> @@ -321,6 +321,59 @@ find_array_ref_with_const_idx_r (tree *expr_p,
> int *, void *)
>return NULL_TREE;
>  }
>  
> +/* Subroutine of warn_tautological_cmp.  Warn about bitwise
> comparison
> +   that always evaluate to true or false.  LOC is the location of
> the
> +   ==/!= comparison specified by CODE; LHS and RHS are the usual
> operands
> +   of this comparison.  */
> +
> +static void
> +warn_tautological_bitwise_comparison (location_t loc, tree_code
> code,
> +   tree lhs, tree rhs)
> +{
> +  if (code != EQ_EXPR && code != NE_EXPR)
> +return;
> +
> +  /* Extract the operands from e.g. (x & 8) == 4.  */
> +  tree bitop;
> +  tree cst;
> +  if ((TREE_CODE (lhs) == BIT_AND_EXPR
> +   || TREE_CODE (lhs) == BIT_IOR_EXPR)
> +  && TREE_CODE (rhs) == INTEGER_CST)
> +bitop = lhs, cst = rhs;
> +  else if ((TREE_CODE (rhs) == BIT_AND_EXPR
> + || TREE_CODE (rhs) == BIT_IOR_EXPR)
> +&& TREE_CODE (lhs) == INTEGER_CST)
> +bitop = rhs, cst = lhs;
> +  else
> +return;
> +
> +  tree bitopcst;
> +  if (TREE_CODE (TREE_OPERAND (bitop, 0)) == INTEGER_CST)
> +bitopcst = TREE_OPERAND (bitop, 0);
> +  else if (TREE_CODE (TREE_OPERAND (bitop, 1)) == INTEGER_CST)
> +bitopcst = TREE_OPERAND (bitop, 1);
> +  else
> +return;
> +
> +  wide_int res;
> +  if (TREE_CODE (bitop) == BIT_AND_EXPR)
> +res = wi::bit_and (bitopcst, cst);
> +  else
> +res = wi::bit_or (bitopcst, cst);
> +
> +  /* For BIT_AND only warn if (CST2 & CST1) != CST1, and
> + for BIT_OR only if (CST2 | CST1) != CST1.  */
> +  if (res == cst)
> +return;
> +
> +  if (code == EQ_EXPR)
> +warning_at (loc, OPT_Wtautological_compare,
> + "bitwise comparison always evaluates to false");
> +  else
> +warning_at (loc, OPT_Wtautological_compare,
> + "bitwise comparison always evaluates to true");
> +}
> +
>  /* Warn if a self-comparison always evaluates to true or false.  LOC
> is the location of the comparison with code CODE, LHS and RHS are
> operands of the comparison.  */
> @@ -337,6 +390,8 @@ warn_tautological_cmp (location_t loc, enum
> tree_code code, tree lhs, tree rhs)
>|| from_macro_expansion_at (EXPR_LOCATION (rhs)))
>  return;
>  
> +  warn_tautological_bitwise_comparison (loc, code, lhs, rhs);
> +
>/* We do not warn for constants because they are typical of macro
>   expansions that test for features, sizeof, and similar.  */
>if (CONSTANT_CLASS_P (fold_for_warn (lhs))
> diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
> index ec29f1d629e..72a16a19711 100644
> --- gcc/doc/invoke.texi
> +++ gcc/doc/invoke.texi
> @@ -5484,6 +5484,14 @@ int i = 1;
>  @dots{}
>  if (i > i) @{ @dots{} @}
>  @end smallexample
> +
> +This warning also warns about bitwise comparisons that always
> evaluate
> +to true or false, for instance:
> +@smallexample
> +if ((a & 16) == 10) @{ @dots{} @}
> +@end smallexample
> +will always be false.
> +
>  This warning is enabled by @option{-Wall}.
>  
>  @item 

Re: Limit SH strncmp inline expansion (PR target/78460)

2017-08-16 Thread Oleg Endo
On Tue, 2017-08-15 at 23:44 +, Joseph Myers wrote:
> 
> > This is an older issue.  Please also add a reference to PR 67712 in
> > your commit.  Can you also apply it to GCC 6 branch please?

> I can't reproduce the problem with GCC 6 branch; the glibc testsuite 
> builds fine without out-of-memory issues, as does the test included
> in the  patch, despite the GCC code in question being essentially
> unchanged.  That's why the bug appears to me as a regression in GCC
> 7.

OK, thanks for the investigation!

Cheers,
Oleg


c-family PATCH to improve -Wtautological-compare (PR c/81783)

2017-08-16 Thread Marek Polacek
This patch improves -Wtautological-compare so that it also detects
bitwise comparisons involving & and | that are always true or false, e.g.

  if ((a & 16) == 10)
return 1;

can never be true.  Note that e.g. "(a & 9) == 8" is *not* always false
or true.

I think it's pretty straightforward with one snag: we shouldn't warn if
the constant part of the bitwise operation comes from a macro, but currently
that's not possible, so I XFAILed this in the new test.

This has found one issue in the GCC codebase and it's a genuine bug:
.  

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-08-16  Marek Polacek  

PR c/81783
* c-warn.c (warn_tautological_bitwise_comparison): New function.
(warn_tautological_cmp): Call it.

* doc/invoke.texi: Update -Wtautological-compare documentation.

* c-c++-common/Wtautological-compare-5.c: New test.

diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
index 9c3073444cf..0749d16a50f 100644
--- gcc/c-family/c-warn.c
+++ gcc/c-family/c-warn.c
@@ -321,6 +321,59 @@ find_array_ref_with_const_idx_r (tree *expr_p, int *, void 
*)
   return NULL_TREE;
 }
 
+/* Subroutine of warn_tautological_cmp.  Warn about bitwise comparison
+   that always evaluate to true or false.  LOC is the location of the
+   ==/!= comparison specified by CODE; LHS and RHS are the usual operands
+   of this comparison.  */
+
+static void
+warn_tautological_bitwise_comparison (location_t loc, tree_code code,
+ tree lhs, tree rhs)
+{
+  if (code != EQ_EXPR && code != NE_EXPR)
+return;
+
+  /* Extract the operands from e.g. (x & 8) == 4.  */
+  tree bitop;
+  tree cst;
+  if ((TREE_CODE (lhs) == BIT_AND_EXPR
+   || TREE_CODE (lhs) == BIT_IOR_EXPR)
+  && TREE_CODE (rhs) == INTEGER_CST)
+bitop = lhs, cst = rhs;
+  else if ((TREE_CODE (rhs) == BIT_AND_EXPR
+   || TREE_CODE (rhs) == BIT_IOR_EXPR)
+  && TREE_CODE (lhs) == INTEGER_CST)
+bitop = rhs, cst = lhs;
+  else
+return;
+
+  tree bitopcst;
+  if (TREE_CODE (TREE_OPERAND (bitop, 0)) == INTEGER_CST)
+bitopcst = TREE_OPERAND (bitop, 0);
+  else if (TREE_CODE (TREE_OPERAND (bitop, 1)) == INTEGER_CST)
+bitopcst = TREE_OPERAND (bitop, 1);
+  else
+return;
+
+  wide_int res;
+  if (TREE_CODE (bitop) == BIT_AND_EXPR)
+res = wi::bit_and (bitopcst, cst);
+  else
+res = wi::bit_or (bitopcst, cst);
+
+  /* For BIT_AND only warn if (CST2 & CST1) != CST1, and
+ for BIT_OR only if (CST2 | CST1) != CST1.  */
+  if (res == cst)
+return;
+
+  if (code == EQ_EXPR)
+warning_at (loc, OPT_Wtautological_compare,
+   "bitwise comparison always evaluates to false");
+  else
+warning_at (loc, OPT_Wtautological_compare,
+   "bitwise comparison always evaluates to true");
+}
+
 /* Warn if a self-comparison always evaluates to true or false.  LOC
is the location of the comparison with code CODE, LHS and RHS are
operands of the comparison.  */
@@ -337,6 +390,8 @@ warn_tautological_cmp (location_t loc, enum tree_code code, 
tree lhs, tree rhs)
   || from_macro_expansion_at (EXPR_LOCATION (rhs)))
 return;
 
+  warn_tautological_bitwise_comparison (loc, code, lhs, rhs);
+
   /* We do not warn for constants because they are typical of macro
  expansions that test for features, sizeof, and similar.  */
   if (CONSTANT_CLASS_P (fold_for_warn (lhs))
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index ec29f1d629e..72a16a19711 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -5484,6 +5484,14 @@ int i = 1;
 @dots{}
 if (i > i) @{ @dots{} @}
 @end smallexample
+
+This warning also warns about bitwise comparisons that always evaluate
+to true or false, for instance:
+@smallexample
+if ((a & 16) == 10) @{ @dots{} @}
+@end smallexample
+will always be false.
+
 This warning is enabled by @option{-Wall}.
 
 @item -Wtrampolines
diff --git gcc/testsuite/c-c++-common/Wtautological-compare-5.c 
gcc/testsuite/c-c++-common/Wtautological-compare-5.c
index e69de29bb2d..4664bfdeae6 100644
--- gcc/testsuite/c-c++-common/Wtautological-compare-5.c
+++ gcc/testsuite/c-c++-common/Wtautological-compare-5.c
@@ -0,0 +1,106 @@
+/* PR c/81783 */
+/* { dg-do compile } */
+/* { dg-options "-Wtautological-compare" } */
+
+enum E { FOO = 128 };
+
+int
+f (int a)
+{
+  if ((a & 16) == 10) /* { dg-warning "bitwise comparison always evaluates to 
false" } */
+return 1;
+  if ((16 & a) == 10) /* { dg-warning "bitwise comparison always evaluates to 
false" } */
+return 1;
+  if (10 == (a & 16)) /* { dg-warning "bitwise comparison always evaluates to 
false" } */
+return 1;
+  if (10 == (16 & a)) /* { dg-warning "bitwise comparison always evaluates to 
false" } */
+return 1;
+
+  if ((a & 16) != 10) /* { dg-warning "bitwise comparison always evaluates to 
true" } */
+return 1;
+  if ((16 & a) != 10) /* { dg-warning 

Re: PR81635: Use chrecs to help find related data refs

2017-08-16 Thread Bin.Cheng
On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
 wrote:
> The first loop in the testcase regressed after my recent changes to
> dr_analyze_innermost.  Previously we would treat "i" as an iv even
> for bb analysis and end up with:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: 0
>DR_INIT: 0 or 4
>DR_STEP: 16
>
> We now always keep the step as 0 instead, so for an int "i" we'd have:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: (intptr_t) i
>DR_INIT: 0 or 4
>DR_STEP: 0
>
> This is also what we'd like to have for the unsigned "i", but the
> problem is that strip_constant_offset thinks that the "i + 1" in
> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
> longer seem to be related to the [i] ones.

Didn't read the change in detail, so sorry if I mis-understood the issue.
I made changes in scev to better fold type conversion by various sources
of information, for example, vrp, niters, undefined overflow behavior etc.
In theory these information should be available for other optimizers without
querying scev.  For the mentioned test, vrp should compute accurate range
information for "i" so that we can fold (intptr_t) (i + 1) it without worrying
overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1
could be more expensive (especially in case of (T)(i + j)), or because the
CST part is in bigger precision after conversion.
But such folding is wanted in several places, e.g, IVOPTs.  To provide such
an interface, we changed tree-affine and made it do aggressive fold.  I am
curious if it's possible to use aff_tree to implement strip_constant_offset
here since aggressive folding is wanted.  After all, using additional chrec
looks like a little heavy wrto the simple test.

Thanks,
bin
>
> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type
> expects DR_OFFSET and DR_INIT to be generic expressions whose sum
> gives the initial offset from DR_BASE_ADDRESS.  The other type uses
> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data
> references, with the distance between their start addresses being
> the difference between their DR_INITs.  For this second type, the
> exact form of DR_OFFSET doesn't really matter, and the more it is
> able to canonicalise the non-constant offset, the better.
>
> The patch fixes the PR by adding another offset/init pair to the
> innermost loop behaviour for this second group.  The new pair use chrecs
> rather than generic exprs for the offset part, with the chrec being
> analysed in the innermost loop for which the offset isn't invariant.
> This allows us to vectorise the second function in the testcase
> as well, which wasn't possible before the earlier patch.
>
> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
> PR tree-optimization/81635
> * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and
> chrec_init.
> (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.
> (dr_equal_offsets_p): Delete.
> (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.
> * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
> (split_constant_offset): Handle POLYNOMIAL_CHREC.
> (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.
> (operator ==): Use dr_chrec_offsets_equal_p and compare the
> DR_CHREC_INITs.
> (prune_runtime_alias_test_list): Likewise.
> (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare
> the DR_CHREC_INITs.
> (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
> (analyze_offset_scev): New function.
> (compute_offset_chrecs): Likewise.
> (dr_chrec_offsets_equal_p): Likewise.
> (dr_chrec_offsets_compare): Likewise.
> * tree-loop-distribution.c (compute_alias_check_pairs): Use
> dr_chrec_offsets_compare.
> * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
> dr_chrec_offsets_compare and compare the DR_CHREC_INITs.
> (dr_group_sort_cmp): Likewise.
> (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT.
> (vect_no_alias_p): Likewise.
> (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and
> compare the DR_CHREC_INITs.
> (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare.
> * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead
> of DR_INIT.
>
> gcc/testsuite/
> PR tree-optimization/81635
> * gcc.dg/vect/bb-slp-pr81635.c: New test.
>


Re: PR81635: Use chrecs to help find related data refs

2017-08-16 Thread Richard Sandiford
Richard Sandiford  writes:
> The first loop in the testcase regressed after my recent changes to
> dr_analyze_innermost.  Previously we would treat "i" as an iv even
> for bb analysis and end up with:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: 0
>DR_INIT: 0 or 4
>DR_STEP: 16
>
> We now always keep the step as 0 instead, so for an int "i" we'd have:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: (intptr_t) i
>DR_INIT: 0 or 4
>DR_STEP: 0
>
> This is also what we'd like to have for the unsigned "i", but the
> problem is that strip_constant_offset thinks that the "i + 1" in
> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
> longer seem to be related to the [i] ones.
>
> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type
> expects DR_OFFSET and DR_INIT to be generic expressions whose sum
> gives the initial offset from DR_BASE_ADDRESS.  The other type uses
> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data
> references, with the distance between their start addresses being
> the difference between their DR_INITs.  For this second type, the
> exact form of DR_OFFSET doesn't really matter, and the more it is
> able to canonicalise the non-constant offset, the better.
>
> The patch fixes the PR by adding another offset/init pair to the
> innermost loop behaviour for this second group.  The new pair use chrecs
> rather than generic exprs for the offset part, with the chrec being
> analysed in the innermost loop for which the offset isn't invariant.
> This allows us to vectorise the second function in the testcase
> as well, which wasn't possible before the earlier patch.
>
> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
>   PR tree-optimization/81635
>   * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and
>   chrec_init.
>   (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.
>   (dr_equal_offsets_p): Delete.
>   (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.
>   * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
>   (split_constant_offset): Handle POLYNOMIAL_CHREC.
>   (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.
>   (operator ==): Use dr_chrec_offsets_equal_p and compare the
>   DR_CHREC_INITs.
>   (prune_runtime_alias_test_list): Likewise.
>   (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare
>   the DR_CHREC_INITs.
>   (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
>   (analyze_offset_scev): New function.
>   (compute_offset_chrecs): Likewise.
>   (dr_chrec_offsets_equal_p): Likewise.
>   (dr_chrec_offsets_compare): Likewise.
>   * tree-loop-distribution.c (compute_alias_check_pairs): Use
>   dr_chrec_offsets_compare.
>   * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
>   dr_chrec_offsets_compare and compare the DR_CHREC_INITs.
>   (dr_group_sort_cmp): Likewise.
>   (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT.
>   (vect_no_alias_p): Likewise.
>   (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and
>   compare the DR_CHREC_INITs.
>   (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare.
>   * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead
>   of DR_INIT.
>
> gcc/testsuite/
>   PR tree-optimization/81635
>   * gcc.dg/vect/bb-slp-pr81635.c: New test.

Sorry, forgot I was going to convert this to a context diff...

Index: gcc/tree-data-ref.h
===
*** gcc/tree-data-ref.h 2017-08-04 11:42:45.938134820 +0100
--- gcc/tree-data-ref.h 2017-08-16 14:34:56.611554296 +0100
*** struct innermost_loop_behavior
*** 52,57 
--- 52,78 
tree init;
tree step;
  
+   /* The scalar evolution of OFFSET + INIT, split into non-constant and
+  constant terms in the same way as OFFSET and INIT.  For example,
+  if OFFSET + INIT is {x + 4, +, 3}_1, the fields would be:
+ 
+  chrec_offset  {x, +, 3}_1
+  chrec_init4
+ 
+  If OFFSET + INIT cannot be represented as a scalar evolution,
+  the fields are simply a copy of OFFSET and INIT.
+ 
+  This split is useful when analyzing the relationship between two
+  data references with the same base.  OFFSET and INIT are generic
+  expressions that can be used to build related data references,
+  while CHREC_OFFSET and CHREC_INIT are our best attempt at
+  canonicalizing the value of OFFSET + INIT.
+ 
+  These fields are only initialized lazily, by a call to
+  compute_offset_chrecs.  */
+   tree chrec_offset;
+   tree chrec_init;
+ 
/* BASE_ADDRESS is known to be misaligned by 

PR81635: Use chrecs to help find related data refs

2017-08-16 Thread Richard Sandiford
The first loop in the testcase regressed after my recent changes to
dr_analyze_innermost.  Previously we would treat "i" as an iv even
for bb analysis and end up with:

   DR_BASE_ADDRESS: p or q
   DR_OFFSET: 0
   DR_INIT: 0 or 4
   DR_STEP: 16

We now always keep the step as 0 instead, so for an int "i" we'd have:

   DR_BASE_ADDRESS: p or q
   DR_OFFSET: (intptr_t) i
   DR_INIT: 0 or 4
   DR_STEP: 0

This is also what we'd like to have for the unsigned "i", but the
problem is that strip_constant_offset thinks that the "i + 1" in
"(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
name that holds "(intptr_t) (i + 1)", meaning that the accesses no
longer seem to be related to the [i] ones.

There seem to be two main uses of DR_OFFSET and DR_INIT.  One type
expects DR_OFFSET and DR_INIT to be generic expressions whose sum
gives the initial offset from DR_BASE_ADDRESS.  The other type uses
the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data
references, with the distance between their start addresses being
the difference between their DR_INITs.  For this second type, the
exact form of DR_OFFSET doesn't really matter, and the more it is
able to canonicalise the non-constant offset, the better.

The patch fixes the PR by adding another offset/init pair to the
innermost loop behaviour for this second group.  The new pair use chrecs
rather than generic exprs for the offset part, with the chrec being
analysed in the innermost loop for which the offset isn't invariant.
This allows us to vectorise the second function in the testcase
as well, which wasn't possible before the earlier patch.

Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?

Richard


gcc/
PR tree-optimization/81635
* tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and
chrec_init.
(DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.
(dr_equal_offsets_p): Delete.
(dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.
* tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
(split_constant_offset): Handle POLYNOMIAL_CHREC.
(dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.
(operator ==): Use dr_chrec_offsets_equal_p and compare the
DR_CHREC_INITs.
(prune_runtime_alias_test_list): Likewise.
(comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare
the DR_CHREC_INITs.
(dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
(analyze_offset_scev): New function.
(compute_offset_chrecs): Likewise.
(dr_chrec_offsets_equal_p): Likewise.
(dr_chrec_offsets_compare): Likewise.
* tree-loop-distribution.c (compute_alias_check_pairs): Use
dr_chrec_offsets_compare.
* tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
dr_chrec_offsets_compare and compare the DR_CHREC_INITs.
(dr_group_sort_cmp): Likewise.
(vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT.
(vect_no_alias_p): Likewise.
(vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and
compare the DR_CHREC_INITs.
(vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare.
* tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead
of DR_INIT.

gcc/testsuite/
PR tree-optimization/81635
* gcc.dg/vect/bb-slp-pr81635.c: New test.

Index: gcc/tree-data-ref.h
===
--- gcc/tree-data-ref.h 2017-08-04 11:42:45.938134820 +0100
+++ gcc/tree-data-ref.h 2017-08-16 14:34:56.611554296 +0100
@@ -52,6 +52,27 @@ struct innermost_loop_behavior
   tree init;
   tree step;
 
+  /* The scalar evolution of OFFSET + INIT, split into non-constant and
+ constant terms in the same way as OFFSET and INIT.  For example,
+ if OFFSET + INIT is {x + 4, +, 3}_1, the fields would be:
+
+ chrec_offset  {x, +, 3}_1
+ chrec_init4
+
+ If OFFSET + INIT cannot be represented as a scalar evolution,
+ the fields are simply a copy of OFFSET and INIT.
+
+ This split is useful when analyzing the relationship between two
+ data references with the same base.  OFFSET and INIT are generic
+ expressions that can be used to build related data references,
+ while CHREC_OFFSET and CHREC_INIT are our best attempt at
+ canonicalizing the value of OFFSET + INIT.
+
+ These fields are only initialized lazily, by a call to
+ compute_offset_chrecs.  */
+  tree chrec_offset;
+  tree chrec_init;
+
   /* BASE_ADDRESS is known to be misaligned by BASE_MISALIGNMENT bytes
  from an alignment boundary of BASE_ALIGNMENT bytes.  For example,
  if we had:
@@ -187,6 +208,8 @@ #define DR_IS_CONDITIONAL_IN_STMT(DR) (D
 #define DR_BASE_ADDRESS(DR)(DR)->innermost.base_address
 #define DR_OFFSET(DR)

Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space

2017-08-16 Thread Georg-Johann Lay

On 28.07.2017 09:34, Richard Biener wrote:

On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay  wrote:

On 27.07.2017 14:34, Richard Biener wrote:


On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay  wrote:


For some targets, the best place to put read-only lookup tables as
generated by -ftree-switch-conversion is not the generic address space
but some target specific address space.

This is the case for AVR, where for most devices .rodata must be
located in RAM.

Part #1 adds a new, optional target hook that queries the back-end
about the desired address space.  There is currently only one user:
tree-switch-conversion.c::build_one_array() which re-builds value_type
and array_type if the address space returned by the backend is not
the generic one.

Part #2 is the AVR part that implements the new hook and adds some
sugar around it.



Given that switch-conversion just creates a constant initializer doesn't
AVR
benefit from handling those uniformly (at RTL expansion?).  Not sure but
I think it goes through the regular constant pool handling.

Richard.



avr doesn't use constant pools because they are not profitable for
several reasons.

Moreover, it's not sufficient to just patch the pool's section, you'd
also have to patch *all* accesses to also use the exact same address
space so that they use the correct instruction (LPM instead of LD).


Sure.  So then not handle it in constant pool handling but in expanding
the initializers of global vars.  I think the entry for this is
varasm.c:assemble_variable - that sets DECL_RTL for the variable.


That cannot work, and here is why; just for completeness:

cgraphunit.c::symbol_table::compile()

runs

  ...
  expand_all_functions ();
  output_variables (); // runs assemble_variable
  ...

So any patching of DECL_RTL will result in wrong code: The address
space of the decl won't match the address space used by the code
accessing decl.

Johann


Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions

2017-08-16 Thread Uros Bizjak
On Wed, Aug 16, 2017 at 12:55 PM, Richard Biener
 wrote:
> On Wed, Aug 16, 2017 at 12:51 PM, Uros Bizjak  wrote:
>> On Wed, Aug 16, 2017 at 12:48 PM, Uros Bizjak  wrote:
>>> On Wed, Aug 16, 2017 at 12:43 PM, Richard Biener
>>>  wrote:
 On Tue, Aug 15, 2017 at 9:21 PM, Uros Bizjak  wrote:
> On Tue, Aug 15, 2017 at 4:59 PM, Richard Biener
>  wrote:
>
>> So I'd try the "easy" way of expanding if (__builtin_cpu_supports 
>> ("sse4.1"))
>> as the sse4.1 sequence is just a single instruction.  The interesting 
>> part
>> of the story will be to make sure we can emit that even if ! 
>> TARGET_ROUND ...
>>
>> Uros, any idea how to accomplish this?  Or is the idea of a "local" ifunc
>> better?  Note the ABI boundary will be expensive but I guess the 
>> conditional
>> sequence as well (and it will disturb RA even if predicted to have SSE 
>> 4.1).
>
> TARGET_ROUND is just:
>
> /* SSE4.1 defines round instructions */
> #defineOPTION_MASK_ISA_ROUNDOPTION_MASK_ISA_SSE4_1
> #defineTARGET_ISA_ROUND((ix86_isa_flags & OPTION_MASK_ISA_ROUND) 
> != 0)
>
> I don't remember the history around the #define, once upon a time
> probably made sense, but nowadays it looks that it can be simply
> substituted with TARGET_SSE4_1.

 Sure but we want the backend to use a TARGET_ROUND guarded define_insn
 when TARGET_ROUND is false but inside a runtime conditional ensuring that
 TARGET_ROUND is satisfied.  With doing this with ifuncs we'd mark the 
 function
 with a proper target attribute but within a function?
>>>
>>> How about something intrinsic headers are using?
>>
>> (... somehow managed to press send too early ...)
>>
>> There we use GCC_push_options and GCC_target pragmas. Maybe we also
>> need corresponding __ROUND__ define defined by the compiler.
>
> Those don't work inside a function.  Remember I want to change the expander
> of ceil () to
>
>  if (__builtin_cpu_supports ("sse4.1"))
>ceil_for_sse4.1 ();
>  else
>ceil ();
>
> from the x86 target code that expands ceil for ! TARGET_ROUND.  I suppose
> we could simply use a separate pattern for SSE 4.1 roundsd here (does it
> have to be an unspec?  I suppose so to prevent it from being generated by
> other means and to prevent code motion out of the conditional?)
>
> Or forgo with the idea to use inline conditional code and emit an ifunc
> dispatcher, a function with the sse4.1 instruction, and a call to the 
> dispatcher
> ourselves.

Hm ...

Maybe in this case an example from libatomic, how cmpxchg16 is handled
comes handy.

Uros.


Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions

2017-08-16 Thread Richard Biener
On Wed, Aug 16, 2017 at 12:51 PM, Uros Bizjak  wrote:
> On Wed, Aug 16, 2017 at 12:48 PM, Uros Bizjak  wrote:
>> On Wed, Aug 16, 2017 at 12:43 PM, Richard Biener
>>  wrote:
>>> On Tue, Aug 15, 2017 at 9:21 PM, Uros Bizjak  wrote:
 On Tue, Aug 15, 2017 at 4:59 PM, Richard Biener
  wrote:

> So I'd try the "easy" way of expanding if (__builtin_cpu_supports 
> ("sse4.1"))
> as the sse4.1 sequence is just a single instruction.  The interesting part
> of the story will be to make sure we can emit that even if ! TARGET_ROUND 
> ...
>
> Uros, any idea how to accomplish this?  Or is the idea of a "local" ifunc
> better?  Note the ABI boundary will be expensive but I guess the 
> conditional
> sequence as well (and it will disturb RA even if predicted to have SSE 
> 4.1).

 TARGET_ROUND is just:

 /* SSE4.1 defines round instructions */
 #defineOPTION_MASK_ISA_ROUNDOPTION_MASK_ISA_SSE4_1
 #defineTARGET_ISA_ROUND((ix86_isa_flags & OPTION_MASK_ISA_ROUND) 
 != 0)

 I don't remember the history around the #define, once upon a time
 probably made sense, but nowadays it looks that it can be simply
 substituted with TARGET_SSE4_1.
>>>
>>> Sure but we want the backend to use a TARGET_ROUND guarded define_insn
>>> when TARGET_ROUND is false but inside a runtime conditional ensuring that
>>> TARGET_ROUND is satisfied.  With doing this with ifuncs we'd mark the 
>>> function
>>> with a proper target attribute but within a function?
>>
>> How about something intrinsic headers are using?
>
> (... somehow managed to press send too early ...)
>
> There we use GCC_push_options and GCC_target pragmas. Maybe we also
> need corresponding __ROUND__ define defined by the compiler.

Those don't work inside a function.  Remember I want to change the expander
of ceil () to

 if (__builtin_cpu_supports ("sse4.1"))
   ceil_for_sse4.1 ();
 else
   ceil ();

from the x86 target code that expands ceil for ! TARGET_ROUND.  I suppose
we could simply use a separate pattern for SSE 4.1 roundsd here (does it
have to be an unspec?  I suppose so to prevent it from being generated by
other means and to prevent code motion out of the conditional?)

Or forgo with the idea to use inline conditional code and emit an ifunc
dispatcher, a function with the sse4.1 instruction, and a call to the dispatcher
ourselves.

Richard.

> Uros.


Re: [PATCH] Fix middle-end/81737

2017-08-16 Thread Richard Biener
On Wed, Aug 16, 2017 at 12:33 PM, Marek Polacek  wrote:
> On Mon, Aug 14, 2017 at 10:22:09AM +0200, Richard Biener wrote:
>> On Tue, Aug 8, 2017 at 7:23 PM, Marek Polacek  wrote:
>> > On Mon, Aug 07, 2017 at 04:07:49PM +0200, Richard Biener wrote:
>> >> On August 7, 2017 11:09:59 AM GMT+02:00, Marek Polacek 
>> >>  wrote:
>> >> >On Mon, Aug 07, 2017 at 10:58:09AM +0200, Jakub Jelinek wrote:
>> >> >> On Mon, Aug 07, 2017 at 10:47:51AM +0200, Marek Polacek wrote:
>> >> >> > In my recent change I failed to check whether the type domain
>> >> >> > of a type is non-NULL and this goof causes crashing on this
>> >> >> > testcase.
>> >> >> >
>> >> >> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
>> >> >> >
>> >> >> > 2017-08-07  Marek Polacek  
>> >> >> >
>> >> >> >  PR middle-end/81737
>> >> >> >  * fold-const.c (fold_indirect_ref_1): Check type_domain.
>> >> >> >
>> >> >> >  * gcc.dg/pr81737.c: New test.
>> >> >>
>> >> >> The old code was assuming size_zero_node if type_domain is NULL
>> >> >> or TYPE_MIN_VALUE is NULL, which is reasonable for C/C++, but indeed
>> >> >might
>> >> >> be wrong perhaps for Fortran or Ada.
>> >>
>> >> It's how the middle-end defines it, so please restore this behavior 
>> >> (sorry for missing this in the review).  IIRC there are at least one 
>> >> other 'copy' of the folding somewhere.
>> >
>> > Sure, this patch should do it:
>> >
>> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
>> >
>> > 2017-08-08  Marek Polacek  
>> >
>> > PR middle/81695
>> > * fold-const.c (fold_indirect_ref_1): Restore original behavior
>> > regarding size_zero_node.
>> >
>> > diff --git gcc/fold-const.c gcc/fold-const.c
>> > index 5a118ca50a1..2c47d1d16a4 100644
>> > --- gcc/fold-const.c
>> > +++ gcc/fold-const.c
>> > @@ -14109,22 +14109,21 @@ fold_indirect_ref_1 (location_t loc, tree type, 
>> > tree op0)
>> >&& type == TREE_TYPE (op00type))
>> > {
>> >   tree type_domain = TYPE_DOMAIN (op00type);
>> > - tree min;
>> > + tree min = size_zero_node;
>> >   if (type_domain != NULL_TREE
>> > - && (min = TYPE_MIN_VALUE (type_domain))
>> > + && TYPE_MIN_VALUE (type_domain)
>> >   && TREE_CODE (min) == INTEGER_CST)
>> > +   min = TYPE_MIN_VALUE (type_domain);
>>
>> I think this is wrong for non-INTEGER_CST TYPE_MIN_VALUE.
>
> Ah, right, another try:
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2017-08-16  Marek Polacek  
>
> PR middle/81695
> * fold-const.c (fold_indirect_ref_1): Restore original behavior
> regarding size_zero_node.
>
> diff --git gcc/fold-const.c gcc/fold-const.c
> index 5a118ca50a1..0a05da0b49f 100644
> --- gcc/fold-const.c
> +++ gcc/fold-const.c
> @@ -14109,22 +14109,19 @@ fold_indirect_ref_1 (location_t loc, tree type, 
> tree op0)
>&& type == TREE_TYPE (op00type))
> {
>   tree type_domain = TYPE_DOMAIN (op00type);
> - tree min;
> - if (type_domain != NULL_TREE
> - && (min = TYPE_MIN_VALUE (type_domain))
> - && TREE_CODE (min) == INTEGER_CST)
> + tree min = size_zero_node;
> + if (type_domain && TYPE_MIN_VALUE (type_domain))
> +   min = TYPE_MIN_VALUE (type_domain);
> + offset_int off = wi::to_offset (op01);
> + offset_int el_sz = wi::to_offset (TYPE_SIZE_UNIT (type));
> + offset_int remainder;
> + off = wi::divmod_trunc (off, el_sz, SIGNED, );
> + if (remainder == 0)

Ok with && TREE_CODE (min) == INTEGER_CST here.

Richard.

> {
> - offset_int off = wi::to_offset (op01);
> - offset_int el_sz = wi::to_offset (TYPE_SIZE_UNIT (type));
> - offset_int remainder;
> - off = wi::divmod_trunc (off, el_sz, SIGNED, );
> - if (remainder == 0)
> -   {
> - off = off + wi::to_offset (min);
> - op01 = wide_int_to_tree (sizetype, off);
> - return build4_loc (loc, ARRAY_REF, type, op00, op01,
> -NULL_TREE, NULL_TREE);
> -   }
> + off = off + wi::to_offset (min);
> + op01 = wide_int_to_tree (sizetype, off);
> + return build4_loc (loc, ARRAY_REF, type, op00, op01,
> +NULL_TREE, NULL_TREE);
> }
> }
> }
>
> Marek


Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions

2017-08-16 Thread Uros Bizjak
On Wed, Aug 16, 2017 at 12:48 PM, Uros Bizjak  wrote:
> On Wed, Aug 16, 2017 at 12:43 PM, Richard Biener
>  wrote:
>> On Tue, Aug 15, 2017 at 9:21 PM, Uros Bizjak  wrote:
>>> On Tue, Aug 15, 2017 at 4:59 PM, Richard Biener
>>>  wrote:
>>>
 So I'd try the "easy" way of expanding if (__builtin_cpu_supports 
 ("sse4.1"))
 as the sse4.1 sequence is just a single instruction.  The interesting part
 of the story will be to make sure we can emit that even if ! TARGET_ROUND 
 ...

 Uros, any idea how to accomplish this?  Or is the idea of a "local" ifunc
 better?  Note the ABI boundary will be expensive but I guess the 
 conditional
 sequence as well (and it will disturb RA even if predicted to have SSE 
 4.1).
>>>
>>> TARGET_ROUND is just:
>>>
>>> /* SSE4.1 defines round instructions */
>>> #defineOPTION_MASK_ISA_ROUNDOPTION_MASK_ISA_SSE4_1
>>> #defineTARGET_ISA_ROUND((ix86_isa_flags & OPTION_MASK_ISA_ROUND) != 
>>> 0)
>>>
>>> I don't remember the history around the #define, once upon a time
>>> probably made sense, but nowadays it looks that it can be simply
>>> substituted with TARGET_SSE4_1.
>>
>> Sure but we want the backend to use a TARGET_ROUND guarded define_insn
>> when TARGET_ROUND is false but inside a runtime conditional ensuring that
>> TARGET_ROUND is satisfied.  With doing this with ifuncs we'd mark the 
>> function
>> with a proper target attribute but within a function?
>
> How about something intrinsic headers are using?

(... somehow managed to press send too early ...)

There we use GCC_push_options and GCC_target pragmas. Maybe we also
need corresponding __ROUND__ define defined by the compiler.

Uros.


Re: [PATCH PR81832]Skip copying loop header if inner loop is distributed

2017-08-16 Thread Richard Biener
On Wed, Aug 16, 2017 at 11:22 AM, Bin.Cheng  wrote:
> On Tue, Aug 15, 2017 at 6:33 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Tue, Aug 15, 2017 at 11:28 AM, Bin Cheng  wrote:
 Hi,
 This patch fixes PR81832.  Root cause for the ICE is:
   1) Loop has distributed inner loop.
   2) The guarding function call IFN_LOOP_DIST_CALL happens to be in loop's 
 header.
   3) IFN_LOOP_DIST_CALL (int loop's header) is duplicated by pass_ch_vect 
 thus
  not eliminated.

 Given pass_ch_vect copies loop header to enable more vectorization, we 
 should
 skip loop in this case because distributed inner loop means this loop can 
 not
 be vectorized anyway.  One point to mention is name 
 inner_loop_distributed_p
 is a little misleading.  The name indicates that each basic block is 
 checked,
 but the patch only checks loop's header for simplicity/efficiency's 
 purpose.
 Any comment?
>>>
>>> My comment would be to question pass_ch_vect placement -- what was the
>>> reason to place it so late?
>>>
>>> I also see GRAPHITE runs inbetween loop distribution and vectorization --
>>> what prevents GRAPHITE from messing up things here?  Or autopar?
>>>
>>> The patch itself shows should_duplicate_loop_header_p should
>>> handle this IFN specially (somehow all IFNs are considered "inexpensive").
>>>
>>> So can you please adjust should_duplicate_loop_header_p instead and/or
>>> gimple_inexpensive_call_p?  Since we have IFNs for stuff like EXP10 I'm not
>>> sure if that default is so good.
>>
>> I think the default itself is OK: we only use IFNs for libm functions
>> like exp10 if an underlying optab exists (unlike __builtin_exp10, which
>> is useful as the non-errno setting version of exp10), so the target must
>> have something that it thinks is worth open-coding.  Also, we currently
>> treat all MD built-ins as "simple" and thus "inexpensive", so the IFN
>> handling is consistent with that.
>>
>> Maybe there are some IFNs that are worth special-casing as expensive,
>> but IMO doing that to solve this problem would be a bit hacky.  It seems
>> like "inexpensive" should be more of a cost thing than a correctness thing.
> Hi all,
> This is updated patch.  It only adds a single check on IFN_LOOP_DIST_ALIAS
> in function should_duplicate_loop_header_p.  As for gimple_inexpensive_call_p,
> I think it's natural to consider functions like IFN_LOOP_VECTORIZED and
> IFN_LOOP_DIST_ALIAS as expensive because they are only meant to indicate
> temporary arrangement of optimizations and are never used in code generation.
> I will send a standalone patch for that.  Another thing is this patch
> doesn't check
> IFN_LOOP_VECTORIZED because it's impossible to have it with current order
> of passes.
> Bootstrap and test ongoing.  Further comments?

Ok.

Thanks,
Richard.

> Thanks,
> bin
> 2017-08-15  Bin Cheng  
>
> PR tree-optimization/81832
> * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Don't
> copy loop header which has IFN_LOOP_DIST_ALIAS call.
>
> gcc/testsuite
> 2017-08-15  Bin Cheng  
>
> PR tree-optimization/81832
> * gcc.dg/tree-ssa/pr81832.c: New test.
>
>>
>> Thanks,
>> Richard
>>
>>> Thanks,
>>> Richard.
>>>
 Bootstrap and test on x86_64.

 Thanks,
 bin
 2017-08-15  Bin Cheng  

 PR tree-optimization/81832
 * tree-ssa-loop-ch.c (inner_loop_distributed_p): New function.
 (pass_ch_vect::process_loop_p): Call above function.

 gcc/testsuite
 2017-08-15  Bin Cheng  

 PR tree-optimization/81832
 * gcc.dg/tree-ssa/pr81832.c: New test.


Re: Add option for whether ceil etc. can raise "inexact", adjust x86 conditions

2017-08-16 Thread Richard Biener
On Tue, Aug 15, 2017 at 9:21 PM, Uros Bizjak  wrote:
> On Tue, Aug 15, 2017 at 4:59 PM, Richard Biener
>  wrote:
>
>> So I'd try the "easy" way of expanding if (__builtin_cpu_supports ("sse4.1"))
>> as the sse4.1 sequence is just a single instruction.  The interesting part
>> of the story will be to make sure we can emit that even if ! TARGET_ROUND ...
>>
>> Uros, any idea how to accomplish this?  Or is the idea of a "local" ifunc
>> better?  Note the ABI boundary will be expensive but I guess the conditional
>> sequence as well (and it will disturb RA even if predicted to have SSE 4.1).
>
> TARGET_ROUND is just:
>
> /* SSE4.1 defines round instructions */
> #defineOPTION_MASK_ISA_ROUNDOPTION_MASK_ISA_SSE4_1
> #defineTARGET_ISA_ROUND((ix86_isa_flags & OPTION_MASK_ISA_ROUND) != 0)
>
> I don't remember the history around the #define, once upon a time
> probably made sense, but nowadays it looks that it can be simply
> substituted with TARGET_SSE4_1.

Sure but we want the backend to use a TARGET_ROUND guarded define_insn
when TARGET_ROUND is false but inside a runtime conditional ensuring that
TARGET_ROUND is satisfied.  With doing this with ifuncs we'd mark the function
with a proper target attribute but within a function?

Richard.

> Uros.


Re: [PATCH] Fix middle-end/81737

2017-08-16 Thread Marek Polacek
On Mon, Aug 14, 2017 at 10:22:09AM +0200, Richard Biener wrote:
> On Tue, Aug 8, 2017 at 7:23 PM, Marek Polacek  wrote:
> > On Mon, Aug 07, 2017 at 04:07:49PM +0200, Richard Biener wrote:
> >> On August 7, 2017 11:09:59 AM GMT+02:00, Marek Polacek 
> >>  wrote:
> >> >On Mon, Aug 07, 2017 at 10:58:09AM +0200, Jakub Jelinek wrote:
> >> >> On Mon, Aug 07, 2017 at 10:47:51AM +0200, Marek Polacek wrote:
> >> >> > In my recent change I failed to check whether the type domain
> >> >> > of a type is non-NULL and this goof causes crashing on this
> >> >> > testcase.
> >> >> >
> >> >> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> >> >> >
> >> >> > 2017-08-07  Marek Polacek  
> >> >> >
> >> >> >  PR middle-end/81737
> >> >> >  * fold-const.c (fold_indirect_ref_1): Check type_domain.
> >> >> >
> >> >> >  * gcc.dg/pr81737.c: New test.
> >> >>
> >> >> The old code was assuming size_zero_node if type_domain is NULL
> >> >> or TYPE_MIN_VALUE is NULL, which is reasonable for C/C++, but indeed
> >> >might
> >> >> be wrong perhaps for Fortran or Ada.
> >>
> >> It's how the middle-end defines it, so please restore this behavior (sorry 
> >> for missing this in the review).  IIRC there are at least one other 'copy' 
> >> of the folding somewhere.
> >
> > Sure, this patch should do it:
> >
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> >
> > 2017-08-08  Marek Polacek  
> >
> > PR middle/81695
> > * fold-const.c (fold_indirect_ref_1): Restore original behavior
> > regarding size_zero_node.
> >
> > diff --git gcc/fold-const.c gcc/fold-const.c
> > index 5a118ca50a1..2c47d1d16a4 100644
> > --- gcc/fold-const.c
> > +++ gcc/fold-const.c
> > @@ -14109,22 +14109,21 @@ fold_indirect_ref_1 (location_t loc, tree type, 
> > tree op0)
> >&& type == TREE_TYPE (op00type))
> > {
> >   tree type_domain = TYPE_DOMAIN (op00type);
> > - tree min;
> > + tree min = size_zero_node;
> >   if (type_domain != NULL_TREE
> > - && (min = TYPE_MIN_VALUE (type_domain))
> > + && TYPE_MIN_VALUE (type_domain)
> >   && TREE_CODE (min) == INTEGER_CST)
> > +   min = TYPE_MIN_VALUE (type_domain);
> 
> I think this is wrong for non-INTEGER_CST TYPE_MIN_VALUE.

Ah, right, another try:

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-08-16  Marek Polacek  

PR middle/81695
* fold-const.c (fold_indirect_ref_1): Restore original behavior
regarding size_zero_node.

diff --git gcc/fold-const.c gcc/fold-const.c
index 5a118ca50a1..0a05da0b49f 100644
--- gcc/fold-const.c
+++ gcc/fold-const.c
@@ -14109,22 +14109,19 @@ fold_indirect_ref_1 (location_t loc, tree type, tree 
op0)
   && type == TREE_TYPE (op00type))
{
  tree type_domain = TYPE_DOMAIN (op00type);
- tree min;
- if (type_domain != NULL_TREE
- && (min = TYPE_MIN_VALUE (type_domain))
- && TREE_CODE (min) == INTEGER_CST)
+ tree min = size_zero_node;
+ if (type_domain && TYPE_MIN_VALUE (type_domain))
+   min = TYPE_MIN_VALUE (type_domain);
+ offset_int off = wi::to_offset (op01);
+ offset_int el_sz = wi::to_offset (TYPE_SIZE_UNIT (type));
+ offset_int remainder;
+ off = wi::divmod_trunc (off, el_sz, SIGNED, );
+ if (remainder == 0)
{
- offset_int off = wi::to_offset (op01);
- offset_int el_sz = wi::to_offset (TYPE_SIZE_UNIT (type));
- offset_int remainder;
- off = wi::divmod_trunc (off, el_sz, SIGNED, );
- if (remainder == 0)
-   {
- off = off + wi::to_offset (min);
- op01 = wide_int_to_tree (sizetype, off);
- return build4_loc (loc, ARRAY_REF, type, op00, op01,
-NULL_TREE, NULL_TREE);
-   }
+ off = off + wi::to_offset (min);
+ op01 = wide_int_to_tree (sizetype, off);
+ return build4_loc (loc, ARRAY_REF, type, op00, op01,
+NULL_TREE, NULL_TREE);
}
}
}

Marek


Re: [PATCH PR81832]Skip copying loop header if inner loop is distributed

2017-08-16 Thread Bin.Cheng
On Wed, Aug 16, 2017 at 10:31 AM, Richard Sandiford
 wrote:
> "Bin.Cheng"  writes:
>> On Tue, Aug 15, 2017 at 6:33 PM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Tue, Aug 15, 2017 at 11:28 AM, Bin Cheng  wrote:
> Hi,
> This patch fixes PR81832.  Root cause for the ICE is:
>   1) Loop has distributed inner loop.
>   2) The guarding function call IFN_LOOP_DIST_CALL happens to be in 
> loop's header.
>   3) IFN_LOOP_DIST_CALL (int loop's header) is duplicated by pass_ch_vect 
> thus
>  not eliminated.
>
> Given pass_ch_vect copies loop header to enable more vectorization, we 
> should
> skip loop in this case because distributed inner loop means this loop can 
> not
> be vectorized anyway.  One point to mention is name 
> inner_loop_distributed_p
> is a little misleading.  The name indicates that each basic block is 
> checked,
> but the patch only checks loop's header for simplicity/efficiency's 
> purpose.
> Any comment?

 My comment would be to question pass_ch_vect placement -- what was the
 reason to place it so late?

 I also see GRAPHITE runs inbetween loop distribution and vectorization --
 what prevents GRAPHITE from messing up things here?  Or autopar?

 The patch itself shows should_duplicate_loop_header_p should
 handle this IFN specially (somehow all IFNs are considered "inexpensive").

 So can you please adjust should_duplicate_loop_header_p instead and/or
 gimple_inexpensive_call_p?  Since we have IFNs for stuff like EXP10 I'm not
 sure if that default is so good.
>>>
>>> I think the default itself is OK: we only use IFNs for libm functions
>>> like exp10 if an underlying optab exists (unlike __builtin_exp10, which
>>> is useful as the non-errno setting version of exp10), so the target must
>>> have something that it thinks is worth open-coding.  Also, we currently
>>> treat all MD built-ins as "simple" and thus "inexpensive", so the IFN
>>> handling is consistent with that.
>>>
>>> Maybe there are some IFNs that are worth special-casing as expensive,
>>> but IMO doing that to solve this problem would be a bit hacky.  It seems
>>> like "inexpensive" should be more of a cost thing than a correctness thing.
>> Hi all,
>
>> This is updated patch.  It only adds a single check on IFN_LOOP_DIST_ALIAS
>> in function should_duplicate_loop_header_p.
>
> Thanks.
>
>> As for gimple_inexpensive_call_p, I think it's natural to consider
>> functions like IFN_LOOP_VECTORIZED and IFN_LOOP_DIST_ALIAS as
>> expensive because they are only meant to indicate temporary
>> arrangement of optimizations and are never used in code generation.
>> I will send a standalone patch for that.
>
> Is that enough to consider them expensive though?  To me, "expensive"
Not sure.  Or the other hand,  "Expensive" is a measurement of the cost of
generated code.  For internal function calls in discussion, maybe we should
not ask the question in the first.  Even these function calls are expanded to
constant, IMHO, we can't simply consider it's cheap either because there is
high level side-effects along with expanding, i.e, undoing loop versioning.
such high level transformation is not (and should not be) covered by
gimple_inexpensive_call_p.

Thanks,
bin
> should mean that they cost a lot in terms of size or speed (whichever
> is most important in context).  Both functions are really cheap in
> that sense, since they eventually expand to constants.
>
> Thanks,
> Richard
>
>> Another thing is this patch doesn't check IFN_LOOP_VECTORIZED because
>> it's impossible to have it with current order of passes.  Bootstrap
>> and test ongoing.  Further comments?
>>
>> Thanks,
>> bin
>> 2017-08-15  Bin Cheng  
>>
>> PR tree-optimization/81832
>> * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Don't
>> copy loop header which has IFN_LOOP_DIST_ALIAS call.
>>
>> gcc/testsuite
>> 2017-08-15  Bin Cheng  
>>
>> PR tree-optimization/81832
>> * gcc.dg/tree-ssa/pr81832.c: New test.
>>
>>>
>>> Thanks,
>>> Richard
>>>
 Thanks,
 Richard.

> Bootstrap and test on x86_64.
>
> Thanks,
> bin
> 2017-08-15  Bin Cheng  
>
> PR tree-optimization/81832
> * tree-ssa-loop-ch.c (inner_loop_distributed_p): New function.
> (pass_ch_vect::process_loop_p): Call above function.
>
> gcc/testsuite
> 2017-08-15  Bin Cheng  
>
> PR tree-optimization/81832
> * gcc.dg/tree-ssa/pr81832.c: New test.
>>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c
>> new file mode 100644
>> index 000..893124e
>> --- /dev/null
>> +++ 

Re: [PATCH PR81832]Skip copying loop header if inner loop is distributed

2017-08-16 Thread Richard Sandiford
"Bin.Cheng"  writes:
> On Tue, Aug 15, 2017 at 6:33 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Tue, Aug 15, 2017 at 11:28 AM, Bin Cheng  wrote:
 Hi,
 This patch fixes PR81832.  Root cause for the ICE is:
   1) Loop has distributed inner loop.
   2) The guarding function call IFN_LOOP_DIST_CALL happens to be in loop's 
 header.
   3) IFN_LOOP_DIST_CALL (int loop's header) is duplicated by pass_ch_vect 
 thus
  not eliminated.

 Given pass_ch_vect copies loop header to enable more vectorization, we 
 should
 skip loop in this case because distributed inner loop means this loop can 
 not
 be vectorized anyway.  One point to mention is name 
 inner_loop_distributed_p
 is a little misleading.  The name indicates that each basic block is 
 checked,
 but the patch only checks loop's header for simplicity/efficiency's 
 purpose.
 Any comment?
>>>
>>> My comment would be to question pass_ch_vect placement -- what was the
>>> reason to place it so late?
>>>
>>> I also see GRAPHITE runs inbetween loop distribution and vectorization --
>>> what prevents GRAPHITE from messing up things here?  Or autopar?
>>>
>>> The patch itself shows should_duplicate_loop_header_p should
>>> handle this IFN specially (somehow all IFNs are considered "inexpensive").
>>>
>>> So can you please adjust should_duplicate_loop_header_p instead and/or
>>> gimple_inexpensive_call_p?  Since we have IFNs for stuff like EXP10 I'm not
>>> sure if that default is so good.
>>
>> I think the default itself is OK: we only use IFNs for libm functions
>> like exp10 if an underlying optab exists (unlike __builtin_exp10, which
>> is useful as the non-errno setting version of exp10), so the target must
>> have something that it thinks is worth open-coding.  Also, we currently
>> treat all MD built-ins as "simple" and thus "inexpensive", so the IFN
>> handling is consistent with that.
>>
>> Maybe there are some IFNs that are worth special-casing as expensive,
>> but IMO doing that to solve this problem would be a bit hacky.  It seems
>> like "inexpensive" should be more of a cost thing than a correctness thing.
> Hi all,

> This is updated patch.  It only adds a single check on IFN_LOOP_DIST_ALIAS
> in function should_duplicate_loop_header_p.

Thanks.

> As for gimple_inexpensive_call_p, I think it's natural to consider
> functions like IFN_LOOP_VECTORIZED and IFN_LOOP_DIST_ALIAS as
> expensive because they are only meant to indicate temporary
> arrangement of optimizations and are never used in code generation.
> I will send a standalone patch for that.

Is that enough to consider them expensive though?  To me, "expensive"
should mean that they cost a lot in terms of size or speed (whichever
is most important in context).  Both functions are really cheap in
that sense, since they eventually expand to constants.

Thanks,
Richard

> Another thing is this patch doesn't check IFN_LOOP_VECTORIZED because
> it's impossible to have it with current order of passes.  Bootstrap
> and test ongoing.  Further comments?
>
> Thanks,
> bin
> 2017-08-15  Bin Cheng  
>
> PR tree-optimization/81832
> * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Don't
> copy loop header which has IFN_LOOP_DIST_ALIAS call.
>
> gcc/testsuite
> 2017-08-15  Bin Cheng  
>
> PR tree-optimization/81832
> * gcc.dg/tree-ssa/pr81832.c: New test.
>
>>
>> Thanks,
>> Richard
>>
>>> Thanks,
>>> Richard.
>>>
 Bootstrap and test on x86_64.

 Thanks,
 bin
 2017-08-15  Bin Cheng  

 PR tree-optimization/81832
 * tree-ssa-loop-ch.c (inner_loop_distributed_p): New function.
 (pass_ch_vect::process_loop_p): Call above function.

 gcc/testsuite
 2017-08-15  Bin Cheng  

 PR tree-optimization/81832
 * gcc.dg/tree-ssa/pr81832.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c
> new file mode 100644
> index 000..893124e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +int a, b, *c;
> +void d(void)
> +{
> +int **e;
> +for(;;)
> +for(int f = 1; f <= 6; f++)
> +{
> +b = 0;
> +if(a)
> +g:
> +while(a++);
> +if (**e);
> +else
> +{
> +*c = a;
> +goto g;
> +}
> +}
> +}
> diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
> index 14cc6d8d..6bb0220 100644
> --- a/gcc/tree-ssa-loop-ch.c
> +++ b/gcc/tree-ssa-loop-ch.c
> @@ -119,7 +119,10 @@ should_duplicate_loop_header_p (basic_block header, 
> struct