RE: [Patch x86_64]: fix order of cost table initialization for -march=znver1.

2016-03-08 Thread Kumar, Venkataramanan
Hi Maintainers,

> -Original Message-
> From: Kumar, Venkataramanan
> Sent: Tuesday, March 08, 2016 7:27 PM
> To: Uros Bizjak (ubiz...@gmail.com); gcc-patches@gcc.gnu.org
> Cc: Richard Beiner (richard.guent...@gmail.com); Kumar, Venkataramanan
> Subject: RE: [Patch x86_64]: fix order of cost table initialization for -
> march=znver1.
> 
> Hi Uros,
> 
> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > ow...@gcc.gnu.org] On Behalf Of Kumar, Venkataramanan
> > Sent: Tuesday, March 08, 2016 7:21 PM
> > To: Uros Bizjak (ubiz...@gmail.com); gcc-patches@gcc.gnu.org
> > Cc: Richard Beiner (richard.guent...@gmail.com)
> > Subject: [Patch x86_64]: fix order of cost table initialization for -
> > march=znver1.
> >
> > Hi Uros,
> >
> > While debugging GCC to see if cost of multiplication for DI mode is set
> > correctly for znver1 target.
> >  I found that the order of cost table insertion is wrong for znver1 and it
> > worked because btver2 had same cost for multiply .
> >
> > The patch corrects the mistake I made.
> >
> > 2016-03-08  Venkataramanan Kumar  
> >
> > *  config/i386/i386.c (processor_target_table): Fix cost table
> > initialization order for znver1.
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index
> > 8a026ae..3d67c65 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -2662,9 +2662,9 @@ static const struct ptt
> > processor_target_table[PROCESSOR_max] =
> >{"bdver2", _cost, 16, 10, 16, 7, 11},
> >{"bdver3", _cost, 16, 10, 16, 7, 11},
> >{"bdver4", _cost, 16, 10, 16, 7, 11},
> > -  {"znver1", _cost, 16, 10, 16, 7, 11},
> >{"btver1", _cost, 16, 10, 16, 7, 11},
> > -  {"btver2", _cost, 16, 10, 16, 7, 11}
> > +  {"btver2", _cost, 16, 10, 16, 7, 11},  {"znver1",
> > + _cost, 16, 10, 16, 7, 11},
> >  };
> >
> > It passes normal bootstrap and bootstrap with BOOT_CFLAGS="-O2 -g -
> > march=znver1 -mno-clzero -mno-sha " on avx2 target.
> >
> > Is it ok for trunk?
> 
> Please find the correct patch below.
> 
> Change Log
> 2016-03-08  Venkataramanan Kumar  
> 
>  *  config/i386/i386.c (processor_target_table): Fix cost table
>  initialization order for znver1.
> 
> 
> snip
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 8a026ae..234327a 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2662,9 +2662,9 @@ static const struct ptt
> processor_target_table[PROCESSOR_max] =
>{"bdver2", _cost, 16, 10, 16, 7, 11},
>{"bdver3", _cost, 16, 10, 16, 7, 11},
>{"bdver4", _cost, 16, 10, 16, 7, 11},
> -  {"znver1", _cost, 16, 10, 16, 7, 11},
>{"btver1", _cost, 16, 10, 16, 7, 11},
> -  {"btver2", _cost, 16, 10, 16, 7, 11}
> +  {"btver2", _cost, 16, 10, 16, 7, 11},
> +  {"znver1", _cost, 16, 10, 16, 7, 11}
>  };
> snip
> 
> Ok for trunk?

Committed the patch to trunk since it as obvious fix. 

https://gcc.gnu.org/viewcvs/gcc?view=revision=234076

regards,
Venkat.
> 
> >
> > Regards,
> > Venkat.



[PATCH, libstdc++] Add missing free-standing headers to install rule

2016-03-08 Thread Bernd Edlinger
Hi,

when the free-standing libstdc++-headers are installed, the C++ header
file  does not always compile, because it includes  and this
includes under certain conditions (__cplusplus >= 201103L &&
ATOMIC_INT_LOCK_FREE > 1) the header file 
but that fails to compile because it needs  which is not installed.
This condition depends on the target, and for instance an arm-eabi
eCos compiler fails to compile  with -mcpu=cortex-a9 and the
default C++ standard option, while it is OK with ARMv4 CPUs.

Therefore this patch adds move.h and concept_check.h to the installed headers,
unconditionally.

I've verified that the  header compiles on an eCos cross compiler.

Boot-strapped and regression-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.2016-03-08  Bernd Edlinger  

	* include/Makefile.am (install-freestanding-headers): Add
	concept_check.h and move.h to the installed headers.
	* include/Makefile.in: Regenerated.

Index: libstdc++-v3/include/Makefile.am
===
--- libstdc++-v3/include/Makefile.am	(revision 234060)
+++ libstdc++-v3/include/Makefile.am	(working copy)
@@ -1331,7 +1331,7 @@
 # libsupc++, so only the others and the sub-includes are copied here.
 install-freestanding-headers:
 	$(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/bits
-	for file in c++0x_warning.h atomic_base.h; do \
+	for file in c++0x_warning.h atomic_base.h concept_check.h move.h; do \
 	  $(INSTALL_DATA) ${glibcxx_srcdir}/include/bits/$${file} $(DESTDIR)${gxx_include_dir}/bits; done
 	$(mkinstalldirs) $(DESTDIR)${host_installdir}
 	for file in ${host_srcdir}/os_defines.h ${host_builddir}/c++config.h \
Index: libstdc++-v3/include/Makefile.in
===
--- libstdc++-v3/include/Makefile.in	(revision 234060)
+++ libstdc++-v3/include/Makefile.in	(working copy)
@@ -1753,7 +1753,7 @@
 # libsupc++, so only the others and the sub-includes are copied here.
 install-freestanding-headers:
 	$(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/bits
-	for file in c++0x_warning.h atomic_base.h; do \
+	for file in c++0x_warning.h atomic_base.h concept_check.h move.h; do \
 	  $(INSTALL_DATA) ${glibcxx_srcdir}/include/bits/$${file} $(DESTDIR)${gxx_include_dir}/bits; done
 	$(mkinstalldirs) $(DESTDIR)${host_installdir}
 	for file in ${host_srcdir}/os_defines.h ${host_builddir}/c++config.h \


Re: [C++] Add -fnull-this-pointer

2016-03-08 Thread Jan Hubicka
Hi,
so I am not sure what is the consensus. Here are my two cents.
I added the code motivated by looking into multiple inheritance code, where
we often wind to

if (this != null)
  foo = this + offset;
else
  foo = NULL;

which is redundant.  I built libreoffice+LTO with and without this change and
get the following

   text data bssdec 
612076873461668  480368 65149723trunk
608495353461612  480496 64791643disabling special code for THIS
605998273461612  480496 64541935-fno-delete-null-pointer-checks

I.e. about 0.6% code size increase for this analysis ;)
This reproduce over other libreoffice's libraries.

While it is not goal of this optimization to increase the size of
libreoffice binary, I would say that the new coe seems surprisingly active
on this codebase (more than I would expect for one liner of this type).
I think it makes sense to shoot towards making this a default and convincing 
users
to clean the codebases.

Given that we already have sanitizer feature for few releases reporitng this (I
believe, did not double check), having flag disabled for this release and 
enabled
for next is only delaying the trouble by a year.

I don't see much extra pain in maintaining this flag and we may document it
as meant for transition and that it will be obsoletted eventually.

If I imagine I was a maintainer of a package that breaks now, I think I would
find an explicit flag in Makefile a decade later more infrmative than having
-fno-delete-null-pointer-checks but no longer being sure if my codebase doesn't
break the rule some other way.

So if we get into consensus that we want flag I will be happy to take care of
the patch (there was suggestion for better name that I can't find right now).
If we decide to get w/o a flag I am happy with current status where the
situation is documented in the changes.html and porting document and there is
well defined workaround.
Honza


Re: [PATCH] Fix PR67278

2016-03-08 Thread Andreas Schwab
Richard Biener  writes:

> Index: gcc/testsuite/gcc.dg/simd-7.c
> ===
> --- gcc/testsuite/gcc.dg/simd-7.c (revision 0)
> +++ gcc/testsuite/gcc.dg/simd-7.c (working copy)
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +
> +#if __SIZEOF_LONG_DOUBLE__ == 16 || __SIZEOF_LONG_DOUBLE__ == 8
> +typedef long double a __attribute__((vector_size (16)));
> +
> +a __attribute__((noinline))
> +sum (a first, a second)
> +{
> +return first + second;
> +}
> +
> +a
> +foo (a x, a y, a z)
> +{
> +  return sum (x, y) + z;
> +}
> +#endif

On powerpc -m32:

FAIL: gcc.dg/simd-7.c (test for excess errors)
Excess errors:
/daten/gcc/gcc-20160307/gcc/testsuite/gcc.dg/simd-7.c:8:1: warning: GCC vector 
returned by reference: non-standard ABI extension with no compatibility 
guarantee
/daten/gcc/gcc-20160307/gcc/testsuite/gcc.dg/simd-7.c:7:1: warning: GCC vector 
passed by reference: non-standard ABI extension with no compatibility guarantee

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


C++ PATCH to unary fold-expression semantics

2016-03-08 Thread Jason Merrill
At Jacksonville the committee adjusted the semantics of C++17 
fold-expressions so that empty expansions are ill-formed for more 
operators.  Since this is a new feature in GCC 6, and the patch is 
trivial, let's update it accordingly.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit d51e35e5c273c0706806777112e77ba395f0070d
Author: Jason Merrill 
Date:   Fri Mar 4 22:27:42 2016 -0500

	P0036R0: Unary Folds and Empty Parameter Packs

	* pt.c (expand_empty_fold): Remove special cases for *,+,&,|.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 515537b..978 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10629,10 +10629,6 @@ gen_elem_of_pack_expansion_instantiation (tree pattern,
sequence, the value of the expression is as follows; the program is
ill-formed if the operator is not listed in this table.
 
-   *	1
-   +	0
-   &	-1
-   |	0
&&	true
||	false
,	void()  */
@@ -10644,14 +10640,6 @@ expand_empty_fold (tree t, tsubst_flags_t complain)
   if (!FOLD_EXPR_MODIFY_P (t))
 switch (code)
   {
-  case MULT_EXPR:
-	return integer_one_node;
-  case PLUS_EXPR:
-	return integer_zero_node;
-  case BIT_AND_EXPR:
-	return integer_minus_one_node;
-  case BIT_IOR_EXPR:
-	return integer_zero_node;
   case TRUTH_ANDIF_EXPR:
 	return boolean_true_node;
   case TRUTH_ORIF_EXPR:
diff --git a/gcc/testsuite/g++.dg/cpp1z/fold1.C b/gcc/testsuite/g++.dg/cpp1z/fold1.C
index 3c33651..510d61a 100644
--- a/gcc/testsuite/g++.dg/cpp1z/fold1.C
+++ b/gcc/testsuite/g++.dg/cpp1z/fold1.C
@@ -22,11 +22,11 @@ MAKE_FNS (add, +);
 MAKE_FNS (sub, -);
 
 int main() {
-  assert(unary_left_add() == 0);
+  // assert(unary_left_add() == 0);
   assert(unary_left_add(1) == 1);
   assert(unary_left_add(1, 2, 3) == 6);
 
-  assert(unary_right_add() == 0);
+  // assert(unary_right_add() == 0);
   assert(unary_right_add(1) == 1);
   assert(unary_right_add(1, 2, 3) == 6);
 
diff --git a/gcc/testsuite/g++.dg/cpp1z/fold3.C b/gcc/testsuite/g++.dg/cpp1z/fold3.C
index 307818f..58d41e6 100644
--- a/gcc/testsuite/g++.dg/cpp1z/fold3.C
+++ b/gcc/testsuite/g++.dg/cpp1z/fold3.C
@@ -47,16 +47,16 @@ MAKE_FN (dot_star, .*);
 MAKE_FN (arrow_star, ->*);
 
 int main() {
-  static_assert(add() == int(), "");
-  static_assert(mul() == 1, "");
-  static_assert(bor() == int(), "");
-  static_assert(band() == -1, "");
   static_assert(land() == true, "");
   static_assert(lor() == false, "");
   comma(); // No value to theck
 
   // These are all errors, but the error is emitted at the point
   // of instantiation (line 10).
+  add();			// { dg-message "required from here" }
+  mul();			// { dg-message "required from here" }
+  bor();			// { dg-message "required from here" }
+  band();			// { dg-message "required from here" }
   sub();			// { dg-message "required from here" }
   div();			// { dg-message "required from here" }
   mod();			// { dg-message "required from here" }


C++ PATCH to remove -fconcepts from -std=c++1z

2016-03-08 Thread Jason Merrill
At the Jacksonville C++ meeting last week the committee decided not to 
integrate the Concepts TS into the C++17 working paper, so I'm removing 
it from -std=c++1z.  This patch also adds diagnostics to guide people 
toward adding -fconcepts for code that needs it.  Concepts will probably 
be in C++20 instead.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 4d03a13cd253a97292a5bf7105c627393cfce081
Author: Jason Merrill 
Date:   Fri Mar 4 22:21:40 2016 -0500

	Remove Concepts from -std=c++1z.

gcc/c-family/
	* c-opts.c (set_std_cxx1z): Don't enable concepts.
gcc/testsuite/
	* lib/g++-dg.exp (g++-dg-runtest): Handle "concepts" in std list.
	* lib/target-supports.exp (check_effective_target_concepts): New.
gcc/cp/
	* parser.c (cp_parser_diagnose_invalid_type_name): Give helpful
	diagnostic for use of "concept".
	(cp_parser_requires_clause_opt): And "requires".
	(cp_parser_type_parameter, cp_parser_late_return_type_opt)
	(cp_parser_explicit_template_declaration): Adjust.
	* Make-lang.in (check-c++-all): Add "concepts" to std list.

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index c2783f7..fec58bc 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -1566,8 +1566,6 @@ set_std_cxx1z (int iso)
   /* C++11 includes the C99 standard library.  */
   flag_isoc94 = 1;
   flag_isoc99 = 1;
-  /* Enable concepts by default. */
-  flag_concepts = 1;
   flag_isoc11 = 1;
   cxx_dialect = cxx1z;
   lang_hooks.name = "GNU C++14"; /* Pretend C++14 till standarization.  */
diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
index 2286c64..8770f6f 100644
--- a/gcc/cp/Make-lang.in
+++ b/gcc/cp/Make-lang.in
@@ -152,7 +152,7 @@ check-c++1z:
 
 # Run the testsuite in all standard conformance levels.
 check-c++-all:
-	$(MAKE) RUNTESTFLAGS="$(RUNTESTFLAGS) --stds=98,11,14,1z" check-g++
+	$(MAKE) RUNTESTFLAGS="$(RUNTESTFLAGS) --stds=98,11,14,1z,concepts" check-g++
 
 # Run the testsuite with garbage collection at every opportunity.
 check-g++-strict-gc:
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 535052f..726d5fc 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -3172,6 +3172,8 @@ cp_parser_diagnose_invalid_type_name (cp_parser *parser, tree id,
 	   && !strcmp (IDENTIFIER_POINTER (id), "thread_local"))
 	inform (location, "C++11 % only available with "
 		"-std=c++11 or -std=gnu++11");
+  else if (!flag_concepts && id == ridpointers[(int)RID_CONCEPT])
+	inform (location, "% only available with -fconcepts");
   else if (processing_template_decl && current_class_type
 	   && TYPE_BINFO (current_class_type))
 	{
@@ -14668,13 +14670,10 @@ cp_parser_type_parameter (cp_parser* parser, bool *is_parameter_pack)
 	cp_parser_require (parser, CPP_GREATER, RT_GREATER);
 
 // If template requirements are present, parse them.
-	if (flag_concepts)
-  {
-tree reqs = get_shorthand_constraints (current_template_parms);
-if (tree r = cp_parser_requires_clause_opt (parser))
-  reqs = conjoin_constraints (reqs, make_predicate_constraint (r));
-TEMPLATE_PARMS_CONSTRAINTS (current_template_parms) = reqs;
-  }
+	tree reqs = get_shorthand_constraints (current_template_parms);
+	if (tree r = cp_parser_requires_clause_opt (parser))
+	  reqs = conjoin_constraints (reqs, make_predicate_constraint (r));
+	TEMPLATE_PARMS_CONSTRAINTS (current_template_parms) = reqs;
 
 	/* Look for the `class' or 'typename' keywords.  */
 	cp_parser_type_parameter_key (parser);
@@ -19745,6 +19744,8 @@ cp_parser_late_return_type_opt (cp_parser* parser, cp_declarator *declarator,
   /* A late-specified return type is indicated by an initial '->'. */
   if (token->type != CPP_DEREF
   && token->keyword != RID_REQUIRES
+  && !(token->type == CPP_NAME
+	   && token->u.value == ridpointers[RID_REQUIRES])
   && !(declare_simd_p || cilk_simd_fn_vector_p || oacc_routine_p))
 return NULL_TREE;
 
@@ -24216,8 +24217,20 @@ cp_parser_requires_clause (cp_parser *parser)
 static tree
 cp_parser_requires_clause_opt (cp_parser *parser)
 {
-  if (!cp_lexer_next_token_is_keyword (parser->lexer, RID_REQUIRES))
-return NULL_TREE;
+  cp_token *tok = cp_lexer_peek_token (parser->lexer);
+  if (tok->keyword != RID_REQUIRES)
+{
+  if (!flag_concepts && tok->type == CPP_NAME
+	  && tok->u.value == ridpointers[RID_REQUIRES])
+	{
+	  error_at (cp_lexer_peek_token (parser->lexer)->location,
+		"% only available with -fconcepts");
+	  /* Parse and discard the requires-clause.  */
+	  cp_lexer_consume_token (parser->lexer);
+	  cp_parser_requires_clause (parser);
+	}
+  return NULL_TREE;
+}
   cp_lexer_consume_token (parser->lexer);
   return cp_parser_requires_clause (parser);
 }
@@ -25608,13 +25621,10 @@ cp_parser_explicit_template_declaration (cp_parser* parser, bool member_p)
   

Re: [PATCH] Fix PR67278, x86 target part

2016-03-08 Thread Andreas Schwab
Richard Biener  writes:

> Index: gcc/testsuite/gcc.dg/simd-8.c
> ===
> --- gcc/testsuite/gcc.dg/simd-8.c (revision 0)
> +++ gcc/testsuite/gcc.dg/simd-8.c (working copy)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +
> +#if __SIZEOF_LONG_DOUBLE__ == 16 || __SIZEOF_LONG_DOUBLE__ == 8
> +typedef long double a __attribute__((vector_size (32)));
> +
> +a __attribute__((noinline))
> +sum (a first, a second)
> +{
> +return first + second;
> +}
> +
> +a
> +foo (a x, a y, a z)
> +{
> +  return sum (x, y) + z;
> +}
> +#else
> +int main() {}
> +#endif

On powerpc:

FAIL: gcc.dg/simd-8.c (test for excess errors)
Excess errors:
/daten/gcc/gcc-20160307/gcc/testsuite/gcc.dg/simd-8.c:8:1: warning: GCC vector 
returned by reference: non-standard ABI extension with no compatibility 
guarantee
/daten/gcc/gcc-20160307/gcc/testsuite/gcc.dg/simd-8.c:7:1: warning: GCC vector 
passed by reference: non-standard ABI extension with no compatibility guarantee

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PR69634] fix debug_insn-inconsistent REG_N_CALLS_CROSSED

2016-03-08 Thread Andreas Schwab
Alexandre Oliva  writes:

> diff --git a/gcc/testsuite/gcc.dg/pr69634.c b/gcc/testsuite/gcc.dg/pr69634.c
> new file mode 100644
> index 000..837bd57
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr69634.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-dce -fschedule-insns -fno-tree-vrp 
> -fcompare-debug" } */
> +/* { dg-additional-options "-Wno-psabi -mno-sse" { target i?86-*-* 
> x86_64-*-* } } */
> +/* { dg-additional-options "-m32" { target x86_64-*-* } } */
> +
> +typedef unsigned short u16;
> +typedef short v16u16 __attribute__ ((vector_size (16)));
> +typedef unsigned v16u32 __attribute__ ((vector_size (16)));
> +typedef unsigned long long v16u64 __attribute__ ((vector_size (16)));
> +
> +u16
> +foo(u16 u16_1, v16u16 v16u16_0, v16u32 v16u64_0, v16u16 v16u16_1, v16u32 
> v16u32_1, v16u64 v16u64_1)

On powerpc -m32:

FAIL: gcc.dg/pr69634.c (test for excess errors)
Excess errors:
/daten/gcc/gcc-20160307/gcc/testsuite/gcc.dg/pr69634.c:11:1: warning: GCC 
vector passed by reference: non-standard ABI extension with no compatibility 
guarantee

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [AArch64] Emit square root using the Newton series

2016-03-08 Thread Evandro Menezes

On 03/08/16 16:08, Evandro Menezes wrote:

On 02/16/16 14:56, Evandro Menezes wrote:

On 12/08/15 15:35, Evandro Menezes wrote:

Emit square root using the Newton series

   2015-12-03  Evandro Menezes  

   gcc/
* config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
   Declare new
function.
* config/aarch64/aarch64-simd.md (sqrt2): New
   expansion and
insn definitions.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
* config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
   new function.
* config/aarch64/aarch64.md (sqrt2): New expansion
   and insn
definitions.
* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
   Expand option
description.
* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.

This patch extends the patch that added support for implementing 
x^-1/2 using the Newton series by adding support for x^1/2 as well.


Is it OK at this point of stage 3?

Thank you,



James,

As I was saying, this patch results in some validation errors in 
CPU2000 benchmarks using DF.  Although proving the algorithm to be 
pretty solid with a vast set of random values, I'm confused why some 
benchmarks fail to validate with this implementation of the Newton 
series for square root too, when they pass with the Newton series for 
reciprocal square root.


Since I had no problems with the same algorithm on x86-64, I wonder 
if the initial estimate on AArch64, which offers just 8 bits, whereas 
x86-64 offers 11 bits, has to do with it.  Then again, the algorithm 
iterated 1 less time on x86-64 than on AArch64.


Since it seems that the initial estimate is sufficient for CPU2000 to 
validate when using SF, I'm leaning towards restricting the Newton 
series for square root only for SF.


Your thoughts on the matter are appreciated,


Add choices for the reciprocal square root approximation

Allow a target to prefer such operation depending on the FP
   precision.

gcc/
* config/aarch64/aarch64-protos.h
(AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask.
(AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise.
* config/aarch64/aarch64.c
(use_rsqrt_p): New argument for the mode.
(aarch64_builtin_reciprocal): Devise mode from builtin.
(aarch64_optab_supported_p): New argument for the mode.


Emit square root using the Newton series

gcc/
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_SQRT_{DF,SF}): New tuning macros.
* config/aarch64/aarch64-protos.h
(aarch64_emit_approx_sqrt): Declare new function.
* config/aarch64/aarch64.c
(aarch64_emit_approx_sqrt): Define new function.
* config/aarch64/aarch64.md
(sqrt*2): New expansion and insn definitions.
* config/aarch64/aarch64-simd.md (sqrt*2): Likewise.
* config/aarch64/aarch64.opt
(mlow-precision-recip-sqrt): Expand option description.
* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.


This patch, which depends on 
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00534.html, leverages the 
reciprocal square root approximation to emit a faster square root 
approximation.


I have however encountered precision issues with DF, namely some 
benchmarks in the SPECfp CPU2000 suite would fail to validate. Perhaps 
the initial estimate, with just 8 bits, is not good enough for the 
series to converge given the workloads of such benchmarks; perhaps 
denormals, known to occur in some of these benchmarks, result in 
errors.  This was the motivation to split the tuning flags between one 
specific for DF and the other, for SF in the previous related patch.


Again, now with the patch attached, your feedback is appreciated.

Thank you,

--
Evandro Menezes

>From 4f61f722f744339650a48aa034906dd685110ae2 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Tue, 8 Mar 2016 15:06:03 -0600
Subject: [PATCH] Emit square root using the Newton series

gcc/
	* config/aarch64/aarch64-tuning-flags.def
	(AARCH64_EXTRA_TUNE_APPROX_SQRT_{DF,SF}): New tuning macros.
	* config/aarch64/aarch64-protos.h
	(aarch64_emit_approx_sqrt): Declare new function.
	* config/aarch64/aarch64.c
	(aarch64_emit_approx_sqrt): Define new function.
	* config/aarch64/aarch64.md
	(sqrt*2): New expansion and insn definitions.
	* config/aarch64/aarch64-simd.md (sqrt*2): Likewise.
	* config/aarch64/aarch64.opt
	(mlow-precision-recip-sqrt): Expand option description.
	* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 

Re: [AArch64] Emit square root using the Newton series

2016-03-08 Thread Evandro Menezes

On 03/08/16 16:08, Evandro Menezes wrote:

On 02/16/16 14:56, Evandro Menezes wrote:

On 12/08/15 15:35, Evandro Menezes wrote:

Emit square root using the Newton series

   2015-12-03  Evandro Menezes  

   gcc/
* config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
   Declare new
function.
* config/aarch64/aarch64-simd.md (sqrt2): New
   expansion and
insn definitions.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
* config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
   new function.
* config/aarch64/aarch64.md (sqrt2): New expansion
   and insn
definitions.
* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
   Expand option
description.
* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.

This patch extends the patch that added support for implementing 
x^-1/2 using the Newton series by adding support for x^1/2 as well.


Is it OK at this point of stage 3?

Thank you,



James,

As I was saying, this patch results in some validation errors in 
CPU2000 benchmarks using DF.  Although proving the algorithm to be 
pretty solid with a vast set of random values, I'm confused why some 
benchmarks fail to validate with this implementation of the Newton 
series for square root too, when they pass with the Newton series for 
reciprocal square root.


Since I had no problems with the same algorithm on x86-64, I wonder 
if the initial estimate on AArch64, which offers just 8 bits, whereas 
x86-64 offers 11 bits, has to do with it.  Then again, the algorithm 
iterated 1 less time on x86-64 than on AArch64.


Since it seems that the initial estimate is sufficient for CPU2000 to 
validate when using SF, I'm leaning towards restricting the Newton 
series for square root only for SF.


Your thoughts on the matter are appreciated,


Add choices for the reciprocal square root approximation

Allow a target to prefer such operation depending on the FP
   precision.

gcc/
* config/aarch64/aarch64-protos.h
(AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask.
(AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise.
* config/aarch64/aarch64.c
(use_rsqrt_p): New argument for the mode.
(aarch64_builtin_reciprocal): Devise mode from builtin.
(aarch64_optab_supported_p): New argument for the mode.


Emit square root using the Newton series

gcc/
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_SQRT_{DF,SF}): New tuning macros.
* config/aarch64/aarch64-protos.h
(aarch64_emit_approx_sqrt): Declare new function.
* config/aarch64/aarch64.c
(aarch64_emit_approx_sqrt): Define new function.
* config/aarch64/aarch64.md
(sqrt*2): New expansion and insn definitions.
* config/aarch64/aarch64-simd.md (sqrt*2): Likewise.
* config/aarch64/aarch64.opt
(mlow-precision-recip-sqrt): Expand option description.
* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.


This patch, which depends on 
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00534.html, leverages the 
reciprocal square root approximation to emit a faster square root 
approximation.


I have however encountered precision issues with DF, namely some 
benchmarks in the SPECfp CPU2000 suite would fail to validate. Perhaps 
the initial estimate, with just 8 bits, is not good enough for the 
series to converge given the workloads of such benchmarks; perhaps 
denormals, known to occur in some of these benchmarks, result in 
errors.  This was the motivation to split the tuning flags between one 
specific for DF and the other, for SF in the previous related patch.


Again, your feedback is appreciated.

Thank you,

--
Evandro Menezes



Re: [PATCH] Fix ICE with vector types in X % -Y pattern (PR middle-end/70050)

2016-03-08 Thread Andreas Schwab
Marek Polacek  writes:

> diff --git gcc/testsuite/gcc.dg/pr70050.c gcc/testsuite/gcc.dg/pr70050.c
> index e69de29..610456f 100644
> --- gcc/testsuite/gcc.dg/pr70050.c
> +++ gcc/testsuite/gcc.dg/pr70050.c
> @@ -0,0 +1,11 @@
> +/* PR middle-end/70025 */
> +/* { dg-do compile } */
> +/* { dg-options "-Wno-psabi" } */
> +
> +typedef int v8si __attribute__ ((vector_size (32)));
> +
> +v8si
> +foo (v8si v)

On powerpc:

FAIL: gcc.dg/pr70050.c (test for excess errors)
Excess errors:
/daten/gcc/gcc-20160307/gcc/testsuite/gcc.dg/pr70050.c:9:1: warning: GCC vector 
returned by reference: non-standard ABI extension with no compatibility 
guarantee
/daten/gcc/gcc-20160307/gcc/testsuite/gcc.dg/pr70050.c:8:1: warning: GCC vector 
passed by reference: non-standard ABI extension with no compatibility guarantee

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [AArch64] Emit square root using the Newton series

2016-03-08 Thread Evandro Menezes

On 02/16/16 14:56, Evandro Menezes wrote:

On 12/08/15 15:35, Evandro Menezes wrote:

Emit square root using the Newton series

   2015-12-03  Evandro Menezes  

   gcc/
* config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
   Declare new
function.
* config/aarch64/aarch64-simd.md (sqrt2): New
   expansion and
insn definitions.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
* config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
   new function.
* config/aarch64/aarch64.md (sqrt2): New expansion
   and insn
definitions.
* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
   Expand option
description.
* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.

This patch extends the patch that added support for implementing 
x^-1/2 using the Newton series by adding support for x^1/2 as well.


Is it OK at this point of stage 3?

Thank you,



James,

As I was saying, this patch results in some validation errors in 
CPU2000 benchmarks using DF.  Although proving the algorithm to be 
pretty solid with a vast set of random values, I'm confused why some 
benchmarks fail to validate with this implementation of the Newton 
series for square root too, when they pass with the Newton series for 
reciprocal square root.


Since I had no problems with the same algorithm on x86-64, I wonder if 
the initial estimate on AArch64, which offers just 8 bits, whereas 
x86-64 offers 11 bits, has to do with it.  Then again, the algorithm 
iterated 1 less time on x86-64 than on AArch64.


Since it seems that the initial estimate is sufficient for CPU2000 to 
validate when using SF, I'm leaning towards restricting the Newton 
series for square root only for SF.


Your thoughts on the matter are appreciated,


Add choices for the reciprocal square root approximation

Allow a target to prefer such operation depending on the FP
   precision.

gcc/
* config/aarch64/aarch64-protos.h
(AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask.
(AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise.
* config/aarch64/aarch64.c
(use_rsqrt_p): New argument for the mode.
(aarch64_builtin_reciprocal): Devise mode from builtin.
(aarch64_optab_supported_p): New argument for the mode.


Now that the patch is attached, feedback is appreciated.

Thank you,


--
Evandro Menezes

>From 0bb413550e854c81cc5ab180a3afdd43cd4faf0b Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Thu, 3 Mar 2016 18:13:46 -0600
Subject: [PATCH] Add choices for the reciprocal square root approximation

Allow a target to prefer such operation depending on the FP precision.

gcc/
	* config/aarch64/aarch64-protos.h
	(AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro.
	* config/aarch64/aarch64-tuning-flags.def
	(AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask.
	(AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise.
	* config/aarch64/aarch64.c
	(use_rsqrt_p): New argument for the mode.
	(aarch64_builtin_reciprocal): Devise mode from builtin.
	(aarch64_optab_supported_p): New argument for the mode.
---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 gcc/config/aarch64/aarch64-tuning-flags.def |  3 ++-
 gcc/config/aarch64/aarch64.c| 23 +++
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index acf2062..ee3505c 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -263,6 +263,9 @@ enum aarch64_extra_tuning_flags
 };
 #undef AARCH64_EXTRA_TUNING_OPTION
 
+#define AARCH64_EXTRA_TUNE_APPROX_RSQRT \
+  (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF | AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF)
+
 extern struct tune_params aarch64_tune_params;
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 7e45a0c..57d9588 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -29,5 +29,6 @@
  AARCH64_TUNE_ to give an enum name. */
 
 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
-AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
+AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT_DF)
+AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrtf", APPROX_RSQRT_SF)
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 801f95a..39a1a47 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7464,12 +7464,16 @@ aarch64_memory_move_cost (machine_mode mode 

Re: [C++ PATCH] Further fix for constexpr SAVE_EXPR loop handling (PR c++/70135)

2016-03-08 Thread Jason Merrill

OK, thanks.

Jason


Re: [PATCH] Require type compatible bases in DDR initialization (PR tree-optimization/70127)

2016-03-08 Thread Richard Biener
On March 8, 2016 7:20:50 PM GMT+01:00, Jakub Jelinek  wrote:
>On Tue, Mar 08, 2016 at 07:11:45PM +0100, Richard Biener wrote:
>> On March 8, 2016 7:04:37 PM GMT+01:00, Jakub Jelinek
> wrote:
>> I believe the safest fix is to re-instantiate the compatibility check
>by refactoring operand_equal_p to perform it on the full ref (but not
>recursions where it would be redundant and maybe too conservative).
>> I've noticed this as well when doing the last operand_equal_p
>surgery, esp. The incomplete and bogus half-way type checking done at
>its top.
>
>Even say for INTEGER_CST vs. INTEGER_CST?  I thought we intentionally
>ignore
>the type there.

Yes, the equality of those is tested before the type checks. That'll make a 
refactoring a little tricky I guess.

Richard.

>Anyway, I can try to cook up some patch with a new OEP_ flag and gather
>some
>statistics on how often such a change would affect things (both in the
>amount of 0 -> 1 returns and 1 -> 0 returns).
>
>   Jakub




[PATCH][wwwdocs] GCC 6 supports musl libc on Linux

2016-03-08 Thread Szabolcs Nagy
I'd like to mention musl libc support in the gcc 6 release notes.

(added under a linux section since only linux targets are supported now.)

Is it ok to commit?

Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.68
diff -u -r1.68 changes.html
--- htdocs/gcc-6/changes.html	3 Mar 2016 10:09:18 -	1.68
+++ htdocs/gcc-6/changes.html	8 Mar 2016 13:57:35 -
@@ -639,6 +639,16 @@
 
 
 
+Linux
+  
+Support for the http://www.musl-libc.org;>musl C library
+was added for the AArch64, ARM, MicroBlaze, MIPS, MIPS64, PowerPC,
+PowerPC64, SH, i386, x32 and x86_64 targets.  It can be selected using the
+new -mmusl option in case musl is not the default libc.  GCC
+defaults to musl libc if it is built with a target triplet matching the
+*-linux-musl* pattern.
+  
+
 RTEMS
   
 The RTEMS thread model implementation changed.  Mutexes now


Re: [PATCH] Require type compatible bases in DDR initialization (PR tree-optimization/70127)

2016-03-08 Thread Jakub Jelinek
On Tue, Mar 08, 2016 at 07:11:45PM +0100, Richard Biener wrote:
> On March 8, 2016 7:04:37 PM GMT+01:00, Jakub Jelinek  wrote:
> I believe the safest fix is to re-instantiate the compatibility check by 
> refactoring operand_equal_p to perform it on the full ref (but not recursions 
> where it would be redundant and maybe too conservative).
> I've noticed this as well when doing the last operand_equal_p surgery, esp. 
> The incomplete and bogus half-way type checking done at its top.

Even say for INTEGER_CST vs. INTEGER_CST?  I thought we intentionally ignore
the type there.

Anyway, I can try to cook up some patch with a new OEP_ flag and gather some
statistics on how often such a change would affect things (both in the
amount of 0 -> 1 returns and 1 -> 0 returns).

Jakub


[C++ PATCH] Further fix for constexpr SAVE_EXPR loop handling (PR c++/70135)

2016-03-08 Thread Jakub Jelinek
Hi!

The following testcases show that the recent cxx_eval_loop_expr
fix wasn't sufficient in certain cases, it works well if the
SAVE_EXPR contains increment of iterator that is initialized to some
constant before the loop, but if we have nested loops and might look up
the SAVE_EXPR again (usually by processing the LOOP_EXPR again), we can
get stale values.

Fixed by forgetting the saved SAVE_EXPR values at the end of the loop too,
not just in between iterations.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-03-08  Jakub Jelinek  

PR c++/70135
* constexpr.c (cxx_eval_loop_expr): Forget saved values of SAVE_EXPRs
even after the last iteration of the loop.

* g++.dg/cpp1y/constexpr-loop4.C: New test.
* g++.dg/ubsan/pr70135.C: New test.

--- gcc/cp/constexpr.c.jj   2016-03-08 09:01:48.0 +0100
+++ gcc/cp/constexpr.c  2016-03-08 16:36:27.192053214 +0100
@@ -3165,21 +3165,21 @@ cxx_eval_loop_expr (const constexpr_ctx
   constexpr_ctx new_ctx = *ctx;
 
   tree body = TREE_OPERAND (t, 0);
-  while (true)
+  do
 {
   hash_set save_exprs;
   new_ctx.save_exprs = _exprs;
 
   cxx_eval_statement_list (_ctx, body,
   non_constant_p, overflow_p, jump_target);
-  if (returns (jump_target) || breaks (jump_target) || *non_constant_p)
-   break;
 
   /* Forget saved values of SAVE_EXPRs.  */
   for (hash_set::iterator iter = save_exprs.begin();
   iter != save_exprs.end(); ++iter)
new_ctx.values->remove (*iter);
 }
+  while (!returns (jump_target) && !breaks (jump_target) && !*non_constant_p);
+
   if (breaks (jump_target))
 *jump_target = NULL_TREE;
 
--- gcc/testsuite/g++.dg/cpp1y/constexpr-loop4.C.jj 2016-03-08 
16:35:01.352224129 +0100
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-loop4.C2016-03-08 
16:34:32.0 +0100
@@ -0,0 +1,27 @@
+// { dg-do compile { target c++14 } }
+
+struct A
+{
+  int i;
+};
+
+constexpr bool f()
+{
+  A ar[5] = { 6, 7, 8, 9, 10 };
+  A *ap = ar;
+  int i = 0, j = 0;
+  for (j = 0; j < 2; j++)
+{
+  do
+   *ap++ = A{i};
+  while (++i < j * 2 + 2);
+}
+  return (ar[0].i == 0
+ && ar[1].i == 1
+ && ar[2].i == 2
+ && ar[3].i == 3
+ && ar[4].i == 10);
+}
+
+#define SA(X) static_assert((X),#X)
+SA(f());
--- gcc/testsuite/g++.dg/ubsan/pr70135.C.jj 2016-03-08 16:16:06.863701979 
+0100
+++ gcc/testsuite/g++.dg/ubsan/pr70135.C2016-03-08 16:17:26.850610633 
+0100
@@ -0,0 +1,36 @@
+// PR c++/70135
+// { dg-do run }
+// { dg-options "-fsanitize=bounds -std=c++14" }
+
+template 
+struct S {
+  static constexpr bool c[] {b...};
+  static constexpr auto foo ()
+  {
+unsigned long n = 0;
+for (unsigned long i = 0; i < sizeof (c); i++)
+  if (!c[i])
+   ++n;
+return n;
+  }
+  static constexpr auto n = foo () + 1;
+  static constexpr auto bar ()
+  {
+int h = 0;
+for (int g = 0, i = 0; g < n; ++g)
+  {
+   while (i < sizeof...(b) && c[i++])
+ ++h;
+   h += 64;
+  }
+return h;
+  }
+};
+
+int
+main ()
+{
+  S  s;
+  constexpr auto c = s.bar ();
+  static_assert (s.bar () == 4 * 64 + 5);
+}

Jakub


Re: [PATCH] Require type compatible bases in DDR initialization (PR tree-optimization/70127)

2016-03-08 Thread Richard Biener
On March 8, 2016 7:04:37 PM GMT+01:00, Jakub Jelinek  wrote:
>Hi!
>
>Honza has removed types_compatible_p call from operand_equal_p of
>*MEM_REF.
>This breaks tree-data-ref.c stuff, as the access fns are built with the
>assumption that they refer to the same stuff (array indices or field
>bit
>offsets) between all DRs whose bases are operand_equal_p, but now that
>it ensures just that they have the same size, one could have a type of
>single element array of records, the other the record itself, or one
>could
>be a record with a single field, another that field's type, and we
>suddenly
>can have DR_ACCESS_FN (, 0) in one DR being array index and bitfield
>offset
>in another one etc.
>
>To me the safest fix looks to be just to revert to what 5.x used to do
>here
>for initialize_data_dependence_relation, i.e. require compatible types,
>perhaps later we could carefully lift up that restriction to either
>not creating access fns for single element arrays (but watch for
>flexible
>array-like arrays) or fields over whole structure side, or compare
>types
>more losely, or compare also sizes of refs themselves, ...

I believe the safest fix is to re-instantiate the compatibility check by 
refactoring operand_equal_p to perform it on the full ref (but not recursions 
where it would be redundant and maybe too conservative).
I've noticed this as well when doing the last operand_equal_p surgery, esp. The 
incomplete and bogus half-way type checking done at its top.

Richard.

>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
>2016-03-08  Jakub Jelinek  
>
>   PR tree-optimization/70127
>   * tree-data-ref.c (initialize_data_dependence_relation): Return
>   chrec_dont_know if types of DR_BASE_OBJECT aren't compatible.
>
>   * gcc.c-torture/execute/pr70127.c: New test.
>
>--- gcc/tree-data-ref.c.jj 2016-01-27 12:38:52.0 +0100
>+++ gcc/tree-data-ref.c2016-03-08 12:01:11.865034378 +0100
>@@ -1539,7 +1539,13 @@ initialize_data_dependence_relation (str
> 
>   /* If the references do not access the same object, we do not know
>  whether they alias or not.  */
>-  if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), 0))
>+  if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), 0)
>+  /* operand_equal_p checked just type sizes; with single element
>+   array types and/or fields that have the same size as containing
>+   struct/union those could match, even when DR_ACCESS_FN count
>+   different things between the two DRs.  See PR70127.  */
>+  || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)),
>+TREE_TYPE (DR_BASE_OBJECT (b
> {
>   DDR_ARE_DEPENDENT (res) = chrec_dont_know;
>   return res;
>--- gcc/testsuite/gcc.c-torture/execute/pr70127.c.jj   2016-03-08
>12:11:11.890835632 +0100
>+++ gcc/testsuite/gcc.c-torture/execute/pr70127.c  2016-03-08
>12:10:58.0 +0100
>@@ -0,0 +1,23 @@
>+/* PR tree-optimization/70127 */
>+
>+struct S { int f; signed int g : 2; } a[1], c = {5, 1}, d;
>+short b;
>+
>+__attribute__((noinline, noclone)) void
>+foo (int x)
>+{
>+  if (x != 1)
>+__builtin_abort ();
>+}
>+
>+int
>+main ()
>+{
>+  while (b++ <= 0)
>+{
>+  struct S e = {1, 1};
>+  d = e = a[0] = c;
>+}
>+  foo (a[0].g);
>+  return 0;
>+}
>
>   Jakub




[PATCH] Require type compatible bases in DDR initialization (PR tree-optimization/70127)

2016-03-08 Thread Jakub Jelinek
Hi!

Honza has removed types_compatible_p call from operand_equal_p of *MEM_REF.
This breaks tree-data-ref.c stuff, as the access fns are built with the
assumption that they refer to the same stuff (array indices or field bit
offsets) between all DRs whose bases are operand_equal_p, but now that
it ensures just that they have the same size, one could have a type of
single element array of records, the other the record itself, or one could
be a record with a single field, another that field's type, and we suddenly
can have DR_ACCESS_FN (, 0) in one DR being array index and bitfield offset
in another one etc.

To me the safest fix looks to be just to revert to what 5.x used to do here
for initialize_data_dependence_relation, i.e. require compatible types,
perhaps later we could carefully lift up that restriction to either
not creating access fns for single element arrays (but watch for flexible
array-like arrays) or fields over whole structure side, or compare types
more losely, or compare also sizes of refs themselves, ...

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-03-08  Jakub Jelinek  

PR tree-optimization/70127
* tree-data-ref.c (initialize_data_dependence_relation): Return
chrec_dont_know if types of DR_BASE_OBJECT aren't compatible.

* gcc.c-torture/execute/pr70127.c: New test.

--- gcc/tree-data-ref.c.jj  2016-01-27 12:38:52.0 +0100
+++ gcc/tree-data-ref.c 2016-03-08 12:01:11.865034378 +0100
@@ -1539,7 +1539,13 @@ initialize_data_dependence_relation (str
 
   /* If the references do not access the same object, we do not know
  whether they alias or not.  */
-  if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), 0))
+  if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), 0)
+  /* operand_equal_p checked just type sizes; with single element
+array types and/or fields that have the same size as containing
+struct/union those could match, even when DR_ACCESS_FN count
+different things between the two DRs.  See PR70127.  */
+  || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)),
+ TREE_TYPE (DR_BASE_OBJECT (b
 {
   DDR_ARE_DEPENDENT (res) = chrec_dont_know;
   return res;
--- gcc/testsuite/gcc.c-torture/execute/pr70127.c.jj2016-03-08 
12:11:11.890835632 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr70127.c   2016-03-08 
12:10:58.0 +0100
@@ -0,0 +1,23 @@
+/* PR tree-optimization/70127 */
+
+struct S { int f; signed int g : 2; } a[1], c = {5, 1}, d;
+short b;
+
+__attribute__((noinline, noclone)) void
+foo (int x)
+{
+  if (x != 1)
+__builtin_abort ();
+}
+
+int
+main ()
+{
+  while (b++ <= 0)
+{
+  struct S e = {1, 1};
+  d = e = a[0] = c;
+}
+  foo (a[0].g);
+  return 0;
+}

Jakub


[committed] Spelling fixes - becuase -> because

2016-03-08 Thread Jakub Jelinek
Hi!

I've noticed a bunch of comment typos, fixed thusly, committed as obvious
after bootstrap/regtest on x86_64-linux and i686-linux.

2016-03-08  Jakub Jelinek  

* ipa-polymorphic-call.c (walk_ssa_copies): Fix spelling
- becuase -> because.
* ipa-reference.c (ignore_module_statics): Likewise.
* cgraph.c (cgraph_node::get_body): Likewise.
* ipa-inline.c (early_inliner): Likewise.
* ipa-devirt.c (types_same_for_odr): Likewise.
* tree-streamer-out.c (pack_ts_type_common_value_fields): Likewise.
* config/i386/i386.h (ACCUMULATE_OUTGOING_ARGS): Likewise.
cp/
* decl.c (duplicate_decls): Fix spelling - becuase -> because.
lto/
* lto-symtab.h (lto_symtab_prevail_decl): Fix spelling
- becuase -> because.

--- gcc/ipa-polymorphic-call.c.jj   2016-01-14 10:49:05.490172201 +0100
+++ gcc/ipa-polymorphic-call.c  2016-03-08 14:43:46.213418101 +0100
@@ -812,7 +812,7 @@ walk_ssa_copies (tree op, hash_set
   ptr = ptr.foo;
 This pattern is implicitly produced for casts to non-primary
 bases.  When doing context analysis, we do not really care
-about the case pointer is NULL, becuase the call will be
+about the case pointer is NULL, because the call will be
 undefined anyway.  */
   if (gimple_code (SSA_NAME_DEF_STMT (op)) == GIMPLE_PHI)
{
--- gcc/ipa-reference.c.jj  2016-01-04 14:55:53.100469813 +0100
+++ gcc/ipa-reference.c 2016-03-08 14:43:50.715356221 +0100
@@ -104,7 +104,7 @@ static splay_tree reference_vars_to_cons
static we are considering.  This is added to the local info when asm
code is found that clobbers all memory.  */
 static bitmap all_module_statics;
-/* Set of all statics that should be ignored becuase they are touched by
+/* Set of all statics that should be ignored because they are touched by
-fno-ipa-reference code.  */
 static bitmap ignore_module_statics;
 
--- gcc/config/i386/i386.h.jj   2016-03-08 09:01:50.870475507 +0100
+++ gcc/config/i386/i386.h  2016-03-08 14:44:05.113158321 +0100
@@ -1621,7 +1621,7 @@ enum reg_class
function prologue should increase the stack frame size by this amount.  
 
In 32bit mode enabling argument accumulation results in about 5% code size
-   growth becuase move instructions are less compact than push.  In 64bit
+   growth because move instructions are less compact than push.  In 64bit
mode the difference is less drastic but visible.  
 
FIXME: Unlike earlier implementations, the size of unwind info seems to
--- gcc/cp/decl.c.jj2016-03-04 23:11:13.096811005 +0100
+++ gcc/cp/decl.c   2016-03-08 14:43:58.017255855 +0100
@@ -2646,7 +2646,7 @@ duplicate_decls (tree newdecl, tree oldd
 
  Before releasing the node, be sore to remove function from symbol
  table that might have been inserted there to record comdat group.
- Be sure to however do not free DECL_STRUCT_FUNCTION becuase this
+ Be sure to however do not free DECL_STRUCT_FUNCTION because this
  structure is shared in between newdecl and oldecl.  */
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
 DECL_STRUCT_FUNCTION (newdecl) = NULL;
--- gcc/cgraph.c.jj 2016-02-12 00:50:55.850885110 +0100
+++ gcc/cgraph.c2016-03-08 14:43:34.382580717 +0100
@@ -3356,7 +3356,7 @@ cgraph_node::get_body (void)
   updated = get_untransformed_body ();
 
   /* Getting transformed body makes no sense for inline clones;
- we should never use this on real clones becuase they are materialized
+ we should never use this on real clones because they are materialized
  early.
  TODO: Materializing clones here will likely lead to smaller LTRANS
  footprint. */
--- gcc/ipa-inline.c.jj 2016-02-22 15:18:35.547649249 +0100
+++ gcc/ipa-inline.c2016-03-08 14:43:43.021461974 +0100
@@ -2688,7 +2688,7 @@ early_inliner (function *fun)
   /* If some always_inline functions was inlined, apply the changes.
 This way we will not account always inline into growth limits and
 moreover we will inline calls from always inlines that we skipped
-previously becuase of conditional above.  */
+previously because of conditional above.  */
   if (inlined)
{
  timevar_push (TV_INTEGRATION);
--- gcc/ipa-devirt.c.jj 2016-02-25 17:04:17.553699186 +0100
+++ gcc/ipa-devirt.c2016-03-08 14:43:38.740520817 +0100
@@ -393,7 +393,7 @@ odr_vtable_hasher::hash (const odr_type_
 
When STRICT is true, we compare types by their names for purposes of
ODR violation warnings.  When strict is false, we consider variants
-   equivalent, becuase it is all that matters for devirtualization machinery.
+   equivalent, because it is all that matters for devirtualization machinery.
 */
 
 bool
--- gcc/tree-streamer-out.c.jj  2016-01-25 12:10:59.006252536 +0100
+++ gcc/tree-streamer-out.c 2016-03-08 14:43:54.384305790 +0100
@@ -325,7 +325,7 @@ 

[PATCH, rs6000] Add support for xxpermr and vpermr instructions

2016-03-08 Thread Kelvin Nilsen


This patch adds support for two new Power9 instructions, xxpermr and 
vpermr, providing more efficient vector permutation operations on 
little-endian configurations. These new instructions are described in 
the Power ISA 3.0 document.  Selection of the new instructions is 
conditioned upon TARGET_P9_VECTOR and !VECTOR_ELT_ORDER_BIG.


The patch has bootstrapped and tested on powerpc64le-unknown-linux-gnu 
and powerpc64-unknown-linux-gnu with no regressions.  Is this ok for GCC 
7 when stage 1 opens?


Thanks.


--
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain
gcc/ChangeLog:

2016-03-07  Kelvin Nilsen  

* config/rs6000/rs6000.c (rs6000_expand_vector_set): If
!BYTES_BIG_ENDIAN and TARGET_P9_VECTOR, expand using template that
translates into new xxpermr or vpermr instructions.
(altivec_expand_vec_perm_le): If TARGET_P9_VECTOR, expand using
template that translates into new xxpermr or vpermr instructions.
* config/rs6000/altivec.md: (UNSPEC_VPERMR): New unspec constant.
(*altivec_vpermr__internal): New insn.

gcc/testsuite/ChangeLog:

2016-03-07  Kelvin Nilsen  

* gcc.target/powerpc/p9-permute.c: Generalize test to run on
big-endian Power9 in addition to little-endian Power9.
* gcc.target/powerpc/p9-vpermr.c: New test.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 233539)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6553,19 +6553,27 @@ rs6000_expand_vector_set (rtx target, rtx val, int
UNSPEC_VPERM);
   else 
 {
-  /* Invert selector.  We prefer to generate VNAND on P8 so
- that future fusion opportunities can kick in, but must
- generate VNOR elsewhere.  */
-  rtx notx = gen_rtx_NOT (V16QImode, force_reg (V16QImode, x));
-  rtx iorx = (TARGET_P8_VECTOR
- ? gen_rtx_IOR (V16QImode, notx, notx)
- : gen_rtx_AND (V16QImode, notx, notx));
-  rtx tmp = gen_reg_rtx (V16QImode);
-  emit_insn (gen_rtx_SET (tmp, iorx));
-
-  /* Permute with operands reversed and adjusted selector.  */
-  x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp),
- UNSPEC_VPERM);
+  if (TARGET_P9_VECTOR)
+   x = gen_rtx_UNSPEC (mode,
+   gen_rtvec (3, target, reg, 
+  force_reg (V16QImode, x)),
+   UNSPEC_VPERMR);
+  else
+   {
+ /* Invert selector.  We prefer to generate VNAND on P8 so
+that future fusion opportunities can kick in, but must
+generate VNOR elsewhere.  */
+ rtx notx = gen_rtx_NOT (V16QImode, force_reg (V16QImode, x));
+ rtx iorx = (TARGET_P8_VECTOR
+ ? gen_rtx_IOR (V16QImode, notx, notx)
+ : gen_rtx_AND (V16QImode, notx, notx));
+ rtx tmp = gen_reg_rtx (V16QImode);
+ emit_insn (gen_rtx_SET (tmp, iorx));
+ 
+ /* Permute with operands reversed and adjusted selector.  */
+ x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp),
+ UNSPEC_VPERM);
+   }
 }
 
   emit_insn (gen_rtx_SET (target, x));
@@ -33421,18 +33429,26 @@ altivec_expand_vec_perm_le (rtx operands[4])
   if (!REG_P (target))
 tmp = gen_reg_rtx (mode);
 
-  /* Invert the selector with a VNAND if available, else a VNOR.
- The VNAND is preferred for future fusion opportunities.  */
-  notx = gen_rtx_NOT (V16QImode, sel);
-  iorx = (TARGET_P8_VECTOR
- ? gen_rtx_IOR (V16QImode, notx, notx)
- : gen_rtx_AND (V16QImode, notx, notx));
-  emit_insn (gen_rtx_SET (norreg, iorx));
+  if (TARGET_P9_VECTOR)
+{
+  unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op0, op1, sel), 
+  UNSPEC_VPERMR);
+}
+  else
+{
+  /* Invert the selector with a VNAND if available, else a VNOR.
+The VNAND is preferred for future fusion opportunities.  */
+  notx = gen_rtx_NOT (V16QImode, sel);
+  iorx = (TARGET_P8_VECTOR
+ ? gen_rtx_IOR (V16QImode, notx, notx)
+ : gen_rtx_AND (V16QImode, notx, notx));
+  emit_insn (gen_rtx_SET (norreg, iorx));
+  
+  /* Permute with operands reversed and adjusted selector.  */
+  unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op1, op0, norreg),
+  UNSPEC_VPERM);
+}
 
-  /* Permute with operands reversed and adjusted selector.  */
-  unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op1, op0, norreg),
-  UNSPEC_VPERM);
-
   /* Copy into target, possibly by way of a register.  */
   if (!REG_P (target))
 {
Index: gcc/config/rs6000/altivec.md

Re: [RFC][ARM,AArch64] Adding crypto Advsimd intrinsics tests

2016-03-08 Thread Christophe Lyon
To illustrate what I mean, in fact we already have similar cases:

On 7 March 2016 at 10:12, Christophe Lyon  wrote:
> Hi,
>
> While preparing the cleanup of neon-testgen.ml, I'm adding the missing
> tests to gcc.target/aarch64/advsimd-intrinsics.
>
> All the *_p64 and *_p128 are currently missing, and I am wondering
> what's the best option. I can think of:
> 1- Update existing tests using #ifdef __ARM_FEATURE_CRYPTO
This somewhat what we currently have in vfma.c/vfms.c (but here in
fact the test is empty if the #ifdef is false)

> 2- Update existing tests without #ifdef, but adding effective_target
> arm_crypto_ok
That's what we have with vqrdmlah.c, testing arm_v8_1a_neon_hw

> 3- Create dedicated tests, either grouping alll p64/p128 in one single
> source, or splitting them in as many source files as there are
> intrinsics.
That's almost what we have in vcvt_f16.c, which also uses effective
target arm_neon_fp16_hw.


> 1- means that we would test different things depending on how GCC is
> configured (--with-fpu)
> 2- means that we would not be able to test the subset which does not
> require crypto if for some reason we cannot force the right effective
> target
> 3- might be a bit more confusing as several places cover the same intrinsics.
>
> Thoughts?
>
> Thanks,
>
> Christophe.


Re: [PATCH 2/2][GCC][ARM] Fix testcases after introduction of Cortex-R8

2016-03-08 Thread Kyrill Tkachov

Hi Andre,

On 08/03/16 11:05, Andre Vieira (lists) wrote:

On 03/03/16 11:28, Kyrill Tkachov wrote:

Hi Andre,

On 02/03/16 12:21, Andre Vieira (lists) wrote:

Hi,

Tests used to check for "r8" which will not work because cortex-r8
string is now included in the assembly. Fixed by checking for "[^\-]r8".

Is this Ok?

Cheers,
Andre

gcc/testsuite/ChangeLog:

2016-03-02  Andre Vieira  

   * gcc.target/arm/pr45701-1.c: Change assembler scan to not
   trigger for cortex-r8, when scanning for register r8.
   * gcc.target/arm/pr45701-2.c: Likewise.

Ok.
Thanks,
Kyrill


Thomas commited on my behalf at revision r234040.

Had to rebase arm-tune.md and invoke.texi, these were all obvious changes.


I'm seeing a DejaGNU error while testing RUNTESTFLAGS="arm.exp=pr45701-*.c":
ERROR: (DejaGnu) proc "^-" does not exist.
The error code is NONE
The info on the error is:
invalid command name "^-"
while executing
"::tcl_unknown ^-"
("uplevel" body line 1)
invoked from within
"uplevel 1 ::tcl_unknown $args"

That's due to the scan-assembler-not test:
/* { dg-final { scan-assembler-not "[^\-]r8" } } */

The '[' and ']' need to be escaped by a backslash.
Can you please post a patch to add the escapes.
Sorry for missing this in the original review...

Kyrill


Cheers,
Andre





[AArch64] Fix dependency of gcc-plugin.h

2016-03-08 Thread Christophe Lyon
Hi,

Our bug report https://bugs.linaro.org/show_bug.cgi?id=2123
complains about aarch64's missing plugin dependency.

IFAIT, the problem is present on trunk too, and the small attached
patch fixes it.
OK?

Thanks,

Christophe.
2016-03-08  Christophe Lyon  

* config/aarch64/t-aarch64 (OPTIONS_H_EXTRA): Add
aarch64-fusion-pairs.def and aarch64-tuning-flags.def
diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index e2c942b..778e15c 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -20,7 +20,9 @@
 
 TM_H += $(srcdir)/config/aarch64/aarch64-cores.def
 OPTIONS_H_EXTRA += $(srcdir)/config/aarch64/aarch64-cores.def \
-  $(srcdir)/config/aarch64/aarch64-arches.def
+  $(srcdir)/config/aarch64/aarch64-arches.def \
+  $(srcdir)/config/aarch64/aarch64-fusion-pairs.def \
+  $(srcdir)/config/aarch64/aarch64-tuning-flags.def
 
 $(srcdir)/config/aarch64/aarch64-tune.md: $(srcdir)/config/aarch64/gentune.sh \
$(srcdir)/config/aarch64/aarch64-cores.def


[PATCH] libcc1: rerun configure when gcc/BASE-VER changes

2016-03-08 Thread Andreas Schwab
This is needed to get gcc_version updated.

Andreas.

* configure.ac (CONFIG_STATUS_DEPENDENCIES): Substitute.
* configure: Regenerate.
* Makefile.in: Regenerate.

diff --git a/libcc1/configure.ac b/libcc1/configure.ac
index 6c97afd..e2e3fda 100644
--- a/libcc1/configure.ac
+++ b/libcc1/configure.ac
@@ -50,6 +50,7 @@ AC_CHECK_DECLS([basename])
 
 gcc_version=`cat $srcdir/../gcc/BASE-VER`
 AC_SUBST(gcc_version)
+AC_SUBST([CONFIG_STATUS_DEPENDENCIES], ['$(top_srcdir)/../gcc/BASE-VER'])
 
 ACX_PROG_CC_WARNING_OPTS([-W -Wall], [WARN_FLAGS])
 AC_SUBST(WARN_FLAGS)
-- 
2.7.2


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


RE: [Patch x86_64]: fix order of cost table initialization for -march=znver1.

2016-03-08 Thread Kumar, Venkataramanan
Hi Uros,

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Kumar, Venkataramanan
> Sent: Tuesday, March 08, 2016 7:21 PM
> To: Uros Bizjak (ubiz...@gmail.com); gcc-patches@gcc.gnu.org
> Cc: Richard Beiner (richard.guent...@gmail.com)
> Subject: [Patch x86_64]: fix order of cost table initialization for -
> march=znver1.
> 
> Hi Uros,
> 
> While debugging GCC to see if cost of multiplication for DI mode is set
> correctly for znver1 target.
>  I found that the order of cost table insertion is wrong for znver1 and it
> worked because btver2 had same cost for multiply .
> 
> The patch corrects the mistake I made.
> 
> 2016-03-08  Venkataramanan Kumar  
> 
> *  config/i386/i386.c (processor_target_table): Fix cost table
> initialization order for znver1.
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index
> 8a026ae..3d67c65 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2662,9 +2662,9 @@ static const struct ptt
> processor_target_table[PROCESSOR_max] =
>{"bdver2", _cost, 16, 10, 16, 7, 11},
>{"bdver3", _cost, 16, 10, 16, 7, 11},
>{"bdver4", _cost, 16, 10, 16, 7, 11},
> -  {"znver1", _cost, 16, 10, 16, 7, 11},
>{"btver1", _cost, 16, 10, 16, 7, 11},
> -  {"btver2", _cost, 16, 10, 16, 7, 11}
> +  {"btver2", _cost, 16, 10, 16, 7, 11},  {"znver1",
> + _cost, 16, 10, 16, 7, 11},
>  };
> 
> It passes normal bootstrap and bootstrap with BOOT_CFLAGS="-O2 -g -
> march=znver1 -mno-clzero -mno-sha " on avx2 target.
> 
> Is it ok for trunk?

Please find the correct patch below.

Change Log
2016-03-08  Venkataramanan Kumar  
 
 *  config/i386/i386.c (processor_target_table): Fix cost table
 initialization order for znver1.


snip
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8a026ae..234327a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2662,9 +2662,9 @@ static const struct ptt 
processor_target_table[PROCESSOR_max] =
   {"bdver2", _cost, 16, 10, 16, 7, 11},
   {"bdver3", _cost, 16, 10, 16, 7, 11},
   {"bdver4", _cost, 16, 10, 16, 7, 11},
-  {"znver1", _cost, 16, 10, 16, 7, 11},
   {"btver1", _cost, 16, 10, 16, 7, 11},
-  {"btver2", _cost, 16, 10, 16, 7, 11}
+  {"btver2", _cost, 16, 10, 16, 7, 11},
+  {"znver1", _cost, 16, 10, 16, 7, 11}
 };
snip

Ok for trunk? 

> 
> Regards,
> Venkat.



[Patch x86_64]: fix order of cost table initialization for -march=znver1.

2016-03-08 Thread Kumar, Venkataramanan
Hi Uros,

While debugging GCC to see if cost of multiplication for DI mode is set 
correctly for znver1 target.
 I found that the order of cost table insertion is wrong for znver1 and it 
worked because btver2 had same cost for multiply .

The patch corrects the mistake I made.

2016-03-08  Venkataramanan Kumar  

*  config/i386/i386.c (processor_target_table): Fix cost table 
initialization order for znver1.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8a026ae..3d67c65 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2662,9 +2662,9 @@ static const struct ptt 
processor_target_table[PROCESSOR_max] =
   {"bdver2", _cost, 16, 10, 16, 7, 11},
   {"bdver3", _cost, 16, 10, 16, 7, 11},
   {"bdver4", _cost, 16, 10, 16, 7, 11},
-  {"znver1", _cost, 16, 10, 16, 7, 11},
   {"btver1", _cost, 16, 10, 16, 7, 11},
-  {"btver2", _cost, 16, 10, 16, 7, 11}
+  {"btver2", _cost, 16, 10, 16, 7, 11},
+  {"znver1", _cost, 16, 10, 16, 7, 11},
 };

It passes normal bootstrap and bootstrap with BOOT_CFLAGS="-O2 -g -march=znver1 
-mno-clzero -mno-sha " on avx2 target. 

Is it ok for trunk?

Regards,
Venkat.



[PATCH][obvious] Fix typo in tree-ssa-math-opts.c

2016-03-08 Thread Kyrill Tkachov

Committed as obvious.

Thanks,
Kyrill

2016-03-08  Kyrylo Tkachov  

* tree-ssa-math-opts.c: Fix typo in comment.
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 2215b4dc709213730a92b533f8774464a36efaf4..4626022b8b81c74e72d808d63d4c4ed4e7ea963a 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -42,7 +42,7 @@ along with GCC; see the file COPYING3.  If not see
 
First of all, with some experiments it was found out that the
transformation is not always useful if there are only two divisions
-   hy the same divisor.  This is probably because modern processors
+   by the same divisor.  This is probably because modern processors
can pipeline the divisions; on older, in-order processors it should
still be effective to optimize two divisions by the same number.
We make this a param, and it shall be called N in the remainder of


[PATCH 1/2] S/390: Define macros for rounding mode constants

2016-03-08 Thread Andreas Krebbel
This patch replaces a few magic numbers used for floating point
rounding modes with macros.  This is mostly a NoOp change apart from:

fixuns_truncdddi2, fixuns_trunctddi2, fixuns_truncsi2: Replace 5
with DFP_RND_TOWARD_0 (which is 9).

5 as well as 9 represent round towards 0 with the difference that for
5 the new DFP quantum exception is enabled as well.  This exception
isn't IEEE754 and we do not have an interface to enable and test it
anyway.  So we do not intend to enable it.  This so far should not
have any noticable effect since the quantum exception was not
observable through the Posix functions.

Some pattern ("fix_truncdi2") is already using rounding mode 9
correctly.

gcc/ChangeLog:

2016-03-08  Andreas Krebbel  

* config/s390/s390.md (BFP_RND_*, DFP_RND_*): Add new constant
definitions for BFP and DFP rounding modes.
("fixuns_truncdddi2", "fixuns_trunctddi2")
("fixuns_trunc2", "fixuns_truncsi2")
("fix_trunc2", "fix_truncdi2")
("fix_trunctf2"): Use the new constants instead of magic
numbers.
---
 gcc/config/s390/s390.md | 63 +++--
 1 file changed, 50 insertions(+), 13 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 99974f9..185a3f8 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -338,6 +338,39 @@
(VR31_REGNUM 53)
   ])
 
+; Rounding modes for binary floating point numbers
+(define_constants
+  [(BFP_RND_CURRENT 0)
+   (BFP_RND_NEAREST_TIE_AWAY_FROM_0 1)
+   (BFP_RND_PREP_FOR_SHORT_PREC 3)
+   (BFP_RND_NEAREST_TIE_TO_EVEN 4)
+   (BFP_RND_TOWARD_05)
+   (BFP_RND_TOWARD_INF  6)
+   (BFP_RND_TOWARD_MINF 7)])
+
+; Rounding modes for decimal floating point numbers
+; 1-7 were introduced with the floating point extension facility
+; available with z196
+; With these rounding modes (1-7) a quantum exception might occur
+; which is suppressed for the other modes.
+(define_constants
+  [(DFP_RND_CURRENT  0)
+   (DFP_RND_NEAREST_TIE_AWAY_FROM_0_QUANTEXC 1)
+   (DFP_RND_CURRENT_QUANTEXC 2)
+   (DFP_RND_PREP_FOR_SHORT_PREC_QUANTEXC 3)
+   (DFP_RND_NEAREST_TIE_TO_EVEN_QUANTEXC 4)
+   (DFP_RND_TOWARD_0_QUANTEXC5)
+   (DFP_RND_TOWARD_INF_QUANTEXC  6)
+   (DFP_RND_TOWARD_MINF_QUANTEXC 7)
+   (DFP_RND_NEAREST_TIE_TO_EVEN  8)
+   (DFP_RND_TOWARD_0 9)
+   (DFP_RND_TOWARD_INF  10)
+   (DFP_RND_TOWARD_MINF 11)
+   (DFP_RND_NEAREST_TIE_AWAY_FROM_0 12)
+   (DFP_RND_NEAREST_TIE_TO_013)
+   (DFP_RND_AWAY_FROM_0 14)
+   (DFP_RND_PREP_FOR_SHORT_PREC 15)])
+
 ;;
 ;; PFPO GPR0 argument format
 ;;
@@ -4482,7 +4515,7 @@
   [(parallel
 [(set (match_operand:DI 0 "register_operand" "")
  (unsigned_fix:DI (match_operand:DD 1 "register_operand" "")))
- (unspec:DI [(const_int 5)] UNSPEC_ROUND)
+ (unspec:DI [(const_int DFP_RND_TOWARD_0)] UNSPEC_ROUND)
  (clobber (reg:CC CC_REGNUM))])]
 
   "TARGET_HARD_DFP"
@@ -4507,11 +4540,13 @@
LT, NULL_RTX, VOIDmode, 0, label1);
   emit_insn (gen_subtd3 (temp, temp,
const_double_from_real_value (sub, TDmode)));
-  emit_insn (gen_fix_trunctddi2_dfp (operands[0], temp, GEN_INT (11)));
+  emit_insn (gen_fix_trunctddi2_dfp (operands[0], temp,
+GEN_INT (DFP_RND_TOWARD_MINF)));
   emit_jump (label2);
 
   emit_label (label1);
-  emit_insn (gen_fix_truncdddi2_dfp (operands[0], operands[1], GEN_INT 
(9)));
+  emit_insn (gen_fix_truncdddi2_dfp (operands[0], operands[1],
+GEN_INT (DFP_RND_TOWARD_0)));
   emit_label (label2);
   DONE;
 }
@@ -4521,7 +4556,7 @@
   [(parallel
 [(set (match_operand:DI 0 "register_operand" "")
  (unsigned_fix:DI (match_operand:TD 1 "register_operand" "")))
- (unspec:DI [(const_int 5)] UNSPEC_ROUND)
+ (unspec:DI [(const_int DFP_RND_TOWARD_0)] UNSPEC_ROUND)
  (clobber (reg:CC CC_REGNUM))])]
 
   "TARGET_HARD_DFP"
@@ -4542,11 +4577,13 @@
LT, NULL_RTX, VOIDmode, 0, label1);
   emit_insn (gen_subtd3 (temp, operands[1],
const_double_from_real_value (sub, TDmode)));
-  emit_insn (gen_fix_trunctddi2_dfp (operands[0], temp, GEN_INT (11)));
+  emit_insn (gen_fix_trunctddi2_dfp (operands[0], temp,
+GEN_INT (DFP_RND_TOWARD_MINF)));
   emit_jump (label2);
 
   emit_label (label1);
-  emit_insn (gen_fix_trunctddi2_dfp (operands[0], operands[1], GEN_INT 
(9)));
+  emit_insn (gen_fix_trunctddi2_dfp (operands[0], operands[1],
+GEN_INT (DFP_RND_TOWARD_0)));
   emit_label (label2);
   DONE;

[PATCH 0/2] Fix TD->SD conversion issue

2016-03-08 Thread Andreas Krebbel
This fixes a rounding mode issue recently noticed when working libdfp.

Please see the patch descriptions for more details.

Andreas Krebbel (2):
  S/390: Define macros for rounding mode constants
  S/390: Fix rounding for _Decimal128 to _Decimal32 conversion

 gcc/config/s390/s390.md   | 80 ---
 gcc/testsuite/gcc.target/s390/dfp-1.c | 23 ++
 2 files changed, 87 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/dfp-1.c

-- 
1.9.1



[PATCH 2/2] S/390: Fix rounding for _Decimal128 to _Decimal32 conversion

2016-03-08 Thread Andreas Krebbel
We do not have a direct conversion instruction from 128 bit DFP to 32
bit DFP so this needs to be done in two steps.  The first needs to be
done with the "prepare for shorter precision rounding mode" in order
to produce a correct result.

2016-03-08  Andreas Krebbel  

* config/s390/s390.md ("trunctddd2"): Turn former define_insn into
define_expand.
("*trunctddd2"): New pattern definition.
("trunctdsd2"): Set prep_for_short_prec rounding mode for the
TD->DD truncation.

gcc/testsuite/ChangeLog:

2016-03-08  Andreas Krebbel  

* gcc.target/s390/dfp-1.c: New test.
---
 gcc/config/s390/s390.md   | 17 ++---
 gcc/testsuite/gcc.target/s390/dfp-1.c | 23 +++
 2 files changed, 37 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/dfp-1.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 185a3f8..5a9f1c8 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -4847,12 +4847,22 @@
 ; trunctddd2 and truncddsd2 instruction pattern(s).
 ;
 
-(define_insn "trunctddd2"
+
+(define_expand "trunctddd2"
+  [(parallel
+[(set (match_operand:DD 0 "register_operand" "")
+ (float_truncate:DD (match_operand:TD 1 "register_operand" "")))
+ (unspec:DI [(const_int DFP_RND_CURRENT)] UNSPEC_ROUND)
+ (clobber (scratch:TD))])]
+  "TARGET_HARD_DFP")
+
+(define_insn "*trunctddd2"
   [(set (match_operand:DD 0 "register_operand" "=f")
(float_truncate:DD (match_operand:TD 1 "register_operand" "f")))
-   (clobber (match_scratch:TD 2 "=f"))]
+   (unspec:DI [(match_operand:DI 2 "const_mask_operand" "I")] UNSPEC_ROUND)
+   (clobber (match_scratch:TD 3 "=f"))]
   "TARGET_HARD_DFP"
-  "ldxtr\t%2,0,%1,0\;ldr\t%0,%2"
+  "ldxtr\t%3,%2,%1,0\;ldr\t%0,%3"
   [(set_attr "length"  "6")
(set_attr "type""ftruncdd")])
 
@@ -4868,6 +4878,7 @@
   [(parallel
 [(set (match_dup 3)
  (float_truncate:DD (match_operand:TD 1 "register_operand" "")))
+ (unspec:DI [(const_int DFP_RND_PREP_FOR_SHORT_PREC)] UNSPEC_ROUND)
  (clobber (match_scratch:TD 2 ""))])
(set (match_operand:SD 0 "register_operand" "")
(float_truncate:SD (match_dup 3)))]
diff --git a/gcc/testsuite/gcc.target/s390/dfp-1.c 
b/gcc/testsuite/gcc.target/s390/dfp-1.c
new file mode 100644
index 000..109d9fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/dfp-1.c
@@ -0,0 +1,23 @@
+/* We do not have a direct conversion instruction from 128 bit DFP to
+   32 bit DFP so this needs to be done in two steps.  The first needs
+   to be done with the "prepare for shorter precision rounding mode"
+   in order to produce a correct result.  Otherwise the 8th digit of
+   the number will change from 4 to 5 in the first rounding step which
+   then will turn the last digit of the 32 bit DFP number (the 3) into
+   a 4.  Although with direct rounding it would stay a 3.  */
+
+/* { dg-do run } */
+/* { dg-options "-O3 -march=z10 -mzarch" } */
+
+_Decimal32 __attribute__((noinline))
+foo (_Decimal128 a)
+{
+  return (_Decimal32)a;
+}
+
+int
+main ()
+{
+if (foo (1.234563499DL) != 1.234563DF)
+__builtin_abort ();
+}
-- 
1.9.1



[PATCH] S/390: Rename shift_count_or_setmem_operand to setmem_operand

2016-03-08 Thread Andreas Krebbel
The shift_count_or_setmem_operand predicate is now only used for
setmem patterns anymore.  Rename it together with the related
functions.

2016-03-08  Andreas Krebbel  

* config/s390/constraints.md: Adjust comment.
("Y"): Adjust comment.  Rename s390_decompose_shift_count to
s390_decompose_addrstyle_without_index.
* config/s390/predicates.md (shift_count_or_setmem_operand):
Rename to setmem_operand.
* config/s390/s390-protos.h
(s390_decompose_shift_count): Rename to
s390_decompose_addrstyle_without_index.
* config/s390/s390.c (s390_decompose_shift_count)
(s390_mem_constraint, print_shift_count_operand)
(print_operand_address, print_operand): Rename
s390_decompose_shift_count to
s390_decompose_addrstyle_without_index and rename
print_shift_count_operand to print_addrstyle_operand troughout the
file.
* config/s390/s390.md ("setmem_long_", "*setmem_long")
("*setmem_long_and", "*setmem_long_31z", "*setmem_long_and_31z"):
Rename shift_count_or_setmem_operand to setmem_operand.
* config/s390/vx-builtins.md ("vec_insert")
("vec_promote"): Replace shift_count_or_setmem_operand with
nonmemory_operand.
---
 gcc/config/s390/constraints.md | 10 +-
 gcc/config/s390/predicates.md  |  4 ++--
 gcc/config/s390/s390-protos.h  |  3 ++-
 gcc/config/s390/s390.c | 22 +-
 gcc/config/s390/s390.md| 12 ++--
 gcc/config/s390/vx-builtins.md |  4 ++--
 6 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/gcc/config/s390/constraints.md b/gcc/config/s390/constraints.md
index 60a7edf..7857700 100644
--- a/gcc/config/s390/constraints.md
+++ b/gcc/config/s390/constraints.md
@@ -79,7 +79,7 @@
 ;; does *not* refer to a literal pool entry.
 ;;U -- Pointer with short displacement. (deprecated - use ZQZR)
 ;;W -- Pointer with long displacement. (deprecated - use ZSZT)
-;;Y -- Shift count operand.
+;;Y -- Address style operand without index.
 ;;ZQ -- Pointer without index register and with short displacement.
 ;;ZR -- Pointer with index register and short displacement.
 ;;ZS -- Pointer without index register but with long displacement.
@@ -189,12 +189,12 @@
 
 
 (define_address_constraint "Y"
-  "Shift count operand"
+  "Address style operand without index register"
 
-;; Simply check for the basic form of a shift count.  Reload will
-;; take care of making sure we have a proper base register.
+;; Simply check for base + offset style operands.  Reload will take
+;; care of making sure we have a proper base register.
 
-  (match_test "s390_decompose_shift_count (op, NULL, NULL)"  ))
+  (match_test "s390_decompose_addrstyle_without_index (op, NULL, NULL)"  ))
 
 
 ;;N -- Multiple letter constraint followed by 4 parameter letters.
diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index fefefb3..e66f4a4 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -87,7 +87,7 @@
 
 ;; Return true if OP is a valid operand as scalar shift count or setmem.
 
-(define_predicate "shift_count_or_setmem_operand"
+(define_predicate "setmem_operand"
   (match_code "reg, subreg, plus, const_int")
 {
   HOST_WIDE_INT offset;
@@ -98,7 +98,7 @@
 return false;
 
   /* Extract base register and offset.  */
-  if (!s390_decompose_shift_count (op, , ))
+  if (!s390_decompose_addrstyle_without_index (op, , ))
 return false;
 
   /* Don't allow any non-base hard registers.  Doing so without
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 792eaa7..2ccf0bb 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -139,7 +139,8 @@ extern rtx_insn *s390_load_got (void);
 extern rtx s390_get_thread_pointer (void);
 extern void s390_emit_tpf_eh_return (rtx);
 extern bool s390_legitimate_address_without_index_p (rtx);
-extern bool s390_decompose_shift_count (rtx, rtx *, HOST_WIDE_INT *);
+extern bool s390_decompose_addrstyle_without_index (rtx, rtx *,
+   HOST_WIDE_INT *);
 extern int s390_branch_condition_mask (rtx);
 extern int s390_compare_and_branch_condition_mask (rtx);
 extern bool s390_extzv_shift_ok (int, int, unsigned HOST_WIDE_INT);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 8924367..4f219be 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -2982,13 +2982,16 @@ s390_decompose_address (rtx addr, struct s390_address 
*out)
   return true;
 }
 
-/* Decompose a RTL expression OP for a shift count into its components,
-   and return the base register in BASE and the offset in OFFSET.
+/* Decompose a RTL expression OP for an address style operand into its
+   components, and return the base register in BASE and the offset in
+   OFFSET.  While OP looks like an address it is 

Re: [PATCH 2/2][GCC][ARM] Fix testcases after introduction of Cortex-R8

2016-03-08 Thread Andre Vieira (lists)
On 03/03/16 11:28, Kyrill Tkachov wrote:
> Hi Andre,
> 
> On 02/03/16 12:21, Andre Vieira (lists) wrote:
>> Hi,
>>
>> Tests used to check for "r8" which will not work because cortex-r8
>> string is now included in the assembly. Fixed by checking for "[^\-]r8".
>>
>> Is this Ok?
>>
>> Cheers,
>> Andre
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-03-02  Andre Vieira  
>>
>>   * gcc.target/arm/pr45701-1.c: Change assembler scan to not
>>   trigger for cortex-r8, when scanning for register r8.
>>   * gcc.target/arm/pr45701-2.c: Likewise.
> 
> Ok.
> Thanks,
> Kyrill
> 
Thomas commited on my behalf at revision r234040.

Had to rebase arm-tune.md and invoke.texi, these were all obvious changes.

Cheers,
Andre