Re: [PATCH] Use two source permute for vector initialization (PR 85692, take 2)

2018-05-10 Thread Allan Sandfeld Jensen
On Donnerstag, 10. Mai 2018 09:57:22 CEST Jakub Jelinek wrote:
> On Wed, May 09, 2018 at 04:53:19PM +0200, Allan Sandfeld Jensen wrote:
> > > > @@ -2022,8 +2022,9 @@ simplify_vector_constructor
> > > > (gimple_stmt_iterator
> > > > *gsi)>
> > > > 
> > > >elem_type = TREE_TYPE (type);
> > > >elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
> > > > 
> > > > -  vec_perm_builder sel (nelts, nelts, 1);
> > > > -  orig = NULL;
> > > > +  vec_perm_builder sel (nelts, 2, nelts);
> > > 
> > > Why this change?  I admit the vec_parm_builder arguments are confusing,
> > > but
> > > I think the second times third is the number of how many indices are
> > > being
> > > pushed into the vector, so I think (nelts, nelts, 1) is right.
> > 
> > I had the impression it was what was selected from. In any case, I changed
> > it because without I get crash when vec_perm_indices is created later
> > with a possible nparms of 2.
> 
> The documentation is apparently in vector-builder.h:
>This class is a wrapper around auto_vec for building vectors of T.
>It aims to encode each vector as npatterns interleaved patterns,
>where each pattern represents a sequence:
> 
>  { BASE0, BASE1, BASE1 + STEP, BASE1 + STEP*2, BASE1 + STEP*3, ... }
> 
>The first three elements in each pattern provide enough information
>to derive the other elements.  If all patterns have a STEP of zero,
>we only need to encode the first two elements in each pattern.
>If BASE1 is also equal to BASE0 for all patterns, we only need to
>encode the first element in each pattern.  The number of encoded
>elements per pattern is given by nelts_per_pattern.
> 
>The class can be used in two ways:
> 
>1. It can be used to build a full image of the vector, which is then
>   canonicalized by finalize ().  In this case npatterns is initially
>   the number of elements in the vector and nelts_per_pattern is
>   initially 1.
> 
>2. It can be used to build a vector that already has a known encoding.
>   This is preferred since it is more efficient and copes with
>   variable-length vectors.  finalize () then canonicalizes the encoding
>   to a simpler form if possible.
> 
> As the vector is constant width and we are building the full image of the
> vector, the right arguments are (nelts, nelts, 1) as per 1. above, and the
> finalization can perhaps change it to something more compact.
> 
> > > (and sorry for missing your patch first, the PR wasn't ASSIGNED and
> > > there
> > > was no link to gcc-patches for it).
> > 
> > It is okay. You are welcome to take it over. I am not a regular gcc
> > contributor and thus not well-versed in the details, only the basic logic
> > of how things work.
> 
> Ok, here is my version of the patch.  Bootstrapped/regtested on x86_64-linux
> and i686-linux, ok for trunk?
> 
Looks good to me if that counts for anything.

'Allan




[Patch] Use two source permute for vector initialization (PR 85692)

2018-05-08 Thread Allan Sandfeld Jensen
I have tried to fix PR85692 that I opened.

2018-05-08  Allan Sandfeld Jense 

PR tree-optimization/85692
* tree-ssa-forwprop.c (simplify_vector_constructor): Detect
  two source permute operations as well.
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 58ec6b47a5b..fbee8064160 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -2004,7 +2004,7 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
   gimple *def_stmt;
-  tree op, op2, orig, type, elem_type;
+  tree op, op2, orig1, orig2, type, elem_type;
   unsigned elem_size, i;
   unsigned HOST_WIDE_INT nelts;
   enum tree_code code, conv_code;
@@ -2022,8 +2022,9 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
 
-  vec_perm_builder sel (nelts, nelts, 1);
-  orig = NULL;
+  vec_perm_builder sel (nelts, 2, nelts);
+  orig1 = NULL;
+  orig2 = NULL;
   conv_code = ERROR_MARK;
   maybe_ident = true;
   FOR_EACH_VEC_SAFE_ELT (CONSTRUCTOR_ELTS (op), i, elt)
@@ -2063,10 +2064,26 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	return false;
   op1 = gimple_assign_rhs1 (def_stmt);
   ref = TREE_OPERAND (op1, 0);
-  if (orig)
+  if (orig1)
 	{
-	  if (ref != orig)
-	return false;
+	  if (ref == orig1 || orig2)
+	{
+	  if (ref != orig1 && ref != orig2)
+	return false;
+	}
+	  else
+	{
+	  if (TREE_CODE (ref) != SSA_NAME)
+		return false;
+	  if (! VECTOR_TYPE_P (TREE_TYPE (ref))
+		  || ! useless_type_conversion_p (TREE_TYPE (op1),
+		  TREE_TYPE (TREE_TYPE (ref
+		return false;
+	  if (TREE_TYPE (orig1) != TREE_TYPE (ref))
+		return false;
+	  orig2 = ref;
+	  maybe_ident = false;
+	  }
 	}
   else
 	{
@@ -2076,12 +2093,14 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	  || ! useless_type_conversion_p (TREE_TYPE (op1),
 	  TREE_TYPE (TREE_TYPE (ref
 	return false;
-	  orig = ref;
+	  orig1 = ref;
 	}
   unsigned int elt;
   if (maybe_ne (bit_field_size (op1), elem_size)
 	  || !constant_multiple_p (bit_field_offset (op1), elem_size, ))
 	return false;
+  if (orig2 && ref == orig2)
+	elt += nelts;
   if (elt != i)
 	maybe_ident = false;
   sel.quick_push (elt);
@@ -2089,14 +2108,17 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   if (i < nelts)
 return false;
 
-  if (! VECTOR_TYPE_P (TREE_TYPE (orig))
+  if (! VECTOR_TYPE_P (TREE_TYPE (orig1))
   || maybe_ne (TYPE_VECTOR_SUBPARTS (type),
-		   TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig
+		   TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig1
 return false;
 
+  if (!orig2)
+orig2 = orig1;
+
   tree tem;
   if (conv_code != ERROR_MARK
-  && (! supportable_convert_operation (conv_code, type, TREE_TYPE (orig),
+  && (! supportable_convert_operation (conv_code, type, TREE_TYPE (orig1),
 	   , _code)
 	  || conv_code == CALL_EXPR))
 return false;
@@ -2104,16 +2126,16 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   if (maybe_ident)
 {
   if (conv_code == ERROR_MARK)
-	gimple_assign_set_rhs_from_tree (gsi, orig);
+	gimple_assign_set_rhs_from_tree (gsi, orig1);
   else
-	gimple_assign_set_rhs_with_ops (gsi, conv_code, orig,
+	gimple_assign_set_rhs_with_ops (gsi, conv_code, orig1,
 	NULL_TREE, NULL_TREE);
 }
   else
 {
   tree mask_type;
 
-  vec_perm_indices indices (sel, 1, nelts);
+  vec_perm_indices indices (sel, 2, nelts);
   if (!can_vec_perm_const_p (TYPE_MODE (type), indices))
 	return false;
   mask_type
@@ -2125,15 +2147,14 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	return false;
   op2 = vec_perm_indices_to_tree (mask_type, indices);
   if (conv_code == ERROR_MARK)
-	gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig, orig, op2);
+	gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig1, orig2, op2);
   else
 	{
 	  gimple *perm
-	= gimple_build_assign (make_ssa_name (TREE_TYPE (orig)),
-   VEC_PERM_EXPR, orig, orig, op2);
-	  orig = gimple_assign_lhs (perm);
+	= gimple_build_assign (make_ssa_name (TREE_TYPE (orig1)),
+   VEC_PERM_EXPR, orig1, orig2, op2);
 	  gsi_insert_before (gsi, perm, GSI_SAME_STMT);
-	  gimple_assign_set_rhs_with_ops (gsi, conv_code, orig,
+	  gimple_assign_set_rhs_with_ops (gsi, conv_code, gimple_assign_lhs (perm),
 	  NULL_TREE, NULL_TREE);
 	}
 }


Re: [Patch] Use two source permute for vector initialization (PR 85692)

2018-05-08 Thread Allan Sandfeld Jensen
On Dienstag, 8. Mai 2018 12:42:33 CEST Richard Biener wrote:
> On Tue, May 8, 2018 at 12:37 PM, Allan Sandfeld Jensen
> 
> <li...@carewolf.com> wrote:
> > I have tried to fix PR85692 that I opened.
> 
> Please add a testcase as well.  It also helps if you shortly tell what
> the patch does
> in your mail.
> 
Okay. I have updated the patch with a test-case based on my motivating 
examples. The patch just extends patching a vector construction to not just a 
single source permute instruction, but also a two source permute instruction.
commit 15c0f6a933d60b085416a59221851b604b955958
Author: Allan Sandfeld Jensen <allan.jen...@qt.io>
Date:   Tue May 8 13:16:18 2018 +0200

Try two source permute for vector construction

simplify_vector_constructor() was detecting when vector construction could
be implemented as a single source permute, but was not detecting when
it could be implemented as a double source permute. This patch adds the
    second case.

2018-05-08 Allan Sandfeld Jensen <allan.jen...@qt.io>

gcc/

PR tree-optimization/85692
* tree-ssa-forwprop.c (simplify_vector_constructor): Try two
source permute as well.

gcc/testsuite

* gcc.target/i386/pr85692.c: Test two simply constructions are
detected as permute instructions.

diff --git a/gcc/testsuite/gcc.target/i386/pr85692.c b/gcc/testsuite/gcc.target/i386/pr85692.c
new file mode 100644
index 000..322c1050161
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr85692.c
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -msse4.1" } */
+/* { dg-final { scan-assembler "unpcklps" } } */
+/* { dg-final { scan-assembler "blendps" } } */
+/* { dg-final { scan-assembler-not "shufps" } } */
+/* { dg-final { scan-assembler-not "unpckhps" } } */
+
+typedef float v4sf __attribute__ ((vector_size (16)));
+
+v4sf unpcklps(v4sf a, v4sf b)
+{
+return v4sf{a[0],b[0],a[1],b[1]};
+}
+
+v4sf blendps(v4sf a, v4sf b)
+{
+return v4sf{a[0],b[1],a[2],b[3]};
+}
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 58ec6b47a5b..fbee8064160 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -2004,7 +2004,7 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
   gimple *def_stmt;
-  tree op, op2, orig, type, elem_type;
+  tree op, op2, orig1, orig2, type, elem_type;
   unsigned elem_size, i;
   unsigned HOST_WIDE_INT nelts;
   enum tree_code code, conv_code;
@@ -2022,8 +2022,9 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
 
-  vec_perm_builder sel (nelts, nelts, 1);
-  orig = NULL;
+  vec_perm_builder sel (nelts, 2, nelts);
+  orig1 = NULL;
+  orig2 = NULL;
   conv_code = ERROR_MARK;
   maybe_ident = true;
   FOR_EACH_VEC_SAFE_ELT (CONSTRUCTOR_ELTS (op), i, elt)
@@ -2063,10 +2064,26 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	return false;
   op1 = gimple_assign_rhs1 (def_stmt);
   ref = TREE_OPERAND (op1, 0);
-  if (orig)
+  if (orig1)
 	{
-	  if (ref != orig)
-	return false;
+	  if (ref == orig1 || orig2)
+	{
+	  if (ref != orig1 && ref != orig2)
+	return false;
+	}
+	  else
+	{
+	  if (TREE_CODE (ref) != SSA_NAME)
+		return false;
+	  if (! VECTOR_TYPE_P (TREE_TYPE (ref))
+		  || ! useless_type_conversion_p (TREE_TYPE (op1),
+		  TREE_TYPE (TREE_TYPE (ref
+		return false;
+	  if (TREE_TYPE (orig1) != TREE_TYPE (ref))
+		return false;
+	  orig2 = ref;
+	  maybe_ident = false;
+	  }
 	}
   else
 	{
@@ -2076,12 +2093,14 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	  || ! useless_type_conversion_p (TREE_TYPE (op1),
 	  TREE_TYPE (TREE_TYPE (ref
 	return false;
-	  orig = ref;
+	  orig1 = ref;
 	}
   unsigned int elt;
   if (maybe_ne (bit_field_size (op1), elem_size)
 	  || !constant_multiple_p (bit_field_offset (op1), elem_size, ))
 	return false;
+  if (orig2 && ref == orig2)
+	elt += nelts;
   if (elt != i)
 	maybe_ident = false;
   sel.quick_push (elt);
@@ -2089,14 +2108,17 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   if (i < nelts)
 return false;
 
-  if (! VECTOR_TYPE_P (TREE_TYPE (orig))
+  if (! VECTOR_TYPE_P (TREE_TYPE (orig1))
   || maybe_ne (TYPE_VECTOR_SUBPARTS (type),
-		   TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig
+		   TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig1
 return false;
 
+  if (!orig2)
+orig2 = orig1;
+
   tree tem;
   if (conv_code != ERROR_MARK
-  && (! supportable_convert_operation (conv_code, type, TREE_TYPE (orig),
+  && (! supportable_convert_operation (conv_code, type, TREE_TYPE (orig1),
 	   , _code)
 	  || conv_code

[Patch] Request for comments: short int builtins

2018-05-20 Thread Allan Sandfeld Jensen
A little over a year back we had a regression in a point release of gcc 
because the builtin __builtin_clzs got removed from i386, in part because it 
is was wrongly named for a target specific builtin, but we were using it in Qt 
since it existed in multiple compilers. I got the patch removing it partially 
reverted and the problem solved, but in the meantime I had worked on a patch 
to make it a generic builtin instead. I have rebased it and added it below, 
should I clean it up futher, finish the other builtins add tests and propose 
it, or is this not something we want?

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 5365befd351..74a84e653e4 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -232,6 +232,8 @@ DEF_FUNCTION_TYPE_1 (BT_FN_INT_LONG, BT_INT, BT_LONG)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_ULONG, BT_INT, BT_ULONG)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_LONGLONG, BT_INT, BT_LONGLONG)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_ULONGLONG, BT_INT, BT_ULONGLONG)
+DEF_FUNCTION_TYPE_1 (BT_FN_INT_INT16, BT_INT, BT_INT16)
+DEF_FUNCTION_TYPE_1 (BT_FN_INT_UINT16, BT_INT, BT_UINT16)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_INTMAX, BT_INT, BT_INTMAX)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_UINTMAX, BT_INT, BT_UINTMAX)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_PTR, BT_INT, BT_PTR)
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 9a2bf8c7d38..a8ad00e8016 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -10689,14 +10689,17 @@ is_inexpensive_builtin (tree decl)
   case BUILT_IN_CLZIMAX:
   case BUILT_IN_CLZL:
   case BUILT_IN_CLZLL:
+  case BUILT_IN_CLZS:
   case BUILT_IN_CTZ:
   case BUILT_IN_CTZIMAX:
   case BUILT_IN_CTZL:
   case BUILT_IN_CTZLL:
+  case BUILT_IN_CTZS:
   case BUILT_IN_FFS:
   case BUILT_IN_FFSIMAX:
   case BUILT_IN_FFSL:
   case BUILT_IN_FFSLL:
+  case BUILT_IN_FFSS:
   case BUILT_IN_IMAXABS:
   case BUILT_IN_FINITE:
   case BUILT_IN_FINITEF:
@@ -10734,10 +10737,12 @@ is_inexpensive_builtin (tree decl)
   case BUILT_IN_POPCOUNTL:
   case BUILT_IN_POPCOUNTLL:
   case BUILT_IN_POPCOUNTIMAX:
+  case BUILT_IN_POPCOUNTS:
   case BUILT_IN_POPCOUNT:
   case BUILT_IN_PARITYL:
   case BUILT_IN_PARITYLL:
   case BUILT_IN_PARITYIMAX:
+  case BUILT_IN_PARITYS:
   case BUILT_IN_PARITY:
   case BUILT_IN_LABS:
   case BUILT_IN_LLABS:
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 449d08d682f..618ee798767 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -848,15 +848,18 @@ DEF_GCC_BUILTIN(BUILT_IN_CLZ, "clz", 
BT_FN_INT_UINT, ATTR_CONST_NOTHROW_
 DEF_GCC_BUILTIN(BUILT_IN_CLZIMAX, "clzimax", BT_FN_INT_UINTMAX, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLZL, "clzl", BT_FN_INT_ULONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLZLL, "clzll", BT_FN_INT_ULONGLONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_CLZS, "clzs", BT_FN_INT_UINT16, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CONSTANT_P, "constant_p", BT_FN_INT_VAR, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CTZ, "ctz", BT_FN_INT_UINT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CTZIMAX, "ctzimax", BT_FN_INT_UINTMAX, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CTZL, "ctzl", BT_FN_INT_ULONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CTZLL, "ctzll", BT_FN_INT_ULONGLONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_CTZS, "ctzs", BT_FN_INT_UINT16, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLRSB, "clrsb", BT_FN_INT_INT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLRSBIMAX, "clrsbimax", BT_FN_INT_INTMAX, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLRSBL, "clrsbl", BT_FN_INT_LONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLRSBLL, "clrsbll", BT_FN_INT_LONGLONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_CLRSBS, "clrsbs", BT_FN_INT_INT16, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_DCGETTEXT, "dcgettext", 
BT_FN_STRING_CONST_STRING_CONST_STRING_INT, ATTR_FORMAT_ARG_2)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_DGETTEXT, "dgettext", 
BT_FN_STRING_CONST_STRING_CONST_STRING, ATTR_FORMAT_ARG_2)
 DEF_GCC_BUILTIN(BUILT_IN_DWARF_CFA, "dwarf_cfa", BT_FN_PTR, 
ATTR_NULL)
@@ -878,6 +881,7 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_FFS, "ffs", 
BT_FN_INT_INT, ATTR_CONST_NOTHROW_L
 DEF_EXT_LIB_BUILTIN(BUILT_IN_FFSIMAX, "ffsimax", BT_FN_INT_INTMAX, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_FFSL, "ffsl", BT_FN_INT_LONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_FFSLL, "ffsll", BT_FN_INT_LONGLONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_EXT_LIB_BUILTIN(BUILT_IN_FFSS, "ffss", BT_FN_INT_INT16, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_FORK, "fork", BT_FN_PID, 

Re: [Patch] Request for comments: short int builtins

2018-05-20 Thread Allan Sandfeld Jensen
On Sonntag, 20. Mai 2018 15:07:59 CEST Richard Biener wrote:
> On May 20, 2018 11:01:54 AM GMT+02:00, Allan Sandfeld Jensen 
<li...@carewolf.com> wrote:
> >A little over a year back we had a regression in a point release of gcc
> >
> >because the builtin __builtin_clzs got removed from i386, in part
> >because it
> >is was wrongly named for a target specific builtin, but we were using
> >it in Qt
> >since it existed in multiple compilers. I got the patch removing it
> >partially
> >reverted and the problem solved, but in the meantime I had worked on a
> >patch
> >to make it a generic builtin instead. I have rebased it and added it
> >below,
> >should I clean it up futher, finish the other builtins add tests and
> >propose
> >it, or is this not something we want?
> 
> Can't users simply do clz((unsigned short) s) - 16? GCC should be able to
> handle this for instruction selection With The addition of some folding
> patterns using the corresponding internal function.
> 
Of course, but we already have the builtin for i386, and a version of the 
builtin for all integer types except short for all platforms. Note the patch 
also generally adds short versions for all the general integer builtins, not 
just clzs and they are not all that trivial to synthesize (without knowing the 
trick, which gcc does).


'Allan





[Patch][GCC] Document and fix -r (partial linking)

2018-08-01 Thread Allan Sandfeld Jensen
The option has existed and been working for years,
make sure it implies the right extra options, and list
it in the documentation.

2018-08-01 Allan Sandfeld Jensen 

gcc/doc

* invoke.texi: Document -r

gcc/
* gcc.c: Correct default specs for -r
---
 gcc/doc/invoke.texi | 7 ++-
 gcc/gcc.c   | 6 +++---
 2 files changed, 9 insertions(+), 4 deletions(-)>From 638966e6c7e072ca46c6af0664fbd57bedbfff80 Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Wed, 1 Aug 2018 18:07:05 +0200
Subject: [PATCH] Fix and document -r option

The option has existed and been working for years,
make sure it implies the right extra options, and list
it in the documentation.

2018-07-29 Allan Sandfeld Jensen 

gcc/doc

* invoke.texi: Document -r

gcc/
* gcc.c: Correct default specs for -r
---
 gcc/doc/invoke.texi | 7 ++-
 gcc/gcc.c   | 6 +++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6047d82065a..7da30bd9d99 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -518,7 +518,7 @@ Objective-C and Objective-C++ Dialects}.
 @xref{Link Options,,Options for Linking}.
 @gccoptlist{@var{object-file-name}  -fuse-ld=@var{linker}  -l@var{library} @gol
 -nostartfiles  -nodefaultlibs  -nolibc  -nostdlib @gol
--pie  -pthread  -rdynamic @gol
+-pie  -pthread  -r  -rdynamic @gol
 -s  -static -static-pie -static-libgcc  -static-libstdc++ @gol
 -static-libasan  -static-libtsan  -static-liblsan  -static-libubsan @gol
 -shared  -shared-libgcc  -symbolic @gol
@@ -12444,6 +12444,11 @@ x86 Cygwin and MinGW targets.  On some targets this option also sets
 flags for the preprocessor, so it should be used consistently for both
 compilation and linking.
 
+@item -r
+@opindex r
+Produce a relocatable object as output. This is also known as partial
+linking.
+
 @item -rdynamic
 @opindex rdynamic
 Pass the flag @option{-export-dynamic} to the ELF linker, on targets
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 780d4859ef3..858a5600c14 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -675,7 +675,7 @@ proper position among the other output files.  */
 
 /* config.h can define LIB_SPEC to override the default libraries.  */
 #ifndef LIB_SPEC
-#define LIB_SPEC "%{!shared:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"
+#define LIB_SPEC "%{!shared|!r:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"
 #endif
 
 /* When using -fsplit-stack we need to wrap pthread_create, in order
@@ -797,7 +797,7 @@ proper position among the other output files.  */
 /* config.h can define STARTFILE_SPEC to override the default crt0 files.  */
 #ifndef STARTFILE_SPEC
 #define STARTFILE_SPEC  \
-  "%{!shared:%{pg:gcrt0%O%s}%{!pg:%{p:mcrt0%O%s}%{!p:crt0%O%s}}}"
+  "%{!shared|!r:%{pg:gcrt0%O%s}%{!pg:%{p:mcrt0%O%s}%{!p:crt0%O%s}}}"
 #endif
 
 /* config.h can define ENDFILE_SPEC to override the default crtn files.  */
@@ -936,7 +936,7 @@ proper position among the other output files.  */
 #else
 #define LD_PIE_SPEC ""
 #endif
-#define LINK_PIE_SPEC "%{static|shared|r:;" PIE_SPEC ":" LD_PIE_SPEC "} "
+#define LINK_PIE_SPEC "%{static|shared|r|ar:;" PIE_SPEC ":" LD_PIE_SPEC "} "
 #endif
 
 #ifndef LINK_BUILDID_SPEC
-- 
2.17.0



[PATCH][x86] Match movss and movsd "blend" instructions

2018-08-01 Thread Allan Sandfeld Jensen
Adds the ability to match movss and movsd as blend patterns,
implemented in a new method to be able to match these before shuffles,
while keeping other blends after.

2018-07-29 Allan Sandfeld Jensen 

gcc/config/i386

* i386.cc (expand_vec_perm_movs): New method matching movs
patterns.
* i386.cc (expand_vec_perm_1): Try the new method.

gcc/testsuite

* gcc.target/i386/sse2-movs.c: New test.
---
 gcc/config/i386/emmintrin.h   |  2 +-
 gcc/config/i386/i386.c| 44 +++
 gcc/config/i386/xmmintrin.h   |  2 +-
 gcc/testsuite/gcc.target/i386/sse2-movs.c | 21 +++
 4 files changed, 67 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-movs.c
>From e96b3aa9017ad0d19238c923146196405cc4e5af Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Wed, 9 May 2018 12:35:14 +0200
Subject: [PATCH] Match movss and movsd blends

Adds the ability to match movss and movsd as blend patterns,
implemented in a new method to be able to match these before shuffles,
while keeping other blends after.

2018-07-29 Allan Sandfeld Jensen 

gcc/config/i386

* i386.cc (expand_vec_perm_movs): New method matching movs
patterns.
* i386.cc (expand_vec_perm_1): Try the new method.

gcc/testsuite

* gcc.target/i386/sse2-movs.c: New test.
---
 gcc/config/i386/emmintrin.h   |  2 +-
 gcc/config/i386/i386.c| 44 +++
 gcc/config/i386/xmmintrin.h   |  2 +-
 gcc/testsuite/gcc.target/i386/sse2-movs.c | 21 +++
 4 files changed, 67 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-movs.c

diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index b940a39d27b..1efd943bac4 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -113,7 +113,7 @@ _mm_setzero_pd (void)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_sd (__m128d __A, __m128d __B)
 {
-  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
+  return __extension__ (__m128d)(__v2df){__B[0],__A[1]};
 }
 
 /* Load two DPFP values from P.  The address must be 16-byte aligned.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ee409cfe7e4..2337ef5ea08 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -46143,6 +46143,46 @@ expand_vselect_vconcat (rtx target, rtx op0, rtx op1,
   return ok;
 }
 
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+   using movss or movsd.  */
+static bool
+expand_vec_perm_movs (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  unsigned i, nelt = d->nelt;
+  rtx x;
+
+  if (d->one_operand_p)
+return false;
+
+  if (TARGET_SSE2 && (vmode == V2DFmode || vmode == V4SFmode))
+;
+  else
+return false;
+
+  /* Only the first element is changed. */
+  if (d->perm[0] != nelt && d->perm[0] != 0)
+return false;
+  for (i = 1; i < nelt; ++i) {
+{
+  if (d->perm[i] != i + nelt - d->perm[0])
+return false;
+}
+  }
+
+  if (d->testing_p)
+return true;
+
+  if (d->perm[0] == nelt)
+x = gen_rtx_VEC_MERGE (vmode, d->op1, d->op0, GEN_INT (1));
+  else
+x = gen_rtx_VEC_MERGE (vmode, d->op0, d->op1, GEN_INT (1));
+
+  emit_insn (gen_rtx_SET (d->target, x));
+
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
in terms of blendp[sd] / pblendw / pblendvb / vpblendd.  */
 
@@ -46885,6 +46925,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
 	}
 }
 
+  /* Try movss/movsd instructions.  */
+  if (expand_vec_perm_movs (d))
+return true;
+
   /* Finally, try the fully general two operand permute.  */
   if (expand_vselect_vconcat (d->target, d->op0, d->op1, d->perm, nelt,
 			  d->testing_p))
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index f64f3f74a0b..699f681e054 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -1011,7 +1011,7 @@ _mm_storer_ps (float *__P, __m128 __A)
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_ss (__m128 __A, __m128 __B)
 {
-  return (__m128) __builtin_ia32_movss ((__v4sf)__A, (__v4sf)__B);
+  return __extension__ (__m128)(__v4sf){__B[0],__A[1],__A[2],__A[3]};
 }
 
 /* Extracts one of the four words of A.  The selector N must be immediate.  */
diff --git a/gcc/testsuite/gcc.target/i386/sse2-movs.c b/gcc/testsuite/gcc.target/i386/sse2-movs.c
new file mode 100644
index 000..79f486cfa82
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-movs.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-require-effective-target sse2 } */
+/* { dg-final { scan-assembler "movss" } } */
+/* { dg-final {

Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-02 Thread Allan Sandfeld Jensen
On Donnerstag, 2. August 2018 23:46:37 CEST Jakub Jelinek wrote:
> On Thu, Aug 02, 2018 at 10:50:58PM +0200, Allan Sandfeld Jensen wrote:
> > Here is the version with __builtin_shuffle. It might be more expectable
> > -O0, but it is also uglier.
> 
> I don't find anything ugly on it, except the formatting glitches (missing
> space before (, overlong line, and useless __extension__.
> Improving code generated for __builtin_shuffle is desirable too.
> 

__extension__ is needed when using the the {...} initialization otherwise -
std=C89 will produce warnings about standards.  The line is a bit long, but I 
thought it looked better like this rather than adding any emergency line 
breaks. Is there a hard limit?

> > --- a/gcc/config/i386/xmmintrin.h
> > +++ b/gcc/config/i386/xmmintrin.h
> > @@ -1011,7 +1011,8 @@ _mm_storer_ps (float *__P, __m128 __A)
> > 
> >  extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
> >  __artificial__)) _mm_move_ss (__m128 __A, __m128 __B)
> >  {
> > 
> > -  return (__m128) __builtin_ia32_movss ((__v4sf)__A, (__v4sf)__B);
> > +  return __extension__ (__m128) __builtin_shuffle((__v4sf)__A,
> > (__v4sf)__B, + 
> > (__attribute__((__vector_size__ (16))) int){4, 1, 2, 3});
> And obviously use __v4si here instead of __attribute__((__vector_size__
> (16))) int.
> 
__v4si is declared in emmintrin.h, so I couldn't use it here unless I moved 
the definition. I tried changing as little as possible to not trigger bike 
shedding.

'Allan





Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-02 Thread Allan Sandfeld Jensen
On Donnerstag, 2. August 2018 23:15:28 CEST Marc Glisse wrote:
> On Thu, 2 Aug 2018, Allan Sandfeld Jensen wrote:
> > I forgot. One of the things that makes using __builtin_shuffle ugly is
> > that
> > __v4si  as the suffle argument needs to be in _mm_move_ss, is declared
> > in emmintrin.h, but _mm_move_ss is in xmmintrin.h.
> 
> __v4si is some internal detail, I don't see much issue with moving it to
> xmmintrin.h if you want to use it there.
> 
> > In general the gcc __builtin_shuffle syntax with the argument being a
> > vector is kind of ackward. At least for the declaring intrinsics, the
> > clang still where the permutator is extra argument is easier to deal
> > with:
> > __builtin_shuffle(a, b, (__v4si){4, 0, 1, 2})
> > vs
> > __builtin_shuffle(a, b, 4, 0, 1, 2)
> 
> __builtin_shufflevector IIRC
> 
> >> The question is what users expect and get when they use -O0 with
> >> intrinsics?> 
> > Here is the version with __builtin_shuffle. It might be more expectable
> > -O0, but it is also uglier.
> 
> I am not convinced -O0 is very important.
> 
Me neither, and in any case I would argue the logic that recognizes the vector 
constructions patterns are not optimizations but instruction matching.

> If you start extending your approach to _mm_add_sd and others, while one
> instruction is easy enough to recognize, if we put several in a row, they
> will be partially simplified and may become harder to recognize.
> { x*(y+v[0]-z), v[1] } requires that you notice that the upper part of
> this vector is v[1], i.e. the upper part of a vector whose lower part
> appears somewhere in the arbitrarily complex expression for the lower
> part of the result. And you then have to propagate the fact that you are
> doing vector operations all the way back to v[0].
> 
> I don't have a strong opinion on what the best approach is.

Yes, I am not sure all of those could be done exhaustively with the existing 
logic, and it might also be of dubious value as in almost all cases the ps 
instructions have the same latency and bandwidth as the ss instructions, so 
developers should probably use _ps versions as they are scheduled better by 
the compiler (or at least better by gcc).
It was just an idea, and I haven't tried it at this point.

'Allan





Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-02 Thread Allan Sandfeld Jensen
On Mittwoch, 1. August 2018 18:51:41 CEST Marc Glisse wrote:
> On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> >  extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__,
> > 
> > __artificial__))
> > 
> >  _mm_move_sd (__m128d __A, __m128d __B)
> >  {
> > 
> > -  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
> > +  return __extension__ (__m128d)(__v2df){__B[0],__A[1]};
> > 
> >  }
> 
> If the goal is to have it represented as a VEC_PERM_EXPR internally, I
> wonder if we should be explicit and use __builtin_shuffle instead of
> relying on some forwprop pass to transform it. Maybe not, just asking. And
> the answer need not even be the same for _mm_move_sd and _mm_move_ss.

I forgot. One of the things that makes using __builtin_shuffle ugly is that 
__v4si  as the suffle argument needs to be in _mm_move_ss, is declared
in emmintrin.h, but _mm_move_ss is in xmmintrin.h.

In general the gcc __builtin_shuffle syntax with the argument being a vector 
is kind of ackward. At least for the declaring intrinsics, the clang still 
where the permutator is extra argument is easier to deal with:
__builtin_shuffle(a, b, (__v4si){4, 0, 1, 2})
 vs
 __builtin_shuffle(a, b, 4, 0, 1, 2)







Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-02 Thread Allan Sandfeld Jensen
On Donnerstag, 2. August 2018 11:18:41 CEST Richard Biener wrote:
> On Thu, Aug 2, 2018 at 11:12 AM Allan Sandfeld Jensen
> 
>  wrote:
> > On Mittwoch, 1. August 2018 18:51:41 CEST Marc Glisse wrote:
> > > On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> > > >  extern __inline __m128d __attribute__((__gnu_inline__,
> > > >  __always_inline__,
> > > > 
> > > > __artificial__))
> > > > 
> > > >  _mm_move_sd (__m128d __A, __m128d __B)
> > > >  {
> > > > 
> > > > -  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
> > > > +  return __extension__ (__m128d)(__v2df){__B[0],__A[1]};
> > > > 
> > > >  }
> > > 
> > > If the goal is to have it represented as a VEC_PERM_EXPR internally, I
> > > wonder if we should be explicit and use __builtin_shuffle instead of
> > > relying on some forwprop pass to transform it. Maybe not, just asking.
> > > And
> > > the answer need not even be the same for _mm_move_sd and _mm_move_ss.
> > 
> > I wrote it this way because this pattern could later also be used for the
> > other _ss intrinsics, such as _mm_add_ss, where a _builtin_shuffle could
> > not. To match the other intrinsics the logic that tries to match vector
> > construction just needs to be extended to try merge patterns even if one
> > of the subexpressions is not simple.
> 
> The question is what users expect and get when they use -O0 with intrinsics?
> 
> Richard.
> 
Here is the version with __builtin_shuffle. It might be more expectable -O0, 
but it is also uglier.

diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index b940a39d27b..6501638f619 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -113,7 +113,7 @@ _mm_setzero_pd (void)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_sd (__m128d __A, __m128d __B)
 {
-  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
+  return __extension__ (__m128d) __builtin_shuffle((__v2df)__A, (__v2df)__B, (__v2di){2, 1});
 }
 
 /* Load two DPFP values from P.  The address must be 16-byte aligned.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ee409cfe7e4..2337ef5ea08 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -46143,6 +46143,46 @@ expand_vselect_vconcat (rtx target, rtx op0, rtx op1,
   return ok;
 }
 
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+   using movss or movsd.  */
+static bool
+expand_vec_perm_movs (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  unsigned i, nelt = d->nelt;
+  rtx x;
+
+  if (d->one_operand_p)
+return false;
+
+  if (TARGET_SSE2 && (vmode == V2DFmode || vmode == V4SFmode))
+;
+  else
+return false;
+
+  /* Only the first element is changed. */
+  if (d->perm[0] != nelt && d->perm[0] != 0)
+return false;
+  for (i = 1; i < nelt; ++i) {
+{
+  if (d->perm[i] != i + nelt - d->perm[0])
+return false;
+}
+  }
+
+  if (d->testing_p)
+return true;
+
+  if (d->perm[0] == nelt)
+x = gen_rtx_VEC_MERGE (vmode, d->op1, d->op0, GEN_INT (1));
+  else
+x = gen_rtx_VEC_MERGE (vmode, d->op0, d->op1, GEN_INT (1));
+
+  emit_insn (gen_rtx_SET (d->target, x));
+
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
in terms of blendp[sd] / pblendw / pblendvb / vpblendd.  */
 
@@ -46885,6 +46925,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
 	}
 }
 
+  /* Try movss/movsd instructions.  */
+  if (expand_vec_perm_movs (d))
+return true;
+
   /* Finally, try the fully general two operand permute.  */
   if (expand_vselect_vconcat (d->target, d->op0, d->op1, d->perm, nelt,
 			  d->testing_p))
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index f64f3f74a0b..45b99ff87d5 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -1011,7 +1011,8 @@ _mm_storer_ps (float *__P, __m128 __A)
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_ss (__m128 __A, __m128 __B)
 {
-  return (__m128) __builtin_ia32_movss ((__v4sf)__A, (__v4sf)__B);
+  return __extension__ (__m128) __builtin_shuffle((__v4sf)__A, (__v4sf)__B,
+  (__attribute__((__vector_size__ (16))) int){4, 1, 2, 3});
 }
 
 /* Extracts one of the four words of A.  The selector N must be immediate.  */


Re: [Patch][GCC] Document and fix -r (partial linking)

2018-08-03 Thread Allan Sandfeld Jensen
On Mittwoch, 1. August 2018 18:32:30 CEST Joseph Myers wrote:
> On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> > gcc/
> > 
> > * gcc.c: Correct default specs for -r
> 
> I don't follow why your changes (which would need describing for each
> individual spec changed) are corrections.
> 
> >  /* config.h can define LIB_SPEC to override the default libraries.  */
> >  #ifndef LIB_SPEC
> > 
> > -#define LIB_SPEC "%{!shared:%{g*:-lg}
> > %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}" +#define LIB_SPEC
> > "%{!shared|!r:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"> 
> >  #endif
> 
> '!' binds more closely than '|' in specs.  That is, !shared|!r means the
> following specs are used unless both -shared and -r are specified, which
> seems nonsensical to me.  I'd expect something more like "shared|r:;" to
> expand to nothing if either -shared or -r is passed and to what follows if
> neither is passed.
> 
> And that ignores that this LIB_SPEC value in gcc.c is largely irrelevant,
> as it's generally overridden by targets - and normally for targets using
> ELF shared libraries, for example, -lc *does* have to be used when linking
> with -shared.
> 
> I think you're changing the wrong place for this.  If you want -r to be
> usable with GCC without using -nostdlib (which is an interesting
> question), you actually need to change LINK_COMMAND_SPEC (also sometimes
> overridden for targets) to handle -r more like -nostdlib -nostartfiles.
> 
Okay, so like this?
>From 9de68f2ef0b77a0c0bcf0d83232e7fc34b006406 Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Wed, 1 Aug 2018 18:07:05 +0200
Subject: [PATCH] Fix and document -r option

The option has existed and been working for years,
make sure it implies the right extra options, and list
it in the documentation.

2018-07-29 Allan Sandfeld Jensen 

gcc/doc

* invoke.texi: Document -r

gcc/
* gcc.c (LINK_COMMAND_SPEC): Handle -r like -nostdlib
* config/darwin.h (LINK_COMMAND_SPEC): Handle -r like -nostdlib
---
 gcc/config/darwin.h | 8 
 gcc/doc/invoke.texi | 7 ++-
 gcc/gcc.c   | 6 +++---
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index 980ad9b4057..6ad657a4958 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -178,20 +178,20 @@ extern GTY(()) int darwin_ms_struct;
"%X %{s} %{t} %{Z} %{u*} \
 %{e*} %{r} \
 %{o*}%{!o:-o a.out} \
-%{!nostdlib:%{!nostartfiles:%S}} \
+%{!nostdlib:%{!r:%{!nostartfiles:%S}}} \
 %{L*} %(link_libgcc) %o %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} \
 %{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*:%*} 1): \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libgomp.a%s; : -lgomp } } \
 %{fgnu-tm: \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libitm.a%s; : -litm } } \
-%{!nostdlib:%{!nodefaultlibs:\
+%{!nostdlib:%{!r:%{!nodefaultlibs:\
   %{%:sanitize(address): -lasan } \
   %{%:sanitize(undefined): -lubsan } \
   %(link_ssp) \
   " DARWIN_EXPORT_DYNAMIC " %

Re: [Patch][GCC] Document and fix -r (partial linking)

2018-08-01 Thread Allan Sandfeld Jensen
On Mittwoch, 1. August 2018 18:32:30 CEST Joseph Myers wrote:
> On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> > gcc/
> > 
> > * gcc.c: Correct default specs for -r
> 
> I don't follow why your changes (which would need describing for each
> individual spec changed) are corrections.
> 
> >  /* config.h can define LIB_SPEC to override the default libraries.  */
> >  #ifndef LIB_SPEC
> > 
> > -#define LIB_SPEC "%{!shared:%{g*:-lg}
> > %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}" +#define LIB_SPEC
> > "%{!shared|!r:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"> 
> >  #endif
> 
> '!' binds more closely than '|' in specs.  That is, !shared|!r means the
> following specs are used unless both -shared and -r are specified, which
> seems nonsensical to me.  I'd expect something more like "shared|r:;" to
> expand to nothing if either -shared or -r is passed and to what follows if
> neither is passed.
> 
> And that ignores that this LIB_SPEC value in gcc.c is largely irrelevant,
> as it's generally overridden by targets - and normally for targets using
> ELF shared libraries, for example, -lc *does* have to be used when linking
> with -shared.
> 
> I think you're changing the wrong place for this.  If you want -r to be
> usable with GCC without using -nostdlib (which is an interesting
> question), you actually need to change LINK_COMMAND_SPEC (also sometimes
> overridden for targets) to handle -r more like -nostdlib -nostartfiles.
> 
Ok, thanks for the information, I will investigate that. 

> > -#define LINK_PIE_SPEC "%{static|shared|r:;" PIE_SPEC ":" LD_PIE_SPEC "} "
> > +#define LINK_PIE_SPEC "%{static|shared|r|ar:;" PIE_SPEC ":" LD_PIE_SPEC
> > "} "
> What's this "-ar" option you're handling here?

Dead code from a previous more ambitious version of the patch. I will remove.

`Allan





Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-02 Thread Allan Sandfeld Jensen
On Mittwoch, 1. August 2018 18:51:41 CEST Marc Glisse wrote:
> On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> >  extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__,
> > 
> > __artificial__))
> > 
> >  _mm_move_sd (__m128d __A, __m128d __B)
> >  {
> > 
> > -  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
> > +  return __extension__ (__m128d)(__v2df){__B[0],__A[1]};
> > 
> >  }
> 
> If the goal is to have it represented as a VEC_PERM_EXPR internally, I
> wonder if we should be explicit and use __builtin_shuffle instead of
> relying on some forwprop pass to transform it. Maybe not, just asking. And
> the answer need not even be the same for _mm_move_sd and _mm_move_ss.

I wrote it this way because this pattern could later also be used for the 
other _ss intrinsics, such as _mm_add_ss, where a _builtin_shuffle could not. 
To match the other intrinsics the logic that tries to match vector 
construction just needs to be extended to try merge patterns even if one of 
the subexpressions is not simple.

'Allan




Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-11 Thread Allan Sandfeld Jensen
Updated:

Match movss and movsd "blend" instructions

Adds the ability to match movss and movsd as blend patterns,
implemented in a new method to be able to match these before shuffles,
while keeping other blends after.

2018-08-11 Allan Sandfeld Jensen 

gcc/config/i386

* i386.cc (expand_vec_perm_movs): New method matching movs
patterns.
* i386.cc (expand_vec_perm_1): Try the new method.

gcc/testsuite

* gcc.target/i386/sse2-movs.c: New test.
diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index b940a39d27b..6501638f619 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -113,7 +113,7 @@ _mm_setzero_pd (void)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_sd (__m128d __A, __m128d __B)
 {
-  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
+  return __extension__ (__m128d) __builtin_shuffle((__v2df)__A, (__v2df)__B, (__v2di){2, 1});
 }
 
 /* Load two DPFP values from P.  The address must be 16-byte aligned.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7554fd1f659..485850096e9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -46145,6 +46145,46 @@ expand_vselect_vconcat (rtx target, rtx op0, rtx op1,
   return ok;
 }
 
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+   using movss or movsd.  */
+static bool
+expand_vec_perm_movs (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  unsigned i, nelt = d->nelt;
+  rtx x;
+
+  if (d->one_operand_p)
+return false;
+
+  if (TARGET_SSE2 && (vmode == V2DFmode || vmode == V4SFmode))
+;
+  else
+return false;
+
+  /* Only the first element is changed. */
+  if (d->perm[0] != nelt && d->perm[0] != 0)
+return false;
+  for (i = 1; i < nelt; ++i) {
+{
+  if (d->perm[i] != i + nelt - d->perm[0])
+return false;
+}
+  }
+
+  if (d->testing_p)
+return true;
+
+  if (d->perm[0] == nelt)
+x = gen_rtx_VEC_MERGE (vmode, d->op1, d->op0, GEN_INT (1));
+  else
+x = gen_rtx_VEC_MERGE (vmode, d->op0, d->op1, GEN_INT (1));
+
+  emit_insn (gen_rtx_SET (d->target, x));
+
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
in terms of blendp[sd] / pblendw / pblendvb / vpblendd.  */
 
@@ -46887,6 +46927,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
 	}
 }
 
+  /* Try movss/movsd instructions.  */
+  if (expand_vec_perm_movs (d))
+return true;
+
   /* Finally, try the fully general two operand permute.  */
   if (expand_vselect_vconcat (d->target, d->op0, d->op1, d->perm, nelt,
 			  d->testing_p))
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index f64f3f74a0b..f770570295c 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -1011,7 +1011,10 @@ _mm_storer_ps (float *__P, __m128 __A)
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_ss (__m128 __A, __m128 __B)
 {
-  return (__m128) __builtin_ia32_movss ((__v4sf)__A, (__v4sf)__B);
+  return (__m128) __builtin_shuffle ((__v4sf)__A, (__v4sf)__B,
+ __extension__
+ (__attribute__((__vector_size__ (16))) int)
+ {4,1,2,3});
 }
 
 /* Extracts one of the four words of A.  The selector N must be immediate.  */


Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-11 Thread Allan Sandfeld Jensen
On Samstag, 11. August 2018 11:18:39 CEST Jakub Jelinek wrote:
> On Sat, Aug 11, 2018 at 10:59:26AM +0200, Allan Sandfeld Jensen wrote:
> > +/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
> > +   using movss or movsd.  */
> > +static bool
> > +expand_vec_perm_movs (struct expand_vec_perm_d *d)
> > +{
> > +  machine_mode vmode = d->vmode;
> > +  unsigned i, nelt = d->nelt;
> > +  rtx x;
> > +
> > +  if (d->one_operand_p)
> > +return false;
> > +
> > +  if (TARGET_SSE2 && (vmode == V2DFmode || vmode == V4SFmode))
> > +;
> > +  else
> > +return false;
> > +
> > +  /* Only the first element is changed. */
> 
> Two spaces after .
> 
> > +  if (d->perm[0] != nelt && d->perm[0] != 0)
> > +return false;
> > +  for (i = 1; i < nelt; ++i) {
> > +{
> > +  if (d->perm[i] != i + nelt - d->perm[0])
> > +return false;
> > +}
> > +  }
> 
> Extraneous {}s (both pairs, the outer ones even badly indented).
> 
> Otherwise LGTM.
> 
Updated:

Note as an infrequent contributor don't have commit access, so I need someone 
reviewing to also commit.

'Allan
>From e33241e5ddc7fa57c4ba7893669af7f7e636125e Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Sat, 11 Aug 2018 11:52:21 +0200
Subject: [PATCH] Match movss and movsd "blend" instructions

Adds the ability to match movss and movsd as blend patterns,
implemented in a new method to be able to match these before shuffles,
while keeping other blends after.

2018-08-11 Allan Sandfeld Jensen 

gcc/config/i386

* i386.cc (expand_vec_perm_movs): New method matching movs
patterns.
* i386.cc (expand_vec_perm_1): Try the new method.

gcc/testsuite

* gcc.target/i386/sse2-movs.c: New test.
---
 gcc/config/i386/emmintrin.h |  2 +-
 gcc/config/i386/i386.c  | 41 +
 gcc/config/i386/xmmintrin.h |  5 -
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index b940a39d27b..6501638f619 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -113,7 +113,7 @@ _mm_setzero_pd (void)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_sd (__m128d __A, __m128d __B)
 {
-  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
+  return __extension__ (__m128d) __builtin_shuffle((__v2df)__A, (__v2df)__B, (__v2di){2, 1});
 }
 
 /* Load two DPFP values from P.  The address must be 16-byte aligned.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7554fd1f659..15a3caa94c3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -46145,6 +46145,43 @@ expand_vselect_vconcat (rtx target, rtx op0, rtx op1,
   return ok;
 }
 
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+   using movss or movsd.  */
+static bool
+expand_vec_perm_movs (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  unsigned i, nelt = d->nelt;
+  rtx x;
+
+  if (d->one_operand_p)
+return false;
+
+  if (TARGET_SSE2 && (vmode == V2DFmode || vmode == V4SFmode))
+;
+  else
+return false;
+
+  /* Only the first element is changed.  */
+  if (d->perm[0] != nelt && d->perm[0] != 0)
+return false;
+  for (i = 1; i < nelt; ++i)
+if (d->perm[i] != i + nelt - d->perm[0])
+  return false;
+
+  if (d->testing_p)
+return true;
+
+  if (d->perm[0] == nelt)
+x = gen_rtx_VEC_MERGE (vmode, d->op1, d->op0, GEN_INT (1));
+  else
+x = gen_rtx_VEC_MERGE (vmode, d->op0, d->op1, GEN_INT (1));
+
+  emit_insn (gen_rtx_SET (d->target, x));
+
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
in terms of blendp[sd] / pblendw / pblendvb / vpblendd.  */
 
@@ -46887,6 +46924,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
 	}
 }
 
+  /* Try movss/movsd instructions.  */
+  if (expand_vec_perm_movs (d))
+return true;
+
   /* Finally, try the fully general two operand permute.  */
   if (expand_vselect_vconcat (d->target, d->op0, d->op1, d->perm, nelt,
 			  d->testing_p))
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index f64f3f74a0b..f770570295c 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -1011,7 +1011,10 @@ _mm_storer_ps (float *__P, __m128 __A)
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_ss (__m128 __A, __m128 __B)
 {
-  return (__m128) __builtin_ia32_movss ((__v4sf)__A, (__v4sf)__B);
+  return (__m128) __builtin_shuffle ((__v4sf)__A, (__v4sf)__B,
+ __extension__
+ (__attribute__((__vector_size__ (16))) int)
+ {4,1,2,3});
 }
 
 /* Extracts one of the four words of A.  The selector N must be immediate.  */
-- 
2.17.1



Re: [Patch][GCC] Document and fix -r (partial linking)

2018-08-11 Thread Allan Sandfeld Jensen
On Freitag, 3. August 2018 13:56:12 CEST Allan Sandfeld Jensen wrote:
> On Mittwoch, 1. August 2018 18:32:30 CEST Joseph Myers wrote:
> > On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> > > gcc/
> > > 
> > > * gcc.c: Correct default specs for -r
> > 
> > I don't follow why your changes (which would need describing for each
> > individual spec changed) are corrections.
> > 
> > >  /* config.h can define LIB_SPEC to override the default libraries.  */
> > >  #ifndef LIB_SPEC
> > > 
> > > -#define LIB_SPEC "%{!shared:%{g*:-lg}
> > > %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}" +#define LIB_SPEC
> > > "%{!shared|!r:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}">
> > > 
> > >  #endif
> > 
> > '!' binds more closely than '|' in specs.  That is, !shared|!r means the
> > following specs are used unless both -shared and -r are specified, which
> > seems nonsensical to me.  I'd expect something more like "shared|r:;" to
> > expand to nothing if either -shared or -r is passed and to what follows if
> > neither is passed.
> > 
> > And that ignores that this LIB_SPEC value in gcc.c is largely irrelevant,
> > as it's generally overridden by targets - and normally for targets using
> > ELF shared libraries, for example, -lc *does* have to be used when linking
> > with -shared.
> > 
> > I think you're changing the wrong place for this.  If you want -r to be
> > usable with GCC without using -nostdlib (which is an interesting
> > question), you actually need to change LINK_COMMAND_SPEC (also sometimes
> > overridden for targets) to handle -r more like -nostdlib -nostartfiles.
> 
> Okay, so like this?


Any further comments or corrections to updated patch in the parent message?

'Allan




Re: [Patch][GCC] Document and fix -r (partial linking)

2018-08-26 Thread Allan Sandfeld Jensen
On Dienstag, 21. August 2018 00:38:58 CEST Joseph Myers wrote:
> On Fri, 3 Aug 2018, Allan Sandfeld Jensen wrote:
> > > I think you're changing the wrong place for this.  If you want -r to be
> > > usable with GCC without using -nostdlib (which is an interesting
> > > question), you actually need to change LINK_COMMAND_SPEC (also sometimes
> > > overridden for targets) to handle -r more like -nostdlib -nostartfiles.
> > 
> > Okay, so like this?
> 
> Could you confirm if this has passed a bootstrap and testsuite run, with
> no testsuite regressions compared to GCC without the patch applied?  I
> think it looks mostly OK (modulo ChangeLog rewrites and a missing second
> space after '.' in the manual change) but I'd like to make sure it's
> passed the usual testing before preparing it for commit.

I didn't think of running the tests since it only affects command line 
options, but it did bootstrap, and behave as expected for my specific usecase.

I will update and run the tests when I have time.




Re: [Patch][GCC] Document and fix -r (partial linking)

2018-08-26 Thread Allan Sandfeld Jensen
On Donnerstag, 23. August 2018 23:24:02 CEST Joseph Myers wrote:
> On Thu, 23 Aug 2018, Iain Sandoe wrote:
> > Joseph: As a side-comment, is there a reason that we don’t exclude
> > gomp/itm/fortran/gcov from the link for -nostdlib / -nodefaultlib?
> > 
> > If we are relying on the lib self-specs for this, then we’re not
> > succeeding since the one we build at the moment don’t include those
> > clauses.
> 
> Well, fortran/gfortranspec.c for example has
> 
> case OPT_nostdlib:
> case OPT_nodefaultlibs:
> case OPT_c:
> case OPT_S:
> case OPT_fsyntax_only:
> case OPT_E:
>   /* These options disable linking entirely or linking of the
>  standard libraries.  */
>   library = 0;
>   break;
> 
> and only uses libgfortran.spec if (library).  So it's certainly meant to
> avoid linking with libgfortran or its dependencies if -nostdlib.

Patch updated. I specifically edited a number of the existing tests that used 
both -r and -nostdlib and removed -nostdlib so the patch is exercised by 
existing tests. The patch bootstrapped, I didn't notice any relevant failures 
when running the test suite (though I could have missed something, I am never
comfortable reading that output).

'Allan

>From 07ed41a9afd107c5d45feb1ead7a74ca735a1bb2 Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Sun, 26 Aug 2018 20:02:54 +0200
Subject: [PATCH] Fix and document -r option

The option has existed and been working for years,
make sure it implies the right extra options, and list
it in the documentation.

2018-08-26 Allan Sandfeld Jensen 

gcc/doc/
* invoke.texi: Document -r.

gcc/
* gcc.c (LINK_COMMAND_SPEC): Handle -r like -nostdlib.
* config/darwin.h (LINK_COMMAND_SPEC): Handle -r like -nostdlib.
* cp/g++spec.c (lang_specific_driver): Handle -r like -nostdlib.
* fortran/gfortranspec.c (lang_specific_driver): Handle -r like -nostdlib.
* go/gospec.c (lang_specific_driver): Handle -r like -nostdlib.

gcc/testsuite/
* g++.dg/ipa/pr64059.C: Removed now redundant -nostdlib.
* g++.dg/lto/20081109-1_0.C: Removed now redundant -nostdlib.
* g++.dg/lto/20090302_0.C: Removed now redundant -nostdlib.
* g++.dg/lto/pr45621_0.C: Removed now redundant -nostdlib.
* g++.dg/lto/pr60567_0.C: Removed now redundant -nostdlib.
* g++.dg/lto/pr62026.C: Removed now redundant -nostdlib.
* gcc.dg/lto/pr45736_0.c: Removed now redundant -nostdlib.
* gcc.dg/lto/pr52634_0.c: Removed now redundant -nostdlib.
* gfortran.dg/lto/20091016-1_0.f90: Removed now redundant -nostdlib.
* gfortran.dg/lto/pr79108_0.f90: Removed now redundant -nostdlib.

---
 gcc/config/darwin.h| 8 
 gcc/cp/g++spec.c   | 1 +
 gcc/doc/invoke.texi| 7 ++-
 gcc/fortran/gfortranspec.c | 1 +
 gcc/gcc.c  | 6 +++---
 gcc/go/gospec.c| 1 +
 gcc/testsuite/g++.dg/ipa/pr64059.C | 2 +-
 gcc/testsuite/g++.dg/lto/20081109-1_0.C| 2 +-
 gcc/testsuite/g++.dg/lto/20090302_0.C  | 2 +-
 gcc/testsuite/g++.dg/lto/pr45621_0.C   | 2 +-
 gcc/testsuite/g++.dg/lto/pr60567_0.C   | 2 +-
 gcc/testsuite/g++.dg/lto/pr62026.C | 2 +-
 gcc/testsuite/gcc.dg/lto/pr45736_0.c   | 2 +-
 gcc/testsuite/gcc.dg/lto/pr52634_0.c   | 2 +-
 gcc/testsuite/gfortran.dg/lto/20091016-1_0.f90 | 2 +-
 gcc/testsuite/gfortran.dg/lto/pr79108_0.f90| 2 +-
 16 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index cd6d6521658..87f610259c0 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -180,20 +180,20 @@ extern GTY(()) int darwin_ms_struct;
"%X %{s} %{t} %{Z} %{u*} \
 %{e*} %{r} \
 %{o*}%{!o:-o a.out} \
-%{!nostdlib:%{!nostartfiles:%S}} \
+%{!nostdlib:%{!r:%{!nostartfiles:%S}}} \
 %{L*} %(link_libgcc) %o %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} \
 %{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*:%*} 1): \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libgomp.a%s; : -lgomp } } \
 %{fgnu-tm: \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libitm.a%s; : -litm } } \
-%{!nostdlib:%{!nodefaultlibs:\
+%{!nostdlib:%{!r:%{!nodefaultlibs:\
   %{%:sanitize(address): -lasan } \
   %{%:sanitize(undefined): -lubsan } \
   %(link_ssp) \
   " DARWIN_EXPORT_DYNAMIC " %

Re: [Patch][GCC] Document and fix -r (partial linking)

2018-09-01 Thread Allan Sandfeld Jensen
On Montag, 27. August 2018 15:37:15 CEST Joseph Myers wrote:
> On Sun, 26 Aug 2018, Allan Sandfeld Jensen wrote:
> > Patch updated. I specifically edited a number of the existing tests that
> > used both -r and -nostdlib and removed -nostdlib so the patch is
> > exercised by existing tests. The patch bootstrapped, I didn't notice any
> > relevant failures when running the test suite (though I could have missed
> > something, I am never comfortable reading that output).
> 
> Note that Iain's comments also included that the patch is incomplete
> because of more specs in gcc.c (VTABLE_VERIFICATION_SPEC,
> SANITIZER_EARLY_SPEC, SANITIZER_SPEC) that needs corresponding updates to
> handle -r like -nostdlib.

Okay, I can add that, or whoever commits the patch can add that. We can also 
improve the feature if we discover more places that needs updating. Do you 
want me to post an version updated with with these two places? 

'Allan




Re: [Patch][GCC] Document and fix -r (partial linking)

2018-09-01 Thread Allan Sandfeld Jensen
On Montag, 27. August 2018 15:37:15 CEST Joseph Myers wrote:
> On Sun, 26 Aug 2018, Allan Sandfeld Jensen wrote:
> > Patch updated. I specifically edited a number of the existing tests that
> > used both -r and -nostdlib and removed -nostdlib so the patch is
> > exercised by existing tests. The patch bootstrapped, I didn't notice any
> > relevant failures when running the test suite (though I could have missed
> > something, I am never comfortable reading that output).
> 
> Note that Iain's comments also included that the patch is incomplete
> because of more specs in gcc.c (VTABLE_VERIFICATION_SPEC,
> SANITIZER_EARLY_SPEC, SANITIZER_SPEC) that needs corresponding updates to
> handle -r like -nostdlib.

Updated (but tests not rerun)>From 1d164bced7979c94767c260174e3c486d4fc8c5d Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Sat, 1 Sep 2018 12:59:14 +0200
Subject: [PATCH] Fix and document -r option

The option has existed and been working for years,
make sure it implies the right extra options, and list
it in the documentation.

2018-09-01 Allan Sandfeld Jensen 

gcc/doc/
* invoke.texi: Document -r.

gcc/
* gcc.c (LINK_COMMAND_SPEC): Handle -r like -nostdlib.
(VTABLE_VERIFICATION_SPEC): Ditto
(SANITIZER_EARLY_SPEC): Ditto
(SANITIZER_SPEC): Ditto
* config/darwin.h (LINK_COMMAND_SPEC): Ditto
* cp/g++spec.c (lang_specific_driver): Ditto
* fortran/gfortranspec.c (lang_specific_driver): Ditto
* go/gospec.c (lang_specific_driver): Ditto

gcc/testsuite/
* g++.dg/ipa/pr64059.C: Removed now redundant -nostdlib.
* g++.dg/lto/20081109-1_0.C: Ditto
* g++.dg/lto/20090302_0.C: Ditto
* g++.dg/lto/pr45621_0.C: Ditto
* g++.dg/lto/pr60567_0.C: Ditto
* g++.dg/lto/pr62026.C: Ditto
* gcc.dg/lto/pr45736_0.c: Ditto
* gcc.dg/lto/pr52634_0.c: Ditto
* gfortran.dg/lto/20091016-1_0.f90: Ditto
* gfortran.dg/lto/pr79108_0.f90: Ditto
---
 gcc/config/darwin.h|  8 
 gcc/cp/g++spec.c   |  1 +
 gcc/doc/invoke.texi|  7 ++-
 gcc/fortran/gfortranspec.c |  1 +
 gcc/gcc.c  | 18 +-
 gcc/go/gospec.c|  1 +
 gcc/testsuite/g++.dg/ipa/pr64059.C |  2 +-
 gcc/testsuite/g++.dg/lto/20081109-1_0.C|  2 +-
 gcc/testsuite/g++.dg/lto/20090302_0.C  |  2 +-
 gcc/testsuite/g++.dg/lto/pr45621_0.C   |  2 +-
 gcc/testsuite/g++.dg/lto/pr60567_0.C   |  2 +-
 gcc/testsuite/g++.dg/lto/pr62026.C |  2 +-
 gcc/testsuite/gcc.dg/lto/pr45736_0.c   |  2 +-
 gcc/testsuite/gcc.dg/lto/pr52634_0.c   |  2 +-
 gcc/testsuite/gfortran.dg/lto/20091016-1_0.f90 |  2 +-
 gcc/testsuite/gfortran.dg/lto/pr79108_0.f90|  2 +-
 16 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index cd6d6521658..87f610259c0 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -180,20 +180,20 @@ extern GTY(()) int darwin_ms_struct;
"%X %{s} %{t} %{Z} %{u*} \
 %{e*} %{r} \
 %{o*}%{!o:-o a.out} \
-%{!nostdlib:%{!nostartfiles:%S}} \
+%{!nostdlib:%{!r:%{!nostartfiles:%S}}} \
 %{L*} %(link_libgcc) %o %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} \
 %{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*:%*} 1): \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libgomp.a%s; : -lgomp } } \
 %{fgnu-tm: \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libitm.a%s; : -litm } } \
-%{!nostdlib:%{!nodefaultlibs:\
+%{!nostdlib:%{!r:%{!nodefaultlibs:\
   %{%:sanitize(address): -lasan } \
   %{%:sanitize(undefined): -lubsan } \
   %(link_ssp) \
   " DARWIN_EXPORT_DYNAMIC " %

[Patch] Make x86-64 a generic architecture as documented

2019-07-13 Thread Allan Sandfeld Jensen
Hello

Changing -march=x86-64 behaviour to match documentation and common usage.

Question: Is this also good for -m32 -march=x86-64 or will generic then mean 
something else?

Thank
Allan

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 364466b6b6f..90f9cbc3c35 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2019-07-13  Allan Sandfeld Jensen 
+
+   * gcc/common/config/i386/i386-common.c (processor_alias_table): Change
+   x86-64 architecture to use generic tuning and scheduling instead of 
K8.
+
 2019-07-11  Jakub Jelinek  

PR target/91124
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/
i386-common.c
index a394f874fe4..19ace226190 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -1673,7 +1673,7 @@ const pta processor_alias_table[] =
 PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR},
   {"athlon-mp", PROCESSOR_ATHLON, CPU_ATHLON,
 PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR},
-  {"x86-64", PROCESSOR_K8, CPU_K8,
+  {"x86-64", PROCESSOR_GENERIC, CPU_GENERIC,
 PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR},
   {"eden-x2", PROCESSOR_K8, CPU_K8,
 PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_FXSR},





Re: [PATCH] Come up with -flto=auto option.

2019-07-23 Thread Allan Sandfeld Jensen
On Dienstag, 23. Juli 2019 10:30:07 CEST Martin Liška wrote:
> Hi.
> 
> As we as openSUSE started using -flto, I see it very handy to have
> an option value that will automatically detect number of cores
> that can be used for parallel LTRANS phase.
> 
> Thoughts?
> 
That's really nice. 

How much extra work would it be to make it support a posix make jobserver? 

As far as I understand, you would need to guess a partition size first (as 
your patch here does), but then only start each job when given a token from 
the jobserver FD.

With that the integration to existing build infrastructure would be optimal.

Cheers
'Allan




Re: [PATCH] Come up with -flto=auto option.

2019-07-24 Thread Allan Sandfeld Jensen
On Mittwoch, 24. Juli 2019 08:45:21 CEST Martin Liška wrote:
> On 7/24/19 12:11 AM, Allan Sandfeld Jensen wrote:
> > On Dienstag, 23. Juli 2019 10:30:07 CEST Martin Liška wrote:
> >> Hi.
> >> 
> >> As we as openSUSE started using -flto, I see it very handy to have
> >> an option value that will automatically detect number of cores
> >> that can be used for parallel LTRANS phase.
> >> 
> >> Thoughts?
> > 
> > That's really nice.
> > 
> > How much extra work would it be to make it support a posix make jobserver?
> 
> We do support it via -flto=jobserver:
> 
Good to know :)

> 
> Problem is that nowadays you how much more common make systems like ninja,
> meson and others that probably do not support that.
> 
There are patches to enable it in ninja, and I know some Linux distros apply 
the patches by default. Though that is more listening, so it probably requires 
launching ninja using make, if you want to be able to pass it own to gcc.

'Allan




Re: issue with behavior change of gcc -r between gcc-8 and gcc-9

2020-04-02 Thread Allan Sandfeld Jensen
On Wednesday, 1 April 2020 19:48:11 CEST Olivier Hainque wrote:
> 
> -r 's business was to arrange for the linker not to
> complain because the closure is incomplete, leaving us
> with complete control of the closure.
> 
> It doesn't seem to me there was a really strong motivation
> to suddenly have -r influence the closure the way it now does.
> 
> Would it be possible to revert to the previous behavior
> and document it ?
> 
> Or maybe allow it to be controllable by the target ports ?
> 
> Or provide something to bring back the flexibility we had
> if we really believe the default should change ? (I'm not
> convinced)

-r is used for relinking. The idea behind the change was to make it directly 
suitable for that. It takes object files and relinks them into a new object 
file. It gives the caller complete control.

It sounds like you are missing some way to add startfiles? A reverse of 
-nostartfiles?

But hopefully you can just use the linker directly? Unless you have LTO 
enabled object files you dont need the compiler to link.

`Allan




<    1   2