Re: Rb_tree constructor optimization

2018-05-14 Thread François Dumont

On 11/05/2018 14:16, Jonathan Wakely wrote:

On 06/05/18 16:06 +0200, François Dumont wrote:

Here is the rework of this patch.


On 02/05/2018 13:49, Jonathan Wakely wrote:

There's no changelog entry with the patch, so to recap, the changes
are:

- noexcept specifications are automatically deduced instead of being
 stated explicitly.


I simplified this part, it is not deduced anymore except for Debug 
mode relying on Normal mode qualifications.




- The allocator-extended move constructors get correct exception
 specifications in Debug Mode (consistent with normal mode). This
 should be done anyway, independent of any other changes.

I kept this untouch.


- We avoid a `if (__x._M_root() != nullptr)` branch in the case where
 the allocator-extended move constructor is used with an always-equal
 allocator, right?

Kept too.


Compilation time for containers is already **much** slower since the
allocator-extended constructors were added for GCC 5.x, despite very
few people ever using those constructors. This patch seems to require
even more work from the compiler to parse constructors nobody uses.

I restore direct qualifications.

diff --git a/libstdc++-v3/include/bits/stl_map.h 
b/libstdc++-v3/include/bits/stl_map.h

index a4a026e..2b8fd27 100644
--- a/libstdc++-v3/include/bits/stl_map.h
+++ b/libstdc++-v3/include/bits/stl_map.h
@@ -240,8 +240,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER

  /// Allocator-extended move constructor.
  map(map&& __m, const allocator_type& __a)
- noexcept(is_nothrow_copy_constructible<_Compare>::value
-   && _Alloc_traits::_S_always_equal())
+  noexcept( noexcept(
+    _Rep_type(std::move(__m._M_t), declval<_Pair_alloc_type>())) )


All these calls to declval need to be qualified to avoid ADL.

It seems strange to have a mix of real values and declval expressions,
rather than:

_Rep_type(std::declval<_Rep_type>(), 
std::declval<_Pair_alloc_type>())) )

I add std:: qualitification and generalized usage.




-  _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a);
+    private:
+  _Rb_tree(_Rb_tree&& __x, _Node_allocator&& __a, true_type)
+ 
noexcept(is_nothrow_move_constructible<_Rb_tree_impl<_Compare>>::value)


This is wrong.


As stated previously I removed this but based on this remark I also 
change the _Rb_tree_impl() qualification so that it doesn't depend 
anymore on allocator default constructor qualification.


I've added a test for that too.

This can use std::is_nothrow_constructible


Indeed.

+   "noexcept move constructor with allocator" );
+
+struct ExceptLess


Please name this "ThrowingLess" instead of "ExceptLess", I think
that's more descriptive.
I prefered "not_noexcept" prefix cause those operators are not 
throwing anything neither.


Tested under Linux x86_64, ok to commit ?

    * include/bits/stl_tree.h
    (_Rb_tree_impl()):
    Remove is_nothrow_default_constructible<_Node_allocator> from 
noexcept

    qualification.
    (_Rb_tree_impl(_Rb_tree_impl&&, _Node_allocator&&)): New.
    (_Rb_tree(_Rb_tree&&, _Node_allocator&&, std::true_type)): New, 
use latter.

    (_Rb_tree(_Rb_tree&&, _Node_allocator&&, std::false_type)): New.
    (_Rb_tree(_Rb_tree&&, _Node_allocator&&)): Adapt, use latters.
    * include/debug/map.h
    (map(map&&, const_allocator_type&)): Add noexcept qualitication.
    * include/debug/multimap.h
    (multimap(multimap&&, const_allocator_type&)): Add noexcept 
qualification.

    * include/debug/set.h
    (set(set&&, const_allocator_type&)): Add noexcept qualitication.
    * include/debug/multiset.h
    (multiset(multiset&&, const_allocator_type&)): Add noexcept 
qualification.

    * testsuite/23_containers/map/cons/noexcept_default_construct.cc:
    Add checks.
    * testsuite/23_containers/map/cons/noexcept_move_construct.cc:
    Add checks.
    * 
testsuite/23_containers/multimap/cons/noexcept_default_construct.cc:

    Add checks.
    * testsuite/23_containers/multimap/cons/noexcept_move_construct.cc:
    Add checks.
    * 
testsuite/23_containers/multiset/cons/noexcept_default_construct.cc:

    Add checks.
    * testsuite/23_containers/multiset/cons/noexcept_move_construct.cc:
    Add checks.
    * testsuite/23_containers/set/cons/noexcept_default_construct.cc:
    Add checks.
    * testsuite/23_containers/set/cons/noexcept_move_construct.cc:
    Add checks.

François



diff --git a/libstdc++-v3/include/bits/stl_tree.h 
b/libstdc++-v3/include/bits/stl_tree.h

index d0a8448..9e59259 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -698,8 +698,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  _Rb_tree_impl()
   _GLIBCXX_NOEXCEPT_IF(
- is_nothrow_default_constructible<_Node_allocator>::value
-    && is_nothrow_default_constructible<_Base_key_compare>::value )
+    // is_nothrow_default_constructible<_Node_allocator>::value &&
+ is_nothrow_default_constructible<_Base_key_compare>::value )
  : _Node_allocator()
  { }


This 

Re: PING^1: [PATCH] C/C++: Add -Waddress-of-packed-member

2018-05-14 Thread Martin Sebor

On 05/14/2018 01:10 PM, H.J. Lu wrote:

On Mon, May 14, 2018 at 10:40 AM, H.J. Lu  wrote:


$ cat c.i
struct B { int i; };
struct C { struct B b; } __attribute__ ((packed));

long* g8 (struct C *p) { return p; }
$ gcc -O2 -S c.i -Wno-incompatible-pointer-types
c.i: In function ‘g8’:
c.i:4:33: warning: taking value of packed 'struct C *' may result in an
unaligned pointer value [-Waddress-of-packed-member]


 ^
That should read "taking address" (not value) but...


The value of 'struct C *' is an address. There is no address taken here.


...to help explain the problem I would suggest to mention the expected
and actual alignment in the warning message.  E.g.,

  storing the address of a packed 'struct C' in 'struct C *' increases the
alignment of the pointer from 1 to 4.


I will take a look.


(IIUC, the source type and destination type need not be the same so
including both should be helpful in those cases.)

Adding a note pointing to the declaration of either the struct or
the member would help users find it if it's a header far removed
from the point of use.


I will see what I can do.


How about this

[hjl@gnu-skx-1 pr51628]$ cat n9.i
struct B { int i; };
struct C { struct B b; } __attribute__ ((packed));

long* g8 (struct C *p) { return p; }
[hjl@gnu-skx-1 pr51628]$
/export/build/gnu/gcc-test/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-test/build-x86_64-linux/gcc/ -O2 -S n9.i
n9.i: In function ‘g8’:
n9.i:4:33: warning: returning ‘struct C *’ from a function with
incompatible return type ‘long int *’ [-Wincompatible-pointer-types]
 long* g8 (struct C *p) { return p; }
 ^
n9.i:4:33: warning: taking value of packed ‘struct C *’ increases the
alignment of the pointer from 1 to 8 [-Waddress-of-packed-member]
n9.i:2:8: note: defined here
 struct C { struct B b; } __attribute__ ((packed));


Mentioning the alignments looks good.

I still find the "taking value" phrasing odd.  I think we would
describe what's going on as "converting a pointer to a packed C
to a pointer to C (with an alignment of 8)" so I'd suggest to
use the term converting instead.

I also think mentioning both the source and the destination types
is useful irrespective of -Wincompatible-pointer-types because
the latter is often suppressed using a cast, as in:

  struct __attribute__ ((packed)) A { int i; };
  struct B {
struct A a;
  } b;

  long *p = (long*)   // -Waddress-of-packed-member
  int *q = (int*)   // missing warning

If the types above were obfuscated by macros, typedefs, or in
C++ template parameters, it could be difficult to figure out
what the type of the member is because neither it nor the name
of the member appears in the message.

Martin


[PATCH] restore -Wstringop-overflow for checked strcpy/strcat (PR 85259)

2018-05-14 Thread Martin Sebor

r256683 committed to GCC 8 to avoiding duplicate instances of
-Wstringop-overflow warnings on some targets has the unintended
side-effect of suppressing even singleton instances of the warning
in cases such as 'strcat (strcpy (buf, "hello "), "world!")' when
_FORTIFY_SOURCE is defined.

The attached patch restores the warning for the trunk.

Since this is a regression in a security feature and the warning
isn't prone to false positives (I don't think I've seen any when
_FORTIFY_SOURCE is defined), I'd also like the fix considered for
the 8 branch.

Thanks
Martin
PR tree-optimization/85259 - Missing -Wstringop-overflow= since r256683

gcc/ChangeLog:

	PR tree-optimization/85259
	* builtins.c (compute_objsize): Handle constant offsets.
	* gimple-ssa-warn-restrict.c (maybe_diag_offset_bounds): Return
	true iff a warning has been issued.
	* gimple.h (gimple_nonartificial_location): New function.
	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Call
	gimple_nonartificial_location and handle -Wno-system-headers.
	(handle_builtin_stxncpy): Same.

gcc/testsuite/ChangeLog:

	PR tree-optimization/85259
	* gcc.dg/Wstringop-overflow-5.c: New test.
	* gcc.dg/Wstringop-overflow-6.c: New test.
Index: gcc/builtins.c
===
--- gcc/builtins.c	(revision 259298)
+++ gcc/builtins.c	(working copy)
@@ -3329,10 +3329,29 @@ compute_objsize (tree dest, int ostype)
 	{
 	  /* compute_builtin_object_size fails for addresses with
 	 non-constant offsets.  Try to determine the range of
-	 such an offset here and use it to adjus the constant
+	 such an offset here and use it to adjust the constant
 	 size.  */
 	  tree off = gimple_assign_rhs2 (stmt);
-	  if (TREE_CODE (off) == SSA_NAME
+	  if (TREE_CODE (off) == INTEGER_CST)
+	{
+	  if (tree size = compute_objsize (dest, ostype))
+		{
+		  wide_int wioff = wi::to_wide (off);
+		  wide_int wisiz = wi::to_wide (size);
+
+		  /* Ignore negative offsets for now.  For others,
+		 use the lower bound as the most optimistic
+		 estimate of the (remaining) size.  */
+		  if (wi::sign_mask (wioff))
+		;
+		  else if (wi::ltu_p (wioff, wisiz))
+		return wide_int_to_tree (TREE_TYPE (size),
+	 wi::sub (wisiz, wioff));
+		  else
+		return size_zero_node;
+		}
+	}
+	  else if (TREE_CODE (off) == SSA_NAME
 	  && INTEGRAL_TYPE_P (TREE_TYPE (off)))
 	{
 	  wide_int min, max;
Index: gcc/gimple-ssa-warn-restrict.c
===
--- gcc/gimple-ssa-warn-restrict.c	(revision 259298)
+++ gcc/gimple-ssa-warn-restrict.c	(working copy)
@@ -1603,8 +1603,6 @@ maybe_diag_offset_bounds (location_t loc, gcall *c
 
   loc = expansion_point_location_if_in_system_header (loc);
 
-  tree type;
-
   char rangestr[2][64];
   if (ooboff[0] == ooboff[1]
   || (ooboff[0] != ref.offrange[0]
@@ -1615,6 +1613,8 @@ maybe_diag_offset_bounds (location_t loc, gcall *c
 	 (long long) ooboff[0].to_shwi (),
 	 (long long) ooboff[1].to_shwi ());
 
+  bool warned = false;
+
   if (oobref == error_mark_node)
 {
   if (ref.sizrange[0] == ref.sizrange[1])
@@ -1624,6 +1624,8 @@ maybe_diag_offset_bounds (location_t loc, gcall *c
 		 (long long) ref.sizrange[0].to_shwi (),
 		 (long long) ref.sizrange[1].to_shwi ());
 
+  tree type;
+
   if (DECL_P (ref.base)
 	  && TREE_CODE (type = TREE_TYPE (ref.base)) == ARRAY_TYPE)
 	{
@@ -1631,19 +1633,22 @@ maybe_diag_offset_bounds (location_t loc, gcall *c
 			  "%G%qD pointer overflow between offset %s "
 			  "and size %s accessing array %qD with type %qT",
 			  call, func, rangestr[0], rangestr[1], ref.base, type))
-	inform (DECL_SOURCE_LOCATION (ref.base),
-		"array %qD declared here", ref.base);
+	{
+	  inform (DECL_SOURCE_LOCATION (ref.base),
+		  "array %qD declared here", ref.base);
+	  warned = true;
+	}
 	  else
-	warning_at (loc, OPT_Warray_bounds,
-			"%G%qD pointer overflow between offset %s "
-			"and size %s",
-			call, func, rangestr[0], rangestr[1]);
+	warned = warning_at (loc, OPT_Warray_bounds,
+ "%G%qD pointer overflow between offset %s "
+ "and size %s",
+ call, func, rangestr[0], rangestr[1]);
 	}
   else
-	warning_at (loc, OPT_Warray_bounds,
-		"%G%qD pointer overflow between offset %s "
-		"and size %s",
-		call, func, rangestr[0], rangestr[1]);
+	warned = warning_at (loc, OPT_Warray_bounds,
+			 "%G%qD pointer overflow between offset %s "
+			 "and size %s",
+			 call, func, rangestr[0], rangestr[1]);
 }
   else if (oobref == ref.base)
 {
@@ -1674,22 +1679,26 @@ maybe_diag_offset_bounds (location_t loc, gcall *c
   "of object %qD with type %qT"),
 			 call, func, rangestr[0],
 			 ref.base, TREE_TYPE (ref.base)))
-	inform (DECL_SOURCE_LOCATION (ref.base),
-		"%qD declared here", ref.base);
+	{
+	  inform (DECL_SOURCE_LOCATION (ref.base),
+		  "%qD 

Re: [PATCH] improve -Wrestrict for struct members (PR 85753)

2018-05-14 Thread Jeff Law
On 05/11/2018 05:09 PM, Martin Sebor wrote:
> The attached patch extends -Wrestrict to constrain valid offset
> ranges into objects of struct types to the bounds of the object
> type, the same way the warning already handles arrays.  This
> makes it possible to detect overlapping accesses in cases like
> the second call to memcpy below:
> 
>   char a[16];
> 
>   struct { char a[16]; } x;
> 
>   void f (int i, int j)
>   {
> memcpy ([i], [j], 9);   // -Wrestrict (good)
> 
> memcpy ([i], [j], 9);   // missing -Wrestrict
>   }
> 
> These is no way to copy 9 bytes within a 16 byte array without
> either overlap or without accessing memory outside the bounaries
> of the object.
> 
> This is for GCC 9.
> 
> Thanks
> Martin
> 
> gcc-85753.diff
> 
> 
> PR tree-optimization/85753 - missing -Wrestrict on memcpy into a member array
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/85753
>   * gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref): Handle
>   RECORD_TYPE in addition to ARRAY_TYPE.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/85753
>   * gcc.dg/Wrestrict-10.c: Adjust.
>   * gcc.dg/Wrestrict-16.c: New test.
OK.
jeff


Re: [PATCH 1/2] Introduce prefetch-minimum stride option

2018-05-14 Thread Luis Machado

Hi,

Here's an updated version of the patch (now reverted) that addresses the 
previous bootstrap problem (signedness and long long/int conversion).


I've checked that it bootstraps properly on both aarch64-linux and 
x86_64-linux and that tests look sane.


James, would you please give this one a try to see if you can still 
reproduce PR85682? I couldn't reproduce it in multiple attempts.


Thanks,
Luis

On 01/22/2018 11:46 AM, Luis Machado wrote:

This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:

* Make the pass less aggressive, only issuing prefetch hints for bigger strides
that are more likely to benefit from prefetching. I've noticed a case in cpu2017
where we were issuing thousands of hints, for example.

* For processors that have a hardware prefetcher, like Falkor, it allows the
loop prefetch pass to defer prefetching of smaller (less than the threshold)
strides to the hardware prefetcher instead. This prevents conflicts between
the software prefetcher and the hardware prefetcher.

I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in terms of
prefetch behavior for Falkor.

The default settings should guarantee no changes for existing targets. Those
are free to tweak the settings as necessary.

No regressions in the testsuite and bootstrapped ok on aarch64-linux.

Ok?



>From e0207950a6d7793cdaceaa71fc5ada05a93dc1b3 Mon Sep 17 00:00:00 2001
From: Luis Machado 
Date: Thu, 21 Dec 2017 15:23:59 -0200
Subject: [PATCH 1/2] Introduce prefetch-minimum stride option

This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:

* Make the pass less aggressive, only issuing prefetch hints for bigger strides
that are more likely to benefit from prefetching. I've noticed a case in cpu2017
where we were issuing thousands of hints, for example.

* For processors that have a hardware prefetcher, like Falkor, it allows the
loop prefetch pass to defer prefetching of smaller (less than the threshold)
strides to the hardware prefetcher instead. This prevents conflicts between
the software prefetcher and the hardware prefetcher.

I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in terms of
prefetch behavior for Falkor.

The default settings should guarantee no changes for existing targets. Those
are free to tweak the settings as necessary.

No regressions in the testsuite and bootstrapped ok on aarch64-linux and
x86_64-linux.

Ok?

2018-05-14  Luis Machado  

	Introduce option to limit software prefetching to known constant
	strides above a specific threshold with the goal of preventing
	conflicts with a hardware prefetcher.

	gcc/
	* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
	: New const int field.
	* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
	minimum_stride field.
	(exynosm1_prefetch_tune): Likewise.
	(thunderxt88_prefetch_tune): Likewise.
	(thunderx_prefetch_tune): Likewise.
	(thunderx2t99_prefetch_tune): Likewise.
	(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
	: Set to 3.
	(aarch64_override_options_internal): Update to set
	PARAM_PREFETCH_MINIMUM_STRIDE.
	* doc/invoke.texi (prefetch-minimum-stride): Document new option.
	* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
	* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
	* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
	stride is constant and is below the minimum stride threshold.
---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 gcc/config/aarch64/aarch64.c| 13 -
 gcc/doc/invoke.texi | 15 +++
 gcc/params.def  |  9 +
 gcc/params.h|  2 ++
 gcc/tree-ssa-loop-prefetch.c| 17 +
 6 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index cda2895..5d3b9d7 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -230,6 +230,9 @@ struct cpu_prefetch_tune
   const int l1_cache_size;
   const int l1_cache_line_size;
   const int l2_cache_size;
+  /* The minimum constant stride beyond which we should use prefetch
+ hints for.  */
+  const int minimum_stride;
   const int default_opt_level;
 };
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a2003fe..5215deb 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =
   -1,			/* l1_cache_size  */
   -1,			

Re: [PATCH] consider offsets when handling nonstrings in -Wstringop-truncation (PR 85643)

2018-05-14 Thread Jeff Law
On 05/11/2018 10:21 AM, Martin Sebor wrote:
> The attached tweak suppresses some -Wstringop-truncation false
> positives when a pointer into an array declared nonstring is
> passed to a function that expects a string argument such as in:
> 
>   char a[8] __attribute__ ((nonstring));
>   strncpy (a + 1, s, sizeof a - 1);
> 
> I'd like to commit this fix to both trunk and 8-branch.
> 
> Thanks
> Martin
> 
> gcc-85643.diff
> 
> 
> PR middle-end/85643 - attribute nonstring fails to squash 
> -Wstringop-truncation warning
> 
> gcc/ChangeLog:
> 
>   PR middle-end/85643
>   * calls.c (get_attr_nonstring_decl): Handle MEM_REF.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR middle-end/85643
>   * c-c++-common/attr-nonstring-7.c: New test.
OK for the trunk.  It's borderline for the release branch, but I'll ack
it there too after a week of soak time on the trunk.

jeff


Re: [PATCH] DWARF: Use DW_OP_addrx and DW_OP_constx for DWARF5.

2018-05-14 Thread Jeff Law
On 05/14/2018 07:35 AM, Mark Wielaard wrote:
> For older DWARF and -gsplit-dwarf we want to emit DW_OP_GNU_addr_index
> and DW_OP_GNU_const_index, but for DWARF5 we should use DW_OP_addrx
> and DW_OP_constx.
> 
> gcc/ChangeLog:
> 
>   * dwarf2out.c (dwarf_OP): Handle DW_OP_addrx and DW_OP_constx.
>   (size_of_loc_descr): Likewise.
>   (output_loc_operands): Likewise.
>   (output_loc_operands_raw): Likewise.
>   (dw_addr_op): Use dwarf_OP () for DW_OP_constx and DW_OP_addrx.
>   (resolve_addr_in_expr): Handle DW_OP_addrx and DW_OP_constx.
>   (hash_loc_operands): Likewise.
>   (compare_loc_operands): Likewise.
OK.
jeff


Re: [PATCH] DWARF calculate the number of indexed addresses.

2018-05-14 Thread Jeff Law
On 05/14/2018 07:36 AM, Mark Wielaard wrote:
> The length in the .debug_addr unit header was calculated using the number
> of elements in the addr_index_table. This is wrong because the entries in
> the table are refcounted and only those with a refcount > 0 are actually
> put in the index. Add a helper function count_index_addrs to get the
> correct number of addresses in the index.
> 
> gcc/ChangeLog:
> 
>   * dwarf2out.c (count_index_addrs): New function.
>   (dwarf2out_finish): Use count_index_addrs to calculate addrs_length.
OK.
jeff


Re: [PR63185][RFC] Improve DSE with branches

2018-05-14 Thread Jeff Law
On 05/13/2018 09:37 PM, Kugan Vivekanandarajah wrote:
> Hi,
> 
> Attached patch handles PR63185 when we reach PHI with temp != NULLL.
> We could see the PHI and if there isn't any uses for PHI that is
> interesting, we could ignore that ?
> 
> Bootstrapped and regression tested on x86_64-linux-gnu.
> Is this OK?
> 
> Thanks,
> Kugan
> 
> 
> gcc/ChangeLog:
> 
> 2018-05-14  Kugan Vivekanandarajah  
> 
> * tree-ssa-dse.c (phi_dosent_define_nor_use_p): New.
> (dse_classify_store): Use phi_dosent_define_nor_use_p.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-05-14  Kugan Vivekanandarajah  
> 
> * gcc.dg/tree-ssa/ssa-dse-33.c: New test.
> 
So in this case the PHI post-dominates the store.


;;   basic block 2, loop depth 0
;;pred:   ENTRY
  # .MEM_3 = VDEF <.MEM_2(D)>
  p_4 = malloc (1024);
  # .MEM_5 = VDEF <.MEM_3>
  memset (p_4, 8, 1024);
  if (n_6(D) != 0)
goto ; [INV]
  else
goto ; [INV]
;;succ:   3
;;4

;;   basic block 3, loop depth 0
;;pred:   2
  # .MEM_7 = VDEF <.MEM_5>
  g ();
;;succ:   4

;;   basic block 4, loop depth 0
;;pred:   2
;;3
  # .MEM_1 = PHI <.MEM_5(2), .MEM_7(3)>
  # VUSE <.MEM_1>
  return;
;;succ:   EXIT

}

So the overall processing in this case is something like this:

We call into dse_classify_store for the memset.  The first immediate use
is the call to g(), which does not do an aliased read.  So we set TEMP
to g() and continue processing immediate uses.

We then process the PHI as an immediate use.  We fail to walk through
the PHI because we've already walked through the call to g() (as
indicated by g() being stored in TEMP).

I think we can get the effect we want by realizing that the PHI is a
sink for the statement in TEMP.  That in turn allows the existing
mechanisms to walk through the PHI to work.  We would have to somehow
verify that we hadn't cleared anything in LIVE_BYTES though.

Jeff



Re: [PATCH, v2] Recognize a missed usage of a sbfiz instruction

2018-05-14 Thread Luis Machado

On 05/11/2018 06:46 AM, Kyrill Tkachov wrote:

Hi Luis,

On 10/05/18 11:31, Luis Machado wrote:


On 05/09/2018 10:44 AM, Kyrill Tkachov wrote:


On 09/05/18 13:30, Luis Machado wrote:

Hi Kyrill,

On 05/08/2018 11:09 AM, Kyrill Tkachov wrote:

Hi Luis,

On 07/05/18 15:28, Luis Machado wrote:

Hi,

On 02/08/2018 10:45 AM, Luis Machado wrote:

Hi Kyrill,

On 02/08/2018 09:48 AM, Kyrill Tkachov wrote:

Hi Luis,

On 06/02/18 15:04, Luis Machado wrote:
Thanks for the feedback Kyrill. I've adjusted the v2 patch 
based on your

suggestions and re-tested the changes. Everything is still sane.


Thanks! This looks pretty good to me.

Since this is ARM-specific and fairly specific, i wonder if it 
would be

reasonable to consider it for inclusion at the current stage.


It is true that the target maintainers can choose to take
such patches at any stage. However, any patch at this stage 
increases

the risk of regressions being introduced and these regressions
can come bite us in ways that are very hard to anticipate.

Have a look at some of the bugs in bugzilla (or a quick scan of 
the gcc-bugs list)
for examples of the ways that things can go wrong with any of 
the myriad of GCC components

and the unexpected ways in which they can interact.

For example, I am now working on what I initially thought was a 
one-liner fix for
PR 84164 but it has expanded into a 3-patch series with a midend 
component and

target-specific changes for 2 ports.

These issues are very hard to catch during review and normal 
testing, and can sometimes take months of deep testing by
fuzzing and massive codebase rebuilds to expose, so the closer 
the commit is to a release
the higher the risk is that an obscure edge case will be 
unnoticed and unfixed in the release.


So the priority at this stage is to minimise the risk of 
destabilising the codebase,
as opposed to taking in new features and desirable performance 
improvements (like your patch!)


That is the rationale for delaying committing such changes until 
the start
of GCC 9 development. But again, this is up to the aarch64 
maintainers.
I'm sure the patch will be a perfectly fine and desirable commit 
for GCC 9.

This is just my perspective as maintainer of the arm port.


Thanks. Your explanation makes the situation pretty clear and it 
sounds very reasonable. I'll put the patch on hold until 
development is open again.


Regards,
Luis


With GCC 9 development open, i take it this patch is worth 
considering again?




Yes, I believe the latest version is at:
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00239.html ?

+(define_insn "*ashift_extv_bfiz"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+    (ashift:GPI (sign_extract:GPI (match_operand:GPI 1 
"register_operand" "r")
+  (match_operand 2 
"aarch64_simd_shift_imm_offset_" "n")
+  (match_operand 3 
"aarch64_simd_shift_imm_" "n"))

+ (match_operand 4 "aarch64_simd_shift_imm_" "n")))]
+  ""
+  "sbfiz\\t%0, %1, %4, %2"
+  [(set_attr "type" "bfx")]
+)
+


Indeed.



Can you give a bit more information about what are the values for 
operands 2,3 and 4 in your example testcases?


For sbfiz32 we have 3, 0 and 19 respectively. For sbfiz64 we have 6, 
0 and 38.


I'm trying to understand why the value of operand 3 (the bit 
position the sign-extract starts from) doesn't get validated

in any way and doesn't play any role in the output...


This may be an oversight. It seems operand 3 will always be 0 in 
this particular case i'm covering. It starts from 0, gets shifted x 
bits to the left and then y < x bits to the right). The operation is 
essentially an ashift of the bitfield followed by a sign-extension 
of the msb of the bitfield being extracted.


Having a non-zero operand 3 from RTL means the shift amount won't 
translate directly to operand 3 of sbfiz (the position). Then we'd 
need to do a calculation where we take into account operand 3 from RTL.


I'm wondering when such a RTL pattern, with a non-zero operand 3, 
would be generated though.


I think it's best to enforce that operand 3 is a zero. Maybe just 
match const_int 0 here directly.

Better safe than sorry with these things.


Indeed. I've updated the original patch with that change now.

Bootstrapped and regtested on aarch64-linux.



I think you might also want to enforce that the sum of operands 2 and 3 
doesn't exceed the width of the mode
(see other patterns in aarch64.md that generate bfx-style instructions 
for similar restrictions).


Updated now in the attached patch. Everything still sane with bootstrap 
and testsuite on aarch64-linux.




Otherwise the patch looks good to me (it will still need approval from a 
maintainer).


For the future I think there's also an opportunity to match the SImode 
version of this pattern that zero-extends to DImode,
making use of the implicit zero-extension that writing to a W-dreg 
provides. But that would be a separate patch.


Indeed. I'll have a TODO for 

Re: [RFC] Improve tree DSE

2018-05-14 Thread Jeff Law
On 05/01/2018 05:16 PM, Kugan Vivekanandarajah wrote:
>> I'm also struggling a little bit to see much value in handling this
>> case.  In the included testcases we've got a memset in a loop where the
>> args do not vary across the loop iterations and there are no reads from
>> the memory location within the loop. How realistic is that?
> 
> I was looking into another case from an application but that was not
> handled partly due to limitations of alias analysis and thought that
> this could be handled. If you think that this is not going to happen
> often in practice, I agree that this is not worth the trouble.
I'm not sure, that's why I asked :-)

If you've seen this happen in your larger application or it's a
prerequisite for addressing a missed DSE in your larger application,
then that's probably enough justification to move forward with the
simpler solution Richi suggested.


Jeff


Re: Improve std::rotate usages

2018-05-14 Thread François Dumont

Any feedback regarding this patch ?


On 02/05/2018 07:26, François Dumont wrote:

Hi

    std::rotate already returns the expected iterator so there is no 
need for calls to std::advance/std::distance.


Tested under Linux x86_64, ok to commit ?

François





C++ PATCH to improve C++ function type variant handling

2018-05-14 Thread Jason Merrill
I haven't been able to create a testcase to demonstrate a bug, but
I've been uneasy for a while about how we blithely set
TYPE_HAS_LATE_RETURN_TYPE on function types without regard to how they
might be shared.

This patch handles that flag like TYPE_RAISES_EXCEPTIONS and the
ref-qualifier flags, and factors out that handling into a
build_cp_fntype_variant function that the existing
build_exception_variant and build_ref_qualified_type functions become
wrappers for.

I then went through and changed most of the existing uses of those
latter functions to use either the new function or
cxx_copy_lang_qualifiers.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 8f13581eefeacd7af96b672b00115a4eec4e2ef7
Author: Jason Merrill 
Date:   Mon Apr 2 17:42:14 2018 -0400

Handle TYPE_HAS_LATE_RETURN_TYPE like ref-qualifier and eh spec.

* tree.c (build_cp_fntype_variant): New.
(build_ref_qualified_type, build_exception_variant)
(strip_typedefs, cxx_copy_lang_qualifiers): Use it.
(cxx_type_hash_eq, cp_check_qualified_type): Check
TYPE_HAS_LATE_RETURN_TYPE.
(cp_build_type_attribute_variant): Check cxx_type_hash_eq.
(cp_build_qualified_type_real): No need to preserve C++ qualifiers.
* class.c (build_clone): Use cxx_copy_lang_qualifiers.
(adjust_clone_args): Likewise.
* decl.c (grokfndecl): Add late_return_type_p parameter.  Use
build_cp_fntype_variant.
(grokdeclarator): Pass late_return_type_p to grokfndecl.
(check_function_type): Use cxx_copy_lang_qualifiers.
(static_fn_type): Use cxx_copy_lang_qualifiers.
* decl2.c (build_memfn_type, maybe_retrofit_in_chrg)
(cp_reconstruct_complex_type, coerce_new_type, coerce_delete_type)
(change_return_type): Use cxx_copy_lang_qualifiers.
* mangle.c (write_type): Use cxx_copy_lang_qualifiers.
* parser.c (cp_parser_lambda_declarator_opt): Represent an explicit
return type on the declarator like a normal trailing return type.
* pt.c (tsubst_function_type): Use build_cp_fntype_variant.
(copy_default_args_to_explicit_spec): Use cxx_copy_lang_qualifiers.
* typeck.c (merge_types): Use build_cp_fntype_variant.

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 4616d8d3036..83ac78fc8b9 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -4469,13 +4469,8 @@ build_clone (tree fn, tree name)
  type.  */
   if (DECL_HAS_IN_CHARGE_PARM_P (clone))
 {
-  tree basetype;
-  tree parmtypes;
-  tree exceptions;
-
-  exceptions = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (clone));
-  basetype = TYPE_METHOD_BASETYPE (TREE_TYPE (clone));
-  parmtypes = TYPE_ARG_TYPES (TREE_TYPE (clone));
+  tree basetype = TYPE_METHOD_BASETYPE (TREE_TYPE (clone));
+  tree parmtypes = TYPE_ARG_TYPES (TREE_TYPE (clone));
   /* Skip the `this' parameter.  */
   parmtypes = TREE_CHAIN (parmtypes);
   /* Skip the in-charge parameter.  */
@@ -4494,12 +4489,11 @@ build_clone (tree fn, tree name)
 	= build_method_type_directly (basetype,
   TREE_TYPE (TREE_TYPE (clone)),
   parmtypes);
-  if (exceptions)
-	TREE_TYPE (clone) = build_exception_variant (TREE_TYPE (clone),
-		 exceptions);
   TREE_TYPE (clone)
 	= cp_build_type_attribute_variant (TREE_TYPE (clone),
 	   TYPE_ATTRIBUTES (TREE_TYPE (fn)));
+  TREE_TYPE (clone)
+	= cxx_copy_lang_qualifiers (TREE_TYPE (clone), TREE_TYPE (fn));
 }
 
   /* Copy the function parameters.  */
@@ -4687,11 +4681,6 @@ adjust_clone_args (tree decl)
 	{
 	  /* A default parameter has been added. Adjust the
 		 clone's parameters.  */
-	  tree exceptions = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (clone));
-	  tree attrs = TYPE_ATTRIBUTES (TREE_TYPE (clone));
-	  tree basetype = TYPE_METHOD_BASETYPE (TREE_TYPE (clone));
-	  tree type;
-
 	  clone_parms = orig_decl_parms;
 
 	  if (DECL_HAS_VTT_PARM_P (clone))
@@ -4701,13 +4690,15 @@ adjust_clone_args (tree decl)
 	   clone_parms);
 		  TREE_TYPE (clone_parms) = TREE_TYPE (orig_clone_parms);
 		}
-	  type = build_method_type_directly (basetype,
-		 TREE_TYPE (TREE_TYPE (clone)),
-		 clone_parms);
-	  if (exceptions)
-		type = build_exception_variant (type, exceptions);
-	  if (attrs)
+
+	  tree basetype = TYPE_METHOD_BASETYPE (TREE_TYPE (clone));
+	  tree type
+		= build_method_type_directly (basetype,
+	  TREE_TYPE (TREE_TYPE (clone)),
+	  clone_parms);
+	  if (tree attrs = TYPE_ATTRIBUTES (TREE_TYPE (clone)))
 		type = cp_build_type_attribute_variant (type, attrs);
+	  type = cxx_copy_lang_qualifiers (type, TREE_TYPE (clone));
 	  TREE_TYPE (clone) = type;
 
 	  clone_parms = NULL_TREE;
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 353bc6abde3..881b796e05f 

[PATCH] rs6000: Fix -mreadonly-in-sdata documentation

2018-05-14 Thread Segher Boessenkool
For some reason I made both an @item and an @itemx for
-mreadonly-in-sdata.  This fixes it.  Committing.


Segher


2018-05-14  Segher Boessenkool  

* doc/invoke.texi (RS/6000 and PowerPC Options): Delete @itemx for
-mreadonly-in-sdata.

---
 gcc/doc/invoke.texi | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ecb1550..1297aef 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -24032,7 +24032,6 @@ in the @code{.data} section, and all uninitialized data 
in the
 @code{.bss} section.
 
 @item -mreadonly-in-sdata
-@itemx -mreadonly-in-sdata
 @opindex mreadonly-in-sdata
 @opindex mno-readonly-in-sdata
 Put read-only objects in the @code{.sdata} section as well.  This is the
-- 
1.8.3.1



Re: PING^1: [PATCH] C/C++: Add -Waddress-of-packed-member

2018-05-14 Thread H.J. Lu
On Mon, May 14, 2018 at 10:40 AM, H.J. Lu  wrote:

 $ cat c.i
 struct B { int i; };
 struct C { struct B b; } __attribute__ ((packed));

 long* g8 (struct C *p) { return p; }
 $ gcc -O2 -S c.i -Wno-incompatible-pointer-types
 c.i: In function ‘g8’:
 c.i:4:33: warning: taking value of packed 'struct C *' may result in an
 unaligned pointer value [-Waddress-of-packed-member]
>>
>>  ^
>> That should read "taking address" (not value) but...
>
> The value of 'struct C *' is an address. There is no address taken here.
>
>> ...to help explain the problem I would suggest to mention the expected
>> and actual alignment in the warning message.  E.g.,
>>
>>   storing the address of a packed 'struct C' in 'struct C *' increases the
>> alignment of the pointer from 1 to 4.
>
> I will take a look.
>
>> (IIUC, the source type and destination type need not be the same so
>> including both should be helpful in those cases.)
>>
>> Adding a note pointing to the declaration of either the struct or
>> the member would help users find it if it's a header far removed
>> from the point of use.
>
> I will see what I can do.

How about this

[hjl@gnu-skx-1 pr51628]$ cat n9.i
struct B { int i; };
struct C { struct B b; } __attribute__ ((packed));

long* g8 (struct C *p) { return p; }
[hjl@gnu-skx-1 pr51628]$
/export/build/gnu/gcc-test/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-test/build-x86_64-linux/gcc/ -O2 -S n9.i
n9.i: In function ‘g8’:
n9.i:4:33: warning: returning ‘struct C *’ from a function with
incompatible return type ‘long int *’ [-Wincompatible-pointer-types]
 long* g8 (struct C *p) { return p; }
 ^
n9.i:4:33: warning: taking value of packed ‘struct C *’ increases the
alignment of the pointer from 1 to 8 [-Waddress-of-packed-member]
n9.i:2:8: note: defined here
 struct C { struct B b; } __attribute__ ((packed));
^
[hjl@gnu-skx-1 pr51628]$


-- 
H.J.


[PATCH] PR libstdc++/81256 fix exception handling in basic_filebuf::close

2018-05-14 Thread Jonathan Wakely

As explained in the PR, we were incorrectly swallowing exceptions from
basic_filebuf::close(). They should propagate from that function, but
still be swallowed in the destructor.

PR libstdc++/81256
* include/bits/fstream.tcc (basic_filebuf::close): Do not swallow
exceptions from _M_terminate_output().
* include/std/fstream (basic_filebuf::~basic_filebuf): Swallow any
exceptions from close().
* testsuite/27_io/basic_filebuf/close/81256.cc: New.

Tested powerp64le-linux, committed to trunk.


commit adcfbaa7023dc8d2d5cfa0b6971a99d0ed33a1f5
Author: Jonathan Wakely 
Date:   Mon May 14 17:13:05 2018 +0100

PR libstdc++/81256 fix exception handling in basic_filebuf::close

PR libstdc++/81256
* include/bits/fstream.tcc (basic_filebuf::close): Do not swallow
exceptions from _M_terminate_output().
* include/std/fstream (basic_filebuf::~basic_filebuf): Swallow any
exceptions from close().
* testsuite/27_io/basic_filebuf/close/81256.cc: New.

diff --git a/libstdc++-v3/include/bits/fstream.tcc 
b/libstdc++-v3/include/bits/fstream.tcc
index f23ff7af4eb..08cf189ee06 100644
--- a/libstdc++-v3/include/bits/fstream.tcc
+++ b/libstdc++-v3/include/bits/fstream.tcc
@@ -239,13 +239,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if (!_M_terminate_output())
  __testfail = true;
  }
-   __catch(__cxxabiv1::__forced_unwind&)
+   __catch(...)
  {
_M_file.close();
__throw_exception_again;
  }
-   __catch(...)
- { __testfail = true; }
   }
 
   if (!_M_file.close())
diff --git a/libstdc++-v3/include/std/fstream b/libstdc++-v3/include/std/fstream
index 3a5895d68b0..05661d9d58f 100644
--- a/libstdc++-v3/include/std/fstream
+++ b/libstdc++-v3/include/std/fstream
@@ -244,7 +244,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*/
   virtual
   ~basic_filebuf()
-  { this->close(); }
+  {
+   __try
+ { this->close(); }
+   __catch(...)
+ { }
+  }
 
 #if __cplusplus >= 201103L
   basic_filebuf& operator=(const basic_filebuf&) = delete;
diff --git a/libstdc++-v3/testsuite/27_io/basic_filebuf/close/81256.cc 
b/libstdc++-v3/testsuite/27_io/basic_filebuf/close/81256.cc
new file mode 100644
index 000..aef7364526b
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/basic_filebuf/close/81256.cc
@@ -0,0 +1,109 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+#include 
+
+struct E : std::runtime_error
+{
+  E() : runtime_error("") { }
+};
+
+struct Cvt : std::codecvt
+{
+  explicit Cvt(size_t refs) : codecvt(refs) { }
+
+  mutable int exceptions_thrown = 0;
+
+private:
+  int
+  do_encoding() const noexcept override
+  { return -1; }
+
+  bool
+  do_always_noconv() const noexcept override
+  { return false; }
+
+  result
+  do_unshift(state_type&, char*, char*, char*&) const override
+  {
+++exceptions_thrown;
+throw E();
+  }
+};
+
+struct filebuf : std::basic_filebuf
+{
+  explicit filebuf(Cvt* c)
+  {
+std::locale loc(std::locale::classic(), c);
+imbue(loc);
+  }
+};
+
+void
+test01()
+{
+  // This facet needs to still be valid when ~basic_filebuf runs:
+  Cvt conv{1};
+  {
+filebuf fb();
+fb.open("output.txt", std::wios::out);
+fb.sputn(L"x", 1);
+
+bool caught = false;
+try
+{
+  /* [filebuf.members] p7: If one of these calls throws an exception,
+   * the exception is caught and rethrown after closing the file.  */
+  fb.close();
+}
+catch (const E&)
+{
+  caught = true;
+}
+VERIFY( conv.exceptions_thrown == 1 );
+VERIFY( caught );
+  }
+  VERIFY( conv.exceptions_thrown == 1 );
+}
+
+void
+test02()
+{
+  // This facet needs to still be valid when ~basic_filebuf runs:
+  Cvt conv{1};
+  {
+filebuf fb();
+fb.open("output.txt", std::wios::out);
+fb.sputn(L"x", 1);
+/* [filebuf.cons] p5: If an exception occurs during the destruction
+ * of the object, including the call to close(), the exception is
+ * caught but not rethrown.  */
+  }
+  VERIFY( 

Re: [PATCH] use string length to relax -Wstringop-overflow for nonstrings (PR 85623)

2018-05-14 Thread Martin Sebor

On 05/14/2018 10:43 AM, Franz Sirl wrote:

Am 2018-05-10 um 21:26 schrieb Martin Sebor:

GCC 8.1 warns for unbounded (and some bounded) string comparisons
involving arrays declared attribute nonstring (i.e., char arrays
that need not be nul-terminated).  For instance:

   extern __attribute__((nonstring)) char a[4];

   int f (void)
   {
 return strncmp (a, "123", sizeof a);
   }

   warning: ‘strcmp’ argument 1 declared attribute ‘nonstring’

Note that the warning refers to strcmp even though the call in
the source is to strncmp, because prior passes transform one to
the other.

The warning above is unnecessary (for strcmp) and incorrect for
strncmp because the call reads exactly four bytes from the non-
string array a regardless of the bound and so there is no risk
that it will read past the end of the array.

The attached change enhances the warning to use the length of
the string argument to suppress some of these needless warnings
for both bounded and unbounded string comparison functions.
When the length of the string is unknown, the warning uses its
size (when possible) as the upper bound on the number of accessed
bytes.  The change adds no new warnings.

I'm looking for approval to commit it to both trunk and 8-branch.


Hi Martin,

this patch is a nice improvement and makes "nonstring" a lot more
useable. But shouldn't the attribute also handle these cases (similar to
PR 85602):

#include 

typedef struct {
char segname[16] __attribute__((__nonstring__));
char sectname[16] __attribute__((__nonstring__));
} MachO_header;

const char *get_seg_sect(MachO_header *hdr)
{
static char segname_sectname[sizeof(hdr->segname) +
sizeof(hdr->sectname) + 2];

memset(segname_sectname, 0, sizeof(segname_sectname));
strncpy(segname_sectname, hdr->segname, sizeof(hdr->segname));
strcat(segname_sectname, "@");
strncat(segname_sectname, hdr->sectname, sizeof(hdr->sectname));

return segname_sectname;
}

$ gcc-8 -c -O2 -W -Wall test-macho.c
In file included from /usr/include/string.h:630,
 from test-macho.c:1:
test-macho.c: In function 'get_seg_sect':
test-macho.c:14:48: warning: argument to 'sizeof' in '__builtin_strncpy'
call is the same expression as the source; did you mean to use the size
of the destination? [-Wsizeof-pointer-memaccess]
  strncpy(segname_sectname, hdr->segname, sizeof(hdr->segname));
^
test-macho.c:16:49: warning: argument to 'sizeof' in '__builtin_strncat'
call is the same expression as the source; did you mean to use the size
of the destination? [-Wsizeof-pointer-memaccess]
  strncat(segname_sectname, hdr->sectname, sizeof(hdr->sectname));
 ^

As you can see __attribute__((__nonstring__)) doesn't silence the warning.


Thanks for the heads up.  I've added my comments to the bug.
I will see if I can enhance the warning to detect the attribute
and suppress the warning for the use case above.  I think I'd
prefer to do that separately from this bug fix since the two
affect different warnings and will likely need changes to
different areas of the compiler (front-end vs middle-end).

Martin


Re: PING^1: [PATCH] C/C++: Add -Waddress-of-packed-member

2018-05-14 Thread H.J. Lu
On Mon, May 14, 2018 at 9:18 AM, Martin Sebor  wrote:
> On 05/14/2018 07:44 AM, H.J. Lu wrote:
>>
>> On Wed, Apr 25, 2018 at 7:54 PM, H.J. Lu  wrote:
>>>
>>> When address of packed member of struct or union is taken, it may result
>>> in an unaligned pointer value.  This patch adds
>>> -Waddress-of-packed-member
>>> to check alignment at pointer assignment and warn unaligned address as
>>> well as unaligned pointer:
>
>
> This isn't a complete review, just some high level observations
> and suggestions for things I noticed.
>
>>> $ cat x.i
>>> struct pair_t
>>> {
>>>   char c;
>>>   int i;
>>> } __attribute__ ((packed));
>>>
>>> extern struct pair_t p;
>>> int *addr = 
>>> $ gcc -O2 -S x.i
>>> x.i:8:13:  warning: taking address of packed member of 'struct pair_t'
>>> may result in an unaligned pointer value [-Waddress-of-packed-member]
>>>  int *addr = 
>>>  ^
>>> $ cat c.i
>>> struct B { int i; };
>>> struct C { struct B b; } __attribute__ ((packed));
>>>
>>> long* g8 (struct C *p) { return p; }
>>> $ gcc -O2 -S c.i -Wno-incompatible-pointer-types
>>> c.i: In function ‘g8’:
>>> c.i:4:33: warning: taking value of packed 'struct C *' may result in an
>>> unaligned pointer value [-Waddress-of-packed-member]
>
>  ^
> That should read "taking address" (not value) but...

The value of 'struct C *' is an address. There is no address taken here.

> ...to help explain the problem I would suggest to mention the expected
> and actual alignment in the warning message.  E.g.,
>
>   storing the address of a packed 'struct C' in 'struct C *' increases the
> alignment of the pointer from 1 to 4.

I will take a look.

> (IIUC, the source type and destination type need not be the same so
> including both should be helpful in those cases.)
>
> Adding a note pointing to the declaration of either the struct or
> the member would help users find it if it's a header far removed
> from the point of use.

I will see what I can do.

> Here's a question: Should the following trigger the warning (it
> doesn't, either with your patch or with Clang, even though the
> pointer is misaligned).
>
>   struct __attribute__ ((packed)) A { int i; } a;
>   struct B: A { int j; } b;
>
>   int B::*pbi = ::i;

Can you show the misaligned pointer in the generated code?

>>>  long* g8 (struct C *p) { return p; }
>>>  ^
>>> $
>>>
>>> This warning is enabled by default.
>
>
> Can you please explain the reasoning behind this decision?
>
> Is it because the warning is meant to have no false positives
> (and the GCC diagnostic guidelines suggest that such warnings
> be on by default), or because Clang enables it by default?

I don't have strong opinion here.

> I ask because the phrasing of the warning "may result" suggests

"may result" is used since 4-byte alignment is also 1-byte alignment.

> it's not free of false positives.  I'm probably missing something,
> and so...
>
> ...I would suggest to expand the documentation a bit and add
> an example showing when the warning triggers, when it doesn't.
> The added text says that taking the address of a packed member
> "usually results in an unaligned pointer value."  When does it
> not result in one?  Would a warning in such cases be a false
> positive?

Since any alignment is 1-byte alignment, a 1-byte aligned address
may be 8-byte aligned.

>
>>>  Since read_encoded_value_with_base
>>> in unwind-pe.h has
>>>
>>>   union unaligned
>>> {
>>>   void *ptr;
>>>   unsigned u2 __attribute__ ((mode (HI)));
>>>   unsigned u4 __attribute__ ((mode (SI)));
>>>   unsigned u8 __attribute__ ((mode (DI)));
>>>   signed s2 __attribute__ ((mode (HI)));
>>>   signed s4 __attribute__ ((mode (SI)));
>>>   signed s8 __attribute__ ((mode (DI)));
>>> } __attribute__((__packed__));
>>>   _Unwind_Internal_Ptr result;
>>>
>>> and GCC warns:
>>>
>>> gcc/libgcc/unwind-pe.h:210:37: warning: taking address of packed member
>>> of 'union unaligned' may result in an unaligned pointer value
>>> [-Waddress-of-packed-member]
>>> result = (_Unwind_Internal_Ptr) u->ptr;
>>> ^
>>> we need to add GCC pragma to ignore -Waddress-of-packed-member.
>>>
>>> OK for trunk?
>>>
>>> H.J.
>>> 
>>> gcc/c/
>>>
>>> PR c/51628
>>> * doc/invoke.texi: Document -Wno-address-of-packed-member.
>>>
>>> gcc/c-family/
>>>
>>> PR c/51628
>>> * c-common.h (warn_for_address_of_packed_member): New.
>>> * c-warn.c (check_address_of_packed_member): New function.
>>> (warn_for_address_of_packed_member): Likewise.
>>> * c.opt: Add -Wno-address-of-packed-member.
>>>
>>> gcc/c/
>>>
>>> PR c/51628
>>> * c-typeck.c (warn_for_pointer_of_packed_member): New function.
>>> (convert_for_assignment): Call warn_for_address_of_packed_member
>>> and warn_for_pointer_of_packed_member.
>
>
> The comment in 

Re: [PATCH] gcc/configure.ac: add --disable-systemtap switch

2018-05-14 Thread Joseph Myers
Any new configure options need to be documented in install.texi.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] use string length to relax -Wstringop-overflow for nonstrings (PR 85623)

2018-05-14 Thread Franz Sirl

Am 2018-05-10 um 21:26 schrieb Martin Sebor:

GCC 8.1 warns for unbounded (and some bounded) string comparisons
involving arrays declared attribute nonstring (i.e., char arrays
that need not be nul-terminated).  For instance:

   extern __attribute__((nonstring)) char a[4];

   int f (void)
   {
     return strncmp (a, "123", sizeof a);
   }

   warning: ‘strcmp’ argument 1 declared attribute ‘nonstring’

Note that the warning refers to strcmp even though the call in
the source is to strncmp, because prior passes transform one to
the other.

The warning above is unnecessary (for strcmp) and incorrect for
strncmp because the call reads exactly four bytes from the non-
string array a regardless of the bound and so there is no risk
that it will read past the end of the array.

The attached change enhances the warning to use the length of
the string argument to suppress some of these needless warnings
for both bounded and unbounded string comparison functions.
When the length of the string is unknown, the warning uses its
size (when possible) as the upper bound on the number of accessed
bytes.  The change adds no new warnings.

I'm looking for approval to commit it to both trunk and 8-branch.


Hi Martin,

this patch is a nice improvement and makes "nonstring" a lot more 
useable. But shouldn't the attribute also handle these cases (similar to 
PR 85602):


#include 

typedef struct {
char segname[16] __attribute__((__nonstring__));
char sectname[16] __attribute__((__nonstring__));
} MachO_header;

const char *get_seg_sect(MachO_header *hdr)
{
static char segname_sectname[sizeof(hdr->segname) + 
sizeof(hdr->sectname) + 2];


memset(segname_sectname, 0, sizeof(segname_sectname));
strncpy(segname_sectname, hdr->segname, sizeof(hdr->segname));
strcat(segname_sectname, "@");
strncat(segname_sectname, hdr->sectname, sizeof(hdr->sectname));

return segname_sectname;
}

$ gcc-8 -c -O2 -W -Wall test-macho.c
In file included from /usr/include/string.h:630,
 from test-macho.c:1:
test-macho.c: In function 'get_seg_sect':
test-macho.c:14:48: warning: argument to 'sizeof' in '__builtin_strncpy' 
call is the same expression as the source; did you mean to use the size 
of the destination? [-Wsizeof-pointer-memaccess]

  strncpy(segname_sectname, hdr->segname, sizeof(hdr->segname));
^
test-macho.c:16:49: warning: argument to 'sizeof' in '__builtin_strncat' 
call is the same expression as the source; did you mean to use the size 
of the destination? [-Wsizeof-pointer-memaccess]

  strncat(segname_sectname, hdr->sectname, sizeof(hdr->sectname));
 ^

As you can see __attribute__((__nonstring__)) doesn't silence the warning.

Franz.


Re: [PATCH, aarch64] Patch to update pipeline descriptions in thunderx2t99.md

2018-05-14 Thread Kyrill Tkachov

Hi Steve,

On 09/05/18 23:37, Steve Ellcey wrote:

On Fri, 2018-05-04 at 14:05 -0700, Andrew Pinski wrote:
>
> >(thunderx2t99_loadpair): Fix cpu unit ordering.
> I think the original ordering was correct.  The address calculation
> happens before the actual load.
> thunderx2t99_asimd_load1_ldp would have a similar issue.
>
> Thanks,
> Andrew

OK, I checked into that and undid the change to thunderx2t99_loadpair
and fixed thunderx2t99_asimd_load1_ldp to match it.  Everything else is
the same.



This looks ok to me. The accuracy of the model, of course, is up to you.
You'll need approval from a maintainer though.

Thanks,
Kyrill


Steve Ellcey
sell...@cavium.com


2018-05-09  Steve Ellcey  

* config/aarch64/thunderx2t99.md (thunderx2t99_ls_both): Delete.
(thunderx2t99_multiple): Delete psuedo-units from used cpus.
Add untyped.
(thunderx2t99_alu_shift): Remove alu_shift_reg, alus_shift_reg.
Change logics_shift_reg to logics_shift_imm.
(thunderx2t99_fp_loadpair_basic): Delete.
(thunderx2t99_fp_storepair_basic): Delete.
(thunderx2t99_asimd_int): Add neon_sub and neon_sub_q types.
(thunderx2t99_asimd_polynomial): Delete.
(thunderx2t99_asimd_fp_simple): Add neon_fp_mul_s_scalar_q
and neon_fp_mul_d_scalar_q.
(thunderx2t99_asimd_fp_conv): Add *int_to_fp* types.
(thunderx2t99_asimd_misc): Delete neon_dup and neon_dup_q.
(thunderx2t99_asimd_recip_step): Add missing *sqrt* types.
(thunderx2t99_asimd_lut): Add missing tbl types.
(thunderx2t99_asimd_ext): Delete.
(thunderx2t99_asimd_load1_1_mult): Delete.
(thunderx2t99_asimd_load1_2_mult): Delete.
(thunderx2t99_asimd_load1_ldp): New.
(thunderx2t99_asimd_load1): New.
(thunderx2t99_asimd_load2): Add missing *load2* types.
(thunderx2t99_asimd_load3): New.
(thunderx2t99_asimd_load4): New.
(thunderx2t99_asimd_store1_1_mult): Delete.
(thunderx2t99_asimd_store1_2_mult): Delete.
(thunderx2t99_asimd_store2_mult): Delete.
(thunderx2t99_asimd_store2_onelane): Delete.
(thunderx2t99_asimd_store_stp): New.
(thunderx2t99_asimd_store1): New.
(thunderx2t99_asimd_store2): New.
(thunderx2t99_asimd_store3): New.
(thunderx2t99_asimd_store4): New.




Re: [wwwdocs] Describe how to validate wwwdocs changes

2018-05-14 Thread Martin Sebor

On 05/13/2018 10:19 PM, Gerald Pfeifer wrote:

This is triggered by a report from Martin who rightfully pointed
out that it's not straightforward to validate wwwdocs changes before
committing a change.

Martin, what do you think?  Would that have avoided the challenges
your ran into?  Anything to better clarify or otherwise improve?


Thanks for the improvement!  I think it will help going forward
assuming one knows about the page and remembers to check it.
I have to confess I forgot about it so I didn't check it before
running the validator the last time.  Would it be possible to add
the snippet to each page permanently?

Alternatively, what do you think of the idea to have a script (or
makefile) to post-process changes to these pages on the client side,
before checking them in?  I.e., adding the example annotation David
Malcolm prefers (black background) and also validating the HTML).

Martin



Gerald

Index: about.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/about.html,v
retrieving revision 1.28
diff -u -r1.28 about.html
--- about.html  6 May 2018 16:19:24 -   1.28
+++ about.html  14 May 2018 04:11:25 -
+Validating a change
+
+To validate any changes, you can use the https://validator.w3.org;>W3 Validator. Just replace the
+html tag at the beginning of the document with the following
+snippet and use the "Validate by File Upload" functionality.
+
+
+?xml version="1.0" encoding="utf-8"?
+!DOCTYPE html
+  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";
+html xmlns="http://www.w3.org/1999/xhtml; xml:lang="en" lang="en"
+
+
 Checking in a change

 We recommend you list files explicitly to avoid accidental checkins





Re: [PATCH, rs6000] Improved Documentation of Built-in Functions Part 2

2018-05-14 Thread Segher Boessenkool
Hi Kelvin,

On Mon, May 14, 2018 at 09:59:25AM -0500, Kelvin Nilsen wrote:
> 2018-05-14  Kelvin Nilsen  
> 
>   * doc/extend.texi (Basic PowerPC Built-in Functions): Rename this
>   subsection to be "PowerPC Built-in Functions".
>   (PowerPC Altivec/VSX Built-in Functions): Change this
>   subsection to subsubsection and rename as "PowerPC Altivec
>   Built-in Functions Available on ISA 2.05".
>   (PowerPC Built-in Functions Available on ISA 2.06): New
>   subsubsection.
>   (PowerPC Built-in Functions Available on ISA 2.07): Likewise.
>   (PowerPC Built-in Functions Available on ISA 3.0): Likewise.
>   (PowerPC Hardware Transactional Memory Built-in Functions): Split
>   this subsection into two subsubsections named "Basic PowerPC
>   Hardware Transactional Memory Built-in Functions" and "PowerPC
>   Hardware Transactional Memory Built-in Functions".  Move the basic
>   subsubsection forward to be next to other basic subsubsections.
>   (PowerPC Atomic Memory Operation Functions): Change this
>   subsection to subsubsection.

I don't think basic is such a great name.  Essentially all builtins are
"basic".  It's not immediately clear what basic means.  Maybe describe
the things defined by the header files in (subsub)sections named after
those header files?

The patch is okay for trunk, thanks!


Segher


Re: [PATCH][AArch64] Add combine pattern to fuse AESE/AESMC instructions

2018-05-14 Thread Richard Earnshaw (lists)
On 11/05/18 14:29, Kyrill Tkachov wrote:
> Hi all,
> 
> When the AESE,AESD and AESMC, AESMC instructions are generated through
> the appropriate arm_neon.h intrinsics
> we really want to keep them together when the AESE feeds into an AESMC
> and fusion is supported by the target CPU.
> We have macro-fusion hooks and scheduling model forwarding paths defined
> to facilitate that.
> It is, however, not always enough.
> 
> This patch adds another mechanism for doing that.
> When we can detect during combine that the required dependency is exists
> (AESE -> AESMC, AESD -> AESIMC)
> just keep them together with a combine pattern throughout the rest of
> compilation.
> We won't ever want to split them.
> 
> The testcases generate 4 AESE(D) instructions in a block followed by 4
> AES(I)MC instructions that
> consume the corresponding results and it also adds a bunch of
> computations in-between so that the
> AESE and AESMC instructions are not trivially back-to-back, thus
> exercising the compiler's ability
> to bring them together.
> 
> With this patch all 4 pairs are fused whereas before a couple of fusions
> would be missed due to intervening
> arithmetic and memory instructions.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2018-05-11  Kyrylo Tkachov  
> 
>     * config/aarch64/aarch64-simd.md (*aarch64_crypto_aese_fused):
>     New pattern.
>     (aarch64_crypto_aesd_fused): Likewise.
> 
> 2018-05-11  Kyrylo Tkachov  
> 
>     * gcc.target/aarch64/crypto-fuse-1.c: New test.
>     * gcc.target/aarch64/crypto-fuse-2.c: Likewise.

Your testcases are missing a newline at the end of each file.  Otherwise OK.

R.

> 
> fuse-combine.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 7c166b6c8ec40475d1e01561b613b590b6690ad5..9a6ed304432af0ca23ec7d3797783a3128776a6e
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -5790,6 +5790,44 @@ (define_insn "aarch64_crypto_aesv16qi"
>(const_string "yes")])]
>  )
>  
> +;; When AESE/AESMC fusion is enabled we really want to keep the two together
> +;; and enforce the register dependency without scheduling or register
> +;; allocation messing up the order or introducing moves inbetween.
> +;;  Mash the two together during combine.
> +
> +(define_insn "*aarch64_crypto_aese_fused"
> +  [(set (match_operand:V16QI 0 "register_operand" "=")
> + (unspec:V16QI
> +   [(unspec:V16QI
> + [(match_operand:V16QI 1 "register_operand" "0")
> +  (match_operand:V16QI 2 "register_operand" "w")] UNSPEC_AESE)
> +   ] UNSPEC_AESMC))]
> +  "TARGET_SIMD && TARGET_AES
> +   && aarch64_fusion_enabled_p (AARCH64_FUSE_AES_AESMC)"
> +  "aese\\t%0.16b, %2.16b\;aesmc\\t%0.16b, %0.16b"
> +  [(set_attr "type" "crypto_aese")
> +   (set_attr "length" "8")]
> +)
> +
> +;; When AESD/AESIMC fusion is enabled we really want to keep the two together
> +;; and enforce the register dependency without scheduling or register
> +;; allocation messing up the order or introducing moves inbetween.
> +;;  Mash the two together during combine.
> +
> +(define_insn "*aarch64_crypto_aesd_fused"
> +  [(set (match_operand:V16QI 0 "register_operand" "=")
> + (unspec:V16QI
> +   [(unspec:V16QI
> + [(match_operand:V16QI 1 "register_operand" "0")
> +  (match_operand:V16QI 2 "register_operand" "w")] UNSPEC_AESD)
> +   ] UNSPEC_AESIMC))]
> +  "TARGET_SIMD && TARGET_AES
> +   && aarch64_fusion_enabled_p (AARCH64_FUSE_AES_AESMC)"
> +  "aesd\\t%0.16b, %2.16b\;aesimc\\t%0.16b, %0.16b"
> +  [(set_attr "type" "crypto_aese")
> +   (set_attr "length" "8")]
> +)
> +
>  ;; sha1
>  
>  (define_insn "aarch64_crypto_sha1hsi"
> diff --git a/gcc/testsuite/gcc.target/aarch64/crypto-fuse-1.c 
> b/gcc/testsuite/gcc.target/aarch64/crypto-fuse-1.c
> new file mode 100644
> index 
> ..79fd6011ed946d746ed5f03d26c7fe661f3f8154
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/crypto-fuse-1.c
> @@ -0,0 +1,44 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mcpu=cortex-a72+crypto -dp" } */
> +
> +#include 
> +
> +#define AESE(r, v, key) (r = vaeseq_u8 ((v), (key)));
> +#define AESMC(r, i) (r = vaesmcq_u8 (i))
> +
> +uint8x16_t dummy;
> +uint8x16_t a;
> +uint8x16_t b;
> +uint8x16_t c;
> +uint8x16_t d;
> +uint8x16_t e;
> +
> +void
> +foo (void)
> +{
> +  AESE (a, a, e);
> +  dummy = vaddq_u8 (dummy, dummy);
> +  dummy = vaddq_u8 (dummy, dummy);
> +  AESE (b, b, e);
> +  dummy = vaddq_u8 (dummy, dummy);
> +  dummy = vaddq_u8 (dummy, dummy);
> +  AESE (c, c, e);
> +  dummy = vaddq_u8 (dummy, dummy);
> +  dummy = vaddq_u8 (dummy, dummy);
> +  AESE (d, d, e);
> +  dummy = vaddq_u8 (dummy, dummy);
> +  dummy = vaddq_u8 (dummy, dummy);
> +
> +  AESMC (a, a);
> +  dummy = vaddq_u8 (dummy, dummy);
> +  dummy = vaddq_u8 (dummy, dummy);
> +  

Re: PING^1: [PATCH] C/C++: Add -Waddress-of-packed-member

2018-05-14 Thread Martin Sebor

On 05/14/2018 07:44 AM, H.J. Lu wrote:

On Wed, Apr 25, 2018 at 7:54 PM, H.J. Lu  wrote:

When address of packed member of struct or union is taken, it may result
in an unaligned pointer value.  This patch adds -Waddress-of-packed-member
to check alignment at pointer assignment and warn unaligned address as
well as unaligned pointer:


This isn't a complete review, just some high level observations
and suggestions for things I noticed.


$ cat x.i
struct pair_t
{
  char c;
  int i;
} __attribute__ ((packed));

extern struct pair_t p;
int *addr = 
$ gcc -O2 -S x.i
x.i:8:13:  warning: taking address of packed member of 'struct pair_t' may 
result in an unaligned pointer value [-Waddress-of-packed-member]
 int *addr = 
 ^
$ cat c.i
struct B { int i; };
struct C { struct B b; } __attribute__ ((packed));

long* g8 (struct C *p) { return p; }
$ gcc -O2 -S c.i -Wno-incompatible-pointer-types
c.i: In function ‘g8’:
c.i:4:33: warning: taking value of packed 'struct C *' may result in an 
unaligned pointer value [-Waddress-of-packed-member]

 ^
That should read "taking address" (not value) but...

...to help explain the problem I would suggest to mention the expected
and actual alignment in the warning message.  E.g.,

  storing the address of a packed 'struct C' in 'struct C *' increases 
the alignment of the pointer from 1 to 4.


(IIUC, the source type and destination type need not be the same so
including both should be helpful in those cases.)

Adding a note pointing to the declaration of either the struct or
the member would help users find it if it's a header far removed
from the point of use.

Here's a question: Should the following trigger the warning (it
doesn't, either with your patch or with Clang, even though the
pointer is misaligned).

  struct __attribute__ ((packed)) A { int i; } a;
  struct B: A { int j; } b;

  int B::*pbi = ::i;


 long* g8 (struct C *p) { return p; }
 ^
$

This warning is enabled by default.


Can you please explain the reasoning behind this decision?

Is it because the warning is meant to have no false positives
(and the GCC diagnostic guidelines suggest that such warnings
be on by default), or because Clang enables it by default?
I ask because the phrasing of the warning "may result" suggests
it's not free of false positives.  I'm probably missing something,
and so...

...I would suggest to expand the documentation a bit and add
an example showing when the warning triggers, when it doesn't.
The added text says that taking the address of a packed member
"usually results in an unaligned pointer value."  When does it
not result in one?  Would a warning in such cases be a false
positive?


 Since read_encoded_value_with_base
in unwind-pe.h has

  union unaligned
{
  void *ptr;
  unsigned u2 __attribute__ ((mode (HI)));
  unsigned u4 __attribute__ ((mode (SI)));
  unsigned u8 __attribute__ ((mode (DI)));
  signed s2 __attribute__ ((mode (HI)));
  signed s4 __attribute__ ((mode (SI)));
  signed s8 __attribute__ ((mode (DI)));
} __attribute__((__packed__));
  _Unwind_Internal_Ptr result;

and GCC warns:

gcc/libgcc/unwind-pe.h:210:37: warning: taking address of packed member of 
'union unaligned' may result in an unaligned pointer value 
[-Waddress-of-packed-member]
result = (_Unwind_Internal_Ptr) u->ptr;
^
we need to add GCC pragma to ignore -Waddress-of-packed-member.

OK for trunk?

H.J.

gcc/c/

PR c/51628
* doc/invoke.texi: Document -Wno-address-of-packed-member.

gcc/c-family/

PR c/51628
* c-common.h (warn_for_address_of_packed_member): New.
* c-warn.c (check_address_of_packed_member): New function.
(warn_for_address_of_packed_member): Likewise.
* c.opt: Add -Wno-address-of-packed-member.

gcc/c/

PR c/51628
* c-typeck.c (warn_for_pointer_of_packed_member): New function.
(convert_for_assignment): Call warn_for_address_of_packed_member
and warn_for_pointer_of_packed_member.


The comment in the hunk below refers to symbols that don't correspond
to any of the arguments (ERRTYPE, PARNUM, and NAME).

+/* Warn if the right hand poiner value RHS isn't aligned to a
+   pointer type TYPE.  ERRTYPE says whether it is argument passing,
+   assignment, initialization or return.  PARMNUM is the number of
+   the argument, for printing in error messages.  NAME is the name
+   of the function.  */
+
+static void
+warn_for_pointer_of_packed_member (location_t location, tree type,
+  tree rhs)
+{

Similarly, in the hunk below, LHS doesn't refer to any variable
either used or declared in the context.

@@ -6986,6 +7037,13 @@ convert_for_assignment (location_t location, 
location_t expr_loc, tree type,

}
}

+  /* If LHS is't an address, check pointer or array of packed
+   

Re: [PATCH v2][AArch64] Remove remaining uses of * in patterns

2018-05-14 Thread Wilco Dijkstra
James Greenhalgh  wrote:
> On Tue, Jan 16, 2018 at 04:32:36PM +, Wilco Dijkstra wrote:
>> v2: Rebased after the big SVE commits
>> 
>> Remove the remaining uses of '*' from aarch64.md.
>> Using '*' in alternatives is typically incorrect as it tells the register
>> allocator to ignore those alternatives.  Also add a missing '?' so we
>> prefer a floating point register for same-size int<->fp conversions.
>> 
>> Passes regress & bootstrap, OK for commit?
>
> Please queue for GCC 9. OK when trunk is back open for new code.

Rebased, fixed more broken tests and retested yet again. Committed as r260233.

Remove the remaining uses of '*' from aarch64.md.
Using '*' in alternatives is typically incorrect as it tells the register
allocator to ignore those alternatives.  Also add a missing '?' so we
prefer a floating point register for same-size int<->fp conversions.

Passes regress and bootstrap on AArch64.

ChangeLog:
2018-05-14  Wilco Dijkstra  

* config/aarch64/aarch64.md (mov): Remove '*' in alternatives.
(movsi_aarch64): Likewise.
(load_pairsi): Likewise.
(load_pairdi): Likewise.
(store_pairsi): Likewise.
(store_pairdi): Likewise.
(load_pairsf): Likewise.
(load_pairdf): Likewise.
(store_pairsf): Likewise.
(store_pairdf): Likewise.
(zero_extend): Likewise.
(trunc): Swap alternatives.
(fcvt_target): Add '?' to prefer w over r.

gcc/testsuite/
* gcc.target/aarch64/vmov_n_1.c: Update test.
* gcc.target/aarch64/vfp-1.c: Update test.

--
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
3bb29c7190a7e9a2472a6891ff4aef551dc88a30..98e0732b2c16de4816cf93199b35c868a77af485
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -928,8 +928,8 @@ (define_expand "mov"
 )
 
 (define_insn "*mov_aarch64"
-  [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r,   *w,r ,r,*w, m, 
m, r,*w,*w")
-   (match_operand:SHORT 1 "aarch64_mov_operand"  " r,M,D,Usv,m, 
m,rZ,*w,*w, r,*w"))]
+  [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r,w,r  ,r,w, 
m,m,r,w,w")
+   (match_operand:SHORT 1 "aarch64_mov_operand"  " 
r,M,D,Usv,m,m,rZ,w,w,r,w"))]
   "(register_operand (operands[0], mode)
 || aarch64_reg_or_zero (operands[1], mode))"
 {
@@ -995,7 +995,7 @@ (define_expand "mov"
 
 (define_insn_and_split "*movsi_aarch64"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m,  
r,  r, w,r,w, w")
-   (match_operand:SI 1 "aarch64_mov_operand"  " 
r,r,k,M,n,Usv,m,m,rZ,*w,Usa,Ush,rZ,w,w,Ds"))]
+   (match_operand:SI 1 "aarch64_mov_operand"  " 
r,r,k,M,n,Usv,m,m,rZ,w,Usa,Ush,rZ,w,w,Ds"))]
   "(register_operand (operands[0], SImode)
 || aarch64_reg_or_zero (operands[1], SImode))"
   "@
@@ -1302,9 +1302,9 @@ (define_expand "movmemdi"
 ;; Operands 1 and 3 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn "load_pairsi"
-  [(set (match_operand:SI 0 "register_operand" "=r,*w")
+  [(set (match_operand:SI 0 "register_operand" "=r,w")
(match_operand:SI 1 "aarch64_mem_pair_operand" "Ump,Ump"))
-   (set (match_operand:SI 2 "register_operand" "=r,*w")
+   (set (match_operand:SI 2 "register_operand" "=r,w")
(match_operand:SI 3 "memory_operand" "m,m"))]
   "rtx_equal_p (XEXP (operands[3], 0),
plus_constant (Pmode,
@@ -1318,9 +1318,9 @@ (define_insn "load_pairsi"
 )
 
 (define_insn "load_pairdi"
-  [(set (match_operand:DI 0 "register_operand" "=r,*w")
+  [(set (match_operand:DI 0 "register_operand" "=r,w")
(match_operand:DI 1 "aarch64_mem_pair_operand" "Ump,Ump"))
-   (set (match_operand:DI 2 "register_operand" "=r,*w")
+   (set (match_operand:DI 2 "register_operand" "=r,w")
(match_operand:DI 3 "memory_operand" "m,m"))]
   "rtx_equal_p (XEXP (operands[3], 0),
plus_constant (Pmode,
@@ -1338,9 +1338,9 @@ (define_insn "load_pairdi"
 ;; fairly lax checking on the second memory operation.
 (define_insn "store_pairsi"
   [(set (match_operand:SI 0 "aarch64_mem_pair_operand" "=Ump,Ump")
-   (match_operand:SI 1 "aarch64_reg_or_zero" "rZ,*w"))
+   (match_operand:SI 1 "aarch64_reg_or_zero" "rZ,w"))
(set (match_operand:SI 2 "memory_operand" "=m,m")
-   (match_operand:SI 3 "aarch64_reg_or_zero" "rZ,*w"))]
+   (match_operand:SI 3 "aarch64_reg_or_zero" "rZ,w"))]
   "rtx_equal_p (XEXP (operands[2], 0),
plus_constant (Pmode,
   XEXP (operands[0], 0),
@@ -1354,9 +1354,9 @@ (define_insn "store_pairsi"
 
 (define_insn "store_pairdi"
   [(set (match_operand:DI 0 "aarch64_mem_pair_operand" "=Ump,Ump")
-   (match_operand:DI 1 "aarch64_reg_or_zero" "rZ,*w"))
+   (match_operand:DI 1 "aarch64_reg_or_zero" "rZ,w"))
(set (match_operand:DI 2 "memory_operand" "=m,m")
-   (match_operand:DI 3 

Re: [Patch / RFC] Rename POINTER_TYPE_P to INDIRECT_TYPE_P ?

2018-05-14 Thread Jeff Law
On 05/11/2018 09:06 AM, Jakub Jelinek wrote:
> On Fri, May 11, 2018 at 10:42:13AM -0400, Jason Merrill wrote:
>> On Fri, May 11, 2018 at 8:48 AM, Paolo Carlini  
>> wrote:
>>> we got this very old comment in tree.h:
>>>
>>> /* Nonzero if TYPE represents a pointer or reference type.
>>>(It should be renamed to INDIRECT_TYPE_P.)  Keep these checks in
>>>ascending code order.  */
>>>
>>> #define POINTER_TYPE_P(TYPE) \
>>>   (TREE_CODE (TYPE) == POINTER_TYPE || TREE_CODE (TYPE) == REFERENCE_TYPE)
>>>
>>> and, FWIW  my personal experience, over the years I got confused a couple of
>>> times because of that name: for example I tried, incorrectly, to replace a
>>> few TREE_CODE (type) == POINTER_TYPE checks with POINTER_TYPE_P (type) in
>>> the C++ front-end.
>>
>> I think my inclination would be to keep this change local to the C++
> 
> If any change needs to be done, yeah.  For the middle-end, having
> POINTER_TYPE_P including reference type is highly desirable, otherwise
> people will just forget to handle REFERENCE_TYPE; after all,
> useless_type_conversion_p says that POINTER_TYPE <-> REFERENCE_TYPE
> conversions are useless, so the distinction is lost during optimizations
> really soon.
Exactly.  We've seen this several times in the past in the gimple/ssa
optimizers.

Jeff


Re: [PATCH] Disallow minus in mem {+,-,&,|,^}= x; mem != 0 peepholes (PR target/85756)

2018-05-14 Thread Uros Bizjak
On Mon, May 14, 2018 at 5:08 PM, Marek Polacek  wrote:
> On Mon, May 14, 2018 at 03:07:54PM +0200, Jakub Jelinek wrote:
>> On Mon, May 14, 2018 at 02:54:18PM +0200, Uros Bizjak wrote:
>> > > --- gcc/config/i386/i386.md.jj  2018-05-11 18:44:32.0 +0200
>> > > +++ gcc/config/i386/i386.md 2018-05-14 13:50:28.100482520 +0200
>> > > @@ -19397,11 +19397,11 @@
>> > >   (set (match_dup 0) (match_dup 2))])
>> > > (set (match_dup 1) (match_dup 0))]
>> > >"(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
>> > > +   && GET_CODE (operands[2]) != MINUS
>> >
>> > && COMMUTATIVE_ARITH_P (operands[2])
>>
>> That works, but then we should change the peephole2 above it as well.
>> MINUS is the only non-commutative plusminuslogic_operator, so it doesn't
>> change any behavior.
>>
>> BTW, wonder if we shouldn't allow giving peephole2's a name, it is too hard
>> to refer to them...
>
> It might be too late but I've regtested/bootstrapped this patch on 
> x86_64-linux.

Also works for me.

Committed to mainline SVN.

Uros.


[PATCH] Add __attribute__((malloc) to allocator and remove unused code

2018-05-14 Thread Jonathan Wakely

As discussed at https://gcc.gnu.org/ml/libstdc++/2018-01/msg00073.html
we can simplify the allocator function for valarray memory. I also
noticed that the _Array(size_t) constructor is never used.

* include/bits/valarray_array.h (__valarray_get_memory): Remove.
(__valarray_get_storage): Call operator new directly. Remove ignored
top-level restrict qualifier and add malloc attribute instead.
(_Array<_Tp>::_Array(size_t)): Remove unused constructor.

Tested powerpc64le-linux, committed to trunk.


commit 71983b7d0901159af2bca65af783460721fc0a76
Author: Jonathan Wakely 
Date:   Mon May 14 16:02:18 2018 +0100

Add __attribute__((malloc) to allocator and remove unused code

* include/bits/valarray_array.h (__valarray_get_memory): Remove.
(__valarray_get_storage): Call operator new directly. Remove ignored
top-level restrict qualifier and add malloc attribute instead.
(_Array<_Tp>::_Array(size_t)): Remove unused constructor.

diff --git a/libstdc++-v3/include/bits/valarray_array.h 
b/libstdc++-v3/include/bits/valarray_array.h
index 07f38ed03ed..6759d6003e9 100644
--- a/libstdc++-v3/include/bits/valarray_array.h
+++ b/libstdc++-v3/include/bits/valarray_array.h
@@ -47,18 +47,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Helper functions on raw pointers
   //
 
-  // We get memory by the old fashion way
-  inline void*
-  __valarray_get_memory(size_t __n)
-  { return operator new(__n); }
+  // We get memory the old fashioned way
+  template
+_Tp*
+__valarray_get_storage(size_t) __attribute__((__malloc__));
 
   template
-inline _Tp*__restrict__
+inline _Tp*
 __valarray_get_storage(size_t __n)
-{
-  return static_cast<_Tp*__restrict__>
-   (std::__valarray_get_memory(__n * sizeof(_Tp)));
-}
+{ return static_cast<_Tp*>(operator new(__n * sizeof(_Tp))); }
 
   // Return memory to the system
   inline void
@@ -410,7 +407,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct _Array
 {
-  explicit _Array(size_t);
   explicit _Array(_Tp* const __restrict__);
   explicit _Array(const valarray<_Tp>&);
   _Array(const _Tp* __restrict__, size_t);
@@ -503,12 +499,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__dst._M_data, __j._M_data);
 }
 
-  template
-inline
-_Array<_Tp>::_Array(size_t __n)
-: _M_data(__valarray_get_storage<_Tp>(__n))
-{ std::__valarray_default_construct(_M_data, _M_data + __n); }
-
   template
 inline
 _Array<_Tp>::_Array(_Tp* const __restrict__ __p)


[PATCH] PR libstdc++/67554 Do not pass null pointers to memcpy

2018-05-14 Thread Jonathan Wakely

PR libstdc++/67554
* include/bits/valarray_array.h (_Array_copy_ctor<_Tp, true>)
(_Array_copier<_Tp, true>): Do not pass null pointers to memcpy.

Tested powerpc64le-linux, committed to trunk.

Backports to follow.

There's no new test for this, as we're not able to use sanitizers in
the testsuite (something I still hope to fix one day).

commit cd61c4ce0bfd69065018417dfe6f3e03b2010af3
Author: Jonathan Wakely 
Date:   Mon May 14 15:50:16 2018 +0100

PR libstdc++/67554 Do not pass null pointers to memcpy

PR libstdc++/67554
* include/bits/valarray_array.h (_Array_copy_ctor<_Tp, true>)
(_Array_copier<_Tp, true>): Do not pass null pointers to memcpy.

diff --git a/libstdc++-v3/include/bits/valarray_array.h 
b/libstdc++-v3/include/bits/valarray_array.h
index f1d2c43044f..07f38ed03ed 100644
--- a/libstdc++-v3/include/bits/valarray_array.h
+++ b/libstdc++-v3/include/bits/valarray_array.h
@@ -152,7 +152,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   inline static void
   _S_do_it(const _Tp* __b, const _Tp* __e, _Tp* __restrict__ __o)
-  { __builtin_memcpy(__o, __b, (__e - __b) * sizeof(_Tp)); }
+  {
+   if (__b)
+ __builtin_memcpy(__o, __b, (__e - __b) * sizeof(_Tp));
+  }
 };
 
   template
@@ -258,7 +261,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   inline static void
   _S_do_it(const _Tp* __restrict__ __a, size_t __n, _Tp* __restrict__ __b)
-  { __builtin_memcpy(__b, __a, __n * sizeof (_Tp)); }
+  {
+   if (__n != 0)
+ __builtin_memcpy(__b, __a, __n * sizeof (_Tp));
+  }
 };
 
   // Copy a plain array __a[<__n>] into a play array __b[<>]


Re: [PATCH] Disallow minus in mem {+,-,&,|,^}= x; mem != 0 peepholes (PR target/85756)

2018-05-14 Thread Marek Polacek
On Mon, May 14, 2018 at 03:07:54PM +0200, Jakub Jelinek wrote:
> On Mon, May 14, 2018 at 02:54:18PM +0200, Uros Bizjak wrote:
> > > --- gcc/config/i386/i386.md.jj  2018-05-11 18:44:32.0 +0200
> > > +++ gcc/config/i386/i386.md 2018-05-14 13:50:28.100482520 +0200
> > > @@ -19397,11 +19397,11 @@
> > >   (set (match_dup 0) (match_dup 2))])
> > > (set (match_dup 1) (match_dup 0))]
> > >"(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> > > +   && GET_CODE (operands[2]) != MINUS
> > 
> > && COMMUTATIVE_ARITH_P (operands[2])
> 
> That works, but then we should change the peephole2 above it as well.
> MINUS is the only non-commutative plusminuslogic_operator, so it doesn't
> change any behavior.
> 
> BTW, wonder if we shouldn't allow giving peephole2's a name, it is too hard
> to refer to them...
 
It might be too late but I've regtested/bootstrapped this patch on x86_64-linux.

> 2018-05-14  Jakub Jelinek  
> 
>   PR target/85756
>   * config/i386/i386.md: Disallow non-commutative arithmetics in
>   last twpeephole for mem {+,-,&,|,^}= x; mem != 0 after cmpelim
>   optimization.  Use COMMUTATIVE_ARITH_P test rather than != MINUS
>   in the peephole2 before it.
> 
>   * gcc.c-torture/execute/pr85756.c: New test.
> 
> --- gcc/config/i386/i386.md.jj2018-05-11 18:44:32.0 +0200
> +++ gcc/config/i386/i386.md   2018-05-14 15:02:48.363662960 +0200
> @@ -19367,7 +19367,7 @@ (define_peephole2
> (set (match_dup 1) (match_dup 0))
> (set (reg FLAGS_REG) (compare (match_dup 0) (const_int 0)))]
>"(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> -   && GET_CODE (operands[2]) != MINUS
> +   && COMMUTATIVE_ARITH_P (operands[2])
> && peep2_reg_dead_p (3, operands[0])
> && !reg_overlap_mentioned_p (operands[0], operands[1])
> && ix86_match_ccmode (peep2_next_insn (2),
> @@ -19397,11 +19397,11 @@ (define_peephole2
> (set (match_dup 0) (match_dup 2))])
> (set (match_dup 1) (match_dup 0))]
>"(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && COMMUTATIVE_ARITH_P (operands[2])
> && peep2_reg_dead_p (2, operands[0])
> && !reg_overlap_mentioned_p (operands[0], operands[1])
> && ix86_match_ccmode (peep2_next_insn (0),
> -  (GET_CODE (operands[2]) == PLUS
> -   || GET_CODE (operands[2]) == MINUS)
> +  GET_CODE (operands[2]) == PLUS
>? CCGOCmode : CCNOmode)"
>[(parallel [(set (match_dup 3) (match_dup 5))
> (set (match_dup 1) (match_dup 4))])]
> --- gcc/testsuite/gcc.c-torture/execute/pr85756.c.jj  2018-05-14 
> 13:50:44.384307289 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr85756.c 2018-05-14 
> 13:48:17.0 +0200
> @@ -0,0 +1,50 @@
> +/* PR target/85756 */
> +
> +#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 && __SIZEOF_INT__ == 4
> +int a, c, *e, f, h = 10;
> +short b;
> +unsigned int p;
> +
> +__attribute__((noipa)) void
> +bar (int a)
> +{
> +  asm volatile ("" : : "r" (a) : "memory");
> +}
> +
> +void
> +foo ()
> +{
> +  unsigned j = 1, m = 430523;
> +  int k, n = 1, *l = 
> +lab:
> +  p = m;
> +  m = -((~65535U | j) - n);
> +  f = b << ~(n - 8);
> +  n = (m || b) ^ f;
> +  j = p;
> +  if (p < m)
> +*l = k < 3;
> +  if (!n)
> +l = 
> +  if (c)
> +{
> +  bar (a);
> +  goto lab;
> +}
> +  if (!*l)
> +*e = 1;
> +}
> +
> +int
> +main ()
> +{
> +  foo ();
> +  return 0;
> +}
> +#else
> +int
> +main ()
> +{
> +  return 0;
> +}
> +#endif
> 
> 
>   Jakub

Marek


[PATCH, rs6000] Improved Documentation of Built-in Functions Part 2

2018-05-14 Thread Kelvin Nilsen
The focus of this patch is to restructure the section headers within the PowerPC
portion of the extend.texi documentation file.  Restructuring section headers 
prepares
the foundation for subsequent documentation improvements which will be 
delivered in
follow-on patches.

I have bootstrapped and regression tested without regressions on
powerpc64le-unknown-linux (P8).  I have also confirmed that this patch builds
on powerpc-linux (P7 bing-endian, both -m32 and -m64 target options).

I have also built and reviewed the gcc.pdf file.

Is this ok for the trunk?

gcc/ChangeLog:

2018-05-14  Kelvin Nilsen  

* doc/extend.texi (Basic PowerPC Built-in Functions): Rename this
subsection to be "PowerPC Built-in Functions".
(PowerPC Altivec/VSX Built-in Functions): Change this
subsection to subsubsection and rename as "PowerPC Altivec
Built-in Functions Available on ISA 2.05".
(PowerPC Built-in Functions Available on ISA 2.06): New
subsubsection.
(PowerPC Built-in Functions Available on ISA 2.07): Likewise.
(PowerPC Built-in Functions Available on ISA 3.0): Likewise.
(PowerPC Hardware Transactional Memory Built-in Functions): Split
this subsection into two subsubsections named "Basic PowerPC
Hardware Transactional Memory Built-in Functions" and "PowerPC
Hardware Transactional Memory Built-in Functions".  Move the basic
subsubsection forward to be next to other basic subsubsections.
(PowerPC Atomic Memory Operation Functions): Change this
subsection to subsubsection.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 260182)
+++ gcc/doc/extend.texi (working copy)
@@ -12477,10 +12477,7 @@ instructions, but allow the compiler to schedule t
 * MSP430 Built-in Functions::
 * NDS32 Built-in Functions::
 * picoChip Built-in Functions::
-* Basic PowerPC Built-in Functions::
-* PowerPC AltiVec/VSX Built-in Functions::
-* PowerPC Hardware Transactional Memory Built-in Functions::
-* PowerPC Atomic Memory Operation Functions::
+* PowerPC Built-in Functions::
 * RX Built-in Functions::
 * S/390 System z Built-in Functions::
 * SH Built-in Functions::
@@ -15536,25 +15533,35 @@ implementing assertions.
 
 @end table
 
-@node Basic PowerPC Built-in Functions
-@subsection Basic PowerPC Built-in Functions
+@node PowerPC Built-in Functions
+@subsection PowerPC Built-in Functions
 
+This section describes built-in functions that are supported for
+various configurations of the PowerPC processor.
+
 @menu
 * Basic PowerPC Built-in Functions Available on all Configurations::
 * Basic PowerPC Built-in Functions Available on ISA 2.05::
 * Basic PowerPC Built-in Functions Available on ISA 2.06::
 * Basic PowerPC Built-in Functions Available on ISA 2.07::
+* Basic PowerPC Hardware Transactional Memory Built-in Functions::
 * Basic PowerPC Built-in Functions Available on ISA 3.0::
+* PowerPC AltiVec Built-in Functions Available on ISA 2.05::
+* PowerPC AltiVec Built-in Functions Available on ISA 2.06::
+* PowerPC AltiVec Built-in Functions Available on ISA 2.07::
+* PowerPC AltiVec Built-in Functions Available on ISA 3.0::
+* PowerPC Hardware Transactional Memory Built-in Functions::
+* PowerPC Atomic Memory Operation Functions::
 @end menu
 
-This section describes PowerPC built-in functions that do not require
-the inclusion of any special header files to declare prototypes or
-provide macro definitions.  The sections that follow describe
-additional PowerPC built-in functions.
-
 @node Basic PowerPC Built-in Functions Available on all Configurations
 @subsubsection Basic PowerPC Built-in Functions Available on all Configurations
 
+This section describes PowerPC built-in functions that are supported
+on all configurations and do not require
+the inclusion of any special header files to declare prototypes or
+provide macro definitions.
+
 @deftypefn {Built-in Function} void __builtin_cpu_init (void)
 This function is a @code{nop} on the PowerPC platform and is included solely
 to maintain API compatibility with the x86 builtins.
@@ -15889,6 +15896,150 @@ addition to the @option{-mpower8-fusion}, @option{
 
 This section intentionally empty.
 
+@node Basic PowerPC Hardware Transactional Memory Built-in Functions
+@subsubsection Basic PowerPC Hardware Transactional Memory Built-in Functions
+
+The following basic built-in functions are available with
+@option{-mhtm} or @option{-mcpu=CPU} where CPU is `power8' or later.
+They all generate the machine instruction that is part of the name.
+
+The Hardware Transactional Memory (HTM) builtins (with the exception
+of @code{__builtin_tbegin}) return 
+the full 4-bit condition register value set by their associated hardware
+instruction.  The header file @code{htmintrin.h} defines some macros that can
+be used to decipher the return value.  The @code{__builtin_tbegin} builtin

[PATCH] PR libstdc++/82966 fix swapping of node handles

2018-05-14 Thread Jonathan Wakely

PR libstdc++/82966
* include/bits/node_handle.h (_Node_handle_common::_M_swap): Use value
instead of type.
* testsuite/23_containers/set/modifiers/node_swap.cc: New.

Tested powerpc64le-linux, committed to trunk.

Backports to gcc-7 and gcc-8 will follow.

commit b6848af169ab5e7d344b01fd124c1f07a92d951e
Author: Jonathan Wakely 
Date:   Mon May 14 15:10:39 2018 +0100

PR libstdc++/82966 fix swapping of node handles

PR libstdc++/82966
* include/bits/node_handle.h (_Node_handle_common::_M_swap): Use 
value
instead of type.
* testsuite/23_containers/set/modifiers/node_swap.cc: New.

diff --git a/libstdc++-v3/include/bits/node_handle.h 
b/libstdc++-v3/include/bits/node_handle.h
index c02aca024bd..8bb4f3c0abc 100644
--- a/libstdc++-v3/include/bits/node_handle.h
+++ b/libstdc++-v3/include/bits/node_handle.h
@@ -109,7 +109,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
using std::swap;
swap(_M_ptr, __nh._M_ptr);
-   if (_AllocTraits::propagate_on_container_swap
+   if (_AllocTraits::propagate_on_container_swap::value
|| !_M_alloc || !__nh._M_alloc)
  _M_alloc.swap(__nh._M_alloc);
else
diff --git a/libstdc++-v3/testsuite/23_containers/set/modifiers/node_swap.cc 
b/libstdc++-v3/testsuite/23_containers/set/modifiers/node_swap.cc
new file mode 100644
index 000..8957d6e125a
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/set/modifiers/node_swap.cc
@@ -0,0 +1,48 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do run { target c++17 } }
+
+#include 
+#include 
+
+void
+test01()
+{
+  // PR libstdc++/82966
+  std::set::node_type n1, n2;
+  n1.swap(n2);
+  VERIFY( n1.empty() );
+  VERIFY( n2.empty() );
+}
+
+void
+test02()
+{
+  std::set s{1, 2};
+  std::set::node_type n1 = s.extract(1), n2;
+  swap(n1, n2);
+  VERIFY( n1.empty() );
+  VERIFY( !n2.empty() );
+}
+
+int main()
+{
+  test01();
+  test02();
+}


Re: PR85734

2018-05-14 Thread Richard Biener
On May 14, 2018 2:30:27 PM GMT+02:00, Prathamesh Kulkarni 
 wrote:
>On 14 May 2018 at 15:06, Richard Biener  wrote:
>> On Mon, 14 May 2018, Prathamesh Kulkarni wrote:
>>
>>> On 14 May 2018 at 14:46, Richard Biener  wrote:
>>> > On Mon, 14 May 2018, Prathamesh Kulkarni wrote:
>>> >
>>> >> Hi,
>>> >> The attached patch tries to fix PR85734, by gating on
>>> >> !function_always_visible_to_compiler_p.
>>> >> Bootstrap+test in progress on x86_64.
>>> >> OK to commit if passes ?
>>> >
>>> > This looks redundant - suggest_attribute does that very same check
>>> > but warn_function_malloc passes it 'false' as known_finite
>>> > (warn_function_noreturn passes it true for example).
>>> Indeed, thanks for pointing it out.
>>> >
>>> > So I suggest to simply pass true as that arg.
>>> Is the attached version OK to commit after bootstrap+test ?
>>
>> Yes.
>Also on a related bug PR85562, should we change the wording of
>suggest_attribute for malloc
>to omit the text "if it is known to return normally" ?

Yes. 

Richard. 

>Thanks,
>Prathamesh
>> Richard.
>>
>>> Thanks,
>>> Prathamesh
>>> >
>>> > Richard
>>> >
>>> >> Thanks,
>>> >> Prathamesh
>>> >>
>>> >
>>> > --
>>> > Richard Biener 
>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>Norton, HRB 21284 (AG Nuernberg)
>>>
>>
>> --
>> Richard Biener 
>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>Norton, HRB 21284 (AG Nuernberg)



Re: [Patch / RFC] Rename POINTER_TYPE_P to INDIRECT_TYPE_P ?

2018-05-14 Thread Nathan Sidwell

On 05/11/2018 11:06 AM, Jakub Jelinek wrote:


If any change needs to be done, yeah.  For the middle-end, having
POINTER_TYPE_P including reference type is highly desirable, otherwise
people will just forget to handle REFERENCE_TYPE; after all,
useless_type_conversion_p says that POINTER_TYPE <-> REFERENCE_TYPE
conversions are useless, so the distinction is lost during optimizations
really soon.


The middle ends don't know that in C++ REFERENCE_TYPE is never a null 
pointer (IIUC).  It might be nice if they knew that.


nathan

--
Nathan Sidwell


Re: [C++ Patch] Add TYPE_REF_P

2018-05-14 Thread Jason Merrill
OK.

On Sun, May 13, 2018 at 7:56 PM, Paolo Carlini  wrote:
> Hi,
>
> this simply adds the missing TYPE_REF_P - counterpart of TYPE_PTR_P - and
> uses it throughout (also adjusts a few places to consistently use the
> existing TYPE_PTR_P). Tested x86_64-linux.
>
> Thanks, Paolo.
>
> 
>


PING^2: [PATCH] Don't mark IFUNC resolver as only called directly

2018-05-14 Thread H.J. Lu
On Wed, Apr 25, 2018 at 8:49 PM, H.J. Lu  wrote:
> On Thu, Apr 12, 2018 at 3:50 PM, H.J. Lu  wrote:
>> On Thu, Apr 12, 2018 at 6:39 AM, H.J. Lu  wrote:
>>> On Thu, Apr 12, 2018 at 5:17 AM, Jan Hubicka  wrote:
> On Thu, Apr 12, 2018 at 1:29 PM, H.J. Lu  wrote:
> > Since IFUNC resolver is called indirectly, don't mark IFUNC resolver as
> > only called directly.
> >
> > OK for trunk?
> >
> >
> > H.J.
> > ---
> > gcc/
> >
> > PR target/85345
> > * cgraph.h: Include stringpool.h" and "attribs.h".
> > (cgraph_node::only_called_directly_or_aliased_p): Return false
> > for IFUNC resolver.
> >
> > gcc/testsuite/
> >
> > PR target/85345
> > * gcc.target/i386/pr85345.c: New test.
> > ---
> >  gcc/cgraph.h|  5 +++-
> >  gcc/testsuite/gcc.target/i386/pr85345.c | 44 
> > +
> >  2 files changed, 48 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr85345.c
> >
> > diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> > index d1ef8408497..9e195824fcc 100644
> > --- a/gcc/cgraph.h
> > +++ b/gcc/cgraph.h
> > @@ -24,6 +24,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "profile-count.h"
> >  #include "ipa-ref.h"
> >  #include "plugin-api.h"
> > +#include "stringpool.h"
> > +#include "attribs.h"
> >
> >  class ipa_opt_pass_d;
> >  typedef ipa_opt_pass_d *ipa_opt_pass;
> > @@ -2894,7 +2896,8 @@ cgraph_node::only_called_directly_or_aliased_p 
> > (void)
> >   && !DECL_STATIC_CONSTRUCTOR (decl)
> >   && !DECL_STATIC_DESTRUCTOR (decl)
> >   && !used_from_object_file_p ()
> > - && !externally_visible);
> > + && !externally_visible
> > + && !lookup_attribute ("ifunc", DECL_ATTRIBUTES (decl)));
>
> How's it handled for our own generated resolver functions?  That is,
> isn't there sth cheaper than doing a lookup_attribute here?  I see
> that make_dispatcher_decl nor ix86_get_function_versions_dispatcher
> adds the 'ifunc' attribute (though they are TREE_PUBLIC there).

 Is there any drawback of setting force_output flag?
 Honza
>>>
>>> Setting force_output may prevent some optimizations.  Can we add a bit
>>> for IFUNC resolver?
>>>
>>
>> Here is the patch to add ifunc_resolver to cgraph_node. Tested on x86-64
>> and i686.  Any comments?
>>
>
> PING:
>
> https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00647.html
>

PING.


-- 
H.J.


PING^1: [PATCH] C/C++: Add -Waddress-of-packed-member

2018-05-14 Thread H.J. Lu
On Wed, Apr 25, 2018 at 7:54 PM, H.J. Lu  wrote:
> When address of packed member of struct or union is taken, it may result
> in an unaligned pointer value.  This patch adds -Waddress-of-packed-member
> to check alignment at pointer assignment and warn unaligned address as
> well as unaligned pointer:
>
> $ cat x.i
> struct pair_t
> {
>   char c;
>   int i;
> } __attribute__ ((packed));
>
> extern struct pair_t p;
> int *addr = 
> $ gcc -O2 -S x.i
> x.i:8:13:  warning: taking address of packed member of 'struct pair_t' may 
> result in an unaligned pointer value [-Waddress-of-packed-member]
>  int *addr = 
>  ^
> $ cat c.i
> struct B { int i; };
> struct C { struct B b; } __attribute__ ((packed));
>
> long* g8 (struct C *p) { return p; }
> $ gcc -O2 -S c.i -Wno-incompatible-pointer-types
> c.i: In function ‘g8’:
> c.i:4:33: warning: taking value of packed 'struct C *' may result in an 
> unaligned pointer value [-Waddress-of-packed-member]
>  long* g8 (struct C *p) { return p; }
>  ^
> $
>
> This warning is enabled by default.  Since read_encoded_value_with_base
> in unwind-pe.h has
>
>   union unaligned
> {
>   void *ptr;
>   unsigned u2 __attribute__ ((mode (HI)));
>   unsigned u4 __attribute__ ((mode (SI)));
>   unsigned u8 __attribute__ ((mode (DI)));
>   signed s2 __attribute__ ((mode (HI)));
>   signed s4 __attribute__ ((mode (SI)));
>   signed s8 __attribute__ ((mode (DI)));
> } __attribute__((__packed__));
>   _Unwind_Internal_Ptr result;
>
> and GCC warns:
>
> gcc/libgcc/unwind-pe.h:210:37: warning: taking address of packed member of 
> 'union unaligned' may result in an unaligned pointer value 
> [-Waddress-of-packed-member]
> result = (_Unwind_Internal_Ptr) u->ptr;
> ^
> we need to add GCC pragma to ignore -Waddress-of-packed-member.
>
> OK for trunk?
>
> H.J.
> 
> gcc/c/
>
> PR c/51628
> * doc/invoke.texi: Document -Wno-address-of-packed-member.
>
> gcc/c-family/
>
> PR c/51628
> * c-common.h (warn_for_address_of_packed_member): New.
> * c-warn.c (check_address_of_packed_member): New function.
> (warn_for_address_of_packed_member): Likewise.
> * c.opt: Add -Wno-address-of-packed-member.
>
> gcc/c/
>
> PR c/51628
> * c-typeck.c (warn_for_pointer_of_packed_member): New function.
> (convert_for_assignment): Call warn_for_address_of_packed_member
> and warn_for_pointer_of_packed_member.
>
> gcc/cp/
>
> PR c/51628
> * call.c (convert_for_arg_passing): Call
> warn_for_address_of_packed_member.
> * typeck.c (convert_for_assignment): Likewise.
>
> gcc/testsuite/
>
> PR c/51628
> * c-c++-common/pr51628-1.c: New test.
> * c-c++-common/pr51628-2.c: Likewise.
> * c-c++-common/pr51628-3.c: Likewise.
> * c-c++-common/pr51628-4.c: Likewise.
> * c-c++-common/pr51628-5.c: Likewise.
> * c-c++-common/pr51628-6.c: Likewise.
> * c-c++-common/pr51628-7.c: Likewise.
> * c-c++-common/pr51628-8.c: Likewise.
> * c-c++-common/pr51628-9.c: Likewise.
> * c-c++-common/pr51628-10.c: Likewise.
> * c-c++-common/pr51628-11.c: Likewise.
> * c-c++-common/pr51628-12.c: Likewise.
> * c-c++-common/pr51628-13.c: Likewise.
> * c-c++-common/pr51628-14.c: Likewise.
> * c-c++-common/pr51628-15.c: Likewise.
> * gcc.dg/pr51628-16.c: Likewise.
> * gcc.dg/pr51628-17.c: Likewise.
> * gcc.dg/pr51628-18.c: Likewise.
> * gcc.dg/pr51628-19.c: Likewise.
> * gcc.dg/pr51628-20.c: Likewise.
> * gcc.dg/pr51628-21.c: Likewise.
> * gcc.dg/pr51628-22.c: Likewise.
> * gcc.dg/pr51628-23.c: Likewise.
> * gcc.dg/pr51628-24.c: Likewise.
> * c-c++-common/asan/misalign-1.c: Add
> -Wno-address-of-packed-member.
> * c-c++-common/asan/misalign-2.c: Likewise.
> * c-c++-common/ubsan/align-2.c: Likewise.
> * c-c++-common/ubsan/align-4.c: Likewise.
> * c-c++-common/ubsan/align-6.c: Likewise.
> * c-c++-common/ubsan/align-7.c: Likewise.
> * c-c++-common/ubsan/align-8.c: Likewise.
> * c-c++-common/ubsan/align-10.c: Likewise.
> * g++.dg/ubsan/align-2.C: Likewise.
> * gcc.target/i386/avx512bw-vmovdqu16-2.c: Likewise.
> * gcc.target/i386/avx512f-vmovdqu32-2.c: Likewise.
> * gcc.target/i386/avx512f-vmovdqu64-2.c: Likewise.
> * gcc.target/i386/avx512vl-vmovdqu16-2.c: Likewise.
> * gcc.target/i386/avx512vl-vmovdqu32-2.c: Likewise.
> * gcc.target/i386/avx512vl-vmovdqu64-2.c: Likewise.
>
> libgcc/
>
> * unwind-pe.h (read_encoded_value_with_base): Add GCC pragma
> to ignore -Waddress-of-packed-member.

Jason, Joseph, is this


Re: RFC: bash completion

2018-05-14 Thread Martin Liška
On 05/14/2018 02:41 PM, Martin Liška wrote:
> Yes, I will work on that as soon as GCC's part is ready.

Hi.

I've just done branch of bash-completion where I implemented that:
https://github.com/marxin/bash-completion/tree/gcc-completion

Martin


Re: [PATCH] DWARF: Emit DWARF5 forms for indirect addresses and string offsets.

2018-05-14 Thread Mark Wielaard
On Mon, 2018-04-30 at 14:35 +0200, Mark Wielaard wrote:
> We already emit DWARF5 attributes and tables for indirect addresses
> and string offsets, but still use GNU forms. Add a new helper function
> dwarf_FORM () for emitting the right form.
> 
> Currently we only use the uleb128 forms. But DWARF5 also allows
> 1, 2, 3 and 4 byte forms (DW_FORM_strx[1234] and DW_FORM_addrx[1234])
> which might be more space efficient.

Ping.

gcc/ChangeLog:

* dwarf2out.c (dwarf_FORM): New function.
(set_indirect_string): Use dwarf_FORM.
(reset_indirect_string): Likewise.
(size_of_die): Likewise.
(value_format): Likewise.
(output_die): Likewise.
(add_skeleton_AT_string): Likewise.
(output_macinfo_op): Likewise.
(index_string): Likewise.
(output_index_string_offset): Likewise.
(output_index_string): Likewise.
(count_index_strings): Likewise.
 
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 340de5b..85a1a8b 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -246,7 +246,7 @@ static GTY (()) hash_table 
*debug_line_str_hash;
That is, the comp_dir and dwo_name will appear in both places.
 
2) Strings can use four forms: DW_FORM_string, DW_FORM_strp,
-   DW_FORM_line_strp or DW_FORM_GNU_str_index.
+   DW_FORM_line_strp or DW_FORM_strx/GNU_str_index.
 
3) GCC chooses the form to use late, depending on the size and
reference count.
@@ -1757,6 +1757,28 @@ dwarf_TAG (enum dwarf_tag tag)
   return tag;
 }
 
+/* And similarly for forms.  */
+static inline enum dwarf_form
+dwarf_FORM (enum dwarf_form form)
+{
+  switch (form)
+{
+case DW_FORM_addrx:
+  if (dwarf_version < 5)
+   return DW_FORM_GNU_addr_index;
+  break;
+
+case DW_FORM_strx:
+  if (dwarf_version < 5)
+   return DW_FORM_GNU_str_index;
+  break;
+
+default:
+  break;
+}
+  return form;
+}
+
 static unsigned long int get_base_type_offset (dw_die_ref);
 
 /* Return the size of a location descriptor.  */
@@ -4387,8 +4409,8 @@ AT_class (dw_attr_node *a)
 }
 
 /* Return the index for any attribute that will be referenced with a
-   DW_FORM_GNU_addr_index or DW_FORM_GNU_str_index.  String indices
-   are stored in dw_attr_val.v.val_str for reference counting
+   DW_FORM_addrx/GNU_addr_index or DW_FORM_strx/GNU_str_index.  String
+   indices are stored in dw_attr_val.v.val_str for reference counting
pruning.  */
 
 static inline unsigned int
@@ -4652,7 +4674,7 @@ set_indirect_string (struct indirect_string_node *node)
   /* Already indirect is a no op.  */
   if (node->form == DW_FORM_strp
   || node->form == DW_FORM_line_strp
-  || node->form == DW_FORM_GNU_str_index)
+  || node->form == dwarf_FORM (DW_FORM_strx))
 {
   gcc_assert (node->label);
   return;
@@ -4668,7 +4690,7 @@ set_indirect_string (struct indirect_string_node *node)
 }
   else
 {
-  node->form = DW_FORM_GNU_str_index;
+  node->form = dwarf_FORM (DW_FORM_strx);
   node->index = NO_INDEX_ASSIGNED;
 }
 }
@@ -4681,7 +4703,7 @@ int
 reset_indirect_string (indirect_string_node **h, void *)
 {
   struct indirect_string_node *node = *h;
-  if (node->form == DW_FORM_strp || node->form == DW_FORM_GNU_str_index)
+  if (node->form == DW_FORM_strp || node->form == dwarf_FORM (DW_FORM_strx))
 {
   free (node->label);
   node->label = NULL;
@@ -9419,7 +9441,7 @@ size_of_die (dw_die_ref die)
   form = AT_string_form (a);
  if (form == DW_FORM_strp || form == DW_FORM_line_strp)
size += DWARF_OFFSET_SIZE;
- else if (form == DW_FORM_GNU_str_index)
+ else if (form == dwarf_FORM (DW_FORM_strx))
size += size_of_uleb128 (AT_index (a));
  else
size += strlen (a->dw_attr_val.v.val_str->str) + 1;
@@ -9666,7 +9688,7 @@ value_format (dw_attr_node *a)
case DW_AT_entry_pc:
case DW_AT_trampoline:
   return (AT_index (a) == NOT_INDEXED
-  ? DW_FORM_addr : DW_FORM_GNU_addr_index);
+  ? DW_FORM_addr : dwarf_FORM (DW_FORM_addrx));
default:
  break;
}
@@ -9839,7 +9861,7 @@ value_format (dw_attr_node *a)
   return DW_FORM_data;
 case dw_val_class_lbl_id:
   return (AT_index (a) == NOT_INDEXED
-  ? DW_FORM_addr : DW_FORM_GNU_addr_index);
+  ? DW_FORM_addr : dwarf_FORM (DW_FORM_addrx));
 case dw_val_class_lineptr:
 case dw_val_class_macptr:
 case dw_val_class_loclistsptr:
@@ -10807,7 +10829,7 @@ output_die (dw_die_ref die)
   a->dw_attr_val.v.val_str->label,
   debug_line_str_section,
   "%s: \"%s\"", name, AT_string (a));
-  else if (a->dw_attr_val.v.val_str->form == DW_FORM_GNU_str_index)
+  else if (a->dw_attr_val.v.val_str->form == dwarf_FORM (DW_FORM_strx))
 dw2_asm_output_data_uleb128 

Re: [PATCH] DWARF: Add header for .debug_str_offsets table for dwarf_version 5.

2018-05-14 Thread Mark Wielaard
On Mon, 2018-04-30 at 14:34 +0200, Mark Wielaard wrote:
> DWARF5 defines a small header for .debug_str_offsets.  Since we only use
> it for split dwarf .dwo files we don't need to keep track of the actual
> index offset in an attribute.

Ping.

gcc/ChangeLog:

* dwarf2out.c (count_index_strings): New function.
(output_indirect_strings): Call count_index_strings and generate
header for dwarf_version >= 5.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index d2d4ec0..340de5b 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -28732,6 +28732,19 @@ output_index_string (indirect_string_node **h, 
unsigned int *cur_idx)
   return 1;
 }
 
+/* A helper function for output_indirect_strings.  Counts the number
+   of index strings offsets.  Must match the logic of the functions
+   output_index_string[_offsets] above.  */
+int
+count_index_strings (indirect_string_node **h, unsigned int *last_idx)
+{
+  struct indirect_string_node *node = *h;
+
+  if (node->form == DW_FORM_GNU_str_index && node->refcount > 0)
+    *last_idx += 1;
+  return 1;
+}
+
 /* A helper function for dwarf2out_finish called through
    htab_traverse.  Emit one queued .debug_str string.  */
 
@@ -28769,6 +28782,33 @@ output_indirect_strings (void)
  output_indirect_string> 
(DW_FORM_strp);
 
   switch_to_section (debug_str_offsets_section);
+  /* For DWARF5 the .debug_str_offsets[.dwo] section needs a unit
+    header.  Note that we don't need to generate a label to the
+    actual index table following the header here, because this is
+    for the split dwarf case only.  In an .dwo file there is only
+    one string offsets table (and one debug info section).  But
+    if we would start using string offset tables for the main (or
+    skeleton) unit, then we have to add a DW_AT_str_offsets_base
+    pointing to the actual index after the header.  Split dwarf
+    units will never have a string offsets base attribute.  When
+    a split unit is moved into a .dwp file the string offsets can
+    be found through the .debug_cu_index section table.  */
+  if (dwarf_version >= 5)
+   {
+ unsigned int last_idx = 0;
+ unsigned long str_offsets_length;
+
+ debug_str_hash->traverse_noresize
+    (_idx);
+ str_offsets_length = last_idx * DWARF_OFFSET_SIZE + 4;
+ if (DWARF_INITIAL_LENGTH_SIZE - DWARF_OFFSET_SIZE == 4)
+   dw2_asm_output_data (4, 0x,
+    "Escape value for 64-bit DWARF extension");
+ dw2_asm_output_data (DWARF_OFFSET_SIZE, str_offsets_length,
+  "Length of string offsets unit");
+ dw2_asm_output_data (2, 5, "DWARF string offsets version");
+ dw2_asm_output_data (2, 0, "Header zero padding");
+   }
   debug_str_hash->traverse_noresize
 ();
   switch_to_section (debug_str_dwo_section);

[PATCH][AArch64] Implement usadv16qi and ssadv16qi standard names

2018-05-14 Thread Kyrill Tkachov

Hi all,

This patch implements the usadv16qi and ssadv16qi standard names.
See the thread at on g...@gcc.gnu.org [1] for background.

The V16QImode variant is important to get right as it is the most commonly used 
pattern:
reducing vectors of bytes into an int.
The midend expects the optab to compute the absolute differences of operands 1 
and 2 and
reduce them while widening along the way up to SImode. So the inputs are 
V16QImode and
the output is V4SImode.

I've tried out a few different strategies for that, the one I settled with is 
to emit:
UABDL2tmp.8h, op1.16b, op2.16b
UABALtmp.8h, op1.16b, op2.16b
UADALPop3.4s, tmp.8h

To work through the semantics let's say operands 1 and 2 are:
op1 { a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15 }
op2 { b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15 }
op3 { c0, c1, c2, c3 }

The UABDL2 takes the upper V8QI elements, computes their absolute differences, 
widens them and stores them into the V8HImode tmp:

tmp { ABS(a[8]-b[8]), ABS(a[9]-b[9]), ABS(a[10]-b[10]), ABS(a[11]-b[11]), 
ABS(a[12]-b[12]), ABS(a[13]-b[13]), ABS(a[14]-b[14]), ABS(a[15]-b[15]) }

The UABAL after that takes the lower V8QI elements, computes their absolute 
differences, widens them and accumulates them into the V8HImode tmp from the 
previous step:

tmp { ABS(a[8]-b[8])+ABS (a[0]-b[0]), ABS(a[9]-b[9])+ABS(a[1]-b[1]), 
ABS(a[10]-b[10])+ABS(a[2]-b[2]), ABS(a[11]-b[11])+ABS(a[3]-b[3]), 
ABS(a[12]-b[12])+ABS(a[4]-b[4]), ABS(a[13]-b[13])+ABS(a[5]-b[5]), 
ABS(a[14]-b[14])+ABS(a[6]-b[6]), ABS(a[15]-b[15])+ABS(a[7]-b[7]) }

Finally the UADALP does a pairwise widening reduction and accumulation into the 
V4SImode op3:
op3 { c0+ABS(a[8]-b[8])+ABS(a[0]-b[0])+ABS(a[9]-b[9])+ABS(a[1]-b[1]), 
c1+ABS(a[10]-b[10])+ABS(a[2]-b[2])+ABS(a[11]-b[11])+ABS(a[3]-b[3]), 
c2+ABS(a[12]-b[12])+ABS(a[4]-b[4])+ABS(a[13]-b[13])+ABS(a[5]-b[5]), 
c3+ABS(a[14]-b[14])+ABS(a[6]-b[6])+ABS(a[15]-b[15])+ABS(a[7]-b[7]) }

(sorry for the text dump)

Remember, according to [1] the exact reduction sequence doesn't matter (for 
integer arithmetic at least).
I've considered other sequences as well (thanks Wilco), for example
* UABD + UADDLP + UADALP
* UABLD2 + UABDL + UADALP + UADALP

I ended up settling in the sequence in this patch as it's short (3 
instructions) and in the future we can potentially
look to optimise multiple occurrences of these into something even faster (for 
example accumulating into H registers for longer
before doing a single UADALP in the end to accumulate into the final S 
register).

If your microarchitecture has some some strong preferences for a particular 
sequence, please let me know or, even better, propose a patch
to parametrise the generation sequence by code (or the appropriate RTX cost).


This expansion allows the vectoriser to avoid unpacking the bytes in two steps 
and performing V4SI arithmetic on them.
So, for the code:

unsigned char pix1[N], pix2[N];

int foo (void)
{
  int i_sum = 0;
  int i;

  for (i = 0; i < 16; i++)
i_sum += __builtin_abs (pix1[i] - pix2[i]);

  return i_sum;
}

we now generate on aarch64:
foo:
adrpx1, pix1
add x1, x1, :lo12:pix1
moviv0.4s, 0
adrpx0, pix2
add x0, x0, :lo12:pix2
ldr q2, [x1]
ldr q3, [x0]
uabdl2  v1.8h, v2.16b, v3.16b
uabal   v1.8h, v2.8b, v3.8b
uadalp  v0.4s, v1.8h
addvs0, v0.4s
umovw0, v0.s[0]
ret


instead of:
foo:
adrpx1, pix1
adrpx0, pix2
add x1, x1, :lo12:pix1
add x0, x0, :lo12:pix2
ldr q0, [x1]
ldr q4, [x0]
ushll   v1.8h, v0.8b, 0
ushll2  v0.8h, v0.16b, 0
ushll   v2.8h, v4.8b, 0
ushll2  v4.8h, v4.16b, 0
usubl   v3.4s, v1.4h, v2.4h
usubl2  v1.4s, v1.8h, v2.8h
usubl   v2.4s, v0.4h, v4.4h
usubl2  v0.4s, v0.8h, v4.8h
abs v3.4s, v3.4s
abs v1.4s, v1.4s
abs v2.4s, v2.4s
abs v0.4s, v0.4s
add v1.4s, v3.4s, v1.4s
add v1.4s, v2.4s, v1.4s
add v0.4s, v0.4s, v1.4s
addvs0, v0.4s
umovw0, v0.s[0]
ret

So I expect this new expansion to be better than the status quo in any case.
Bootstrapped and tested on aarch64-none-linux-gnu.
This gives about 8% on 525.x264_r from SPEC2017 on a Cortex-A72.

Ok for trunk?

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc/2018-05/msg00070.html


2018-05-11  Kyrylo Tkachov  

* config/aarch64/aarch64.md ("unspec"): Define UNSPEC_SABAL,
UNSPEC_SABDL2, UNSPEC_SADALP, UNSPEC_UABAL, UNSPEC_UABDL2,
UNSPEC_UADALP values.
* config/aarch64/iterators.md (ABAL): New int iterator.
(ABDL2): Likewise.
(ADALP): Likewise.
(sur): Add mappings for the above.
* config/aarch64/aarch64-simd.md (aarch64_abdl2_3):
New define_insn.
(aarch64_abal_4): 

[PATCH] DWARF calculate the number of indexed addresses.

2018-05-14 Thread Mark Wielaard
The length in the .debug_addr unit header was calculated using the number
of elements in the addr_index_table. This is wrong because the entries in
the table are refcounted and only those with a refcount > 0 are actually
put in the index. Add a helper function count_index_addrs to get the
correct number of addresses in the index.

gcc/ChangeLog:

* dwarf2out.c (count_index_addrs): New function.
(dwarf2out_finish): Use count_index_addrs to calculate addrs_length.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 9c1d7c4..c05bfe4 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -28898,6 +28898,19 @@ output_addr_table_entry (addr_table_entry **slot, 
unsigned int *cur_index)
   return 1;
 }
 
+/* A helper function for dwarf2out_finish.  Counts the number
+   of indexed addresses.  Must match the logic of the functions
+   output_addr_table_entry above.  */
+int
+count_index_addrs (addr_table_entry **slot, unsigned int *last_idx)
+{
+  addr_table_entry *entry = *slot;
+
+  if (entry->refcount > 0)
+*last_idx += 1;
+  return 1;
+}
+
 /* Produce the .debug_addr section.  */
 
 static void
@@ -31393,8 +31406,12 @@ dwarf2out_finish (const char *)
 DWARF5 specifies a small header when address tables are used.  */
   if (dwarf_version >= 5)
{
- unsigned long addrs_length
-   = addr_index_table->elements () * DWARF2_ADDR_SIZE + 4;
+ unsigned int last_idx = 0;
+ unsigned long addrs_length;
+
+ addr_index_table->traverse_noresize
+(_idx);
+ addrs_length = last_idx * DWARF2_ADDR_SIZE + 4;
 
  if (DWARF_INITIAL_LENGTH_SIZE - DWARF_OFFSET_SIZE == 4)
dw2_asm_output_data (4, 0x,



[PATCH] DWARF: Use DW_OP_addrx and DW_OP_constx for DWARF5.

2018-05-14 Thread Mark Wielaard
For older DWARF and -gsplit-dwarf we want to emit DW_OP_GNU_addr_index
and DW_OP_GNU_const_index, but for DWARF5 we should use DW_OP_addrx
and DW_OP_constx.

gcc/ChangeLog:

* dwarf2out.c (dwarf_OP): Handle DW_OP_addrx and DW_OP_constx.
(size_of_loc_descr): Likewise.
(output_loc_operands): Likewise.
(output_loc_operands_raw): Likewise.
(dw_addr_op): Use dwarf_OP () for DW_OP_constx and DW_OP_addrx.
(resolve_addr_in_expr): Handle DW_OP_addrx and DW_OP_constx.
(hash_loc_operands): Likewise.
(compare_loc_operands): Likewise.
---
 gcc/ChangeLog   | 11 +++
 gcc/dwarf2out.c | 33 +
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8589065..6bc342d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2018-05-13  Mark Wielaard  
+
+   * dwarf2out.c (dwarf_OP): Handle DW_OP_addrx and DW_OP_constx.
+   (size_of_loc_descr): Likewise.
+   (output_loc_operands): Likewise.
+   (output_loc_operands_raw): Likewise.
+   (dw_addr_op): Use dwarf_OP () for DW_OP_constx and DW_OP_addrx.
+   (resolve_addr_in_expr): Handle DW_OP_addrx and DW_OP_constx.
+   (hash_loc_operands): Likewise.
+   (compare_loc_operands): Likewise.
+
 2018-04-30  Mark Wielaard  
 
* dwarf2out.c (dwarf_FORM): New function.
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 0d2af85..9c1d7c4 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -1657,6 +1657,16 @@ dwarf_OP (enum dwarf_location_atom op)
return DW_OP_GNU_reinterpret;
   break;
 
+case DW_OP_addrx:
+  if (dwarf_version < 5)
+   return DW_OP_GNU_addr_index;
+  break;
+
+case DW_OP_constx:
+  if (dwarf_version < 5)
+   return DW_OP_GNU_const_index;
+  break;
+
 default:
   break;
 }
@@ -1794,7 +1804,9 @@ size_of_loc_descr (dw_loc_descr_ref loc)
   size += DWARF2_ADDR_SIZE;
   break;
 case DW_OP_GNU_addr_index:
+case DW_OP_addrx:
 case DW_OP_GNU_const_index:
+case DW_OP_constx:
   gcc_assert (loc->dw_loc_oprnd1.val_entry->index != NO_INDEX_ASSIGNED);
   size += size_of_uleb128 (loc->dw_loc_oprnd1.val_entry->index);
   break;
@@ -2294,7 +2306,9 @@ output_loc_operands (dw_loc_descr_ref loc, int 
for_eh_or_skip)
   break;
 
 case DW_OP_GNU_addr_index:
+case DW_OP_addrx:
 case DW_OP_GNU_const_index:
+case DW_OP_constx:
   gcc_assert (loc->dw_loc_oprnd1.val_entry->index != NO_INDEX_ASSIGNED);
   dw2_asm_output_data_uleb128 (loc->dw_loc_oprnd1.val_entry->index,
"(index into .debug_addr)");
@@ -2525,7 +2539,9 @@ output_loc_operands_raw (dw_loc_descr_ref loc)
 {
 case DW_OP_addr:
 case DW_OP_GNU_addr_index:
+case DW_OP_addrx:
 case DW_OP_GNU_const_index:
+case DW_OP_constx:
 case DW_OP_implicit_value:
   /* We cannot output addresses in .cfi_escape, only bytes.  */
   gcc_unreachable ();
@@ -3925,10 +3941,10 @@ static inline enum dwarf_location_atom
 dw_addr_op (enum dtprel_bool dtprel)
 {
   if (dtprel == dtprel_true)
-return (dwarf_split_debug_info ? DW_OP_GNU_const_index
+return (dwarf_split_debug_info ? dwarf_OP (DW_OP_constx)
 : (DWARF2_ADDR_SIZE == 4 ? DW_OP_const4u : DW_OP_const8u));
   else
-return dwarf_split_debug_info ? DW_OP_GNU_addr_index : DW_OP_addr;
+return dwarf_split_debug_info ? dwarf_OP (DW_OP_addrx) : DW_OP_addr;
 }
 
 /* Return a pointer to a newly allocated address location description.  If
@@ -29746,9 +29762,14 @@ resolve_addr_in_expr (dw_attr_node *a, 
dw_loc_descr_ref loc)
  }
break;
   case DW_OP_GNU_addr_index:
+  case DW_OP_addrx:
   case DW_OP_GNU_const_index:
-   if (loc->dw_loc_opc == DW_OP_GNU_addr_index
-|| (loc->dw_loc_opc == DW_OP_GNU_const_index && loc->dtprel))
+  case DW_OP_constx:
+   if ((loc->dw_loc_opc == DW_OP_GNU_addr_index
+|| loc->dw_loc_opc == DW_OP_addrx)
+   || ((loc->dw_loc_opc == DW_OP_GNU_const_index
+|| loc->dw_loc_opc == DW_OP_constx)
+   && loc->dtprel))
   {
 rtx rtl = loc->dw_loc_oprnd1.val_entry->addr.rtl;
 if (!resolve_one_addr ())
@@ -30534,7 +30555,9 @@ hash_loc_operands (dw_loc_descr_ref loc, inchash::hash 
)
   inchash::add_rtx (val1->v.val_addr, hstate);
   break;
 case DW_OP_GNU_addr_index:
+case DW_OP_addrx:
 case DW_OP_GNU_const_index:
+case DW_OP_constx:
   {
 if (loc->dtprel)
   {
@@ -30775,7 +30798,9 @@ compare_loc_operands (dw_loc_descr_ref x, 
dw_loc_descr_ref y)
 hash_addr:
   return rtx_equal_p (valx1->v.val_addr, valy1->v.val_addr);
 case DW_OP_GNU_addr_index:
+case DW_OP_addrx:
 case DW_OP_GNU_const_index:
+case DW_OP_constx:
   {
 rtx ax1 = 

Re: [PATCH] Disallow minus in mem {+,-,&,|,^}= x; mem != 0 peepholes (PR target/85756)

2018-05-14 Thread Jakub Jelinek
On Mon, May 14, 2018 at 02:54:18PM +0200, Uros Bizjak wrote:
> > --- gcc/config/i386/i386.md.jj  2018-05-11 18:44:32.0 +0200
> > +++ gcc/config/i386/i386.md 2018-05-14 13:50:28.100482520 +0200
> > @@ -19397,11 +19397,11 @@
> >   (set (match_dup 0) (match_dup 2))])
> > (set (match_dup 1) (match_dup 0))]
> >"(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> > +   && GET_CODE (operands[2]) != MINUS
> 
> && COMMUTATIVE_ARITH_P (operands[2])

That works, but then we should change the peephole2 above it as well.
MINUS is the only non-commutative plusminuslogic_operator, so it doesn't
change any behavior.

BTW, wonder if we shouldn't allow giving peephole2's a name, it is too hard
to refer to them...

2018-05-14  Jakub Jelinek  

PR target/85756
* config/i386/i386.md: Disallow non-commutative arithmetics in
last twpeephole for mem {+,-,&,|,^}= x; mem != 0 after cmpelim
optimization.  Use COMMUTATIVE_ARITH_P test rather than != MINUS
in the peephole2 before it.

* gcc.c-torture/execute/pr85756.c: New test.

--- gcc/config/i386/i386.md.jj  2018-05-11 18:44:32.0 +0200
+++ gcc/config/i386/i386.md 2018-05-14 15:02:48.363662960 +0200
@@ -19367,7 +19367,7 @@ (define_peephole2
(set (match_dup 1) (match_dup 0))
(set (reg FLAGS_REG) (compare (match_dup 0) (const_int 0)))]
   "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
-   && GET_CODE (operands[2]) != MINUS
+   && COMMUTATIVE_ARITH_P (operands[2])
&& peep2_reg_dead_p (3, operands[0])
&& !reg_overlap_mentioned_p (operands[0], operands[1])
&& ix86_match_ccmode (peep2_next_insn (2),
@@ -19397,11 +19397,11 @@ (define_peephole2
  (set (match_dup 0) (match_dup 2))])
(set (match_dup 1) (match_dup 0))]
   "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+   && COMMUTATIVE_ARITH_P (operands[2])
&& peep2_reg_dead_p (2, operands[0])
&& !reg_overlap_mentioned_p (operands[0], operands[1])
&& ix86_match_ccmode (peep2_next_insn (0),
-(GET_CODE (operands[2]) == PLUS
- || GET_CODE (operands[2]) == MINUS)
+GET_CODE (operands[2]) == PLUS
 ? CCGOCmode : CCNOmode)"
   [(parallel [(set (match_dup 3) (match_dup 5))
  (set (match_dup 1) (match_dup 4))])]
--- gcc/testsuite/gcc.c-torture/execute/pr85756.c.jj2018-05-14 
13:50:44.384307289 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr85756.c   2018-05-14 
13:48:17.0 +0200
@@ -0,0 +1,50 @@
+/* PR target/85756 */
+
+#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 && __SIZEOF_INT__ == 4
+int a, c, *e, f, h = 10;
+short b;
+unsigned int p;
+
+__attribute__((noipa)) void
+bar (int a)
+{
+  asm volatile ("" : : "r" (a) : "memory");
+}
+
+void
+foo ()
+{
+  unsigned j = 1, m = 430523;
+  int k, n = 1, *l = 
+lab:
+  p = m;
+  m = -((~65535U | j) - n);
+  f = b << ~(n - 8);
+  n = (m || b) ^ f;
+  j = p;
+  if (p < m)
+*l = k < 3;
+  if (!n)
+l = 
+  if (c)
+{
+  bar (a);
+  goto lab;
+}
+  if (!*l)
+*e = 1;
+}
+
+int
+main ()
+{
+  foo ();
+  return 0;
+}
+#else
+int
+main ()
+{
+  return 0;
+}
+#endif


Jakub


Re: [PATCH] Disallow minus in mem {+,-,&,|,^}= x; mem != 0 peepholes (PR target/85756)

2018-05-14 Thread Uros Bizjak
On Mon, May 14, 2018 at 2:48 PM, Jakub Jelinek  wrote:
> Hi!
>
> The last peephole I've recently added is as the testcase shows fundamentally
> incompatible with non-commutative operations, because we need to swap the
> operands.
>
> The pattern right before this one already is:
> (define_peephole2
>   [(parallel [(set (match_operand:SWI 0 "register_operand")
>(match_operator:SWI 2 "plusminuslogic_operator"
>  [(match_dup 0)
>   (match_operand:SWI 1 "memory_operand")]))
>   (clobber (reg:CC FLAGS_REG))])
>(set (match_dup 1) (match_dup 0))
>(set (reg FLAGS_REG) (compare (match_dup 0) (const_int 0)))]
>   "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
>&& GET_CODE (operands[2]) != MINUS
>  disallow non-commutative plusminuslogic_operator
>&& peep2_reg_dead_p (3, operands[0])
>&& !reg_overlap_mentioned_p (operands[0], operands[1])
>&& ix86_match_ccmode (peep2_next_insn (2),
>  GET_CODE (operands[2]) == PLUS
> ^^ no need to check for MINUS here
>  ? CCGOCmode : CCNOmode)"
>   [(parallel [(set (match_dup 3) (match_dup 5))
>   (set (match_dup 1) (match_dup 4))])]
> {
>   operands[3] = SET_DEST (PATTERN (peep2_next_insn (2)));
>   operands[4]
> = gen_rtx_fmt_ee (GET_CODE (operands[2]), GET_MODE (operands[2]),
>   copy_rtx (operands[1]),
>   operands[0]);
> ^^^ and here is where it swaps the operands,
> ^^ the memory originally in the last input is here as the first 
> input
>   operands[5]
> = gen_rtx_COMPARE (GET_MODE (operands[3]),
>copy_rtx (operands[4]),
>const0_rtx);
> })
>
> The following patch handles the cmpelim peephole the same.  Ok for trunk?
>
> Unfortunately, I'm travelling and can't test it right now, Marek, do you
> think you could bootstrap/regtest it for me?  Thanks.

I'll take care of this.

Uros.


Re: [PATCH] PR 85075, Fix PowerPC __float182/__ibm128 types and mangling

2018-05-14 Thread Segher Boessenkool
On Tue, Apr 24, 2018 at 05:42:39PM +, Joseph Myers wrote:
> On Mon, 16 Apr 2018, Segher Boessenkool wrote:
> 
> > > The manglings that are now used are:
> > > 
> > > For -mabi=ieeelongdouble:
> > > 
> > >   __float128  "u10__float128"
> > >   __ibm128"u8__ibm128"
> > >   long double "u9__ieee128"
> > > 
> > > For -mabi=ibmlongdouble:
> > > 
> > >   __float128  "u10__float128"
> > >   __ibm128"u8__ibm128"
> > >   long double "g"
> > > 
> > > For -mlong-double-64:
> > > 
> > >   __float128  "u10__float128"
> > >   __ibm128"u8__ibm128"
> > >   long double "e"
> > 
> > If __float128 is the same type as _Float128, as which of those should
> > it be mangled?  The C++ ABI has "DF" for _FloatN, but the demangler uses
> > that same code for fixed points types.  Ugh.
> > 
> > I don't think mangling "long double" as "u9__ieee128" works well with for
> > example GDB.
> > 
> > Cc:ing Joseph Myers; Joseph, do you have any advice here?
> 
> I believe it has been stated that the goal is for IEEE long double not to 
> require separate multilibs of GCC's libraries; that is, for both libgcc 
> and libstdc++ to provide all the required functions under appropriate 
> names, whichever is the default when GCC is built, and for the right 
> functions to be used automatically.

Yes, as long as no incompatible ABI is required, and that is part of
why currently ieee128 is only supported on systems with VSX.  Supporting
it on other systems sould be akin to soft float.

> For libgcc, this is achieved by the existing *tf* names continuing to be 
> bound to IBM long double, and new *kf* names being used for IEEE long 
> double.  (I don't know if the correct functions now get built for both 
> choices of the default ABI - i.e. all *tf* functions, including those from 
> libgcc2.c, always being built for IBM long double even when TFmode is IEEE 
> long double, all *kf* functions always being built for IEEE long double, 
> and the compiler always generating calls to the correct functions for 
> IFmode, KFmode and TFmode, whichever format TFmode has.  But if there are 
> any remaining issues in this area, they are orthogonal to the C++ issues.)

I checked, and it seems to work correctly.

> For libstdc++, avoiding multilibs means the same set of mangled names 
> should be present, with the same ABIs, regardless of the default ABI.  

Yes.

> This is why, for example, the existing IBM long double uses "g" rather 
> than "e" (the normal long double mangling) - because when long double 
> originally had the same ABI as double, that meant "e" symbols were used 
> for that 64-bit long double.

So we need a new mangling for IEEE 128-bit float.  Mangling it the same
as the "__ieee128" type is probably going to cause problems, and at the
very least it is a confusing name.

(The demangler shows "e" as "long double", but it shows "g" as "__float128",
which is wrong; adding "__ieee128" to the mix will not help).

> Returning to the long double, __ibm128 and __float128 types, I suppose 
> there are questions:
> 
> * Which types need to be different types at the C++ language level?  If 
> people do want to use these types in templates and overloads, I suppose 
> that indicates always having three separate types as you suggest.
> 
> * Where does __ieee128 fit in?
> 
> * Where does __typeof (__builtin_inff128 ()) (GCC's internal _Float128 
> type, cf. bug 85518) fit in?
> 
> * What is the mangling for all those types?
> 
> * Which types are supported by libstdc++ (in that it's valid to use them 
> as arguments to libstdc++ functions / templates)?
> 
> * Which function names are provided by libstdc++ (possibly as aliases)?

Yes, lots of questions :-(

> The "DF_" mangling was introduced to distinguish _Float16 from 
> __fp16, see .  *If* 
> it were decided to address bug 85518 by generically supporting mangling 
> for the not-quite-hidden-for-C++ _FloatN types (as opposed to arranging 
> for float64_type_node etc. to actually be copies of double etc. for C++, 
> or for the types to otherwise be completely hidden), I suppose it would be 
> natural to use that mangling for those types - but the C++ ABI would need 
> to be extended to cover _FloatNx as well if that bug is to be completely 
> fixed that way.  (And this ignores the fixed-point issue - I don't know 
> why that support is in the demangler, since we don't support fixed-point 
> for C++, though I suppose mode attributes might allow fixed-point to creep 
> into C++ code anyway - bug 39059 was constants formerly allowing them into 
> C++.)

And another, in generic code.

> As I understand the proposed patch, it would mean __float128 and __ieee128 
> are always aliases for the not-quite-hidden _Float128 type.  In that case, 
> generic DF128_ mangling would apply to those.  __ibm128, being always 
> different from long double after the patch, would then have u8__ibm128 
> mangling as suggested.  

Re: [PATCH] Disallow minus in mem {+,-,&,|,^}= x; mem != 0 peepholes (PR target/85756)

2018-05-14 Thread Uros Bizjak
On Mon, May 14, 2018 at 2:48 PM, Jakub Jelinek  wrote:
> Hi!
>
> The last peephole I've recently added is as the testcase shows fundamentally
> incompatible with non-commutative operations, because we need to swap the
> operands.
>
> The pattern right before this one already is:
> (define_peephole2
>   [(parallel [(set (match_operand:SWI 0 "register_operand")
>(match_operator:SWI 2 "plusminuslogic_operator"
>  [(match_dup 0)
>   (match_operand:SWI 1 "memory_operand")]))
>   (clobber (reg:CC FLAGS_REG))])
>(set (match_dup 1) (match_dup 0))
>(set (reg FLAGS_REG) (compare (match_dup 0) (const_int 0)))]
>   "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
>&& GET_CODE (operands[2]) != MINUS
>  disallow non-commutative plusminuslogic_operator
>&& peep2_reg_dead_p (3, operands[0])
>&& !reg_overlap_mentioned_p (operands[0], operands[1])
>&& ix86_match_ccmode (peep2_next_insn (2),
>  GET_CODE (operands[2]) == PLUS
> ^^ no need to check for MINUS here
>  ? CCGOCmode : CCNOmode)"
>   [(parallel [(set (match_dup 3) (match_dup 5))
>   (set (match_dup 1) (match_dup 4))])]
> {
>   operands[3] = SET_DEST (PATTERN (peep2_next_insn (2)));
>   operands[4]
> = gen_rtx_fmt_ee (GET_CODE (operands[2]), GET_MODE (operands[2]),
>   copy_rtx (operands[1]),
>   operands[0]);
> ^^^ and here is where it swaps the operands,
> ^^ the memory originally in the last input is here as the first 
> input
>   operands[5]
> = gen_rtx_COMPARE (GET_MODE (operands[3]),
>copy_rtx (operands[4]),
>const0_rtx);
> })
>
> The following patch handles the cmpelim peephole the same.  Ok for trunk?
>
> Unfortunately, I'm travelling and can't test it right now, Marek, do you
> think you could bootstrap/regtest it for me?  Thanks.
>
> 2018-05-14  Jakub Jelinek  
>
> PR target/85756
> * config/i386/i386.md: Disallow MINUS in last peephole for
> mem {+,-,&,|,^}= x; mem != 0 after cmpelim optimization.
>
> * gcc.c-torture/execute/pr85756.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2018-05-11 18:44:32.0 +0200
> +++ gcc/config/i386/i386.md 2018-05-14 13:50:28.100482520 +0200
> @@ -19397,11 +19397,11 @@
>   (set (match_dup 0) (match_dup 2))])
> (set (match_dup 1) (match_dup 0))]
>"(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> +   && GET_CODE (operands[2]) != MINUS

&& COMMUTATIVE_ARITH_P (operands[2])

Uros.


[PATCH 3/3] Come up with new --completion option.

2018-05-14 Thread Martin Liška
Main part where I still need to write ChangeLog file and
gcc.sh needs to be moved to bash-completions project.

Martin
>From cb544b3136559eb3710790dcd582d7bbf36d3792 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 14 May 2018 11:49:52 +0200
Subject: [PATCH 3/3] Come up with new --completion option.

---
 gcc.sh   |  51 +
 gcc/common.opt   |   4 +
 gcc/gcc.c|  15 +++
 gcc/opt-suggestions.c| 290 +--
 gcc/opt-suggestions.h|  15 ++-
 gcc/opts.c   |   3 +
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.c   |  33 ++
 gcc/selftest.h   |  21 
 9 files changed, 418 insertions(+), 15 deletions(-)
 create mode 100644 gcc.sh

diff --git a/gcc.sh b/gcc.sh
new file mode 100644
index 000..06b16b3152b
--- /dev/null
+++ b/gcc.sh
@@ -0,0 +1,51 @@
+# Please add "source /path/to/bash-autocomplete.sh" to your .bashrc to use this.
+
+log()
+{
+  echo $1 >> /tmp/bash-completion.log
+}
+
+_gcc()
+{
+local cur prev prev2 words cword argument prefix
+_init_completion || return
+_expand || return
+
+# extract also for situations like: -fsanitize=add
+if [[ $cword > 2 ]]; then
+  prev2="${COMP_WORDS[$cword - 2]}"
+fi
+
+log "cur: '$cur', prev: '$prev': prev2: '$prev2' cword: '$cword'"
+
+# sample: -fsan
+if [[ "$cur" == -* ]]; then
+  argument=$cur
+# sample: -fsanitize=
+elif [[ "$cur" == "=" && $prev == -* ]]; then
+  argument=$prev$cur
+  prefix=$prev$cur
+# sample: -fsanitize=add
+elif [[ "$prev" == "=" && $prev2 == -* ]]; then
+  argument=$prev2$prev$cur
+  prefix=$prev2$prev
+# sample: --param lto-
+elif [[ "$prev" == "--param" ]]; then
+  argument="$prev $cur"
+  prefix="$prev "
+fi
+
+log  "argument: '$argument', prefix: '$prefix'"
+
+if [[ "$argument" == "" ]]; then
+  _filedir
+else
+  # In situation like '-fsanitize=add' $cur is equal to last token.
+  # Thus we need to strip the beginning of suggested option.
+  flags=$( gcc --completion="$argument" 2>/dev/null | sed "s/^$prefix//")
+  log "compgen: $flags"
+  [[ "${flags: -1}" == '=' ]] && compopt -o nospace 2> /dev/null
+  COMPREPLY=( $( compgen -W "$flags" -- "") )
+fi
+}
+complete -F _gcc gcc
diff --git a/gcc/common.opt b/gcc/common.opt
index d6ef85928f3..c57edfb9c94 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -254,6 +254,10 @@ Driver Alias(S)
 -compile
 Driver Alias(c)
 
+-completion=
+Common Driver Joined Undocumented
+Provide bash completion for options starting with provided string.
+
 -coverage
 Driver Alias(coverage)
 
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 08e76c92df4..c02c2d73b12 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -221,6 +221,10 @@ static int print_help_list;
 
 static int print_version;
 
+/* Flag that stores string value for which we provide bash completion.  */
+
+static const char *completion = NULL;
+
 /* Flag indicating whether we should ONLY print the command and
arguments (like verbose_flag) without executing the command.
Displayed arguments are quoted so that the generated command
@@ -3819,6 +3823,11 @@ driver_handle_option (struct gcc_options *opts,
   add_linker_option ("--version", strlen ("--version"));
   break;
 
+case OPT__completion_:
+  validated = true;
+  completion = decoded->arg;
+  break;
+
 case OPT__help:
   print_help_list = 1;
 
@@ -7292,6 +7301,12 @@ driver::main (int argc, char **argv)
   maybe_putenv_OFFLOAD_TARGETS ();
   handle_unrecognized_options ();
 
+  if (completion)
+{
+  m_option_proposer.suggest_completion (completion);
+  return 0;
+}
+
   if (!maybe_print_and_exit ())
 return 0;
 
diff --git a/gcc/opt-suggestions.c b/gcc/opt-suggestions.c
index e322fcd91c5..2da094a5960 100644
--- a/gcc/opt-suggestions.c
+++ b/gcc/opt-suggestions.c
@@ -28,18 +28,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "opt-suggestions.h"
 #include "selftest.h"
 
-option_proposer::~option_proposer ()
-{
-  if (m_option_suggestions)
-{
-  int i;
-  char *str;
-  FOR_EACH_VEC_ELT (*m_option_suggestions, i, str)
-   free (str);
-  delete m_option_suggestions;
-}
-}
-
 const char *
 option_proposer::suggest_option (const char *bad_opt)
 {
@@ -54,6 +42,58 @@ option_proposer::suggest_option (const char *bad_opt)
  (auto_vec  *) m_option_suggestions);
 }
 
+void
+option_proposer::get_completions (const char *option_prefix,
+  auto_string_vec )
+{
+  /* Bail out for an invalid input.  */
+  if (option_prefix == NULL || option_prefix[0] == '\0')
+return;
+
+  /* Option suggestions are built without first leading dash character.  */
+  if (option_prefix[0] == '-')
+option_prefix++;
+
+  size_t length = strlen (option_prefix);
+
+  /* Handle parameters.  */
+  const char *prefix = "-param";
+  if (length >= strlen (prefix)

[PATCH 2/3] Refactoring to opt-suggestions.[ch].

2018-05-14 Thread Martin Liška
Second part refactors function from gcc.c into a new class
option_proposer.

Martin
>From c42bb9072242ab44884a71effee9398c6bcf998e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 14 May 2018 11:47:08 +0200
Subject: [PATCH 2/3] Refactoring to opt-suggestions.[ch].

gcc/ChangeLog:

2018-05-14  Martin Liska  

	* Makefile.in: Add opt-suggestions.o.
	* gcc-main.c: Include opt-suggestions.h.
	* gcc.c (driver::driver): Likewise.
	(driver::~driver): Remove m_option_suggestions.
	(driver::build_option_suggestions): Moved to option_proposer.
	(driver::suggest_option): Likewise.
	(driver::handle_unrecognized_options): Use option_proposer.
	* gcc.h (class driver): Add new memver m_option_proposer.
	* opt-suggestions.c: New file.
	* opt-suggestions.h: New file.

gcc/c-family/ChangeLog:

2018-05-14  Martin Liska  

	* cppspec.c: Include opt-suggestions.h.

gcc/fortran/ChangeLog:

2018-05-14  Martin Liska  

	* gfortranspec.c: Include opt-suggestions.h.

gcc/jit/ChangeLog:

2018-05-14  Martin Liska  

	* jit-playback.c: Include opt-suggestions.h.
---
 gcc/Makefile.in|   2 +-
 gcc/c-family/cppspec.c |   1 +
 gcc/fortran/gfortranspec.c |   1 +
 gcc/gcc-main.c |   1 +
 gcc/gcc.c  | 114 ++-
 gcc/gcc.h  |   4 +-
 gcc/jit/jit-playback.c |   1 +
 gcc/opt-suggestions.c  | 129 +
 gcc/opt-suggestions.h  |  59 +
 9 files changed, 197 insertions(+), 115 deletions(-)
 create mode 100644 gcc/opt-suggestions.c
 create mode 100644 gcc/opt-suggestions.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8ec0511704d..52d485cf36f 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1617,7 +1617,7 @@ OBJS-libcommon = diagnostic.o diagnostic-color.o diagnostic-show-locus.o \
 # compiler and containing target-dependent code.
 OBJS-libcommon-target = $(common_out_object_file) prefix.o params.o \
 	opts.o opts-common.o options.o vec.o hooks.o common/common-targhooks.o \
-	hash-table.o file-find.o spellcheck.o selftest.o
+	hash-table.o file-find.o spellcheck.o selftest.o opt-suggestions.o
 
 # This lists all host objects for the front ends.
 ALL_HOST_FRONTEND_OBJS = $(foreach v,$(CONFIG_LANGUAGES),$($(v)_OBJS))
diff --git a/gcc/c-family/cppspec.c b/gcc/c-family/cppspec.c
index 1e0a8bcd294..66540239f53 100644
--- a/gcc/c-family/cppspec.c
+++ b/gcc/c-family/cppspec.c
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
+#include "opt-suggestions.h"
 #include "gcc.h"
 #include "opts.h"
 
diff --git a/gcc/fortran/gfortranspec.c b/gcc/fortran/gfortranspec.c
index fe1ec0447b3..4ba3a8dba60 100644
--- a/gcc/fortran/gfortranspec.c
+++ b/gcc/fortran/gfortranspec.c
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
+#include "opt-suggestions.h"
 #include "gcc.h"
 #include "opts.h"
 
diff --git a/gcc/gcc-main.c b/gcc/gcc-main.c
index 9e6de743adc..6a759bbcc1c 100644
--- a/gcc/gcc-main.c
+++ b/gcc/gcc-main.c
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "obstack.h"
 #include "intl.h"
 #include "prefix.h"
+#include "opt-suggestions.h"
 #include "gcc.h"
 
 /* Implement the top-level "main" within the driver in terms of
diff --git a/gcc/gcc.c b/gcc/gcc.c
index a716f708259..08e76c92df4 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -36,6 +36,7 @@ compilation is specified by a string called a "spec".  */
 #include "obstack.h"
 #include "intl.h"
 #include "prefix.h"
+#include "opt-suggestions.h"
 #include "gcc.h"
 #include "diagnostic.h"
 #include "flags.h"
@@ -7262,8 +7263,7 @@ compare_files (char *cmpfile[])
 
 driver::driver (bool can_finalize, bool debug) :
   explicit_link_files (NULL),
-  decoded_options (NULL),
-  m_option_suggestions (NULL)
+  decoded_options (NULL)
 {
   env.init (can_finalize, debug);
 }
@@ -7272,14 +7272,6 @@ driver::~driver ()
 {
   XDELETEVEC (explicit_link_files);
   XDELETEVEC (decoded_options);
-  if (m_option_suggestions)
-{
-  int i;
-  char *str;
-  FOR_EACH_VEC_ELT (*m_option_suggestions, i, str)
-	free (str);
-  delete m_option_suggestions;
-}
 }
 
 /* driver::main is implemented as a series of driver:: method calls.  */
@@ -7768,106 +7760,6 @@ driver::maybe_putenv_OFFLOAD_TARGETS () const
   offload_targets = NULL;
 }
 
-/* Helper function for driver::suggest_option.  Populate
-   m_option_suggestions with candidate strings for misspelled options.
-   The strings will be freed by the driver's dtor.  */
-
-void
-driver::build_option_suggestions (void)
-{
-  gcc_assert (m_option_suggestions == NULL);
-  m_option_suggestions = new auto_vec  ();
-
-  /* We build a vec of m_option_suggestions, using add_misspelling_candidates
- to add copies of strings, without 

[PATCH 1/3] Introduce auto_string_vec class.

2018-05-14 Thread Martin Liška
First part with introduction of auto_string_vec class.

Martin
>From c05ad6d3715fcc99e94dbe2fd3fa1ef94552bea4 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 14 May 2018 14:00:07 +0200
Subject: [PATCH 1/3] Introduce auto_string_vec class.

gcc/ChangeLog:

2018-05-14  Martin Liska  

	* vec.h (class auto_string_vec): New (moved from auto_argvec).
	(auto_string_vec::~auto_string_vec): Likewise.

gcc/jit/ChangeLog:

2018-05-14  Martin Liska  

	* jit-playback.c (class auto_argvec): Moved to vec.h.
	(auto_argvec::~auto_argvec): Likewise.
	(compile): Use the renamed name.
	(invoke_driver): Likewise.
---
 gcc/jit/jit-playback.c | 24 ++--
 gcc/vec.h  | 39 +++
 2 files changed, 29 insertions(+), 34 deletions(-)

diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
index 258ebe8ef86..01c4567de05 100644
--- a/gcc/jit/jit-playback.c
+++ b/gcc/jit/jit-playback.c
@@ -1749,26 +1749,6 @@ block (function *func,
   m_label_expr = NULL;
 }
 
-/* A subclass of auto_vec  that frees all of its elements on
-   deletion.  */
-
-class auto_argvec : public auto_vec 
-{
- public:
-  ~auto_argvec ();
-};
-
-/* auto_argvec's dtor, freeing all contained strings, automatically
-   chaining up to ~auto_vec , which frees the internal buffer.  */
-
-auto_argvec::~auto_argvec ()
-{
-  int i;
-  char *str;
-  FOR_EACH_VEC_ELT (*this, i, str)
-free (str);
-}
-
 /* Compile a playback::context:
 
- Use the context's options to cconstruct command-line options, and
@@ -1822,7 +1802,7 @@ compile ()
   /* Acquire the JIT mutex and set "this" as the active playback ctxt.  */
   acquire_mutex ();
 
-  auto_argvec fake_args;
+  auto_string_vec fake_args;
   make_fake_args (_args, ctxt_progname, _dumps);
   if (errors_occurred ())
 {
@@ -2440,7 +2420,7 @@ invoke_driver (const char *ctxt_progname,
   /* Currently this lumps together both assembling and linking into
  TV_ASSEMBLE.  */
   auto_timevar assemble_timevar (get_timer (), tv_id);
-  auto_argvec argvec;
+  auto_string_vec argvec;
 #define ADD_ARG(arg) argvec.safe_push (xstrdup (arg))
 
   ADD_ARG (gcc_driver_name);
diff --git a/gcc/vec.h b/gcc/vec.h
index 2d1f468ca1c..905985c41b0 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -1447,19 +1447,11 @@ public:
   ~auto_vec () { this->release (); }
 };
 
-
-/* Allocate heap memory for pointer V and create the internal vector
-   with space for NELEMS elements.  If NELEMS is 0, the internal
-   vector is initialized to empty.  */
-
-template
-inline void
-vec_alloc (vec *, unsigned nelems CXX_MEM_STAT_INFO)
+class auto_string_vec : public auto_vec 
 {
-  v = new vec;
-  v->create (nelems PASS_MEM_STAT);
-}
-
+ public:
+  ~auto_string_vec ();
+};
 
 /* Conditionally allocate heap memory for VEC and its internal vector.  */
 
@@ -1553,6 +1545,29 @@ vec::iterate (unsigned ix, T **ptr) const
vec_safe_iterate ((V), (I), &(P));	\
(I)--)
 
+/* auto_string_vec's dtor, freeing all contained strings, automatically
+   chaining up to ~auto_vec , which frees the internal buffer.  */
+
+inline
+auto_string_vec::~auto_string_vec ()
+{
+  int i;
+  char *str;
+  FOR_EACH_VEC_ELT (*this, i, str)
+free (str);
+}
+
+/* Allocate heap memory for pointer V and create the internal vector
+   with space for NELEMS elements.  If NELEMS is 0, the internal
+   vector is initialized to empty.  */
+
+template
+inline void
+vec_alloc (vec *, unsigned nelems CXX_MEM_STAT_INFO)
+{
+  v = new vec;
+  v->create (nelems PASS_MEM_STAT);
+}
 
 /* Return a copy of this vector.  */
 
-- 
2.16.3



[PATCH] Disallow minus in mem {+,-,&,|,^}= x; mem != 0 peepholes (PR target/85756)

2018-05-14 Thread Jakub Jelinek
Hi!

The last peephole I've recently added is as the testcase shows fundamentally
incompatible with non-commutative operations, because we need to swap the
operands.

The pattern right before this one already is:
(define_peephole2
  [(parallel [(set (match_operand:SWI 0 "register_operand")
   (match_operator:SWI 2 "plusminuslogic_operator"
 [(match_dup 0)
  (match_operand:SWI 1 "memory_operand")]))
  (clobber (reg:CC FLAGS_REG))])
   (set (match_dup 1) (match_dup 0))
   (set (reg FLAGS_REG) (compare (match_dup 0) (const_int 0)))]
  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
   && GET_CODE (operands[2]) != MINUS
 disallow non-commutative plusminuslogic_operator
   && peep2_reg_dead_p (3, operands[0])
   && !reg_overlap_mentioned_p (operands[0], operands[1])
   && ix86_match_ccmode (peep2_next_insn (2),
 GET_CODE (operands[2]) == PLUS
^^ no need to check for MINUS here
 ? CCGOCmode : CCNOmode)"
  [(parallel [(set (match_dup 3) (match_dup 5))
  (set (match_dup 1) (match_dup 4))])]
{
  operands[3] = SET_DEST (PATTERN (peep2_next_insn (2)));
  operands[4]
= gen_rtx_fmt_ee (GET_CODE (operands[2]), GET_MODE (operands[2]),
  copy_rtx (operands[1]),
  operands[0]);
^^^ and here is where it swaps the operands,
^^ the memory originally in the last input is here as the first 
input
  operands[5]
= gen_rtx_COMPARE (GET_MODE (operands[3]),
   copy_rtx (operands[4]),
   const0_rtx);
})

The following patch handles the cmpelim peephole the same.  Ok for trunk?

Unfortunately, I'm travelling and can't test it right now, Marek, do you
think you could bootstrap/regtest it for me?  Thanks.

2018-05-14  Jakub Jelinek  

PR target/85756
* config/i386/i386.md: Disallow MINUS in last peephole for
mem {+,-,&,|,^}= x; mem != 0 after cmpelim optimization.

* gcc.c-torture/execute/pr85756.c: New test.

--- gcc/config/i386/i386.md.jj  2018-05-11 18:44:32.0 +0200
+++ gcc/config/i386/i386.md 2018-05-14 13:50:28.100482520 +0200
@@ -19397,11 +19397,11 @@
  (set (match_dup 0) (match_dup 2))])
(set (match_dup 1) (match_dup 0))]
   "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+   && GET_CODE (operands[2]) != MINUS
&& peep2_reg_dead_p (2, operands[0])
&& !reg_overlap_mentioned_p (operands[0], operands[1])
&& ix86_match_ccmode (peep2_next_insn (0),
-(GET_CODE (operands[2]) == PLUS
- || GET_CODE (operands[2]) == MINUS)
+GET_CODE (operands[2]) == PLUS
 ? CCGOCmode : CCNOmode)"
   [(parallel [(set (match_dup 3) (match_dup 5))
  (set (match_dup 1) (match_dup 4))])]
--- gcc/testsuite/gcc.c-torture/execute/pr85756.c.jj2018-05-14 
13:50:44.384307289 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr85756.c   2018-05-14 
13:48:17.0 +0200
@@ -0,0 +1,50 @@
+/* PR target/85756 */
+
+#if __CHAR_BIT__ == 8 && __SIZEOF_SHORT__ == 2 && __SIZEOF_INT__ == 4
+int a, c, *e, f, h = 10;
+short b;
+unsigned int p;
+
+__attribute__((noipa)) void
+bar (int a)
+{
+  asm volatile ("" : : "r" (a) : "memory");
+}
+
+void
+foo ()
+{
+  unsigned j = 1, m = 430523;
+  int k, n = 1, *l = 
+lab:
+  p = m;
+  m = -((~65535U | j) - n);
+  f = b << ~(n - 8);
+  n = (m || b) ^ f;
+  j = p;
+  if (p < m)
+*l = k < 3;
+  if (!n)
+l = 
+  if (c)
+{
+  bar (a);
+  goto lab;
+}
+  if (!*l)
+*e = 1;
+}
+
+int
+main ()
+{
+  foo ();
+  return 0;
+}
+#else
+int
+main ()
+{
+  return 0;
+}
+#endif

Jakub


Re: RFC: bash completion

2018-05-14 Thread Martin Liška
On 04/26/2018 01:13 AM, David Malcolm wrote:
> [moving from gcc to gcc-patches mailing list]
> 
> On Wed, 2018-04-25 at 15:12 +0200, Martin Liška wrote:
>> On 04/24/2018 06:27 PM, David Malcolm wrote:
>>> On Tue, 2018-04-24 at 16:45 +0200, Martin Liška wrote:
 Hi.

 Some time ago, I investigated quite new feature of clang which
 is support of --autocomplete argument. That can be run from bash
 completion
 script and one gets more precise completion hints:

 http://blog.llvm.org/2017/09/clang-bash-better-auto-completion-is
 .htm
 l
 https://www.youtube.com/watch?v=zLPwPdZBpSY

 I like the idea and I would describe how is that better to
 current
 GCC completion
 script sitting here:
 https://github.com/scop/bash-completion/blob/master/completions/g
 cc

 1) gcc -fsanitize=^ - no support for option enum values
 2) gcc -fno-sa^ - no support for negative options
 3) gcc --param=^ - no support for param names

 These are main limitations I see. I'm attaching working
 prototype,
 which you
 can test by installed GCC, setting it on $PATH and doing:
 $ source gcc.sh

 Then bash completion is provided via the newly added option. Some
 examples:

 1)
 $ gcc -fsanitize=
 addressbounds enum   
 
 integer-divide-by-zero nonnull-
 attribute  pointer-
 comparereturn shift-
 base thread vla-bound
 alignment  bounds-strict  float-cast-
 overflowkernel-
 address null   pointer-
 overflow   returns-nonnull-attribute  shift-
 exponent undefined  vptr
 bool   builtinfloat-
 divide-
 by-zero   leak   object-
 sizepointer-
 subtract   shift  signed-integer-
 overflowunreachable

 2)
 $ gcc -fno-ipa-
 -fno-ipa-bit-cp -fno-ipa-cp-alignment   -fno-ipa-
 icf-fno-ipa-icf-variables  -fno-ipa-profile-
 fno-
 ipa-pure-const -fno-ipa-reference  -fno-ipa-struct-
 reorg   
 -fno-ipa-cp -fno-ipa-cp-clone   -fno-ipa-icf-
 functions  -fno-ipa-matrix-reorg   -fno-ipa-pta-fno-
 ipa-
 ra -fno-ipa-sra-fno-ipa-vrp

 3)
 $ gcc --param=lto-
 lto-max-partition  lto-min-partition  lto-partitions

 4)
 gcc --param lto-
 lto-max-partition  lto-min-partition  lto-partitions 

 The patch is quite lean and if people like, I will prepare a
 proper
 patch submission. I know about some limitations that can be then
 handled incrementally.

 Thoughts?
 Martin
>>>
>>> Overall, looks good (albeit with various nits).  I like how you're
>>> reusing the m_option_suggestions machinery from the misspelled
>>> options
>>> code.  There are some awkward issues e.g. arch-specific
>>> completions,
>>> lang-specific completions, custom option-handling etc, but given
>>> that
>>> as-is this patch seems to be an improvement over the status quo,
>>> I'd
>>> prefer to tackle those later.
>>
>> I'm sending second version of the patch. I did isolation of
>> m_option_suggestions machinery
>> to a separate file. Mainly due to selftests that are linked with cc1.
>>
>>>
>>> The patch doesn't have tests.  There would need to be some way to
>>> achieve test coverage for the completion code (especially as we
>>> start
>>> to tackle the more interesting cases).  I wonder what the best way
>>> to
>>> do that is; perhaps a combination of selftest and DejaGnu?  (e.g.
>>> what
>>> about arch-specific completions? what about the interaction with
>>> bash?
>>> etc)
>>
>> For now I come up with quite some selftests. Integration with
>> bash
>> would be challenging.
>>
>>>
>>> A few nits:
>>> * Do we want to hardcode that logging path in gcc.sh?
>>
>> Sure, that needs to be purged. Crucial question here is where the
>> gcc.sh script
>> should live. Note that clang has it in: ./tools/clang/utils/bash-
>> autocomplete.sh
>> and:
>>
>> head -n1 ./tools/clang/utils/bash-autocomplete.sh
>> # Please add "source /path/to/bash-autocomplete.sh" to your .bashrc
>> to use this.
>>
>> Which is not ideal. I would prefer to integrate the script into:
>> https://github.com/scop/bash-completion/blob/master/completions/gcc
>>
>> Thoughts?

Hi David.

Thanks a lot for so detailed review feedback. If not said otherwise I adapted
all your suggestions.

> 
> Maybe our goal should be to update that upstream bash-completion script
> so that it uses "--completion" if it exists (for newer GCCs), falling
> back to their existing 

Re: PR85734

2018-05-14 Thread Prathamesh Kulkarni
On 14 May 2018 at 15:06, Richard Biener  wrote:
> On Mon, 14 May 2018, Prathamesh Kulkarni wrote:
>
>> On 14 May 2018 at 14:46, Richard Biener  wrote:
>> > On Mon, 14 May 2018, Prathamesh Kulkarni wrote:
>> >
>> >> Hi,
>> >> The attached patch tries to fix PR85734, by gating on
>> >> !function_always_visible_to_compiler_p.
>> >> Bootstrap+test in progress on x86_64.
>> >> OK to commit if passes ?
>> >
>> > This looks redundant - suggest_attribute does that very same check
>> > but warn_function_malloc passes it 'false' as known_finite
>> > (warn_function_noreturn passes it true for example).
>> Indeed, thanks for pointing it out.
>> >
>> > So I suggest to simply pass true as that arg.
>> Is the attached version OK to commit after bootstrap+test ?
>
> Yes.
Also on a related bug PR85562, should we change the wording of
suggest_attribute for malloc
to omit the text "if it is known to return normally" ?

Thanks,
Prathamesh
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > Richard
>> >
>> >> Thanks,
>> >> Prathamesh
>> >>
>> >
>> > --
>> > Richard Biener 
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
>> > 21284 (AG Nuernberg)
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [RFC] Improve tree DSE

2018-05-14 Thread Richard Biener
On Mon, May 14, 2018 at 5:32 AM, Kugan Vivekanandarajah
 wrote:
> Hi Richard,
>
>>> Given the simple testcases you add I wonder if you want a cheaper
>>> implementation,
>>> namely check that when reaching a loop PHI the only aliasing stmt in
>>> its use-chain
>>> is the use_stmt you reached the PHI from.  That would avoid this and the 
>>> tests
>>> for the store being redundant and simplify the patch considerably.
>
> Tried implementing above in the attached patch.  Bootstrapped on
> x86_64-linux-gnu. Full testing is ongoing.

I think your phi_aliases_stmt_only is equal to

   has_single_use (PHI_RESULT (phi))
   && PHI_RESULT (phi) == gimple_vuse (defvar_def)

that is, we are sure we have

  phidef = PHI < , ... defvar>

  # defvar = VUSE 
  defvar_def stmt

testing it this way is also more clearly matchign the intended
structure.  Maybe you can
add the above IL picture as comment.

OK with that changes.

Richard.

> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2018-05-14  Kugan Vivekanandarajah  
>
> * tree-ssa-dse.c (phi_aliases_stmt_only): New.
> (dse_classify_store): Use phi_aliases_stmt_only.
>
> gcc/testsuite/ChangeLog:
>
> 2018-05-14  Kugan Vivekanandarajah  
>
> * gcc.dg/tree-ssa/ssa-dse-31.c: New test.
> * gcc.dg/tree-ssa/ssa-dse-32.c: New test.


[PATCH] Merge SLP and non-SLP vectorization costing

2018-05-14 Thread Richard Biener

One difficulty in vectorizer costing is currently that SLP and non-SLP
go a different path and while the latter is costed during vectorizable_*
the former is done in a separate walk over the SLP trees.  This leads
to defects in the former for example costing all "operations" the same
rather than using the proper promote/demote stuff for example.

So this patch is merging both costing paths.  It does so with preserving
any issues that were present before besides those that get magical
treatment (like now using promote/demote stuff).

This should make it possible to compare SLP and non-SLP costs since
they are (should) no longer be apples and oranges.

I've added extra dumping to the actual cost registering function
to aid future testcases that come up with actually fixing SLP costing
issues - most prominently imprecise costing of {permuted,strided}
group loads.

I'm not 100% sure I catched all things that need adjusting, I've
mostly followed hints from the vectorizer testsuite here.  I'm now
doing a proper 3-run of SPEC CPU 2006 with/without the patch
(a previous 1-run didn't come up with anything major).

As this may suggest this is a first patch in a series that is
supposed to improve costing and make costings comparable so we
can decide on vector variants by means of costs.

Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress
after a few last minute changes.

Comments welcome.

Thanks,
Richard.

>From 5521b0bde5bd354869997017744d5cb7f9a3a391 Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Fri, 4 May 2018 14:25:46 +0200
Subject: [PATCH] Merge SLP and non-SLP costing

2018-05-14  Richard Biener  

* tree-vectorizer.h (struct stmt_info_for_cost): Add where member.
(dump_stmt_cost): Declare.
(add_stmt_cost): Dump cost we add.
(add_stmt_costs): New function.
(vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost):
No longer exported.
(vect_analyze_stmt): Adjust prototype.
(vectorizable_condition): Likewise.
(vectorizable_live_operation): Likewise.
(vectorizable_reduction): Likewise.
(vectorizable_induction): Likewise.
* tree-vect-loop.c (vect_analyze_loop_operations): Create local
cost vector to pass to vectorizable_ and record afterwards.
(vect_model_reduction_cost): Take cost vector argument and adjust.
(vect_model_induction_cost): Likewise.
(vectorizable_reduction): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-slp.c (vect_create_new_slp_node): Initialize
SLP_TREE_NUMBER_OF_VEC_STMTS.
(vect_analyze_slp_cost_1): Remove.
(vect_analyze_slp_cost): Likewise.
(vect_slp_analyze_node_operations): Take visited args and
a target cost vector.  Avoid processing already visited stmt sets.
(vect_slp_analyze_operations): Use a local cost vector to gather
costs and register those of non-discarded instances.
(vect_bb_vectorization_profitable_p): Use add_stmt_costs.
(vect_schedule_slp_instance): Remove copying of
SLP_TREE_NUMBER_OF_VEC_STMTS.  Instead assert that it is not
zero.
* tree-vect-stmts.c (record_stmt_cost): Remove path directly
adding cost.  Record cost entry location.
(vect_prologue_cost_for_slp_op): Function to compute cost of
a constant or invariant generated for SLP vect in the prologue,
split out from vect_analyze_slp_cost_1.
(vect_model_simple_cost): Make static.  Adjust for SLP costing.
(vect_model_promotion_demotion_cost): Likewise.
(vect_model_store_cost): Likewise, make static.
(vect_model_load_cost): Likewise.
(vectorizable_bswap): Add cost vector arg and adjust.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.
(can_vectorize_live_stmts): Likewise.
(vect_analyze_stmt): Likewise.
(vect_transform_stmt): Adjust calls to vectorizable_*.
* tree-vectorizer.c: Include gimple-pretty-print.h.
(dump_stmt_cost): New function.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 4ce721ed478..fdf0d9c481a 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1831,6 +1831,9 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
 dump_printf_loc (MSG_NOTE, vect_location,
 "=== vect_analyze_loop_operations ===\n");
 
+  stmt_vector_for_cost cost_vec;
+  cost_vec.create (2);
+
   for (i = 0; i < nbbs; i++)
 {
  

[PATCH] Adjust match.pd :s docs

2018-05-14 Thread Richard Biener

Committed.

Richard.

2018-05-14  Richard Biener  

* doc/match-and-simplify.texi: Adjust :s documentation.

diff --git a/gcc/doc/match-and-simplify.texi b/gcc/doc/match-and-simplify.texi
index 6bc2c47bee7..024d747cafd 100644
--- a/gcc/doc/match-and-simplify.texi
+++ b/gcc/doc/match-and-simplify.texi
@@ -250,7 +250,9 @@ come second in commutative expressions.
 
 The second supported flag is @code{s} which tells the code
 generator to fail the pattern if the expression marked with
-@code{s} does have more than one use.  For example in
+@code{s} does have more than one use and the simplification
+results in an expression with more than one operator.
+For example in
 
 @smallexample
 (simplify
@@ -261,6 +263,14 @@ generator to fail the pattern if the expression marked with
 this avoids the association if @code{(pointer_plus @@0 @@1)} is
 used outside of the matched expression and thus it would stay
 live and not trivially removed by dead code elimination.
+Now consider @code{((x + 3) + -3)} with the temporary
+holding @code{(x + 3)} used elsewhere.  This simplifies down
+to @code{x} which is desirable and thus flagging with @code{s}
+does not prevent the transform.  Now consider @code{((x + 3) + 1)}
+which simplifies to @code{(x + 4)}.  Despite being flagged with
+@code{s} the simplification will be performed.  The
+simplification of @code{((x + a) + 1)} to @code{(x + (a + 1))} will
+not performed in this case though.
 
 More features exist to avoid too much repetition.
 


Re: PR85734

2018-05-14 Thread Richard Biener
On Mon, 14 May 2018, Prathamesh Kulkarni wrote:

> On 14 May 2018 at 14:46, Richard Biener  wrote:
> > On Mon, 14 May 2018, Prathamesh Kulkarni wrote:
> >
> >> Hi,
> >> The attached patch tries to fix PR85734, by gating on
> >> !function_always_visible_to_compiler_p.
> >> Bootstrap+test in progress on x86_64.
> >> OK to commit if passes ?
> >
> > This looks redundant - suggest_attribute does that very same check
> > but warn_function_malloc passes it 'false' as known_finite
> > (warn_function_noreturn passes it true for example).
> Indeed, thanks for pointing it out.
> >
> > So I suggest to simply pass true as that arg.
> Is the attached version OK to commit after bootstrap+test ?

Yes.
Richard.

> Thanks,
> Prathamesh
> >
> > Richard
> >
> >> Thanks,
> >> Prathamesh
> >>
> >
> > --
> > Richard Biener 
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> > 21284 (AG Nuernberg)
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: PR85734

2018-05-14 Thread Prathamesh Kulkarni
On 14 May 2018 at 14:46, Richard Biener  wrote:
> On Mon, 14 May 2018, Prathamesh Kulkarni wrote:
>
>> Hi,
>> The attached patch tries to fix PR85734, by gating on
>> !function_always_visible_to_compiler_p.
>> Bootstrap+test in progress on x86_64.
>> OK to commit if passes ?
>
> This looks redundant - suggest_attribute does that very same check
> but warn_function_malloc passes it 'false' as known_finite
> (warn_function_noreturn passes it true for example).
Indeed, thanks for pointing it out.
>
> So I suggest to simply pass true as that arg.
Is the attached version OK to commit after bootstrap+test ?

Thanks,
Prathamesh
>
> Richard
>
>> Thanks,
>> Prathamesh
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)
2018-05-14  Prathamesh Kulkarni  

PR ipa/85734
* ipa-pure-const.c (warn_function_malloc): Pass value of known_finite 
param
as true in call to suggest_attribute.

testsuite/
* gcc.dg/ipa/pr85734.c: New test.

diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index a80b6845633..7665358a9a5 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -249,7 +249,7 @@ warn_function_malloc (tree decl)
   static hash_set *warned_about;
   warned_about
 = suggest_attribute (OPT_Wsuggest_attribute_malloc, decl,
-false, warned_about, "malloc");
+true, warned_about, "malloc");
 }
 
 /* Emit suggestion about __attribute__((noreturn)) for DECL.  */
diff --git a/gcc/testsuite/gcc.dg/ipa/pr85734.c 
b/gcc/testsuite/gcc.dg/ipa/pr85734.c
new file mode 100644
index 000..e5fa21f0548
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr85734.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wsuggest-attribute=malloc" } */
+
+__attribute__((noinline))
+static void *f1(__SIZE_TYPE__ sz) /* { dg-bogus "function might be candidate 
for attribute 'malloc'" } */
+{
+  return __builtin_malloc (sz);
+}
+
+__attribute__((noinline))
+static void *f2(__SIZE_TYPE__ sz) /* { dg-bogus "function might be candidate 
for attribute 'malloc'" } */
+{
+  return f1 (sz);
+}
+
+void *f3(__SIZE_TYPE__ sz) /* { dg-warning "function might be candidate for 
attribute 'malloc'" } */
+{
+  return f2(sz);
+}


Re: PR85734

2018-05-14 Thread Richard Biener
On Mon, 14 May 2018, Prathamesh Kulkarni wrote:

> Hi,
> The attached patch tries to fix PR85734, by gating on
> !function_always_visible_to_compiler_p.
> Bootstrap+test in progress on x86_64.
> OK to commit if passes ?

This looks redundant - suggest_attribute does that very same check
but warn_function_malloc passes it 'false' as known_finite
(warn_function_noreturn passes it true for example).

So I suggest to simply pass true as that arg.

Richard

> Thanks,
> Prathamesh
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


PR85734

2018-05-14 Thread Prathamesh Kulkarni
Hi,
The attached patch tries to fix PR85734, by gating on
!function_always_visible_to_compiler_p.
Bootstrap+test in progress on x86_64.
OK to commit if passes ?

Thanks,
Prathamesh
2018-05-14  Prathamesh Kulkarni  

PR ipa/85734
* ipa-pure-const.c (propagate_malloc): Gate call towarn_function_malloc 
on !funcion_always_visible_to_compiler_p.
(pass_local_pure_const::execute): Likewise.

testsuite/
* gcc.dg/ipa/pr85734.c: New test.

diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index a80b6845633..ce8028c1639 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -1987,7 +1987,8 @@ propagate_malloc (void)
 
bool malloc_decl_p = DECL_IS_MALLOC (node->decl);
node->set_malloc_flag (true);
-   if (!malloc_decl_p && warn_suggest_attribute_malloc)
+   if (!malloc_decl_p && !function_always_visible_to_compiler_p 
(node->decl)
+   && warn_suggest_attribute_malloc)
warn_function_malloc (node->decl);
  }
   }
@@ -2221,7 +,8 @@ pass_local_pure_const::execute (function *fun)
   && !DECL_IS_MALLOC (current_function_decl))
 {
   node->set_malloc_flag (true);
-  if (warn_suggest_attribute_malloc)
+  if (warn_suggest_attribute_malloc
+ && !function_always_visible_to_compiler_p (current_function_decl))
warn_function_malloc (node->decl);
   changed = true;
   if (dump_file)
diff --git a/gcc/testsuite/gcc.dg/ipa/pr85734.c 
b/gcc/testsuite/gcc.dg/ipa/pr85734.c
new file mode 100644
index 000..e5fa21f0548
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr85734.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wsuggest-attribute=malloc" } */
+
+__attribute__((noinline))
+static void *f1(__SIZE_TYPE__ sz) /* { dg-bogus "function might be candidate 
for attribute 'malloc'" } */
+{
+  return __builtin_malloc (sz);
+}
+
+__attribute__((noinline))
+static void *f2(__SIZE_TYPE__ sz) /* { dg-bogus "function might be candidate 
for attribute 'malloc'" } */
+{
+  return f1 (sz);
+}
+
+void *f3(__SIZE_TYPE__ sz) /* { dg-warning "function might be candidate for 
attribute 'malloc'" } */
+{
+  return f2(sz);
+}


Re: [PATCH 1/2] gcc_qsort: source code changes

2018-05-14 Thread Richard Biener
On Mon, May 14, 2018 at 8:37 AM, Alexander Monakov  wrote:
> On Sun, 13 May 2018, H.J. Lu wrote:
>> This breaks bootstrap on Fedora 28/i686:
>>
>> https://gcc.gnu.org/ml/gcc-regression/2018-05/msg00088.html
>>
>> ../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
>>  REORDER_45 (8, 8, 0);
>>  ^~
>> ../../src-trunk/gcc/sort.cc:100:10: error: ‘void* memcpy(void*, const
>> void*, size_t)’ forming offset [5, 8] is out of the bounds [0, 4] of
>> object ‘t2’ with type ‘size_t’ {aka ‘unsigned int’}
>> [-Werror=array-bounds]
>>memcpy (, e2 + OFFSET, SIZE);  \
>>~~~^~~~
>
> Hm, on 32-bit this is trivially dead code, I wonder why we issue the warning?
>
> In any case, due to PR 85757 it's desirable to use types with sizes matching
> the memcpy size; is the following OK to apply? Bootstrapped on 32-bit x86.

OK.

Richard.

> * sort.cc (REORDER_23): Pass the type for the temporaries instead of
> intended memcpy size.
> (REORDER_45): Likewise.
> ---
>  gcc/sort.cc | 72 
> ++---
>  1 file changed, 36 insertions(+), 36 deletions(-)
>
> diff --git a/gcc/sort.cc b/gcc/sort.cc
> index 4faf6d45dc6..c41683c91dd 100644
> --- a/gcc/sort.cc
> +++ b/gcc/sort.cc
> @@ -62,29 +62,29 @@ struct sort_ctx
>  static void
>  reorder23 (sort_ctx *c, char *e0, char *e1, char *e2)
>  {
> -#define REORDER_23(SIZE, STRIDE, OFFSET)\
> -do {\
> -  size_t t0, t1;\
> -  memcpy (, e0 + OFFSET, SIZE);  \
> -  memcpy (, e1 + OFFSET, SIZE);  \
> -  char *out = c->out + OFFSET;  \
> -  if (likely (c->n == 3))   \
> -memcpy (out + 2*STRIDE, e2 + OFFSET, SIZE); \
> -  memcpy (out, , SIZE); out += STRIDE;   \
> -  memcpy (out, , SIZE);  \
> +#define REORDER_23(TYPE, STRIDE, OFFSET) \
> +do { \
> +  TYPE t0, t1;   \
> +  memcpy (, e0 + OFFSET, sizeof (TYPE));  \
> +  memcpy (, e1 + OFFSET, sizeof (TYPE));  \
> +  char *out = c->out + OFFSET;   \
> +  if (likely (c->n == 3))\
> +memcpy (out + 2*STRIDE, e2 + OFFSET, sizeof (TYPE)); \
> +  memcpy (out, , sizeof (TYPE)); out += STRIDE;   \
> +  memcpy (out, , sizeof (TYPE));  \
>  } while (0)
>
> -  if (sizeof (size_t) == 8 && likely (c->size == 8))
> -REORDER_23 (8, 8, 0);
> -  else if (likely (c->size == 4))
> -REORDER_23 (4, 4, 0);
> +  if (likely (c->size == sizeof (size_t)))
> +REORDER_23 (size_t, sizeof (size_t), 0);
> +  else if (likely (c->size == sizeof (int)))
> +REORDER_23 (int, sizeof (int), 0);
>else
>  {
>size_t offset = 0, step = sizeof (size_t);
>for (; offset + step <= c->size; offset += step)
> -   REORDER_23 (step, c->size, offset);
> +   REORDER_23 (size_t, c->size, offset);
>for (; offset < c->size; offset++)
> -   REORDER_23 (1, c->size, offset);
> +   REORDER_23 (char, c->size, offset);
>  }
>  }
>
> @@ -92,33 +92,33 @@ do {\
>  static void
>  reorder45 (sort_ctx *c, char *e0, char *e1, char *e2, char *e3, char *e4)
>  {
> -#define REORDER_45(SIZE, STRIDE, OFFSET)\
> -do {\
> -  size_t t0, t1, t2, t3;\
> -  memcpy (, e0 + OFFSET, SIZE);  \
> -  memcpy (, e1 + OFFSET, SIZE);  \
> -  memcpy (, e2 + OFFSET, SIZE);  \
> -  memcpy (, e3 + OFFSET, SIZE);  \
> -  char *out = c->out + OFFSET;  \
> -  if (likely (c->n == 5))   \
> -memcpy (out + 4*STRIDE, e4 + OFFSET, SIZE); \
> -  memcpy (out, , SIZE); out += STRIDE;   \
> -  memcpy (out, , SIZE); out += STRIDE;   \
> -  memcpy (out, , SIZE); out += STRIDE;   \
> -  memcpy (out, , SIZE);  \
> +#define REORDER_45(TYPE, STRIDE, OFFSET) \
> +do { \
> +  TYPE t0, t1, t2, t3;   \
> +  memcpy (, e0 + OFFSET, sizeof (TYPE));  \
> +  memcpy (, e1 + OFFSET, sizeof (TYPE));  \
> +  memcpy (, e2 + OFFSET, sizeof (TYPE));  \
> +  memcpy (, e3 + OFFSET, sizeof (TYPE));  \
> +  char *out = c->out + OFFSET;   \
> +  if (likely (c->n == 5))\
> +memcpy (out + 4*STRIDE, e4 + OFFSET, sizeof (TYPE)); \
> +  memcpy (out, , sizeof (TYPE)); out += STRIDE;   \
> +  memcpy (out, , sizeof (TYPE)); out += STRIDE;   \
> +  memcpy (out, , sizeof (TYPE)); out += STRIDE;   \
> +  memcpy (out, , 

Re: [PATCH 1/2] gcc_qsort: source code changes

2018-05-14 Thread Alexander Monakov
On Sun, 13 May 2018, H.J. Lu wrote:
> This breaks bootstrap on Fedora 28/i686:
> 
> https://gcc.gnu.org/ml/gcc-regression/2018-05/msg00088.html
> 
> ../../src-trunk/gcc/sort.cc:112:5: note: in expansion of macro ‘REORDER_45’
>  REORDER_45 (8, 8, 0);
>  ^~
> ../../src-trunk/gcc/sort.cc:100:10: error: ‘void* memcpy(void*, const
> void*, size_t)’ forming offset [5, 8] is out of the bounds [0, 4] of
> object ‘t2’ with type ‘size_t’ {aka ‘unsigned int’}
> [-Werror=array-bounds]
>memcpy (, e2 + OFFSET, SIZE);  \
>~~~^~~~

Hm, on 32-bit this is trivially dead code, I wonder why we issue the warning?

In any case, due to PR 85757 it's desirable to use types with sizes matching
the memcpy size; is the following OK to apply? Bootstrapped on 32-bit x86.

* sort.cc (REORDER_23): Pass the type for the temporaries instead of
intended memcpy size.
(REORDER_45): Likewise.
---
 gcc/sort.cc | 72 ++---
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/gcc/sort.cc b/gcc/sort.cc
index 4faf6d45dc6..c41683c91dd 100644
--- a/gcc/sort.cc
+++ b/gcc/sort.cc
@@ -62,29 +62,29 @@ struct sort_ctx
 static void
 reorder23 (sort_ctx *c, char *e0, char *e1, char *e2)
 {
-#define REORDER_23(SIZE, STRIDE, OFFSET)\
-do {\
-  size_t t0, t1;\
-  memcpy (, e0 + OFFSET, SIZE);  \
-  memcpy (, e1 + OFFSET, SIZE);  \
-  char *out = c->out + OFFSET;  \
-  if (likely (c->n == 3))   \
-memcpy (out + 2*STRIDE, e2 + OFFSET, SIZE); \
-  memcpy (out, , SIZE); out += STRIDE;   \
-  memcpy (out, , SIZE);  \
+#define REORDER_23(TYPE, STRIDE, OFFSET) \
+do { \
+  TYPE t0, t1;   \
+  memcpy (, e0 + OFFSET, sizeof (TYPE));  \
+  memcpy (, e1 + OFFSET, sizeof (TYPE));  \
+  char *out = c->out + OFFSET;   \
+  if (likely (c->n == 3))\
+memcpy (out + 2*STRIDE, e2 + OFFSET, sizeof (TYPE)); \
+  memcpy (out, , sizeof (TYPE)); out += STRIDE;   \
+  memcpy (out, , sizeof (TYPE));  \
 } while (0)
 
-  if (sizeof (size_t) == 8 && likely (c->size == 8))
-REORDER_23 (8, 8, 0);
-  else if (likely (c->size == 4))
-REORDER_23 (4, 4, 0);
+  if (likely (c->size == sizeof (size_t)))
+REORDER_23 (size_t, sizeof (size_t), 0);
+  else if (likely (c->size == sizeof (int)))
+REORDER_23 (int, sizeof (int), 0);
   else
 {
   size_t offset = 0, step = sizeof (size_t);
   for (; offset + step <= c->size; offset += step)
-   REORDER_23 (step, c->size, offset);
+   REORDER_23 (size_t, c->size, offset);
   for (; offset < c->size; offset++)
-   REORDER_23 (1, c->size, offset);
+   REORDER_23 (char, c->size, offset);
 }
 }
 
@@ -92,33 +92,33 @@ do {\
 static void
 reorder45 (sort_ctx *c, char *e0, char *e1, char *e2, char *e3, char *e4)
 {
-#define REORDER_45(SIZE, STRIDE, OFFSET)\
-do {\
-  size_t t0, t1, t2, t3;\
-  memcpy (, e0 + OFFSET, SIZE);  \
-  memcpy (, e1 + OFFSET, SIZE);  \
-  memcpy (, e2 + OFFSET, SIZE);  \
-  memcpy (, e3 + OFFSET, SIZE);  \
-  char *out = c->out + OFFSET;  \
-  if (likely (c->n == 5))   \
-memcpy (out + 4*STRIDE, e4 + OFFSET, SIZE); \
-  memcpy (out, , SIZE); out += STRIDE;   \
-  memcpy (out, , SIZE); out += STRIDE;   \
-  memcpy (out, , SIZE); out += STRIDE;   \
-  memcpy (out, , SIZE);  \
+#define REORDER_45(TYPE, STRIDE, OFFSET) \
+do { \
+  TYPE t0, t1, t2, t3;   \
+  memcpy (, e0 + OFFSET, sizeof (TYPE));  \
+  memcpy (, e1 + OFFSET, sizeof (TYPE));  \
+  memcpy (, e2 + OFFSET, sizeof (TYPE));  \
+  memcpy (, e3 + OFFSET, sizeof (TYPE));  \
+  char *out = c->out + OFFSET;   \
+  if (likely (c->n == 5))\
+memcpy (out + 4*STRIDE, e4 + OFFSET, sizeof (TYPE)); \
+  memcpy (out, , sizeof (TYPE)); out += STRIDE;   \
+  memcpy (out, , sizeof (TYPE)); out += STRIDE;   \
+  memcpy (out, , sizeof (TYPE)); out += STRIDE;   \
+  memcpy (out, , sizeof (TYPE));  \
 } while (0)
 
-  if (sizeof (size_t) == 8 && likely (c->size == 8))
-REORDER_45 (8, 8, 0);
-  else if (likely(c->size == 4))
-REORDER_45 (4, 4, 0);
+  if (likely (c->size == sizeof (size_t)))
+REORDER_45 (size_t, sizeof (size_t), 0);
+