date:20161118

On 11/19/16 00:52, Martin Sebor wrote:
> If you or others are concerned about the rate of false positives
> of this warning please point me at a code base that might be a good
> test bed to to try it on.  Besides GCC I've built Binutils and the
> Linux kernel with just the 5 instances in GCC.
>

you should also include glibc and busybox, to be sure.


Bernd.

Re: [PATCH] avoid calling alloca(0)

On 11/19/16 00:52, Martin Sebor wrote:
> On 11/18/2016 03:51 PM, Bernd Edlinger wrote:
>>  > of the builtin (the function is not declared without attribute
>>  > alloc_size, at least in Glibc, but GCC still expands it inline).
>>  > This is as simple as enclosing alloca in parentheses:
>>  >
>>  >   void *p = (alloca)(n);
>>  >
>>  > Ugly?  Perhaps.  One might say that code that does tricky or
>>
>> No. I doubt that will work, unless you use -ffreestanding on the
>> command line.
>
> It works because "__builtin_alloca" is distinct from "alloca" and
> the warning code can simply compare the strings, just as you say
> you did below.  It's not ideal and I would prefer to avoid it but
> I offer it as another solution if the performance overhead of
> (n + 1) is thought to be too high.
>

So far I thought the warning is trying to make no differences between
malloc, realloc and alloca.

I would say that using realloc(x,0) is for sure always wrong.
Nobody will object against a warning for that.

And yes, malloc(0) is unportable, but if the result is not used and
directly fed into free that should be no problem, but a warning would
be also helpful because it is unportable code.  But I hope that it is
not the same as with the false positive array bounds warnings where
the compiler first transforms the source code into something completely
different and then starts to warn about it.

Regarding alloca, there are probably three different forms.

First a function that is not a builtin but has the name "alloca"
like special_function_p understands it.  It has no attributes
unless the header mentions them.  Calling this function with
0 should not be warned about, right?

Then a function that has the name "alloca" and matches the signature
of the builtin "alloca" function.

And last a function that has the name "__builtin_alloca".

You can distinguish between these, and possibly only warn for
__builtin_alloca?  Note, that will depend on the way the glibc
header works.

Everything would be more easy if the glibc header would not do
that, and just use the alloca and no macro.  Then it would be
more natural to warn about alloca and not about __builtin_alloca,
because that is always implemented in a sensible way.

But even if the __builtin_alloca is called with 0, what do we
do with the result?  I mean any use of the value would be wrong.
So why is there a warning, when the value is not used?

Bernd.

Re: [PATCH] [AArch64] Fix PR78382

2016-11-18 Thread kugan


Hi Naveen,

On 18/11/16 16:30, Hurugalawadi, Naveen wrote:

@@ -1374,10 +1374,17 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 case SYMBOL_SMALL_TLSGD:
   {
rtx_insn *insns;
-   rtx result = gen_rtx_REG (Pmode, R0_REGNUM);
+   rtx result;
+   if (TARGET_ILP32)
+ result = gen_rtx_REG (SImode, R0_REGNUM);
+   else
+ result = gen_rtx_REG (DImode, R0_REGNUM);


Why don't you use the mode of dest as done in other similar places. Like:

+   machine_mode mode = GET_MODE (dest);
+   rtx result = gen_rtx_REG (mode, R0_REGNUM);


Thanks,
Kugan

[PATCH, committed] TILEPro: link against libgcc.a when creating shared libraries

2016-11-18 Thread Walter Lee

This patch forces gcc to link against libgcc.a when creating shared
libraries, needed for 64-bit multiplies.

Bootstrapped and tested on tilepro hardware, also backported to GCC 6.

2016-11-18  Walter Lee  

* config.host (tilepro*-*-linux*): Add t-slibgcc-libgcc.

diff --git a/libgcc/config.host b/libgcc/config.host
index 64beb21..e7e5413 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1267,7 +1267,7 @@ tilegx*-*-linux*)
md_unwind_header=tilepro/linux-unwind.h
 ;;
 tilepro*-*-linux*)
-   tmake_file="${tmake_file} tilepro/t-crtstuff t-softfp-sfdf t-softfp
tilepro/t-tilepro"
+   tmake_file="${tmake_file} tilepro/t-crtstuff t-softfp-sfdf t-softfp
tilepro/t-tilepro t-slibgcc-libgcc"
md_unwind_header=tilepro/linux-unwind.h
 ;;
 v850*-*-*)

[PATCH, committed] TILE-Gx: fix barrier bundling

2016-11-18 Thread Walter Lee

This patch fixes a bundling bug.  When there are consecutive barriers,
the end-of-bundle marker of the last barrier is getting dropped.

Bootstrapped and tested on tilegx hardware, also backported to GCC 6.

2016-11-18  Walter Lee  

* config/tilegx/tilegx.c (tilegx_gen_bundles): Preserve
  end-of-bundle marker for consecutive barriers.

diff --git a/gcc/config/tilegx/tilegx.c b/gcc/config/tilegx/tilegx.c
index 76a7455..0403e8e 100644
--- a/gcc/config/tilegx/tilegx.c
+++ b/gcc/config/tilegx/tilegx.c
@@ -4469,8 +4469,7 @@ tilegx_gen_bundles (void)
   rtx_insn *end = NEXT_INSN (BB_END (bb));

   prev = NULL;
-  for (insn = next_insn_to_bundle (BB_HEAD (bb), end); insn;
-  prev = insn, insn = next)
+  for (insn = next_insn_to_bundle (BB_HEAD (bb), end); insn; insn = next)
{
  next = next_insn_to_bundle (NEXT_INSN (insn), end);

@@ -4506,7 +4505,11 @@ tilegx_gen_bundles (void)
PUT_MODE (prev, QImode);
  }
delete_insn (insn);
+
+// Note: prev remains the same for next iteration.
  }
+  else
+prev = insn;
}
 }
 }

[PATCH, committed] TILE-Gx: fix clzsi2 for big-endian

2016-11-18 Thread Walter Lee

This patch fixes the clzsi2 pattern, which was broken for big-endian.

Bootstrapped and tested on tilegx hardware, also backported to GCC 6.

2016-11-18  Walter Lee  

* config/tilegx/tilegx.md (clzsi2): Fix for big-endian.

--- a/gcc/config/tilegx/tilegx.md
+++ b/gcc/config/tilegx/tilegx.md
@@ -1798,19 +1798,20 @@
   [(set_attr "type" "Y0")])

 (define_expand "clzsi2"
-  [(set (match_dup 2)
-   (zero_extend:DI (match_operand:SI 1 "reg_or_0_operand" "")))
-   (set (match_dup 2)
-   (ashift:DI (match_dup 2)
-   (const_int 32)))
-   (set (match_dup 2)
-   (clz:DI (match_dup 2)))
-   (set (match_operand:SI 0 "register_operand" "")
-   (subreg:SI (match_dup 2) 0))]
-   ""
-   {
- operands[2] = gen_reg_rtx (DImode);
-   })
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (clz:SI (match_operand:SI 1 "reg_or_0_operand" "rO")))]
+  ""
+  {
+rtx tmp1 = gen_reg_rtx (DImode);
+rtx tmp2 = gen_reg_rtx (DImode);
+rtx tmp3 = gen_reg_rtx (DImode);
+
+emit_insn (gen_zero_extendsidi2 (tmp1, operands[1]));
+emit_insn (gen_ashldi3 (tmp2, tmp1, (GEN_INT (32;
+emit_insn (gen_clzdi2 (tmp3, tmp2));
+emit_move_insn (operands[0], gen_lowpart (SImode, tmp3));
+DONE;
+  })

 (define_insn "ctz2"
   [(set (match_operand:I48MODE 0 "register_operand" "=r")

Re: [PATCH, C++] Warn on redefinition of builtin functions (PR c++/71973)

On 11/18/16 23:15, Bernd Edlinger wrote:
>>> -  TREE_NOTHROW (olddecl) = 0;
>>> +  TREE_NOTHROW (olddecl) = TREE_NOTHROW (newdecl);
>>
>> I still think a better fix would be to add a copy of TREE_NOTHROW to the
>> else block of the if (types_match), to go with the existing copies of
>> TREE_READONLY and TREE_THIS_VOLATILE.
>>

Hmm, looking at it again, I start to think you are right.

It looks like it is *not* possible that something other than
a builtin function can reach the bottom of that function with
!types_match.

The upper half of duplicate_decls is structured like this:

   /* Check for redeclaration and other discrepancies.  */
   if (TREE_CODE (olddecl) == FUNCTION_DECL
   && DECL_ARTIFICIAL (olddecl))
 {
 if (TREE_CODE (newdecl) != FUNCTION_DECL)
   {

 return NULL_TREE;
   }

 TREE_NOTHROW (olddecl) = 0;

 }
   else if (TREE_CODE (olddecl) != TREE_CODE (newdecl))
 {

 return error_mark_node;
 }
   else if (!types_match)
 {
 ...
 if (TREE_CODE (newdecl) == TEMPLATE_DECL)
   {

 return NULL_TREE;
   }
 if (TREE_CODE (newdecl) == FUNCTION_DECL)
   {
  if (DECL_EXTERN_C_P (newdecl) && DECL_EXTERN_C_P (olddecl))
{
   ...
   return NULL_TREE;
}
  else if (...)
{

return error_mark_node;
}
  else
return NULL_TREE;
}
  else
{
  ...
  return error_mark_node;
}
  }
else if ()

So in the case types_match=true, it can neither be that olddecl
is something other than a builtin function decl,
nor can newdecl be something other than a function decl.
Everything else makes the function return already here.

Bernd.

[gomp4] remove use of CUDA unified memory in libgomp

2016-11-18 Thread Cesar Philippidis

This patch eliminates the use of CUDA unified shared memory via cuMemcpy
inside nvptx_exec. The major problem with unified memory is that the
CUDA driver needs to copy all of the host addresses to a special data
region prior to transferring the data to and from the accelerator. I'm
not sure why the use CUDA unified shared memory is necessary here
because libgomp already has the necessary machinery to manage the data
mappings itself directly.

The only usage of CUDA unified memory occurs inside nvptx_exec.
Specifically, that function uses 'dp', which is managed by the map_*
functions, as a staging area to upload the omp struct arguments to the
PTX kernel. This particular use of unified memory is particularly
unfortunately as it imposes a synchronization barrier each time a PTX
kernel is launched.

At first I tried to eliminate those map* functions altogether, but that
caused failures in asyncwait-1.c. This happens because the device memory
used by async PTX kernel X may be overridden by async PTX kernel Y.
Apparently, the map functions try to rectify this situation by
implementing a pseudo circular queue in a paged sized block of memory.
However, that queue implementation is fragile because map_pop 'frees'
all of the allocations from the queue, not just the memory for the most
recently used kernel.

This patch does two things. First, it replaces the call to cuMemcpy
inside nvptx_exec with cuMemcpyHtoD. Second, it teaches the map routines
how to use a linked-list data structure to manage asynchronous PTX
stream arguments. As an optimization map_init reserves a large block
of memory for the first cuda_map so that non-async functions do not have
to waste time allocating device memory. Additional cuda_maps are created
for each async PTX stream with the requested SIZE as necessary.

Because of the asynchronous nature, the map routines need to be guarded
by a pthread lock. map_init and map_fini are already guarded by
ptx_dev_lock, but map_push and map_pop need to be locked by ptx_event_lock.

I don't really like the use of malloc for the cuda_maps, but that can be
optimized with a different data structure later.

I've applied this patch to gomp-4_0-branch.

Cesar



2016-11-18  Cesar Philippidis  

	libgomp/
	* plugin/plugin-nvptx.c (struct cuda_map): New.
	(struct ptx_stream): Replace d, h, h_begin, h_end, h_next, h_prev,
	h_tail with (cuda_map *) map.
	(cuda_map_create): New function.
	(cuda_map_destroy): New function.
	(map_init): Update to use a linked list of cuda_map objects.
	(map_fini): Likewise.
	(map_pop): Likewise.
	(map_push): Likewise.  Return CUdeviceptr instead of void.
	(init_streams_for_device): Remove stales references to ptx_stream
	members.
	(select_stream_for_async): Likewise.
	(nvptx_exec): Update call to map_init.


diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index e4fcc0e..c435012 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -95,20 +95,20 @@ cuda_error (CUresult r)
 static unsigned int instantiated_devices = 0;
 static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct cuda_map
+{
+  CUdeviceptr d;
+  size_t size;
+  bool active;
+  struct cuda_map *next;
+};
+
 struct ptx_stream
 {
   CUstream stream;
   pthread_t host_thread;
   bool multithreaded;
-
-  CUdeviceptr d;
-  void *h;
-  void *h_begin;
-  void *h_end;
-  void *h_next;
-  void *h_prev;
-  void *h_tail;
-
+  struct cuda_map *map;
   struct ptx_stream *next;
 };
 
@@ -120,101 +120,114 @@ struct nvptx_thread
   struct ptx_device *ptx_dev;
 };
 
+static struct cuda_map *
+cuda_map_create (size_t size)
+{
+  struct cuda_map *map = GOMP_PLUGIN_malloc (sizeof (struct cuda_map));
+
+  assert (map);
+
+  map->next = NULL;
+  map->size = size;
+  map->active = false;
+
+  CUDA_CALL_ERET (NULL, cuMemAlloc, >d, size);
+  assert (map->d);
+
+  return map;
+}
+
+static void
+cuda_map_destroy (struct cuda_map *map)
+{
+  CUDA_CALL_ASSERT (cuMemFree, map->d);
+  free (map);
+}
+
+/* The following map_* routines manage the CUDA device memory that
+   contains the data mapping arguments for cuLaunchKernel.  Each
+   asynchronous PTX stream may have multiple pending kernel
+   invocations, which are launched in a FIFO order.  As such, the map
+   routines maintains a queue of cuLaunchKernel arguments.
+
+   Calls to map_push and map_pop must be guarded by ptx_event_lock.
+   Likewise, calls to map_init and map_fini are guarded by
+   ptx_dev_lock inside GOMP_OFFLOAD_init_device and
+   GOMP_OFFLOAD_fini_device, respectively.  */
+
 static bool
 map_init (struct ptx_stream *s)
 {
   int size = getpagesize ();
 
   assert (s);
-  assert (!s->d);
-  assert (!s->h);
-
-  CUDA_CALL (cuMemAllocHost, >h, size);
-  CUDA_CALL (cuMemHostGetDevicePointer, >d, s->h, 0);
 
-  assert (s->h);
+  s->map = cuda_map_create (size);
 
-  s->h_begin = s->h;
-  s->h_end = s->h_begin + size;
-  s->h_next = s->h_prev = s->h_tail = s->h_begin;
-
-  assert

[committed] Fix g++.dg/cpp1y/pr68180.C on i686-linux

Hi!

I've noticed in i686-linux bootstrap:
FAIL: g++.dg/cpp1y/pr68180.C  -std=c++14 (test for excess errors)
This is due to using of vector argument in a function, without
corresponding vector ISA enabled (SSE).

Fixed thusly, committed to trunk as obvious.

2016-11-18  Jakub Jelinek  

PR c++/68180
* g++.dg/cpp1y/pr68180.C: Add -Wno-psabi as dg-additional-options.

--- gcc/testsuite/g++.dg/cpp1y/pr68180.C.jj 2016-11-17 18:08:10.554741910 
+0100
+++ gcc/testsuite/g++.dg/cpp1y/pr68180.C2016-11-18 22:31:18.242429801 
+0100
@@ -1,5 +1,6 @@
 // PR c++/68180
 // { dg-do compile { target c++14 } }
+// { dg-additional-options "-Wno-psabi" }
 
 typedef float __attribute__( ( vector_size( 16 ) ) ) float32x4_t;
 constexpr float32x4_t fill(float x) {

Jakub

Re: [PATCH] avoid calling alloca(0)

On 11/18/2016 03:51 PM, Bernd Edlinger wrote:

 > of the builtin (the function is not declared without attribute
 > alloc_size, at least in Glibc, but GCC still expands it inline).
 > This is as simple as enclosing alloca in parentheses:
 >
 >   void *p = (alloca)(n);
 >
 > Ugly?  Perhaps.  One might say that code that does tricky or

No. I doubt that will work, unless you use -ffreestanding on the
command line.

It works because "__builtin_alloca" is distinct from "alloca" and
the warning code can simply compare the strings, just as you say
you did below.  It's not ideal and I would prefer to avoid it but
I offer it as another solution if the performance overhead of
(n + 1) is thought to be too high.

Although the macro is not used in this case, even a declaraion
like

extern void *alloca (size_t __size);

goes thru the decl-anticipated path, that means it inherits
all the attributes from the builtin, unless the parameter don't
match, in that case you get a warning in C but no warning
in C++ (I am going to change the latter).

There is currently an attribute malloc but no explicit attribute
alloca, that's one of the reasons why special_function_p still
has to do a string compare of the function name aginst "alloca".

I think Jakub is right that we need a better way to fix the
warning for instance with a comment like the fall thru thing.

I don't believe this rare, corner case justifies special treatment.
There are many warnings in both -Wall and -Wextra that are orders
of magnitude noisier and that people find useful nonetheless.  They
don't need a special mechanism to suppress or fine tune beyond what
GCC already provides: an option and a pragma, and neither should
this one.  I certainly don't think that extending the complicated
(and controversial) machinery that was put in place to deal with
the exceedingly noisy fallthrough warning would be worth the effort
or even appropriate.

If you or others are concerned about the rate of false positives
of this warning please point me at a code base that might be a good
test bed to to try it on.  Besides GCC I've built Binutils and the
Linux kernel with just the 5 instances in GCC.

Martin

Re: [PATCH] libiberty: Fix -Wimplicit-fallthrough warnings.

2016-11-18 Thread Allan Sandfeld Jensen

On Wednesday 02 November 2016, Mark Wielaard wrote:
> -case 11: c+=((hashval_t)k[10]<<24);
> -case 10: c+=((hashval_t)k[9]<<16);
> -case 9 : c+=((hashval_t)k[8]<<8);
> +case 11: c+=((hashval_t)k[10]<<24);  /* fall through */
> +case 10: c+=((hashval_t)k[9]<<16);   /* fall through */
> +case 9 : c+=((hashval_t)k[8]<<8);/* fall through */
>/* the first byte of c is reserved for the length */

This really highlights another exception -Wimplicit-fallthough should tolerate 
at least on level 1. Single line of code.

case X: my_statement();
case y: my_statement();

and 
case X:
my_statement();
case y:
my_statement();

In both cases, the lack of break is obvious at a glance and thus a comment 
highlighting it has never been needed and shouldn't be enforced.

Well, at least in my opinion, and it would take away all the rest of false 
positives I run into.

Best regards
`Allan

Re: [PATCH], Tweak PowerPC movdi constraints

On Fri, Nov 18, 2016 at 05:52:12PM -0500, Michael Meissner wrote:
> On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote:
> > On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote:
> > > This patch tweaks the movdi constraints for the PowerPC to use "^" or "$"
> > > constraints instead of "?*".  This allows the register allocator to more 
> > > often
> > > allocate DImode to a floating point/vector register when it is desirable 
> > > to do
> > > so.
> > 
> > It also changes some plain "?" to "^" or "$" or even "*" (which cannot
> > work for multi-character constraints, it skips one character, not one
> > constraint).  Wrong version of the patch?
> 
> Note, if '*' does not work with multi-character prefixes, that is a bug.  All
> of rs6000.md assumes that ?*wa means that the register allocator will not
> consider VSX vector registers for when calculating register preferences.

The documentation is out of date.

>From ira-costs.c:

  /* Scan all the constraint letters.  See if the operand
 matches any of the constraints.  Collect the valid
 register classes and see if this operand accepts
 memory.  */
  while ((c = *p))
{
  switch (c)
{
case '*':
  /* Ignore the next letter for this pass.  */
  c = *++p;
  break;

and then 83 lines later:

  p += CONSTRAINT_LEN (c, p);
  if (c == ',')
break;
}

so it does in fact work.

Neither the patch description nor the changelog says you are doing these
changes though.

> > > I built bootstrap compilers and did make check with no regressions on:
> > > 1)Little endian power8, --with-cpu=power8
> > > 2)Big endian power8, --with-cpu=power8 (no 32-bit support)
> > > 3)Big endian power7, --with-cpu=power7 (both 32/64-bit support)
> > 
> > Could you also test with reload please?  Just LE is enough I guess.
> > We'd like to keep reload working for GCC 7 at least, and these cost
> > prefixes tend to break mov patterns :-/
> 
> Argh, I guess you are right, but then if reload doesn't work, I will likely
> submit the patch where there are three different movdi's (one for 32-bit
> without the change, one for 64-bit with reload, and one for 64-bit with lra).
> I would prefer not to do that.

Let's hope it just works :-)


Segher

Re: [PATCH], Tweak PowerPC movdi constraints

2016-11-18 Thread Michael Meissner

On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote:
> On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote:
> > This patch tweaks the movdi constraints for the PowerPC to use "^" or "$"
> > constraints instead of "?*".  This allows the register allocator to more 
> > often
> > allocate DImode to a floating point/vector register when it is desirable to 
> > do
> > so.
> 
> It also changes some plain "?" to "^" or "$" or even "*" (which cannot
> work for multi-character constraints, it skips one character, not one
> constraint).  Wrong version of the patch?

Note, if '*' does not work with multi-character prefixes, that is a bug.  All
of rs6000.md assumes that ?*wa means that the register allocator will not
consider VSX vector registers for when calculating register preferences.

> > I built bootstrap compilers and did make check with no regressions on:
> > 1)  Little endian power8, --with-cpu=power8
> > 2)  Big endian power8, --with-cpu=power8 (no 32-bit support)
> > 3)  Big endian power7, --with-cpu=power7 (both 32/64-bit support)
> 
> Could you also test with reload please?  Just LE is enough I guess.
> We'd like to keep reload working for GCC 7 at least, and these cost
> prefixes tend to break mov patterns :-/

Argh, I guess you are right, but then if reload doesn't work, I will likely
submit the patch where there are three different movdi's (one for 32-bit
without the change, one for 64-bit with reload, and one for 64-bit with lra).
I would prefer not to do that.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] avoid calling alloca(0)

 > of the builtin (the function is not declared without attribute
 > alloc_size, at least in Glibc, but GCC still expands it inline).
 > This is as simple as enclosing alloca in parentheses:
 >
 >   void *p = (alloca)(n);
 >
 > Ugly?  Perhaps.  One might say that code that does tricky or

No. I doubt that will work, unless you use -ffreestanding on the
command line.

Although the macro is not used in this case, even a declaraion
like

extern void *alloca (size_t __size);

goes thru the decl-anticipated path, that means it inherits
all the attributes from the builtin, unless the parameter don't
match, in that case you get a warning in C but no warning
in C++ (I am going to change the latter).

There is currently an attribute malloc but no explicit attribute
alloca, that's one of the reasons why special_function_p still
has to do a string compare of the function name aginst "alloca".

I think Jakub is right that we need a better way to fix the
warning for instance with a comment like the fall thru thing.

Complicated code changes can also introduce new errors.


Bernd.

Re: [PATCH] substring_loc info needs default track-macro-expansion (PR preprocessor/78324)


On 11/18/2016 03:57 PM, David Malcolm wrote:

On Fri, 2016-11-18 at 09:51 -0700, Martin Sebor wrote:

Martin: are the changes to your test cases OK by you, or is there
a better way to rewrite them?


Thanks for looking into it!

Since the purpose of the test_sprintf_note function in the test is
to verify the location of the caret within the warnings I think we
should keep it if it's possible.  Would either removing the P macro
or moving the function to a different file that doesn't use the
-ftrack-macro-expansion=0 option work?


To get substring locations with the proposed patch, that code will need to
be in a file without -ftrack-macro-expansion=0.

builtin-sprintf-warn-4.c seems to fit the bill, and has caret-printing
enabled too, so here's another attempt at the patch, just covering the
affected test cases, which moves test_sprintf_note to that file, and
drops "P".

The carets/underlines from the warnings look sane, and the test case
verifies that via dg-begin/end-multiline-output directives.  The test
case also verifies the carets/underlins from the *notes*.

[FWIW, I'm less convinced by the carets/underlines from the notes:
they all underline the whole of the __builtin_sprintf expression,
though looking at gimple-ssa-sprintf.c, I see:

  location_t callloc = gimple_location (info.callstmt);

which is used for the "inform" calls in question.  Hence I think
it's faithfully printing what that code is asking it to.  I'd prefer
to not touch that location in this patch, since it seems
orthogonal to fixing the PR preprocessor/78324; perhaps something
to address as part of PR middle-end/77696 ?].

Martin: how does this look to you?  Any objections to this change
as part of my fix for PR preprocessor/78324?


Not at all.  It looks great -- using the multiline output is even
better than the original.  You also noticed the comment about the
caret limitation being out of data and removed it.  Thanks for
that too!

I agree that the underlining in the notes could stand to be
improved at some point.  I'll see if I can get to it sometime
after I'm done with all my pending patches.

Thanks again!
Martin

PS If there's something I can help with while you're working on
the rest of the bug let me know.

[PATCH] Handle EOF in c_parser_parse_rtl_body

On Fri, 2016-11-18 at 22:13 +, Joseph Myers wrote:
> On Fri, 18 Nov 2016, David Malcolm wrote:
>
> > +  /* Consume all tokens, up to the closing brace, handling
> > + matching pairs of braces in the rtl dump.  */
> > +  int num_open_braces = 1;
> > +  while (1)
> > +{
> > +  switch (c_parser_peek_token (parser)->type)
> > +   {
> > +   case CPP_OPEN_BRACE:
> > + num_open_braces++;
> > + break;
> > +   case CPP_CLOSE_BRACE:
> > + if (--num_open_braces == 0)
> > +   goto found_closing_brace;
> > + break;
> > +   default:
> > + break;
> > +   }
> > +  c_parser_consume_token (parser);
> > +}
>
> What if you have an EOF without the close brace being found?  I'd
> expect
> you to hit the
>
>   gcc_assert (parser->tokens[0].type != CPP_EOF);
>
> in c_parser_consume_token.

Oops; thanks.  Here's a patch on top of v5 that (I hope) addresses that.

gcc/c/ChangeLog:
* c-parser.c (c_parser_parse_rtl_body): Handle CPP_EOF.

gcc/testsuite/ChangeLog:
* gcc.dg/rtl/truncated-rtl-file.c: New test case.
---
 gcc/c/c-parser.c  | 4 
 gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c | 2 ++
 2 files changed, 6 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index d645d29..fef882a 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -18326,6 +18326,10 @@ c_parser_parse_rtl_body (c_parser *parser, char 
*start_with_pass)
  if (--num_open_braces == 0)
goto found_closing_brace;
  break;
+   case CPP_EOF:
+ error_at (start_loc, "no closing brace");
+ free (start_with_pass);
+ return;
default:
  break;
}
diff --git a/gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c 
b/gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c
new file mode 100644
index 000..4dd8214
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c
@@ -0,0 +1,2 @@
+void __RTL test (void)
+{ /* { dg-error "no closing brace" } */
-- 
1.8.5.3

Re: [PATCH], Tweak PowerPC movdi constraints

On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote:
> This patch tweaks the movdi constraints for the PowerPC to use "^" or "$"
> constraints instead of "?*".  This allows the register allocator to more often
> allocate DImode to a floating point/vector register when it is desirable to do
> so.

It also changes some plain "?" to "^" or "$" or even "*" (which cannot
work for multi-character constraints, it skips one character, not one
constraint).  Wrong version of the patch?

> I built bootstrap compilers and did make check with no regressions on:
> 1)Little endian power8, --with-cpu=power8
> 2)Big endian power8, --with-cpu=power8 (no 32-bit support)
> 3)Big endian power7, --with-cpu=power7 (both 32/64-bit support)

Could you also test with reload please?  Just LE is enough I guess.
We'd like to keep reload working for GCC 7 at least, and these cost
prefixes tend to break mov patterns :-/

Segher

Re: [PATCH] substring_loc info needs default track-macro-expansion (PR preprocessor/78324)

On Fri, 2016-11-18 at 09:51 -0700, Martin Sebor wrote:
> > Martin: are the changes to your test cases OK by you, or is there
> > a better way to rewrite them?
> 
> Thanks for looking into it!
> 
> Since the purpose of the test_sprintf_note function in the test is
> to verify the location of the caret within the warnings I think we
> should keep it if it's possible.  Would either removing the P macro
> or moving the function to a different file that doesn't use the
> -ftrack-macro-expansion=0 option work?

To get substring locations with the proposed patch, that code will need to
be in a file without -ftrack-macro-expansion=0.

builtin-sprintf-warn-4.c seems to fit the bill, and has caret-printing
enabled too, so here's another attempt at the patch, just covering the
affected test cases, which moves test_sprintf_note to that file, and
drops "P".

The carets/underlines from the warnings look sane, and the test case
verifies that via dg-begin/end-multiline-output directives.  The test
case also verifies the carets/underlins from the *notes*.

[FWIW, I'm less convinced by the carets/underlines from the notes:
they all underline the whole of the __builtin_sprintf expression,
though looking at gimple-ssa-sprintf.c, I see:

  location_t callloc = gimple_location (info.callstmt);

which is used for the "inform" calls in question.  Hence I think
it's faithfully printing what that code is asking it to.  I'd prefer
to not touch that location in this patch, since it seems
orthogonal to fixing the PR preprocessor/78324; perhaps something
to address as part of PR middle-end/77696 ?].

Martin: how does this look to you?  Any objections to this change
as part of my fix for PR preprocessor/78324?

gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c (test_sprintf_note):
Move to...
* gcc.dg/tree-ssa/builtin-sprintf-warn-4.c: ...here.  Drop
-ftrack-macro-expansion=0.
(test_sprintf_note): Remove "P" macro.  Add
dg-begin/end-multiline-output directives.
(LINE, buffer, ptr): Copy from builtin-sprintf-warn-1.c.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c
index 5779a95..a24889b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c
@@ -170,35 +170,6 @@ void test_sprintf_zero_length_array (void *p, int i)
   __builtin_sprintf (buffer (1), "%s",  s [i].a);
 }
 
-/* Verify that the note printed along with the diagnostic mentions
-   the correct sizes and refers to the location corresponding to
-   the affected directive.  */
-
-void test_sprintf_note (void)
-{
-#define P __builtin_sprintf
-
-  /* Diagnostic column numbers are 1-based.  */
-
-  P (buffer (0),/* { dg-message "format output 4 bytes into a 
destination of size 0" } */
- "%c%s%i", '1', "2", 3);/* { dg-warning "7:.%c. directive writing 1 
byte into a region of size 0" } */
-
-  P (buffer (1),/* { dg-message "format output 6 bytes into a 
destination of size 1" } */
- "%c%s%i", '1', "23", 45);  /* { dg-warning "9:.%s. directive writing 2 
bytes into a region of size 0" } */
-
-  P (buffer (2),/* { dg-message "format output 6 bytes into a 
destination of size 2" } */
- "%c%s%i", '1', "2", 345);  /* { dg-warning "11:.%i. directive writing 3 
bytes into a region of size 0" } */
-
-  /* It would be nice if the caret in the location range for the format
- string below could be made to point at the closing quote of the format
- string, like so:
-   sprintf (d, "%c%s%i", '1', "2", 3456);
-   ~~^
- Unfortunately, that doesn't work with the current setup.  */
-  P (buffer (6),/* { dg-message "format output 7 bytes into a 
destination of size 6" } */
- "%c%s%i", '1', "2", 3456); /* { dg-warning "writing a terminating nul 
past the end of the destination" } */
-}
-
 #undef T
 #define T(size, fmt, ...)\
   __builtin___sprintf_chk (buffer (size), 0, objsize (size), fmt, \
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c
index faa5806..3b3fb68 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Wformat -Wformat-length=1 -fdiagnostics-show-caret 
-ftrack-macro-expansion=0" } */
+/* { dg-options "-Wformat -Wformat-length=1 -fdiagnostics-show-caret" } */
 
 extern int sprintf (char*, const char*, ...);
 
@@ -91,3 +91,81 @@ void test (void)
 }
 
 /* { dg-prune-output "too many arguments for format" } */
+
+/* When debugging, define LINE to the line number of the test case to exercise
+   and avoid exercising any of the others.  The buffer macro
+   below makes use of LINE to avoid warnings for

Re: [PATCH, C++] Warn on redefinition of builtin functions (PR c++/71973)

On 11/18/16 22:19, Jason Merrill wrote:
> On 11/05/2016 12:44 PM, Bernd Edlinger wrote:
>> +  warning_at (DECL_SOURCE_LOCATION (newdecl), 0,
>> +  "declaration of %q+#D conflicts with built-in "
>> +  "declaration %q#D", newdecl, olddecl);
>
> There needs to be a way to disable this warning, even if it's enabled by
> default.
>

It is mostly disabled by -Wno-system-headers.
So it will only warn for declarations in .cc files.

Then I will probably need a better name for it.
I'd like -Wbuiltin-declaration-mismatch for instance?

I wonder if that should also control the C warning, and the case where
the internal __builtin_ is declared differently.  It is somehow
hard to explain why it has to be a C++ only warning, except for
historical reasons.

>> -  TREE_NOTHROW (olddecl) = 0;
>> +  TREE_NOTHROW (olddecl) = TREE_NOTHROW (newdecl);
>
> I still think a better fix would be to add a copy of TREE_NOTHROW to the
> else block of the if (types_match), to go with the existing copies of
> TREE_READONLY and TREE_THIS_VOLATILE.
>

I can give it a try, but I don't understand why that does apparently
not give us problems with other types of declaration mismatches.

It would be good to have a test case where a redeclaration with a
non-builtin causes similar slightly-wrong-code issues.

Possible that some hidden issues exist already, but also possible
that I create new issues, if I don't understand the code completely.

Is the else block ever used for non-builtin functions?
Can variable declarations end up there?
What other types of objects are possible there?

Given that stage1 is already over, should I split the warning
part from the wrong code issue?

What do you think?

Thanks
Bernd.

Re: [PATCH] Add "__RTL" to cc1 (v5)

2016-11-18 Thread Joseph Myers

On Fri, 18 Nov 2016, David Malcolm wrote:

> +  /* Consume all tokens, up to the closing brace, handling
> + matching pairs of braces in the rtl dump.  */
> +  int num_open_braces = 1;
> +  while (1)
> +{
> +  switch (c_parser_peek_token (parser)->type)
> + {
> + case CPP_OPEN_BRACE:
> +   num_open_braces++;
> +   break;
> + case CPP_CLOSE_BRACE:
> +   if (--num_open_braces == 0)
> + goto found_closing_brace;
> +   break;
> + default:
> +   break;
> + }
> +  c_parser_consume_token (parser);
> +}

What if you have an EOF without the close brace being found?  I'd expect 
you to hit the

  gcc_assert (parser->tokens[0].type != CPP_EOF);

in c_parser_consume_token.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Fix divmod expansion (PR middle-end/78416)

2016-11-18 Thread Richard Biener

On November 18, 2016 10:45:10 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>As the testcase shows, expand_divmod doesn't handle properly
>division/remainder in modes larger than HOST_BITS_PER_WIDE_INT like
>TImode - if op1 is "negative" CONST_INT, it means it has 65 most
>significant
>bits set, but EXACT_POWER_OF_2_OR_ZERO_P will happily return true on
>that
>and optimize the division into a right shift.
>Fixed by taking into account that EXACT_POWER_OF_2_OR_ZERO_P works only
>if either the mode bitsize/precision are smaller or equal than HWI, or
>if the value doesn't have MSB bit set.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I wonder if transforming the const-int to wide int makes this all easier to 
read?

>2016-11-18  Jakub Jelinek  
>
>   PR middle-end/78416
>   * expmed.c (expand_divmod): For modes wider than HWI, take into
>   account implicit 1 bits above msb for EXACT_POWER_OF_2_OR_ZERO_P.
>   Formatting fixes.  Use size <= HOST_BITS_PER_WIDE_INT instead of
>   HOST_BITS_PER_WIDE_INT >= size.
>
>   * gcc.dg/torture/pr78416.c: New test.
>
>--- gcc/expmed.c.jj2016-10-31 13:28:12.0 +0100
>+++ gcc/expmed.c   2016-11-18 16:57:57.608703605 +0100
>@@ -3998,7 +3998,15 @@ expand_divmod (int rem_flag, enum tree_c
>   if (unsignedp)
>   ext_op1 &= GET_MODE_MASK (mode);
>   op1_is_pow2 = ((EXACT_POWER_OF_2_OR_ZERO_P (ext_op1)
>-   || (! unsignedp && EXACT_POWER_OF_2_OR_ZERO_P 
>(-ext_op1;
>+/* If mode is wider than HWI and op1 has msb set,
>+   then it has there extra implicit 1 bits above it.  */
>+&& (GET_MODE_PRECISION (mode) <= HOST_BITS_PER_WIDE_INT
>+|| INTVAL (op1) >= 0))
>+   || (! unsignedp
>+   && EXACT_POWER_OF_2_OR_ZERO_P (-ext_op1)
>+   && (GET_MODE_PRECISION (mode)
>+   <= HOST_BITS_PER_WIDE_INT
>+   || INTVAL (op1) < 0)));
> }
> 
>   /*
>@@ -4141,8 +4149,17 @@ expand_divmod (int rem_flag, enum tree_c
>   op1_is_constant = CONST_INT_P (op1);
>   op1_is_pow2 = (op1_is_constant
>&& ((EXACT_POWER_OF_2_OR_ZERO_P (INTVAL (op1))
>-|| (! unsignedp
>-&& EXACT_POWER_OF_2_OR_ZERO_P (-UINTVAL 
>(op1));
>+/* If mode is wider than HWI and op1 has msb set,
>+   then it has there extra implicit 1 bits above
>+   it.  */
>+&& (GET_MODE_PRECISION (compute_mode)
>+<= HOST_BITS_PER_WIDE_INT
>+|| INTVAL (op1) >= 0))
>+   || (! unsignedp
>+   && EXACT_POWER_OF_2_OR_ZERO_P (-UINTVAL (op1))
>+   && (GET_MODE_PRECISION (compute_mode)
>+   <= HOST_BITS_PER_WIDE_INT
>+   || INTVAL (op1) < 0;
> }
> 
>/* If one of the operands is a volatile MEM, copy it into a register. 
>*/
>@@ -4185,7 +4202,8 @@ expand_divmod (int rem_flag, enum tree_c
>   unsigned HOST_WIDE_INT d = (INTVAL (op1)
>   & GET_MODE_MASK (compute_mode));
> 
>-  if (EXACT_POWER_OF_2_OR_ZERO_P (d))
>+  if (EXACT_POWER_OF_2_OR_ZERO_P (d)
>+  && (INTVAL (op1) >= 0 || size <= HOST_BITS_PER_WIDE_INT))
> {
>   pre_shift = floor_log2 (d);
>   if (rem_flag)
>@@ -4325,7 +4343,7 @@ expand_divmod (int rem_flag, enum tree_c
>   else if (d == -1)
> quotient = expand_unop (compute_mode, neg_optab, op0,
> tquotient, 0);
>-  else if (HOST_BITS_PER_WIDE_INT >= size
>+  else if (size <= HOST_BITS_PER_WIDE_INT
>&& abs_d == HOST_WIDE_INT_1U << (size - 1))
> {
>   /* This case is not handled correctly below.  */
>@@ -4335,6 +4353,7 @@ expand_divmod (int rem_flag, enum tree_c
> goto fail1;
> }
>   else if (EXACT_POWER_OF_2_OR_ZERO_P (d)
>+   && (size <= HOST_BITS_PER_WIDE_INT || d >= 0)
>&& (rem_flag
>? smod_pow2_cheap (speed, compute_mode)
>: sdiv_pow2_cheap (speed, compute_mode))
>@@ -4348,7 +4367,9 @@ expand_divmod (int rem_flag, enum tree_c
>   compute_mode)
>!= CODE_FOR_nothing)))
> ;
>-  else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d))
>+  else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d)
>+   && (size <= HOST_BITS_PER_WIDE_INT
>+

Re: [PATCH] Fix target_clones attribute handling (PR middle-end/78419)

2016-11-18 Thread Richard Biener

On November 18, 2016 10:41:06 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>The following testcase ICEs because of buffer overflow in the
>expand_target_clones function.  The main bug is that get_attr_str
>doesn't count the commas in the strings, so while it handles
>__attribute__((target_clones ("avx", "foo", "avx2", "avx512f",
>"default")))
>it doesn't handle
>__attribute__((target_clones ("avx,foo,avx2,avx512f,default")))
>which is also meant to be valid.
>This is fixed by the get_attr_str hunk.
>The rest of the changes are to avoid leaking memory, avoid double
>diagnostics (if targetm.target_option.valid_attribute_p fails,
>it has reported errors already, so there is no point to report
>further error or warning), fixing location_t of the diagnostics
>(targetm.target_option.valid_attribute_p uses input_location),
>and various formatting fixes and cleanups.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

>2016-11-18  Jakub Jelinek  
>
>   PR middle-end/78419
>   * multiple_target.c (get_attr_len): Start with argnum and increment
>   argnum on every arg.  Use strchr in a loop instead of counting commas
>   manually.
>   (get_attr_str): Increment argnum for every comma in the string.
>   (separate_attrs): Use for instead of while loop, simplify.
>   (expand_target_clones): Rename defenition argument to definition.
>   Free attrs and attr_str even when diagnosing errors.  Temporarily
>   change input_location around targetm.target_option.valid_attribute_p
>   calls.  Don't emit warning or errors if that function fails.
>
>   * gcc.target/i386/pr78419.c: New test.
>
>--- gcc/multiple_target.c.jj   2016-08-29 12:17:37.0 +0200
>+++ gcc/multiple_target.c  2016-11-18 14:14:11.684489030 +0100
>@@ -101,22 +101,18 @@ get_attr_len (tree arglist)
> {
>   tree arg;
>   int str_len_sum = 0;
>-  int argnum = 1;
>+  int argnum = 0;
> 
>   for (arg = arglist; arg; arg = TREE_CHAIN (arg))
> {
>-  unsigned int i;
>   const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
>-  int len = strlen (str);
>-
>+  size_t len = strlen (str);
>   str_len_sum += len + 1;
>-  if (arg != arglist)
>+  for (const char *p = strchr (str, ','); p; p = strchr (p + 1,
>','))
>   argnum++;
>-  for (i = 0; i < strlen (str); i++)
>-  if (str[i] == ',')
>-argnum++;
>+  argnum++;
> }
>-  if (argnum == 1)
>+  if (argnum <= 1)
> return -1;
>   return str_len_sum;
> }
>@@ -134,8 +130,9 @@ get_attr_str (tree arglist, char *attr_s
>   for (arg = arglist; arg; arg = TREE_CHAIN (arg))
> {
>   const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
>-
>   size_t len = strlen (str);
>+  for (const char *p = strchr (str, ','); p; p = strchr (p + 1,
>','))
>+  argnum++;
>   memcpy (attr_str + str_len_sum, str, len);
>   attr_str[str_len_sum + len] = TREE_CHAIN (arg) ? ',' : '\0';
>   str_len_sum += len + 1;
>@@ -152,19 +149,16 @@ separate_attrs (char *attr_str, char **a
> {
>   int i = 0;
>   bool has_default = false;
>-  char *attr = strtok (attr_str, ",");
> 
>-  while (attr != NULL)
>+  for (char *attr = strtok (attr_str, ",");
>+   attr != NULL; attr = strtok (NULL, ","))
> {
>   if (strcmp (attr, "default") == 0)
>   {
> has_default = true;
>-attr = strtok (NULL, ",");
> continue;
>   }
>-  attrs[i] = attr;
>-  attr = strtok (NULL, ",");
>-  i++;
>+  attrs[i++] = attr;
> }
>   if (!has_default)
> return -1;
>@@ -235,7 +229,7 @@ create_target_clone (cgraph_node *node,
>create the appropriate clone for each valid target attribute.  */
> 
> static bool
>-expand_target_clones (struct cgraph_node *node, bool defenition)
>+expand_target_clones (struct cgraph_node *node, bool definition)
> {
>   int i;
>   /* Parsing target attributes separated by comma.  */
>@@ -266,6 +260,8 @@ expand_target_clones (struct cgraph_node
> {
>   error_at (DECL_SOURCE_LOCATION (node->decl),
>   "default target was not set");
>+  XDELETEVEC (attrs);
>+  XDELETEVEC (attr_str);
>   return false;
> }
> 
>@@ -286,22 +282,24 @@ expand_target_clones (struct cgraph_node
> 
>   create_new_asm_name (attr, suffix);
>   /* Create new target clone.  */
>-  cgraph_node *new_node = create_target_clone (node, defenition,
>suffix);
>+  cgraph_node *new_node = create_target_clone (node, definition,
>suffix);
>   XDELETEVEC (suffix);
> 
>   /* Set new attribute for the clone.  */
>   tree attributes = make_attribute ("target", attr,
>   DECL_ATTRIBUTES (new_node->decl));
>   DECL_ATTRIBUTES (new_node->decl) = attributes;
>+  location_t saved_loc = input_location;
>+  input_location = DECL_SOURCE_LOCATION (node->decl);
>if (!targetm.target_option.valid_attribute_p (new_node->decl, NULL,

Re: [PATCH] avoid calling alloca(0)


On 11/18/2016 10:25 AM, Jakub Jelinek wrote:

On Fri, Nov 18, 2016 at 10:14:09AM -0700, Martin Sebor wrote:

Because people make mistakes and warnings help us avoid them (isn't
that obvious?)  Just because we get it right most of the time doesn't
mean we get right all of the time.  The papers and exploits I pointed
to should provide ample evidence that zero allocations are a problem
that can have serious (and costly) consequences.  We (i.e., Red Hat)
recognize this risk and have made it our goal to help stem some of
these problems.


Are you talking about cases where user writes malloc (0) explicitly, or
where user writes malloc (n); and the optimizers figure out n is 0,
either always, or e.g. because the compiler decided to duplicate the code
and and in one of the branches it proves it is 0, or you want to add a
runtime warning when malloc is called with 0?


Both.  The existing -Walloca-larger-than warning and the proposed
-Walloc-zero warning diagnose both.  The non-constant case (i.e.,
one where the zero is the result of constant propagation) is the
more interesting (and dangerous) one but I don't think there is
value in trying to distinguish the two.


E.g. I don't see how writing malloc (0) in the code should have anything
to do with any overflows.  The cases where jump threading creates cases
where we call functions with constant arguments and we then warn on them
is also problematic, often such code is even dead except the compiler is not
able to prove it.


It IMHO doesn't belong to either of these.


You've made that quite clear.  I struggle to reconcile your
position in this case with the one in debate about the
-Wimplicit-fallthrough option where you insisted on the exact
opposite despite the overwhelming number of false positives
caused by it.  Why is it appropriate for that option to be in
-Wextra and not this one?


It also matters what one can do to tell the compiler the code is fine.
For e.g. -Wimplicit-fallthrough and many other warnings one can add
a comment, or attribute, etc. to get the warning out of the way.
But the patch you've posted for the alloca (0) stuff contained
changes of n to n + !n, so the warning basically asks people to slow
down their code for no reason, just to get the warning out of the way.


No, it does not.  There are ways to avoid the warning without
changing the value of the argument.  The obvious one is to set
-Wno-alloc-zero, either on the command line or via pragma GCC
diagnostic.  Another one is to call the alloca function instead
of the builtin (the function is not declared without attribute
alloc_size, at least in Glibc, but GCC still expands it inline).
This is as simple as enclosing alloca in parentheses:

  void *p = (alloca)(n);

Ugly?  Perhaps.  One might say that code that does tricky or
unportable things is ugly for that reason alone.  IMO, it's
a good thing when it also looks unusual (or even ugly).  It
draws attention to itself and is less likely to be copied
or reused, or broken by those who don't realize that it's
fragile.

The specific patch we're discussing touches just 5 functions
in all of GCC, and it changes an alloca(n) to alloca(n + !n).
Your original objection was that the fix was too ugly.  I'd
be happy to change it to (n + 1) if that makes it less
offensive to you (or to anything else that lets us move
forward, including the (alloca)(n) above).  Either way, none
of these calls is in hot loops so the runtime impact of any
of this on GCC hardly seems worth talking about.

Having said that, after another review of the functions changed
by the patch it looks like avoiding the zero case (e.g., by
returning early or branching) might actually improve their
runtime performance (though I doubt that would be worth the
effort either).  The same could well be true for other such
instances of the warning.

Martin

PS I resisted changing the XALLOCAVEC macro itself but I'm not
opposed to it and it's also a possibility.

[PR target/25512] Optimize certain equality tests on m68k

The yearly m68k day or two of bugfixing for the retro-computing folks is 
under way.  It's earlier than last year, so I can do more than just fix 
regressions.


This BZ is a simple request to optimize sequences like:

moveq #1,%d1
cmp.l %d0, %d1
beq/bne

into

subq.l #1, %d0
beq/bne

When d0/d1 are both dead at the cmp.l.

Similarly for small negative numbers which get turned into an addq.l.

I've chosen to implement this as a 3 insn -> 3 insn peephole2.  That may 
not seem useful at first.  But when done right, the 2nd insn (the 
compare) becomes trivial for final to eliminate resulting in the desired 
code.


Each time this applies we save 2 bytes and some number of cycles 
depending on the particular m68k implementation we're on.


This hits just over 4000 times building newlib.  Of course we have so 
many multilib configurations that any one target is going to see 
considerably less impact.


Tested with m68k.exp in the GCC testsuite and by building GCC's target 
libraries as well as newlib/libgloss for the m68k-elf toolchain 
(libgfortran doesn't build due to an unrelated issue).


It'll get further testing once I start my yearly m68k bootstrap.  But 
obviously that'll be running for a good long while.


Installing on the trunk.

Jeff


commit fe059a93ff2768570cab30c2bcf58e71cfb59a66
Author: law 
Date:   Fri Nov 18 21:52:32 2016 +

PR target/25112
* config/m68k/m68k.c (moveq feeding equality comparison): New
peepholes.
* config/m68k/predicates.md (addq_subq_operand): New predicate.
(equality_comparison_operator): Likewise.

PR target/25112
* gcc.target/m68k/pr25112: New test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@242605 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ff3da21..534ef4b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2016-11-18  Jeff Law  
+
+   PR target/25112
+   * config/m68k/m68k.c (moveq feeding equality comparison): New
+   peepholes.
+   * config/m68k/predicates.md (addq_subq_operand): New predicate.
+   (equality_comparison_operator): Likewise.
+
 2016-11-18  Richard Sandiford  
 
* rtlanal.c (load_extend_op): Move to...
diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
index 3d7895d..7b7f373 100644
--- a/gcc/config/m68k/m68k.md
+++ b/gcc/config/m68k/m68k.md
@@ -7641,6 +7641,46 @@
(const_int 0))]
   "operands[5] = (operands[0] == operands[3]) ? operands[4] : operands[3];")
 
+;; We want to turn
+;;   moveq const,dX
+;;   cmp.l dX,dY
+;;   je/jne
+;;
+;; into
+;;   addq/subq -const,dY
+;;   cmp.l dY, 0
+;;   je/jne
+;;
+;; dX and dY must both be dead at the end of the sequence and the constant
+;; must be valid for addq/subq.
+;;
+;; Essentially we're making it trivial for final to realize the comparison
+;; is not needed
+;;
+;; Testing has shown a variant where the operands are reversed in the
+;; comparison never hits, so I have not included that variant.
+;;
+
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand" "")
+   (match_operand:SI 1 "addq_subq_operand" ""))
+   (set (cc0) (compare (match_operand:SI 2 "register_operand" "")
+  (match_dup 0)))
+   (set (pc) (if_then_else (match_operator 5 "equality_comparison_operator"
+   [(cc0) (const_int 0)])
+  (match_operand 3 "pc_or_label_operand")
+  (match_operand 4 "pc_or_label_operand")))]
+  "peep2_reg_dead_p (2, operands[0])
+   && peep2_reg_dead_p (2, operands[2])
+   && (operands[3] == pc_rtx || operands[4] == pc_rtx)
+   && DATA_REG_P (operands[2])"
+  [(set (match_dup 2) (plus:SI (match_dup 2) (match_dup 6)))
+   (set (cc0) (compare (match_dup 2) (const_int 0)))
+   (set (pc) (if_then_else (match_op_dup 5 [(cc0) (const_int 0)])
+  (match_dup 3)
+  (match_dup 4)))]
+  "operands[6] = GEN_INT (-INTVAL (operands[1]));")
+
 (define_peephole2
   [(set (match_operand:SI 0 "register_operand" "")
(match_operand:SI 1 "pow2_m1_operand" ""))
diff --git a/gcc/config/m68k/predicates.md b/gcc/config/m68k/predicates.md
index 186436c..bfb548a 100644
--- a/gcc/config/m68k/predicates.md
+++ b/gcc/config/m68k/predicates.md
@@ -245,6 +245,18 @@
 || reload_completed));
 })
 
+;; Used to detect constants that are valid for addq/subq instructions
+(define_predicate "addq_subq_operand"
+  (match_code "const_int")
+{
+  return ((INTVAL (op) <= 8 && INTVAL (op) > 0)
+ || (INTVAL (op) >= -8 && INTVAL (op) < 0));
+})
+
+;; Used to detect equality and non-equality operators
+(define_predicate "equality_comparison_operator"
+  (match_code "eq,ne"))
+
 ;; Used to detect when an operand is either a register
 ;; or a constant that is all ones in its lower bits.
 ;; Used by insv pattern to help

[PATCH] Fix divmod expansion (PR middle-end/78416)

Hi!

As the testcase shows, expand_divmod doesn't handle properly
division/remainder in modes larger than HOST_BITS_PER_WIDE_INT like
TImode - if op1 is "negative" CONST_INT, it means it has 65 most significant
bits set, but EXACT_POWER_OF_2_OR_ZERO_P will happily return true on that
and optimize the division into a right shift.
Fixed by taking into account that EXACT_POWER_OF_2_OR_ZERO_P works only
if either the mode bitsize/precision are smaller or equal than HWI, or
if the value doesn't have MSB bit set.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-18  Jakub Jelinek  

PR middle-end/78416
* expmed.c (expand_divmod): For modes wider than HWI, take into
account implicit 1 bits above msb for EXACT_POWER_OF_2_OR_ZERO_P.
Formatting fixes.  Use size <= HOST_BITS_PER_WIDE_INT instead of
HOST_BITS_PER_WIDE_INT >= size.

* gcc.dg/torture/pr78416.c: New test.

--- gcc/expmed.c.jj 2016-10-31 13:28:12.0 +0100
+++ gcc/expmed.c2016-11-18 16:57:57.608703605 +0100
@@ -3998,7 +3998,15 @@ expand_divmod (int rem_flag, enum tree_c
   if (unsignedp)
ext_op1 &= GET_MODE_MASK (mode);
   op1_is_pow2 = ((EXACT_POWER_OF_2_OR_ZERO_P (ext_op1)
-|| (! unsignedp && EXACT_POWER_OF_2_OR_ZERO_P 
(-ext_op1;
+ /* If mode is wider than HWI and op1 has msb set,
+then it has there extra implicit 1 bits above it.  */
+ && (GET_MODE_PRECISION (mode) <= HOST_BITS_PER_WIDE_INT
+ || INTVAL (op1) >= 0))
+|| (! unsignedp
+&& EXACT_POWER_OF_2_OR_ZERO_P (-ext_op1)
+&& (GET_MODE_PRECISION (mode)
+<= HOST_BITS_PER_WIDE_INT
+|| INTVAL (op1) < 0)));
 }
 
   /*
@@ -4141,8 +4149,17 @@ expand_divmod (int rem_flag, enum tree_c
   op1_is_constant = CONST_INT_P (op1);
   op1_is_pow2 = (op1_is_constant
 && ((EXACT_POWER_OF_2_OR_ZERO_P (INTVAL (op1))
- || (! unsignedp
- && EXACT_POWER_OF_2_OR_ZERO_P (-UINTVAL 
(op1));
+ /* If mode is wider than HWI and op1 has msb set,
+then it has there extra implicit 1 bits above
+it.  */
+ && (GET_MODE_PRECISION (compute_mode)
+ <= HOST_BITS_PER_WIDE_INT
+ || INTVAL (op1) >= 0))
+|| (! unsignedp
+&& EXACT_POWER_OF_2_OR_ZERO_P (-UINTVAL (op1))
+&& (GET_MODE_PRECISION (compute_mode)
+<= HOST_BITS_PER_WIDE_INT
+|| INTVAL (op1) < 0;
 }
 
   /* If one of the operands is a volatile MEM, copy it into a register.  */
@@ -4185,7 +4202,8 @@ expand_divmod (int rem_flag, enum tree_c
unsigned HOST_WIDE_INT d = (INTVAL (op1)
& GET_MODE_MASK (compute_mode));
 
-   if (EXACT_POWER_OF_2_OR_ZERO_P (d))
+   if (EXACT_POWER_OF_2_OR_ZERO_P (d)
+   && (INTVAL (op1) >= 0 || size <= HOST_BITS_PER_WIDE_INT))
  {
pre_shift = floor_log2 (d);
if (rem_flag)
@@ -4325,7 +4343,7 @@ expand_divmod (int rem_flag, enum tree_c
else if (d == -1)
  quotient = expand_unop (compute_mode, neg_optab, op0,
  tquotient, 0);
-   else if (HOST_BITS_PER_WIDE_INT >= size
+   else if (size <= HOST_BITS_PER_WIDE_INT
 && abs_d == HOST_WIDE_INT_1U << (size - 1))
  {
/* This case is not handled correctly below.  */
@@ -4335,6 +4353,7 @@ expand_divmod (int rem_flag, enum tree_c
  goto fail1;
  }
else if (EXACT_POWER_OF_2_OR_ZERO_P (d)
+&& (size <= HOST_BITS_PER_WIDE_INT || d >= 0)
 && (rem_flag
 ? smod_pow2_cheap (speed, compute_mode)
 : sdiv_pow2_cheap (speed, compute_mode))
@@ -4348,7 +4367,9 @@ expand_divmod (int rem_flag, enum tree_c
compute_mode)
 != CODE_FOR_nothing)))
  ;
-   else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d))
+   else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d)
+&& (size <= HOST_BITS_PER_WIDE_INT
+|| abs_d != (unsigned HOST_WIDE_INT) d))
  {
if (rem_flag)
  {
@@ -4483,7 +4504,7 @@ expand_divmod (int rem_flag, enum tree_c

[PATCH] Fix target_clones attribute handling (PR middle-end/78419)

Hi!

The following testcase ICEs because of buffer overflow in the
expand_target_clones function.  The main bug is that get_attr_str
doesn't count the commas in the strings, so while it handles
__attribute__((target_clones ("avx", "foo", "avx2", "avx512f", "default")))
it doesn't handle
__attribute__((target_clones ("avx,foo,avx2,avx512f,default")))
which is also meant to be valid.
This is fixed by the get_attr_str hunk.
The rest of the changes are to avoid leaking memory, avoid double
diagnostics (if targetm.target_option.valid_attribute_p fails,
it has reported errors already, so there is no point to report
further error or warning), fixing location_t of the diagnostics
(targetm.target_option.valid_attribute_p uses input_location),
and various formatting fixes and cleanups.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-18  Jakub Jelinek  

PR middle-end/78419
* multiple_target.c (get_attr_len): Start with argnum and increment
argnum on every arg.  Use strchr in a loop instead of counting commas
manually.
(get_attr_str): Increment argnum for every comma in the string.
(separate_attrs): Use for instead of while loop, simplify.
(expand_target_clones): Rename defenition argument to definition.
Free attrs and attr_str even when diagnosing errors.  Temporarily
change input_location around targetm.target_option.valid_attribute_p
calls.  Don't emit warning or errors if that function fails.

* gcc.target/i386/pr78419.c: New test.

--- gcc/multiple_target.c.jj2016-08-29 12:17:37.0 +0200
+++ gcc/multiple_target.c   2016-11-18 14:14:11.684489030 +0100
@@ -101,22 +101,18 @@ get_attr_len (tree arglist)
 {
   tree arg;
   int str_len_sum = 0;
-  int argnum = 1;
+  int argnum = 0;
 
   for (arg = arglist; arg; arg = TREE_CHAIN (arg))
 {
-  unsigned int i;
   const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
-  int len = strlen (str);
-
+  size_t len = strlen (str);
   str_len_sum += len + 1;
-  if (arg != arglist)
+  for (const char *p = strchr (str, ','); p; p = strchr (p + 1, ','))
argnum++;
-  for (i = 0; i < strlen (str); i++)
-   if (str[i] == ',')
- argnum++;
+  argnum++;
 }
-  if (argnum == 1)
+  if (argnum <= 1)
 return -1;
   return str_len_sum;
 }
@@ -134,8 +130,9 @@ get_attr_str (tree arglist, char *attr_s
   for (arg = arglist; arg; arg = TREE_CHAIN (arg))
 {
   const char *str = TREE_STRING_POINTER (TREE_VALUE (arg));
-
   size_t len = strlen (str);
+  for (const char *p = strchr (str, ','); p; p = strchr (p + 1, ','))
+   argnum++;
   memcpy (attr_str + str_len_sum, str, len);
   attr_str[str_len_sum + len] = TREE_CHAIN (arg) ? ',' : '\0';
   str_len_sum += len + 1;
@@ -152,19 +149,16 @@ separate_attrs (char *attr_str, char **a
 {
   int i = 0;
   bool has_default = false;
-  char *attr = strtok (attr_str, ",");
 
-  while (attr != NULL)
+  for (char *attr = strtok (attr_str, ",");
+   attr != NULL; attr = strtok (NULL, ","))
 {
   if (strcmp (attr, "default") == 0)
{
  has_default = true;
- attr = strtok (NULL, ",");
  continue;
}
-  attrs[i] = attr;
-  attr = strtok (NULL, ",");
-  i++;
+  attrs[i++] = attr;
 }
   if (!has_default)
 return -1;
@@ -235,7 +229,7 @@ create_target_clone (cgraph_node *node,
create the appropriate clone for each valid target attribute.  */
 
 static bool
-expand_target_clones (struct cgraph_node *node, bool defenition)
+expand_target_clones (struct cgraph_node *node, bool definition)
 {
   int i;
   /* Parsing target attributes separated by comma.  */
@@ -266,6 +260,8 @@ expand_target_clones (struct cgraph_node
 {
   error_at (DECL_SOURCE_LOCATION (node->decl),
"default target was not set");
+  XDELETEVEC (attrs);
+  XDELETEVEC (attr_str);
   return false;
 }
 
@@ -286,22 +282,24 @@ expand_target_clones (struct cgraph_node
 
   create_new_asm_name (attr, suffix);
   /* Create new target clone.  */
-  cgraph_node *new_node = create_target_clone (node, defenition, suffix);
+  cgraph_node *new_node = create_target_clone (node, definition, suffix);
   XDELETEVEC (suffix);
 
   /* Set new attribute for the clone.  */
   tree attributes = make_attribute ("target", attr,
DECL_ATTRIBUTES (new_node->decl));
   DECL_ATTRIBUTES (new_node->decl) = attributes;
+  location_t saved_loc = input_location;
+  input_location = DECL_SOURCE_LOCATION (node->decl);
   if (!targetm.target_option.valid_attribute_p (new_node->decl, NULL,
-   TREE_VALUE (attributes), 0))
+   TREE_VALUE (attributes),
+

Re: [PATCH, C++] Warn on redefinition of builtin functions (PR c++/71973)


On 11/05/2016 12:44 PM, Bernd Edlinger wrote:

+ warning_at (DECL_SOURCE_LOCATION (newdecl), 0,
+ "declaration of %q+#D conflicts with built-in "
+ "declaration %q#D", newdecl, olddecl);


There needs to be a way to disable this warning, even if it's enabled by 
default.



-  TREE_NOTHROW (olddecl) = 0;
+  TREE_NOTHROW (olddecl) = TREE_NOTHROW (newdecl);


I still think a better fix would be to add a copy of TREE_NOTHROW to the 
else block of the if (types_match), to go with the existing copies of 
TREE_READONLY and TREE_THIS_VOLATILE.


Jason

Re: [PATCH] Fix dwarf2out related bootstrap failure on Solaris (PR debug/78191)

OK.

On Thu, Nov 3, 2016 at 12:42 PM, Jakub Jelinek  wrote:
> Hi!
>
> My recent optimize_abbrev_table optimization apparently broke Solaris
> bootstrap.
> The bug is specific I think just to -gdwarf-{2,3} -gno-strict-dwarf, where
> DW_FORM_exprloc can't be used for location expressions and some location
> expression contains DW_OP_GNU_{{const,regval,deref}_type,convert,reinterpret}
> and there are at least 128 die abbreviations used by more than one DIE.
> Because of the lack of DW_FORM_exprloc we need to decide on
> DW_FORM_block{1,2,4} and size the location expression, for that we need to 
> know
> the DIE offsets of base offset DIEs that are referenced (earlier code
> ensures such types appear very early in the CU, right after the
> DW_TAG_compile_unit DIE) and for that their size and their abbreviation values
> need to be constant, so the optimize_abbrev_table opts that reorder 
> abbreviation
> numbers by decreasing usage count or for -gdwarf-5 attempt to optimize 
> implicit
> constants can't be done for the CU and base types.
> For -gdwarf-4 and above, we can emit DW_FORM_exprloc, so value_format doesn't
> need to compute any sizes and thus calc_base_type_die_sizes shouldn't be
> called.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux and Rainer has kindly
> tested it on Solaris, ok for trunk?
>
> 2016-11-03  Jakub Jelinek  
>
> * dwarf2out.c (size_of_discr_list): Fix typo in function comment.
>
> PR debug/78191
> * dwarf2out.c (abbrev_opt_base_type_end): New variable.
> (die_abbrev_cmp): Sort dies with die_abbrev smaller than
> abbrev_opt_base_type_end only by increasing die_abbrev, before
> any other dies.
> (optimize_abbrev_table): Don't change abbrev numbers of
> base types and CU or optimize implicit consts in them if
> calc_base_type_die_sizes has been called during build_abbrev_table.
> (calc_base_type_die_sizes): If abbrev_opt_start, set
> abbrev_opt_base_type_end to one plus largest base type's
> die_abbrev.
>
> --- gcc/dwarf2out.c.jj  2016-11-03 08:47:59.0 +0100
> +++ gcc/dwarf2out.c 2016-11-03 12:26:03.192459170 +0100
> @@ -1909,7 +1909,7 @@ size_of_discr_value (dw_discr_value *dis
>  return size_of_sleb128 (discr_value->v.sval);
>  }
>
> -/* Return the size of the value in a DW_discr_list attribute.  */
> +/* Return the size of the value in a DW_AT_discr_list attribute.  */
>
>  static int
>  size_of_discr_list (dw_discr_list_ref discr_list)
> @@ -8548,6 +8548,11 @@ optimize_external_refs (dw_die_ref die)
>  /* First abbrev_id that can be optimized based on usage.  */
>  static unsigned int abbrev_opt_start;
>
> +/* Maximum abbrev_id of a base type plus one (we can't optimize DIEs with
> +   abbrev_id smaller than this, because they must be already sized
> +   during build_abbrev_table).  */
> +static unsigned int abbrev_opt_base_type_end;
> +
>  /* Vector of usage counts during build_abbrev_table.  Indexed by
> abbrev_id - abbrev_opt_start.  */
>  static vec abbrev_usage_count;
> @@ -8646,12 +8651,16 @@ die_abbrev_cmp (const void *p1, const vo
>gcc_checking_assert (die1->die_abbrev >= abbrev_opt_start);
>gcc_checking_assert (die2->die_abbrev >= abbrev_opt_start);
>
> -  if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start]
> -  > abbrev_usage_count[die2->die_abbrev - abbrev_opt_start])
> -return -1;
> -  if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start]
> -  < abbrev_usage_count[die2->die_abbrev - abbrev_opt_start])
> -return 1;
> +  if (die1->die_abbrev >= abbrev_opt_base_type_end
> +  && die2->die_abbrev >= abbrev_opt_base_type_end)
> +{
> +  if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start]
> + > abbrev_usage_count[die2->die_abbrev - abbrev_opt_start])
> +   return -1;
> +  if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start]
> + < abbrev_usage_count[die2->die_abbrev - abbrev_opt_start])
> +   return 1;
> +}
>
>/* Stabilize the sort.  */
>if (die1->die_abbrev < die2->die_abbrev)
> @@ -8729,10 +8738,12 @@ optimize_abbrev_table (void)
>sorted_abbrev_dies.qsort (die_abbrev_cmp);
>
>unsigned int abbrev_id = abbrev_opt_start - 1;
> -  unsigned int first_id = 0;
> +  unsigned int first_id = ~0U;
>unsigned int last_abbrev_id = 0;
>unsigned int i;
>dw_die_ref die;
> +  if (abbrev_opt_base_type_end > abbrev_opt_start)
> +   abbrev_id = abbrev_opt_base_type_end - 1;
>/* Reassign abbreviation ids from abbrev_opt_start above, so that
>  most commonly used abbreviations come first.  */
>FOR_EACH_VEC_ELT (sorted_abbrev_dies, i, die)
> @@ -8740,10 +8751,15 @@ optimize_abbrev_table (void)
>   dw_attr_node *a;
>   unsigned ix;
>
> + /* If calc_base_type_die_sizes has been called, the CU and
> +base types after it can't

Re: [C++ PATCH, ABI] Fix mangling of TLS init and wrapper fns (PR c++/77285)

On Fri, Nov 11, 2016 at 7:23 AM, Jakub Jelinek  wrote:
> Not really sure if we want to call just check_abi_tags (based on the idea
> that likely any uses of the TLS var from other TUs will be broken),

I think that's fine.  OK.

Jason

Re: [Patch, Fortran] PR 78392: ICE in gfc_trans_auto_array_allocation, at fortran/trans-array.c:5979

2016-11-18 Thread Janus Weil

Hi Dominique,

>> the attached patch fixes an ice-on-valid problem, simply by removing an 
>> assert. ...
>
> I have several instances in my test suite showing that the proposed patch 
> removes the ICE but generates wrong code:
>
> pr42359, second test, => ICE on another place
> pr54613, sixth and eighth tests,

thanks for the comments, those cases are closely related.

I previously assumed that the test case for this PR would be legal,
but by now I think that's wrong. The test case should be rejected, and
we already have checking mechanisms for this (see
resolve_fl_variable), but apparently they are not working.

My current suspicion is that 'gfc_is_constant_expr' has a bug, because
it claims the call to the function 'get_i' to be a constant
expression. This is not true, because get_i() can not be reduced to a
compile-time constant.

In any case the patch I proposed is wrong and the assert should stay.

Cheers,
Janus

[PATCH] Add "__RTL" to cc1 (v5)

On Wed, 2016-11-16 at 14:24 +0100, Richard Biener wrote:
> On Tue, Nov 15, 2016 at 10:07 PM, David Malcolm 
> wrote:
> > On Mon, 2016-11-14 at 16:14 +0100, Richard Biener wrote:
> > > On Fri, Nov 11, 2016 at 10:15 PM, David Malcolm <
> > > dmalc...@redhat.com>
> > > wrote:
> > > > Changed in this version:
> > > > 
> > > > * Rather than running just one pass, run *all* passes, but
> > > > start at
> > > >   the given pass; support for "dg-do run" tests that execute
> > > > the
> > > >   resulting code.
> > > > * Updated test cases to new "compact" dump format; more test
> > > > cases;
> > > >   use "dg-do run" in various places.
> > > > * Lots of bugfixing
> > > > 
> > > > Links to previous versions:
> > > >   https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00263.html
> > > >   https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00500.html
> > 
> > > Does running the RTL passes right from the parser work with 
> > > -fsyntax
> > > -only?
> > 
> > It depends what you mean by "work".  If I run it with -fsyntax
> > -only,
> > then pass_rest_of_compilation::gate returns false, and none of the
> > RTL
> > passes are run.  Is this behavior correct?
> 
> Yes.

Thanks.

> > > Doing it like __GIMPLE has the advantage of not exposing
> > > "rest_of_compilation", etc..
> > 
> > The gimple part of the compiler supports having multiple functions
> > in
> > memory at once, and then compiling them in arbitrary order based on
> > decisions made by the callgraph.
> > 
> > By contrast, the RTL part of the compiler is full of singleton
> > state:
> > things like crtl (aka x_rtl), the state of emit-rtl.c,
> > "reload_completed", etc etc.
> 
> Ah, of course - I forgot that...
> 
> > To try to limit the scope of the project, for the RTL frontend work
> > I'm
> > merely attempting to restore the singleton RTL state from a dump,
> > rather than to support having per function stashes of RTL state.
> > 
> > Hence the rest of compilation gets invoked directly from the
> > frontend
> > for the __RTL case, since it will get overwritten if there's a
> > second
> > __RTL function in the source file (which sounds like an idea for a
> > test
> > case; I'll attempt such a test case).
> > 
> > I hope this is a reasonable approach.  If not, I suppose I can
> > attempt
> > to bundle it up into some kind of RTL function state, but that
> > seems
> > like significant scope creep.
> 
> Yeah, I think it's a good approach for now.

OK.

> > > I'm now handling __GIMPLE from within declspecs (the GIMPLE FE
> > > stuff
> > > has been committed), it would be nice to match the __RTL piece
> > > here.
> > 
> > (Congratulations on getting the GIMPLE FE stuff in)
> > 
> > I'm not sure I understand you here - do you want me to rewrite the
> > __RTL parsing to match the __GIMPLE piece, or the other way around?
> > If the former, presumably I should reuse (and rename)
> > c_parser_gimple_pass_list?
> 
> Handle __RTL like __GIMPLE, thus as declspec.  Of course ultimatively
> Joseph has the last word here.

I've updated the patch to do so.

> > 
> > > > gcc/ChangeLog:
> > > > * Makefile.in (OBJS): Add run-rtl-passes.o.
> > > > 
> > > > gcc/c-family/ChangeLog:
> > > > * c-common.c (c_common_reswords): Add "__RTL".
> > > > * c-common.h (enum rid): Add RID_RTL.
> > > > 
> > > > gcc/c/ChangeLog:
> > > > * c-parser.c: Include "read-rtl-function.h" and
> > > > "run-rtl-passes.h".
> > > > (c_parser_declaration_or_fndef): In the "GNU
> > > > extensions"
> > > > part of
> > > > the leading comment, add an alternate production for
> > > > "function-definition", along with new "rtl-body
> > > > -specifier"
> > > > and
> > > > "rtl-body-pass-specifier" productions.  Handle "__RTL"
> > > > by
> > > > calling
> > > > c_parser_parse_rtl_body.  Convert a timevar_push/pop
> > > > pair
> > > > to an auto_timevar, to cope with early exit.
> > > > (c_parser_parse_rtl_body): New function.
> > > > 
> > > > gcc/ChangeLog:
> > > > * cfg.c (free_original_copy_tables): Remove assertion
> > > > on original_copy_bb_pool.
> > > 
> > > How can that trigger?
> > 
> > It happens when running pass_outof_cfg_layout_mode::execute; seen
> > with
> > gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c.
> > 
> > The input file is a dump taken in cfg_layout mode (although in this
> > case it's a trivial 3-basic-block CFG, but ideally there would be
> > cases
> > with non-trivial control flow); it has "fwprop1" as its starting
> > pass.
> > 
> > Running without -quiet shows:
> > 
> > skipping pass: *rest_of_compilation
> > skipping pass: vregs
> > skipping pass: into_cfglayout
> > skipping pass: jump
> > skipping pass: subreg1
> > skipping pass: cse1
> > found starting pass: fwprop1
> > 
> > i.e. it skips the into_cfglayout (amongst others), to start with
> > fwprop1.
> > 
> > In theory skipping a pass ought to be a no-op, assuming that we're
> > faithfully reconstructing

[PATCH] Introduce emit_status::ensure_regno_capacity (v5)

On Wed, 2016-10-05 at 17:55 +0200, Bernd Schmidt wrote:
> On 10/05/2016 06:15 PM, David Malcolm wrote:
> > -  /* Make sure regno_pointer_align, and regno_reg_rtx are large
> > - enough to have an element for this pseudo reg number.  */
> > +  int cur_size = crtl->emit.regno_pointer_align_length;
> > +  if (reg_rtx_no == cur_size)
> > +crtl->emit.ensure_regno_capacity (cur_size * 2);
> 
> Patch looks ok in principle, but maybe this manipulation of the size
> should be part of the new function as well - i.e. don't pass a
> new_size
> to it, make it check reg_rtx_no itself.

Changed in this version
- dropped "new_size" argument
- added a while loop to cover the case where reg_rtx_no
  needs to grow by more the double (potentially needed
  when loading RTL dumps)
- added a gcc_assert to ensure that the buffer is large enough

Successfully bootstrapped on x86_64-pc-linux-gnu as
part of the patch kit.

OK for trunk? (assuming it tests OK individually)

gcc/ChangeLog:
* emit-rtl.c (gen_reg_rtx): Move regno_pointer_align and
regno_reg_rtx resizing logic to...
(emit_status::ensure_regno_capacity): ...this new method,
and ensure that the buffers are large enough.
(init_emit): Allocate regno_reg_rtx using ggc_cleared_vec_alloc
rather than ggc_vec_alloc.
* function.h (emit_status::ensure_regno_capacity): New method.
---
 gcc/emit-rtl.c | 48 +---
 gcc/function.h |  2 ++
 2 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index e995899..b2b5fde 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1057,29 +1057,38 @@ gen_reg_rtx (machine_mode mode)
   /* Do not call gen_reg_rtx with uninitialized crtl.  */
   gcc_assert (crtl->emit.regno_pointer_align_length);
 
-  /* Make sure regno_pointer_align, and regno_reg_rtx are large
- enough to have an element for this pseudo reg number.  */
+  crtl->emit.ensure_regno_capacity ();
+  gcc_assert (reg_rtx_no < crtl->emit.regno_pointer_align_length);
 
-  if (reg_rtx_no == crtl->emit.regno_pointer_align_length)
-{
-  int old_size = crtl->emit.regno_pointer_align_length;
-  char *tmp;
-  rtx *new1;
+  val = gen_raw_REG (mode, reg_rtx_no);
+  regno_reg_rtx[reg_rtx_no++] = val;
+  return val;
+}
 
-  tmp = XRESIZEVEC (char, crtl->emit.regno_pointer_align, old_size * 2);
-  memset (tmp + old_size, 0, old_size);
-  crtl->emit.regno_pointer_align = (unsigned char *) tmp;
+/* Make sure m_regno_pointer_align, and regno_reg_rtx are large
+   enough to have elements in the range 0 <= idx <= reg_rtx_no.  */
 
-  new1 = GGC_RESIZEVEC (rtx, regno_reg_rtx, old_size * 2);
-  memset (new1 + old_size, 0, old_size * sizeof (rtx));
-  regno_reg_rtx = new1;
+void
+emit_status::ensure_regno_capacity ()
+{
+  int old_size = regno_pointer_align_length;
 
-  crtl->emit.regno_pointer_align_length = old_size * 2;
-}
+  if (reg_rtx_no < old_size)
+return;
 
-  val = gen_raw_REG (mode, reg_rtx_no);
-  regno_reg_rtx[reg_rtx_no++] = val;
-  return val;
+  int new_size = old_size * 2;
+  while (reg_rtx_no >= new_size)
+new_size *= 2;
+
+  char *tmp = XRESIZEVEC (char, regno_pointer_align, new_size);
+  memset (tmp + old_size, 0, new_size - old_size);
+  regno_pointer_align = (unsigned char *) tmp;
+
+  rtx *new1 = GGC_RESIZEVEC (rtx, regno_reg_rtx, new_size);
+  memset (new1 + old_size, 0, (new_size - old_size) * sizeof (rtx));
+  regno_reg_rtx = new1;
+
+  crtl->emit.regno_pointer_align_length = new_size;
 }
 
 /* Return TRUE if REG is a PARM_DECL, FALSE otherwise.  */
@@ -5667,7 +5676,8 @@ init_emit (void)
   crtl->emit.regno_pointer_align
 = XCNEWVEC (unsigned char, crtl->emit.regno_pointer_align_length);
 
-  regno_reg_rtx = ggc_vec_alloc (crtl->emit.regno_pointer_align_length);
+  regno_reg_rtx =
+ggc_cleared_vec_alloc (crtl->emit.regno_pointer_align_length);
 
   /* Put copies of all the hard registers into regno_reg_rtx.  */
   memcpy (regno_reg_rtx,
diff --git a/gcc/function.h b/gcc/function.h
index b564f45..cabffb9 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -34,6 +34,8 @@ struct GTY(()) sequence_stack {
 };
 
 struct GTY(()) emit_status {
+  void ensure_regno_capacity ();
+
   /* This is reset to LAST_VIRTUAL_REGISTER + 1 at the start of each function.
  After rtl generation, it is 1 plus the largest register number used.  */
   int x_reg_rtx_no;
-- 
1.8.5.3

[PATCH], Tweak PowerPC movdi constraints

2016-11-18 Thread Michael Meissner

This patch tweaks the movdi constraints for the PowerPC to use "^" or "$"
constraints instead of "?*".  This allows the register allocator to more often
allocate DImode to a floating point/vector register when it is desirable to do
so.

I did a full Spec 2006 run with this patch installed, and most of the
benchmarks were neutral in terms of performance.  The 482.sphinx3 benchmark had
a 2.5% performance boost with these patches.  There were no benchmarks that
regressed with this patch.

I built bootstrap compilers and did make check with no regressions on:
1)  Little endian power8, --with-cpu=power8
2)  Big endian power8, --with-cpu=power8 (no 32-bit support)
3)  Big endian power7, --with-cpu=power7 (both 32/64-bit support)

Can I check this patch into the trunk?

[gcc]
2016-11-18  Michael Meissner  

* config/rs6000/rs6000.md (movdi_internal32): Change FPR/VSX "?*"
load/store constraints to "^" if the instruction allows d-form
addressing or "$" if it only allows x-form addressing.  Change
FPR/VSX move constraints to "^".
(movdi_internal64): Likewise.

[gcc/testsuite]
2016-11-18  Michael Meissner  

* gcc.target/powerpc/ppc-round2.c: Allow XSCVDPSXWS and XSCVDPUXWS
to be generated instead of FCTIWUZ or FCTIWZ.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md 
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 242556)
+++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy)
@@ -8118,10 +8118,10 @@ (define_insn "p8_mfvsrd_4_disf"
 
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "rs6000_nonimmediate_operand"
- "=Y,r, r, ?m,?*d,?*d,
-  r, ?wY,   ?Z,?*wb,  ?*wv,   ?wi,
-  ?wo,   ?wo,   ?wv,   ?wi,   ?wi,?wv,
-  ?wv")
+ "=Y,r, r, ^m,^d, ^d,
+  r, ^wY,   $Z,^wb,   $wv,^wi,
+  *wo,   *wo,   *wv,   *wi,   *wi,*wv,
+  *wv")
 
(match_operand:DI 1 "input_operand"
   "r,Y, r, d, m,  d,
@@ -8195,9 +8195,9 @@ (define_split
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
"=Y,r, r, r, r,  r,
-?m,?*d,   ?*d,   ?wY,   ?Z, ?*wb,
-?*wv,  ?wi,   ?wo,   ?wo,   ?wv,?wi,
-?wi,   ?wv,   ?wv,   r, *h, *h,
+^m,^d,^d,^Y,$Z, $wb,
+$wv,   ^wi,   *wo,   *wo,   *wv,*wi,
+*wi,   *wv,   *wv,   r, *h, *h,
 ?*r,   ?*wg,  ?*r,   ?*wj")
 
(match_operand:DI 1 "input_operand"
Index: gcc/testsuite/gcc.target/powerpc/ppc-round2.c
===
--- gcc/testsuite/gcc.target/powerpc/ppc-round2.c   
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)
 (revision 242556)
+++ gcc/testsuite/gcc.target/powerpc/ppc-round2.c   
(.../gcc/testsuite/gcc.target/powerpc)  (working copy)
@@ -5,8 +5,8 @@
 /* { dg-options "-O2 -mcpu=power8" } */
 /* { dg-final { scan-assembler-times "fcfid "  2 } } */
 /* { dg-final { scan-assembler-times "fcfids " 2 } } */
-/* { dg-final { scan-assembler-times "fctiwuz "2 } } */
-/* { dg-final { scan-assembler-times "fctiwz " 2 } } */
+/* { dg-final { scan-assembler-times "fctiwuz \|xscvdpuxws " 2 } } */
+/* { dg-final { scan-assembler-times "fctiwz \|xscvdpsxws "  2 } } */
 /* { dg-final { scan-assembler-times "mfvsrd " 4 } } */
 /* { dg-final { scan-assembler-times "mtvsrwa "2 } } */
 /* { dg-final { scan-assembler-times "mtvsrwz "2 } } */

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote:
> I'm seeing lots of ICEs with this.

Here is untested fix for that, will bootstrap/regtest it soon (after my
current set of bootstraps finishes).

2016-11-18  Jakub Jelinek  

* config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable,
don't initialize it, don't use it for the case where it isn't
provable %{z} nor using the same argument, instead move merge
argument into a new pseudo and use that as target.  Formatting fixes.

--- gcc/config/i386/i386.c.jj   2016-11-18 20:04:31.0 +0100
+++ gcc/config/i386/i386.c  2016-11-18 21:21:17.764190127 +0100
@@ -38220,14 +38220,12 @@ rdseed_step:
   rtx (*fcn) (rtx, rtx, rtx, rtx);
   rtx (*fcn_mask) (rtx, rtx, rtx, rtx, rtx);
   rtx (*fcn_maskz) (rtx, rtx, rtx, rtx, rtx, rtx);
-  rtx (*msk_mov) (rtx, rtx, rtx, rtx);
   int masked = 1;
   machine_mode mode, wide_mode, nar_mode;
 
   nar_mode  = V4SFmode;
   mode  = V16SFmode;
   wide_mode = V64SFmode;
-  msk_mov   = gen_avx512f_loadv16sf_mask;
   fcn_mask  = gen_avx5124fmaddps_4fmaddps_mask;
   fcn_maskz = gen_avx5124fmaddps_4fmaddps_maskz;
 
@@ -38270,7 +38268,6 @@ rdseed_step:
  wide_mode = V64SImode;
  fcn_mask  = gen_avx5124vnniw_vp4dpwssd_mask;
  fcn_maskz = gen_avx5124vnniw_vp4dpwssd_maskz;
- msk_mov   = gen_avx512f_loadv16si_mask;
  goto v4fma_expand;
 
case IX86_BUILTIN_4DPWSSDS_MASK:
@@ -38279,7 +38276,6 @@ rdseed_step:
  wide_mode = V64SImode;
  fcn_mask  = gen_avx5124vnniw_vp4dpwssds_mask;
  fcn_maskz = gen_avx5124vnniw_vp4dpwssds_maskz;
- msk_mov   = gen_avx512f_loadv16si_mask;
  goto v4fma_expand;
 
case IX86_BUILTIN_4FMAPS_MASK:
@@ -38295,11 +38291,11 @@ v4fma_expand:
wide_reg = gen_reg_rtx (wide_mode);
for (i = 0; i < 4; i++)
  {
-   args[i] = CALL_EXPR_ARG (exp, i);
+   args[i] = CALL_EXPR_ARG (exp, i);
ops[i] = expand_normal (args[i]);
 
-   emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, (i) * 64),
- ops[i]);
+   emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, i * 64),
+   ops[i]);
  }
 
accum = expand_normal (CALL_EXPR_ARG (exp, 4));
@@ -38318,7 +38314,7 @@ v4fma_expand:
  emit_insn (fcn (target, accum, wide_reg, mem));
else
  {
-   rtx merge, mask;
+   rtx merge, mask;
merge = expand_normal (CALL_EXPR_ARG (exp, 6));
 
mask = expand_normal (CALL_EXPR_ARG (exp, 7));
@@ -38340,18 +38336,16 @@ v4fma_expand:
merge = force_reg (mode, merge);
emit_insn (fcn_mask (target, wide_reg, mem, merge, mask));
  }
-   /* Merge with something unknown might happen if we z-mask w/ 
-O0.  */
+   /* Merge with something unknown might happen if we z-mask w/ 
-O0.  */
else
  {
-   rtx tmp = target;
-   emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask));
-
-   target = force_reg (mode, merge);
-   emit_insn (msk_mov (target, tmp, target, mask));
+   target = gen_reg_rtx (mode);
+   emit_move_insn (target, merge);
+   emit_insn (fcn_mask (target, wide_reg, mem, target, mask));
  }
  }
- return target;
-   }
+   return target;
+ }
 
case IX86_BUILTIN_4FNMASS:
  fcn = gen_avx5124fmaddps_4fnmaddss;
@@ -38366,7 +38360,6 @@ v4fma_expand:
case IX86_BUILTIN_4FNMASS_MASK:
  fcn_mask = gen_avx5124fmaddps_4fnmaddss_mask;
  fcn_maskz = gen_avx5124fmaddps_4fnmaddss_maskz;
- msk_mov   = gen_avx512vl_loadv4sf_mask;
  goto s4fma_expand;
 
case IX86_BUILTIN_4FMASS_MASK:
@@ -38380,22 +38373,21 @@ v4fma_expand:
 
fcn_mask = gen_avx5124fmaddps_4fmaddss_mask;
fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz;
-   msk_mov   = gen_avx512vl_loadv4sf_mask;
 
 s4fma_expand:
mode = V4SFmode;
wide_reg = gen_reg_rtx (V64SFmode);
for (i = 0; i < 4; i++)
  {
-rtx tmp;
-args[i] = CALL_EXPR_ARG (exp, i);
-ops[i] = expand_normal (args[i]);
+   rtx tmp;
+   args[i] = CALL_EXPR_ARG (exp, i);
+   ops[i] = expand_normal (args[i]);
 
-tmp = gen_reg_rtx (SFmode);
-emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0));
+   tmp = gen_reg_rtx (SFmode);
+   emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0));
 
-emit_move_insn

Re: [PATCH] avoid calling alloca(0)


On 11/18/2016 11:46 AM, Jeff Law wrote:

On 11/18/2016 12:58 AM, Jakub Jelinek wrote:

On Thu, Nov 17, 2016 at 05:32:31PM -0700, Martin Sebor wrote:

The point is warning on an alloca (0) may not be as clear cut as it
might seem.  It's probably a reasonable thing to do on the host, but on
a target, which might be embedded and explicitly overriding the builtin
alloca, warning on alloca (0) is less of a slam dunk.


I confess I haven't heard of such an implementation before but after
some searching I have managed to find a few examples of it online,
such as in QNX or in the BlackBerry SDK, or in the GCC shipped by
Texas Instruments.  For instance:


In the libiberty/alloca.c, I don't see anything special about alloca (0).
Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca
(4035).
alloca (0) just returns NULL after it.  The diffutils link is the same
alloca.c as in libiberty.  Returning NULL or returning a non-unique
pointer
for alloca (0) is just fine, it is the same thing as returning NULL or
returning a non-unique pointer from malloc (0).  We definitely don't want
to warn for malloc (0), and IMNSHO don't want to warn for alloca (0),
because it behaves the same.

Maybe that's the key.  We don't want to warn for

alloca (0);

But we should warn for

whatever = alloca (0);


The former is a clear request for GC of the alloca space.  The latter it
much more likely a request for space where something went wrong
computing the amount of space.


That makes sense to me.  In fact, it already works this way, though
not thanks to anything I did.

GCC optimizes calls to alloca away when their return value isn't used
so they don't trigger the warning.  With -fno-builtin-alloca (and with
the Glibc alloca macro suppressed), a call to alloca(0) does not emit
the warning because Glibc alloca isn't declared with attribute
alloc_size (or any other attribute except C++ nothrow).

Only calls to the built-in whose return value is used do trigger it,
whether the argument is a literal zero or the result of constant
propagation.

Martin

documenting command-line options (was Re: PING [PATCH] enable -fprintf-return-value by default)

2016-11-18 Thread Sandra Loosemore


On 11/18/2016 11:52 AM, Martin Sebor wrote:


[snip]
I think it would be be ideal if all the options were sorted the same
way in all sections.  Is there some command to have texinfo sort them
for us?  If not, can we write a script to sort them, either each time
just before generating the docs or manually?  (I'm happy to help.)
Otherwise, consistency will continue to be an elusive goal.

There are at least two reasons why I don't think following the style
used by each section is likely to yield good results (and clearly
hasn't to date).  First, the big sections already have examples of
both approaches (or simply options out of order).  In some of them
it can also be hard to tell if the radix of the options you're
looking to for guidance starts with an 'n'.  Second, when adding
more than one option to different sections (such as the
-Wformat-length and -fprintf-format-length options) having to
remember to apply a different sort order for each (warnings are
sorted by radix but optimization options, for the most parts,
strictly alphabetically), seems also likely to trip people up.


This is wandering off into a general documentation maintenance 
discussion that has little to do with the original patch review request.


GCC has way too many command-line options and most of them are 
interesting to only some tiny fraction of users.  The documentation 
needs to be structured to focus on the options users are most likely to 
need or find useful, and not bury the information about the most 
important options in the middle of a huge pile of barely-useful 
documentation about barely-useful options.  For this reason, I think a 
strict alphabetical sorting across the board is not helpful.  E.g., for 
the list of optimization options, the -O options are clearly more 
important than the -f options controlling individual optimizations, so 
-O should be the first thing users see when they look at that section. 
Similarly for debug options, the vast majority of users won't care about 
anything other than -g.  For target-specific options, it may make sense 
to list -march/-mcpu/-mtune together regardless of their alphabetization 
with respect to other -m options for that target.


My last round of cleanup on invoke.texi was focused on refining the 
categorization of options and moving some option documentation that was 
clearly mis-categorized to more appropriate places or newly-created 
categories.  Having fewer items listed in each category makes the 
ordering of things within the group less critical.


I think people looking for documentation for a specific option should be 
looking in the index first.  We should have @cindex entries for both the 
-ffoo and -fno-foo variants so the alphabetization works either way.


Anyway  long story short  there's probably no perfect 
organization for this chapter, and the sheer number of options being 
documented makes it a pile of work even to implement small changes in 
convention across the board.  That shouldn't stop us from making 
incremental improvements in organization, one section at a time, or 
taking care not to make the current situation worse when adding 
documentation for new options.


-Sandra

C++ PATCH for c++/67631, list-init and explicit

Just need to pass the flags down.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 6aee56eb70ff04c9c1fbcbef388144d4a0338154
Author: Jason Merrill 
Date:   Fri Nov 18 10:04:49 2016 -0500

PR c++/67631 - list-init and explicit conversions

* semantics.c (finish_compound_literal): Call digest_init_flags.
* typeck2.c (digest_init_flags): Add complain parm.
(store_init_value): Pass it.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index e5f9113..5674886 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6839,7 +6839,7 @@ extern tree store_init_value  (tree, 
tree, vec**, int);
 extern tree split_nonconstant_init (tree, tree);
 extern bool check_narrowing(tree, tree, tsubst_flags_t);
 extern tree digest_init(tree, tree, 
tsubst_flags_t);
-extern tree digest_init_flags  (tree, tree, int);
+extern tree digest_init_flags  (tree, tree, int, 
tsubst_flags_t);
 extern tree digest_nsdmi_init  (tree, tree);
 extern tree build_scoped_ref   (tree, tree, tree *);
 extern tree build_x_arrow  (location_t, tree,
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 96c67a5..389e7f1 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -2713,7 +2713,8 @@ finish_compound_literal (tree type, tree compound_literal,
   if (type == error_mark_node)
return error_mark_node;
 }
-  compound_literal = digest_init (type, compound_literal, complain);
+  compound_literal = digest_init_flags (type, compound_literal, LOOKUP_NORMAL,
+   complain);
   if (TREE_CODE (compound_literal) == CONSTRUCTOR)
 TREE_HAS_CONSTRUCTOR (compound_literal) = true;
 
diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 2ca4bf2..b214c99 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -794,7 +794,7 @@ store_init_value (tree decl, tree init, vec** 
cleanups, int flags)
 value = init;
   else
 /* Digest the specified initializer into an expression.  */
-value = digest_init_flags (type, init, flags);
+value = digest_init_flags (type, init, flags, tf_warning_or_error);
 
   value = extend_ref_init_temps (decl, value, cleanups);
 
@@ -1165,9 +1165,9 @@ digest_init (tree type, tree init, tsubst_flags_t 
complain)
 }
 
 tree
-digest_init_flags (tree type, tree init, int flags)
+digest_init_flags (tree type, tree init, int flags, tsubst_flags_t complain)
 {
-  return digest_init_r (type, init, false, flags, tf_warning_or_error);
+  return digest_init_r (type, init, false, flags, complain);
 }
 
 /* Process the initializer INIT for an NSDMI DECL (a FIELD_DECL).  */
@@ -1183,7 +1183,7 @@ digest_nsdmi_init (tree decl, tree init)
   if (BRACE_ENCLOSED_INITIALIZER_P (init)
   && CP_AGGREGATE_TYPE_P (type))
 init = reshape_init (type, init, tf_warning_or_error);
-  init = digest_init_flags (type, init, flags);
+  init = digest_init_flags (type, init, flags, tf_warning_or_error);
   if (TREE_CODE (init) == TARGET_EXPR)
 /* This represents the whole initialization.  */
 TARGET_EXPR_DIRECT_INIT_P (init) = true;
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-explicit1.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist-explicit1.C
new file mode 100644
index 000..5e00b2d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-explicit1.C
@@ -0,0 +1,11 @@
+// PR c++/67631
+// { dg-do compile { target c++11 } }
+
+struct X
+{
+  explicit operator unsigned ();
+};
+unsigned foo ()
+{
+  return unsigned{ X () };
+}

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

Hi!

On Thu, Nov 17, 2016 at 02:18:57PM -0800, H.J. Lu wrote:
> > Hi HJ, could you please commit it?
> 
> Done.

I'm seeing lots of ICEs with this.

E.g. reduced:

typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));
typedef unsigned char __mmask8;
typedef float __v4sf __attribute__ ((__vector_size__ (16)));

static inline  __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
_mm_setzero_ps (void)
{
  return __extension__ (__m128){ 0.0f, 0.0f, 0.0f, 0.0f };
}

 __m128
foo (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C, __m128 __D, __m128 __E, 
__m128 *__F)
{
  return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B,
  (__v4sf) __C,
  (__v4sf) __D,
  (__v4sf) __E,
  (__v4sf) __A,
  (const __v4sf *) __F,
  (__v4sf) _mm_setzero_ps (),
  (__mmask8) __U);
}

ICEs with -mavx5124fmaps -O0, but succeeds with
-mavx512vl -mavx5124fmaps -O0 or -mavx5124fmaps -O2.

fcn_mask = gen_avx5124fmaddps_4fmaddss_mask;
fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz;
msk_mov   = gen_avx512vl_loadv4sf_mask;

looks wrong, while -mavx5124fmaps implies -mavx512f, it doesn't
imply -mavx512vl, so using -mavx512vl insns unconditionally is just wrong.
You need some fallback if avx512vl isn't available, perhaps use
avx512f 512-bit masked insns with bits in the mask forced to pick only the
ones you want?

Also, seems there are various formatting issues in the change,
e.g. shortly after s4fma_expand: there is indentation by 3 chars relative to
above { instead of 2, gen_rtx_SUBREG (V16SFmode, tmp, 0)); has extra 1 char
indentation, some lines too long.

Jakub

Re: [fixincludes] Fix macOS 10.12 and (PR sanitizer/78267)

2016-11-18 Thread Bruce Korb

On Fri, Nov 18, 2016 at 9:42 AM, Mike Stump  wrote:
> On Nov 18, 2016, at 2:45 AM, Rainer Orth  
> wrote:
>> So the current suggestion is to combine my fixincludes patch and Jack's
>> patch to disable  use if !__BLOCKS__.
>
>> I guess this is ok for mainline now to restore bootstrap?
>
> I think we are down to everyone likes this, Ok.  The big question is, does 
> this need a back port?

My thinking on that subject is that since include fixes do not directly affect
the compiler and, instead, facilitate its use on various platforms, it
is therefore
reasonably safe to back port.  Especially if adequate guards (selection tests)
are included in the fix to prevent it from taking action on the wrong files.
But I also think it is really a "steering committee" decision.

For me, I think it is safe enough.

Re: PING [PATCH] enable -fprintf-return-value by default


On 11/18/2016 11:52 AM, Martin Sebor wrote:


I think it would be be ideal if all the options were sorted the same
way in all sections.  Is there some command to have texinfo sort them
for us?  If not, can we write a script to sort them, either each time
just before generating the docs or manually?  (I'm happy to help.)
Otherwise, consistency will continue to be an elusive goal.

I'm not aware of texinfo way to do this automatically.




There are at least two reasons why I don't think following the style
used by each section is likely to yield good results (and clearly
hasn't to date).  First, the big sections already have examples of
both approaches (or simply options out of order).  In some of them
it can also be hard to tell if the radix of the options you're
looking to for guidance starts with an 'n'.  Second, when adding
more than one option to different sections (such as the
-Wformat-length and -fprintf-format-length options) having to
remember to apply a different sort order for each (warnings are
sorted by radix but optimization options, for the most parts,
strictly alphabetically), seems also likely to trip people up.
Let's split this issue off by moving the option into the location Sandra 
has asked so that we're at least kindof, sorta, locally consistent. 
That allows your patch to go forward.


Then separately we can see if we can bring more sense to the larger 
issue.  Sandra has tried to work towards bring sanity to our 
documentation (which has grown like field bindweed over time) and we can 
include a discussion about this issue in that larger effort.




PS I don't mind moving the -fno-printf-return-value option up or
down and I will do it before committing the patch.  I would just
prefer to be able not to have to remember and worry about all
these subtle rules.  There are too many of them and they tend
to take time and energy away from things that matter more (like
fixing bugs).
Understood.  But that's also part of the reason why we delegate things 
-- there's a million little things to remember and nobody can remember 
them all.  So it's a balance between saying "we should clean this up and 
bring consistency here now" and "the maintainer has asked for a change, 
let's do that and address the consistency issues separately".


There's obviously pros and cons to each decision which I don't enumerate ;-)

Jeff

Re: PING [PATCH] enable -fprintf-return-value by default


On 11/18/2016 10:38 AM, Sandra Loosemore wrote:

On 11/18/2016 09:01 AM, Martin Sebor wrote:

On 11/17/2016 10:34 PM, Sandra Loosemore wrote:

On 11/16/2016 09:49 AM, Martin Sebor wrote:

I'm looking for an approval of the attached patch.

I've adjusted the documentation based on Sandra's input (i.e.,
documented the negative of the option rather than the positive;
thank you for the review, btw.)

On 11/08/2016 08:13 PM, Martin Sebor wrote:

The -fprintf-return-value optimization has been disabled since
the last time it caused a bootstrap failure on powerpc64le.  With
the underlying problems fixed GCC has bootstrapped fine on all of
powerpc64, powerpc64le and x86_64 and tested with no regressions.
I'd like to re-enable the option.  The attached patch does that.


[snip]

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi(revision 242500)
+++ gcc/doc/invoke.texi(working copy)
@@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}.
 -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss
@gol
 -fomit-frame-pointer -foptimize-sibling-calls @gol
 -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol
--fprefetch-loop-arrays -fprintf-return-value @gol
+-fprefetch-loop-arrays -fno-printf-return-value @gol
 -fprofile-correction @gol
 -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
 -fprofile-reorder-functions @gol


Please keep this list alphabetized -- the other "-fno-*" options are
sorted as such.  The documentation parts of the patch are OK with that
fixed, but I can't approve changing the default for the option.


I kept the option in the same place on the assumption that it was
the "printf" radix of the name, not the "no-" prefix, that these
were alphabetized by.

But now that you point it out and I've looked more carefully at
some of the other options, I see that in some sections they are
listed strictly alphabetically (-fno-foo after -fmoo) while in
others it's the way you suggest.  AFAICS, the former style looks
prevalent in the C++ Language Options and the in Warning Options,
for example.  The latter seems to be more popular in the
Optimization Options section (though there are counterexamples).

I'm happy to follow either of these conventions as long as its
consistent.  Otherwise, without both kinds of examples to choose
from, I don't think I can trust my brain to remember yet another
rule.


Well, how about the rule is that you look at the convention of the
specific list you are adding something to?  At least that way we retain
local consistency and don't mess up those parts of the documentation
that we have already tried to organize in some way.  There's so much
material in the command-line options chapter that it would be hard to
figure out how to present it even if the content were completely static,
much less when people are adding/renaming/modifying options all the time.


I think it would be be ideal if all the options were sorted the same
way in all sections.  Is there some command to have texinfo sort them
for us?  If not, can we write a script to sort them, either each time
just before generating the docs or manually?  (I'm happy to help.)
Otherwise, consistency will continue to be an elusive goal.

There are at least two reasons why I don't think following the style
used by each section is likely to yield good results (and clearly
hasn't to date).  First, the big sections already have examples of
both approaches (or simply options out of order).  In some of them
it can also be hard to tell if the radix of the options you're
looking to for guidance starts with an 'n'.  Second, when adding
more than one option to different sections (such as the
-Wformat-length and -fprintf-format-length options) having to
remember to apply a different sort order for each (warnings are
sorted by radix but optimization options, for the most parts,
strictly alphabetically), seems also likely to trip people up.

Martin

PS I don't mind moving the -fno-printf-return-value option up or
down and I will do it before committing the patch.  I would just
prefer to be able not to have to remember and worry about all
these subtle rules.  There are too many of them and they tend
to take time and energy away from things that matter more (like
fixing bugs).

Re: [PATCH] avoid calling alloca(0)


On 11/18/2016 12:58 AM, Jakub Jelinek wrote:

On Thu, Nov 17, 2016 at 05:32:31PM -0700, Martin Sebor wrote:

The point is warning on an alloca (0) may not be as clear cut as it
might seem.  It's probably a reasonable thing to do on the host, but on
a target, which might be embedded and explicitly overriding the builtin
alloca, warning on alloca (0) is less of a slam dunk.


I confess I haven't heard of such an implementation before but after
some searching I have managed to find a few examples of it online,
such as in QNX or in the BlackBerry SDK, or in the GCC shipped by
Texas Instruments.  For instance:


In the libiberty/alloca.c, I don't see anything special about alloca (0).
Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035).
alloca (0) just returns NULL after it.  The diffutils link is the same
alloca.c as in libiberty.  Returning NULL or returning a non-unique pointer
for alloca (0) is just fine, it is the same thing as returning NULL or
returning a non-unique pointer from malloc (0).  We definitely don't want
to warn for malloc (0), and IMNSHO don't want to warn for alloca (0),
because it behaves the same.

Maybe that's the key.  We don't want to warn for

alloca (0);

But we should warn for

whatever = alloca (0);


The former is a clear request for GC of the alloca space.  The latter it 
much more likely a request for space where something went wrong 
computing the amount of space.


jeff

Re: [PATCH] avoid calling alloca(0)

On 11/17/2016 05:32 PM, Martin Sebor wrote:

I confess I haven't heard of such an implementation before but after
some searching I have managed to find a few examples of it online,
such as in QNX or in the BlackBerry SDK, or in the GCC shipped by
Texas Instruments. For instance:
I have the "advantage" of being a GCC gray beard and had the misfortune
of maintaining a system that had one of those allocas (hpux) *and*
having to debug heap exhaustions in GCC that would occur due to this
kind of construct

for (some large number of iterations)
frobit (...)

Where frobit would allocate a metric ton of stuff with alloca.

Note the calls to alloca down in frobit would all appear to be at the
same stack depth (alloca literally recorded the SP value and the PA was
a fixed frame target). So the calls to alloca within frobit wouldn't
deallocate anything.

http://www.qnx.com/developers/docs/6.5.0SP1.update/com.qnx.doc.neutrino_lib_ref/a/alloca.html

Their GCC-derived compilers apparently enable it with -fno-builtins.

Right.

That said, I don't view this as reason to exclude the warning from
-Wextra. The vendor-provided compilers are obviously customized
versions of GCC with their own features, including their own set
of options and warnings. It shouldn't stop us from enabling those
we expect to be useful to the typical GCC user on typical targets,
and especially those that can help expose sources of common bugs.
Recognizing I'm preaching to choir here: Calling alloca with any
argument is considered dangerous enough to be discouraged in most
man pages. Alloca(0) is a potentially dangerous corner case of
an already dangerous API. It seems at least as problematic as
any of the warnings already in -Wextra, and I'd say as many in
-Wall.
Right. We're well in the "living dangerously" space. But I could claim
that on one of these targets, warning about an alloca (0) would likely
result in a developer removing it or forcing it to allocate some small
amount of space to shut up the warning. That in turn runs the risk of
making the target code more prone to heap exhaustion and possibly less
secure, particularly if the developer isn't aware of the special nature
of alloca (0).

Even on systems with this unusual alloca implementation, since
alloca(0) is special (and expensive), warning on such calls would
likely be useful. In fact, since calling alloca(0) in these
environments is actually important, warning on them coukd help
detect their unintended absence. The few such calls that are
intended can be easily tweaked to suppress the warning.
It certainly is special and often expensive. But I'd come back to the
likely behavior of the developer. I suspect unless there's a comment
indicating the alloca (0) is intentional, they're likely to just remove it.

So I see both sides here and I'm not sure what the best path forward
should be.

jeff

Re: [PATCH] enable -fprintf-return-value by default


On 11/08/2016 08:13 PM, Martin Sebor wrote:

The -fprintf-return-value optimization has been disabled since
the last time it caused a bootstrap failure on powerpc64le.  With
the underlying problems fixed GCC has bootstrapped fine on all of
powerpc64, powerpc64le and x86_64 and tested with no regressions.
I'd like to re-enable the option.  The attached patch does that.

Thanks
Martin


gcc-fprintf-return-value.diff


gcc/c-family/ChangeLog:

* c.opt (-fprintf-return-value): Enable by default.

gcc/ChangeLog:

* doc/invoke.texi (-fprintf-return-value): Document that option
is enabled by default.

OK once you and Sandra are in-sync on the doc changes.

jeff

Re: [Patch v4 0/17] Add support for _Float16 to AArch64 and ARM

2016-11-18 Thread James Greenhalgh

On Fri, Nov 11, 2016 at 03:37:17PM +, James Greenhalgh wrote:
> 
> Hi,
> 
> This patch set enables the _Float16 type specified in ISO/IEC TS 18661-3
> for AArch64 and ARM. The patch set has been posted over the past two months,
> with many of the target-independent changes approved. I'm reposting it in
> entirity in the form I hope to commit it to trunk.
> 
> The patch set can be roughly split in three; first, hookization of
> TARGET_FLT_EVAL_METHOD, and changes to the excess precision logic in the
> compiler to handle the new values for FLT_EVAL_METHOD defined in
> ISO/IEC TS-18661-3. Second, the AArch64 changes required to enable _Float16,
> and finally the ARM changes required to enable _Float16.
> 
> The broad goals and an outline of each patch in the patch set were
> described in https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02383.html .
> As compared to the original submission, the patch set has grown an ARM
> port, and has had several rounds of technical review on the target
> independent aspects.
> 
> This has resulted in many of the patches already being approved, a full
> summary of the status of each ticket is immediately below.
> 
> Clearly the focus for review of this patch set now needs to be the AArch64
> and ARM ports, I hope the appropriate maintainers will be able to do so in
> time for the patch set to be accepted for GCC 7.
> 
> I've built and tested the full patch set on ARM (cross and native),
> AArch64 (cross and native) and x86_64 (native) with no identified issues.

All the target independent changes are now approved, and all the ARM patches
have been approved (two are conditional on minor changes).

I'd like to apply the target independent and ARM changes to trunk, while I
wait for approval of the AArch64 changes. The changes for the two ports should
be independent. Would that be acceptable, or would you prefer me to wait
for review of the AArch64 changes?

I will then send a follow-up patch for doc/extend.texi detailing the
availability of _Float16 on ARM (I'm holding off on doing this until I know
which order the ARM and AArch64 parts will go in).

Thanks,
James

> --
> Target independent changes
> 
> 10 patches, 9 previously approved, 1 New implementing testsuite
> changes to enable _Float16 tests in more circumstances on ARM.
> --
> 
> [Patch 1/17] Add a new target hook for describing excess precision intentions
> 
>   Approved: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00781.html
> 
> [Patch 2/17] Implement TARGET_C_EXCESS_PRECISION for i386
> 
>   Blanket approved by Jeff in:
> https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02402.html
> 
> [Patch 3/17] Implement TARGET_C_EXCESS_PRECISION for s390
> 
>   Approved: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01554.html
> 
> [Patch 4/17] Implement TARGET_C_EXCESS_PRECISION for m68k
> 
>   Blanket approved by Jeff in:
> https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02402.html
>   And by Andreas: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02414.html
> 
>   There was a typo in the original patch, fixed in:
> https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01173.html
>   which I would apply as an "obvious" fix to the original patch.
> 
> [Patch 5/17] Add -fpermitted-flt-eval-methods=[c11|ts-18661-3]
> 
>   Approved: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02405.html
> 
>   Joseph had a comment in
>   https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00335.html that the tests
>   should check FLT_EVAL_METHOD from  rather than
>   __FLT_EVAL_METHOD__. Rather than implement that suggestion, I added tests
>   to patch 6 which tested the  macro, and left the tests in this
>   patch testing the internal macro.
> 
> [Patch 6/17] Migrate excess precision logic to use TARGET_EXCESS_PRECISION
> 
>   Approved (after removing a rebase bug):
>   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00231.html
> 
> [Patch 7/17] Delete TARGET_FLT_EVAL_METHOD and poison it.
> 
>   Approved: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02401.html
> 
> [Patch 8/17] Make _Float16 available if HFmode is available
> 
>   Approved: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02403.html
> 
> [Patch libgcc 9/17] Update soft-fp from glibc
> 
>   Self approved under policy that we can update libraries which GCC mirrors
>   without further approval.
> 
> [Patch testsuite patch 10/17] Add options for floatN when checking effective 
> target for support
> 
>   NEW!
> 
> 
> AArch64 changes
> 
> 3 patches, none reviewed
> 
> 
> [Patch AArch64 11/17] Add floatdihf2 and floatunsdihf2 patterns
> 
>   Not reviewed, last pinged (^6):
>   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00584.html
> 
> [Patch libgcc AArch64 12/17] Enable hfmode soft-float conversions and 
> truncations
> 
>   Not reviewed:
>   https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02395.html
> 
> [Patch AArch64 13/17] Enable _Float16 for AArch64
> 
>   Not reviewed:
>

libgo patch committed: Move sched variable from C to Go

2016-11-18 Thread Ian Lance Taylor

This patch simply moves the schedt (aka Sched) type and the sched
variable from C to Go.  This is a step toward further changes.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 242594)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-fc4ca600b2fc6de81fd3c4014542d6a50593db1a
+bf4762823c4543229867436399be3ae30b4d13bb
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/runtime2.go
===
--- libgo/go/runtime/runtime2.go(revision 242581)
+++ libgo/go/runtime/runtime2.go(working copy)
@@ -550,9 +550,6 @@ const (
_MaxGomaxprocs = 1 << 8
 )
 
-/*
-Commented out for gccgo for now.
-
 type schedt struct {
// accessed atomically. keep at top to ensure alignment on 32-bit 
systems.
goidgen  uint64
@@ -578,18 +575,17 @@ type schedt struct {
runqsize int32
 
// Global cache of dead G's.
-   gflock   mutex
-   gfreeStack   *g
-   gfreeNoStack *g
-   ngfree   int32
+   gflock mutex
+   gfree  *g
+   ngfree int32
 
// Central cache of sudog structs.
sudoglock  mutex
sudogcache *sudog
 
-   // Central pool of available defer structs of different sizes.
+   // Central pool of available defer structs.
deferlock mutex
-   deferpool [5]*_defer
+   deferpool *_defer
 
gcwaiting  uint32 // gc is waiting to run
stopwait   int32
@@ -608,7 +604,6 @@ type schedt struct {
procresizetime int64 // nanotime() of last change to gomaxprocs
totaltime  int64 // ∫gomaxprocs dt up to procresizetime
 }
-*/
 
 // The m.locked word holds two pieces of state counting active calls to 
LockOSThread/lockOSThread.
 // The low bit (LockExternal) is a boolean reporting whether any LockOSThread 
call is active.
@@ -772,8 +767,10 @@ var (
 
ncpu int32
 
-// forcegc forcegcstate
-// sched   schedt
+   //  forcegc forcegcstate
+
+   sched schedt
+
 // newprocsint32
 
 // Information about what cpu features are available.
Index: libgo/go/runtime/stubs.go
===
--- libgo/go/runtime/stubs.go   (revision 242581)
+++ libgo/go/runtime/stubs.go   (working copy)
@@ -520,3 +520,9 @@ func dumpregs(*_siginfo_t, unsafe.Pointe
 
 // Temporary for gccgo until we port panic.go.
 func startpanic()
+
+// Temporary for gccgo until we port proc.go.
+//go:linkname getsched runtime.getsched
+func getsched() *schedt {
+   return 
+}
Index: libgo/runtime/proc.c
===
--- libgo/runtime/proc.c(revision 242581)
+++ libgo/runtime/proc.c(working copy)
@@ -351,48 +351,18 @@ runtime_mcall(void (*pfn)(G*))
 //
 // Design doc at http://golang.org/s/go11sched.
 
-typedef struct Sched Sched;
-struct Sched {
-   Lock;
-
-   uint64  goidgen;
-   M*  midle;   // idle m's waiting for work
-   int32   nmidle;  // number of idle m's waiting for work
-   int32   nmidlelocked; // number of locked m's waiting for work
-   int32   mcount;  // number of m's that have been created
-   int32   maxmcount;  // maximum number of m's allowed (or die)
-
-   P*  pidle;  // idle P's
-   uint32  npidle;
-   uint32  nmspinning;
-
-   // Global runnable queue.
-   G*  runqhead;
-   G*  runqtail;
-   int32   runqsize;
-
-   // Global cache of dead G's.
-   Lockgflock;
-   G*  gfree;
-
-   uint32  gcwaiting;  // gc is waiting to run
-   int32   stopwait;
-   Notestopnote;
-   uint32  sysmonwait;
-   Notesysmonnote;
-   uint64  lastpoll;
-
-   int32   profilehz;  // cpu profiling rate
-};
+typedef struct schedt Sched;
 
 enum
 {
-   // Number of goroutine ids to grab from runtime_sched.goidgen to local 
per-P cache at once.
+   // Number of goroutine ids to grab from runtime_sched->goidgen to local 
per-P cache at once.
// 16 seems to provide enough amortization, but other than that it's 
mostly arbitrary number.
GoidCacheBatch = 16,
 };
 
-Sched  runtime_sched;
+extern Sched* runtime_getsched() __asm__ (GOSYM_PREFIX "runtime.getsched");
+
+static Sched*  runtime_sched;
 int32  runtime_gomaxprocs;
 uint32 runtime_needextram = 1;
 M  runtime_m0;
@@ -471,6 +441,8 @@ runtime_schedinit(void)
const byte *p;
Eface i;
 
+   runtime_sched = runtime_getsched();
+
m = _m0;
g = _g0;
m->g0 = g;
@@ -479,7 +451,7 @@ runtime_schedinit(void)
 
initcontext();
 
-   runtime_sched.maxmcount = 1;
+

Re: [fixincludes] Fix macOS 10.12 and (PR sanitizer/78267)

2016-11-18 Thread Mike Stump

On Nov 18, 2016, at 2:45 AM, Rainer Orth  wrote:
> So the current suggestion is to combine my fixincludes patch and Jack's
> patch to disable  use if !__BLOCKS__.

> I guess this is ok for mainline now to restore bootstrap?

I think we are down to everyone likes this, Ok.  The big question is, does this 
need a back port?

If you fix includes virtual members or data members of C/C++ classes, just be 
careful disappearing content as that can change the layout of the structure or 
class.

Re: PING [PATCH] enable -fprintf-return-value by default

2016-11-18 Thread Sandra Loosemore


On 11/18/2016 09:01 AM, Martin Sebor wrote:

On 11/17/2016 10:34 PM, Sandra Loosemore wrote:

On 11/16/2016 09:49 AM, Martin Sebor wrote:

I'm looking for an approval of the attached patch.

I've adjusted the documentation based on Sandra's input (i.e.,
documented the negative of the option rather than the positive;
thank you for the review, btw.)

On 11/08/2016 08:13 PM, Martin Sebor wrote:

The -fprintf-return-value optimization has been disabled since
the last time it caused a bootstrap failure on powerpc64le.  With
the underlying problems fixed GCC has bootstrapped fine on all of
powerpc64, powerpc64le and x86_64 and tested with no regressions.
I'd like to re-enable the option.  The attached patch does that.


[snip]

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi(revision 242500)
+++ gcc/doc/invoke.texi(working copy)
@@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}.
 -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss
@gol
 -fomit-frame-pointer -foptimize-sibling-calls @gol
 -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol
--fprefetch-loop-arrays -fprintf-return-value @gol
+-fprefetch-loop-arrays -fno-printf-return-value @gol
 -fprofile-correction @gol
 -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
 -fprofile-reorder-functions @gol


Please keep this list alphabetized -- the other "-fno-*" options are
sorted as such.  The documentation parts of the patch are OK with that
fixed, but I can't approve changing the default for the option.


I kept the option in the same place on the assumption that it was
the "printf" radix of the name, not the "no-" prefix, that these
were alphabetized by.

But now that you point it out and I've looked more carefully at
some of the other options, I see that in some sections they are
listed strictly alphabetically (-fno-foo after -fmoo) while in
others it's the way you suggest.  AFAICS, the former style looks
prevalent in the C++ Language Options and the in Warning Options,
for example.  The latter seems to be more popular in the
Optimization Options section (though there are counterexamples).

I'm happy to follow either of these conventions as long as its
consistent.  Otherwise, without both kinds of examples to choose
from, I don't think I can trust my brain to remember yet another
rule.


Well, how about the rule is that you look at the convention of the 
specific list you are adding something to?  At least that way we retain 
local consistency and don't mess up those parts of the documentation 
that we have already tried to organize in some way.  There's so much 
material in the command-line options chapter that it would be hard to 
figure out how to present it even if the content were completely static, 
much less when people are adding/renaming/modifying options all the time.


-Sandra

Re: Import libcilkrts Build 4467 (PR target/68945)

2016-11-18 Thread Ilya Verbin

2016-11-17 20:01 GMT+03:00 Jeff Law :
> On 11/17/2016 09:56 AM, Ilya Verbin wrote:
>>
>> 2016-11-17 18:50 GMT+03:00 Rainer Orth :
>>>
>>> Rainer Orth  writes:
>>>
 I happened to notice that my libcilkrts SPARC port has been applied
 upstream.  So to reach closure on this issue for the GCC 7 release, I'd
 like to import upstream into mainline which seems to be covered by the
 free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even
 though https://gcc.gnu.org/codingconventions.html#upstream lists nothing
 specific and we have no listed maintainer.
>>>
>>>
>>> I initially used Ilya's intel.com address, which bounced.  Now using the
>>> current address listed in MAINTAINERS...
>>
>>
>> Yeah, I don't work for Intel anymore. And I'm not a libcilkrts
>> maintainer, so I can't approve it.
>
> Do you want to be?  IIRC I was going to nominate you, but held off knowing
> your situation was going to change.
>
> If you're interested in maintainer positions, I can certainly still nominate
> you.

I have little experience with this library, and no longer have a
connection with Cilk developers an Intel, so I'm not interested.

  -- Ilya

Re: [PATCH, Darwin] Fix PR57438 by avoiding empty function bodies and trailing labels.

On Fri, Nov 18, 2016 at 05:19:31PM +, Iain Sandoe wrote:
> I’d like to do that;  is there a way to force a jump table for a limited set 
> of cases? 
> (the example was about the smallest I could get where GCC elected to produce 
> a jump table instead of branches)

--param case-values-threshold ?


Segher

Re: [PATCH] avoid calling alloca(0)

On Fri, Nov 18, 2016 at 10:14:09AM -0700, Martin Sebor wrote:
> Because people make mistakes and warnings help us avoid them (isn't
> that obvious?)  Just because we get it right most of the time doesn't
> mean we get right all of the time.  The papers and exploits I pointed
> to should provide ample evidence that zero allocations are a problem
> that can have serious (and costly) consequences.  We (i.e., Red Hat)
> recognize this risk and have made it our goal to help stem some of
> these problems.

Are you talking about cases where user writes malloc (0) explicitly, or
where user writes malloc (n); and the optimizers figure out n is 0,
either always, or e.g. because the compiler decided to duplicate the code
and and in one of the branches it proves it is 0, or you want to add a
runtime warning when malloc is called with 0?
E.g. I don't see how writing malloc (0) in the code should have anything
to do with any overflows.  The cases where jump threading creates cases
where we call functions with constant arguments and we then warn on them
is also problematic, often such code is even dead except the compiler is not
able to prove it.

> >It IMHO doesn't belong to either of these.
> 
> You've made that quite clear.  I struggle to reconcile your
> position in this case with the one in debate about the
> -Wimplicit-fallthrough option where you insisted on the exact
> opposite despite the overwhelming number of false positives
> caused by it.  Why is it appropriate for that option to be in
> -Wextra and not this one?

It also matters what one can do to tell the compiler the code is fine.
For e.g. -Wimplicit-fallthrough and many other warnings one can add
a comment, or attribute, etc. to get the warning out of the way.
But the patch you've posted for the alloca (0) stuff contained
changes of n to n + !n, so the warning basically asks people to slow
down their code for no reason, just to get the warning out of the way.

Jakub

Re: [PATCH] debug/PR78112: remove recent duplicates for DW_TAG_subprogram attributes

2016-11-18 Thread Pierre-Marie de Rodat


Hi Christophe,

On 11/18/2016 03:03 PM, Christophe Lyon wrote:

Hi,

Part of the new testcase added with this commit fails on arm* targets:
  g++.dg/pr78112.C   scan-assembler-times DW_AT_object_pointer 37


Thank you for the report.

As I said on the bugzilla 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78112#c17), I reproduce 
inconsistent results on this even on a very reduced testcase. So I 
wonder if I should just remove this testcase.


--
Pierre-Marie de Rodat

Re: [fixincludes] Fix macOS 10.12 and (PR sanitizer/78267)

2016-11-18 Thread Bruce Korb

I think restoring bootstrap is likely a good thing.

On Fri, Nov 18, 2016 at 2:45 AM, Rainer Orth
 wrote:
>
> I guess this is ok for mainline now to restore bootstrap?

Re: [PATCH] DWARF: make signedness explicit for enumerator const values

2016-11-18 Thread Pierre-Marie de Rodat


On 11/14/2016 01:05 PM, Mark Wielaard wrote:

You can either choose a signed/unsigned form not giving the consumer
a hint about the size of the underlying constant value or one of the
sized data forms that don't give a hint about the value
representation/signedness. So in both cases the consumer looses
without an actual (base) type telling them how to interpret the
constant.


I see, thank you for the explanation! :-)

I agree with you, so I’ll revise to instead add a DW_AT_type attribute 
to enumeration DIEs to point to a base type.


--
Pierre-Marie de Rodat

Re: [PATCH v2][PR libgfortran/78314] Fix ieee_support_halting

2016-11-18 Thread Szabolcs Nagy

On 16/11/16 16:53, Szabolcs Nagy wrote:
> ieee_support_halting only checked the availability of status
> flags, not trapping support.  On some targets the later can
> only be checked at runtime: feenableexcept reports if
> enabling traps failed.
> 
> So check trapping support by enabling/disabling it.
> 
> Updated the test that enabled trapping to check if it is
> supported.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.
> 
> gcc/testsuite/
> 2016-11-16  Szabolcs Nagy  
> 
>   PR libgfortran/78314
>   * gfortran.dg/ieee/ieee_6.f90: Use ieee_support_halting.
> 
> libgfortran/
> 2016-11-16  Szabolcs Nagy  
> 
>   PR libgfortran/78314
>   * config/fpu-glibc.h (support_fpu_trap): Use feenableexcept.
> 

it seems this broke ieee_8.f90 which tests compile time
vs runtime value of ieee_support_halting

if fortran needs this, then support_halting should be
always false on arm and aarch64.

but i'm not familiar enough with fortran to tell if
there is some better workaround.

Re: [PATCH, Darwin] Fix PR57438 by avoiding empty function bodies and trailing labels.

2016-11-18 Thread Iain Sandoe

Hi Mike,

> On 18 Nov 2016, at 17:16, Mike Stump  wrote:
> 
> On Nov 18, 2016, at 4:33 AM, Iain Sandoe  wrote:
>> 
>> As discussed on IRC, I was under the impression that it is desired to move 
>> away from #ifdef towards if() and  I have been adding those where locally 
>> things have been touched - in this case it was only partially possible.
>> 
>> However, FAOD - you pointed out that for the RS6000 back-end #ifdef are 
>> still preferred,
> 
> Shudder.  We can encourage anyone that likes #if, to like if () instead.
> 
>> OK now for trunk?
> 
> Ok; I'm pretty sure that change can only impact darwin.  If you wanted to 
> reduce the test case to 3 cases, I think that would also show the problem 
> that show that your patch fixes it, ok with such a change, if you want.

I’d like to do that;  is there a way to force a jump table for a limited set of 
cases? 
(the example was about the smallest I could get where GCC elected to produce a 
jump table instead of branches)

Iain

> 
>> Open branches?
> 
> I'm fine with back porting.
>

Re: [PATCH, Darwin] Fix PR57438 by avoiding empty function bodies and trailing labels.

2016-11-18 Thread Mike Stump

On Nov 18, 2016, at 4:33 AM, Iain Sandoe  wrote:
> 
> As discussed on IRC, I was under the impression that it is desired to move 
> away from #ifdef towards if() and  I have been adding those where locally 
> things have been touched - in this case it was only partially possible.
> 
> However, FAOD - you pointed out that for the RS6000 back-end #ifdef are still 
> preferred,

Shudder.  We can encourage anyone that likes #if, to like if () instead.

> OK now for trunk?

Ok; I'm pretty sure that change can only impact darwin.  If you wanted to 
reduce the test case to 3 cases, I think that would also show the problem that 
show that your patch fixes it, ok with such a change, if you want.

> Open branches?

I'm fine with back porting.

Re: [PATCH] avoid calling alloca(0)


On 11/18/2016 09:29 AM, Jakub Jelinek wrote:

On Fri, Nov 18, 2016 at 09:21:38AM -0700, Martin Sebor wrote:

In the libiberty/alloca.c, I don't see anything special about alloca (0).
Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035).
alloca (0) just returns NULL after it.  The diffutils link is the same
alloca.c as in libiberty.  Returning NULL or returning a non-unique pointer
for alloca (0) is just fine, it is the same thing as returning NULL or
returning a non-unique pointer from malloc (0).  We definitely don't want
to warn for malloc (0), and IMNSHO don't want to warn for alloca (0),
because it behaves the same.


I disagree.  At a minimum, the return value of malloc(0) is
implementation-defined and so relying on it being either null
or non-null is non-portable and can be a source of subtle bugs.


Most apps know what malloc (0) means and treat it correctly, they know they
shouldn't dereference the pointer, because it is either NULL or holds an
array with 0 elements.  I fail to see why you would want to warn.


Because people make mistakes and warnings help us avoid them (isn't
that obvious?)  Just because we get it right most of the time doesn't
mean we get right all of the time.  The papers and exploits I pointed
to should provide ample evidence that zero allocations are a problem
that can have serious (and costly) consequences.  We (i.e., Red Hat)
recognize this risk and have made it our goal to help stem some of
these problems.


But malloc(0) has also been known to result from unsigned overflow
which has led to vulnerabilities and exploits (famously by the JPG
COM vulnerability exploit, but there are plenty of others, including
for instance CVE-2016-2851).  Much academic research has been devoted
to this problem and to solutions to detect and prevent it.


How is it different from overflowing to 1 or 2 or 27?  What is special on
the value 0?


It's a case that, unlike the others, can be readily detected.  It
would be nice to detect the others as well but GCC can't do that
yet.  That doesn't mean we shouldn't try to detect at least the
small subset for now.  It doesn't have to be perfect for it to be
useful.




In the absence of a policy or guidelines it's a matter of opinion
whether or not this warning belongs in either -Wall or -Wextra.


It IMHO doesn't belong to either of these.


You've made that quite clear.  I struggle to reconcile your
position in this case with the one in debate about the
-Wimplicit-fallthrough option where you insisted on the exact
opposite despite the overwhelming number of false positives
caused by it.  Why is it appropriate for that option to be in
-Wextra and not this one?

Martin

Patch ping

Hi!

I'd like to ping 2 patches:

http://gcc.gnu.org/ml/gcc-patches/2016-11/msg01074.html
- C++ ABI - mangling of TLS aux symbols; either the posted
  patch or one with if (abi_version_at_least (11))

http://gcc.gnu.org/ml/gcc-patches/2016-11/msg00351.html
- DWARF Solaris bootstrap fix

Jakub

Re: [PATCH] Fix combine's make_extraction (PR rtl-optimization/78378)

2016-11-18 Thread Michael Matz

Hi,

On Wed, 16 Nov 2016, Jakub Jelinek wrote:

> If inner is a MEM, make_extraction requires that pos is a multiple of 
> bytes and deals with offsetting it.  Or otherwise requires that pos is a 
> multiple of BITS_PER_WORD and for REG inner it handles that too.  But if 
> inner is something different, it calls just force_to_mode to the target 
> mode, which only really works if pos is 0.

This should also fix the aarch64 fail for gcc.c-torture/execute/cbrt.c .  
(At least I came up with the same patch in PR78390)

Ciao,
Michael.

Re: [PATCH, Fortran, pr78395, v1] [OOP] error on polymorphic assignment

2016-11-18 Thread Thomas Koenig


Hi Andre,



Btw, when using the in gcc-7 available
polymorphic assign, then v6 is actually auto-allocated and the program runs
fine. So what are your opinions on the auto-allocation issue?


It is my experience that such questions can speedily and correctly
be answered by the regulars on comp.lang.fortran.

I have no opinion either way :-)

Regards

Thomas

[PATCH, Fortran, pr78395, v1] [OOP] error on polymorphic assignment

2016-11-18 Thread Andre Vehreschild

Hi all,

attached patch fixes the issue which was given by nesting calls to typebound
procedures. The expression of the inner typebound procedure call was resolved
correctly, but in the case of it's having a class type the ref-list was
discarded. Leaving the list of references untouched, resolves the wrong
error-message and generates correct code.

When checking the shortened example in comment #3 one gets a segfault, because
v6 is not allocated explicitly. The initial example made sure, that v6 was
allocated. Reading through the standard, I did not find, whether the
auto-allocation is applicable here. I therefore have extended the testcase by
an allocate(v6). Dominique pointed out, that there are already some prs for
adding an on-demand -fcheck=something runtime check for not allocated objects.
But that does not solve the question, whether v6 should be auto-allocated
when assigned by a typebound-procedure (ifort and cray need v6 allocated do,
i.e., they don't auto-allocate). Btw, when using the in gcc-7 available
polymorphic assign, then v6 is actually auto-allocated and the program runs
fine. So what are your opinions on the auto-allocation issue?

This patch fixes the wrong error messages in both gcc-7 and gcc-6.
Bootstraped and regtested on x86_64-linux/F23 for gcc-7 and -6. Ok for trunk
and gcc-6?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
gcc/testsuite/ChangeLog:

2016-11-18  Andre Vehreschild  

PR fortran/78395
* gfortran.dg/typebound_operator_21.f03: New test.


gcc/fortran/ChangeLog:

2016-11-18  Andre Vehreschild  

PR fortran/78395
* resolve.c (resolve_typebound_function): Prevent stripping of refs,
when the base-expression is a class' typed one.

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 825bb12..589a673 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -6140,7 +6140,7 @@ resolve_typebound_function (gfc_expr* e)
 	  gfc_free_ref_list (class_ref->next);
 	  class_ref->next = NULL;
 	}
-  else if (e->ref && !class_ref)
+  else if (e->ref && !class_ref && expr->ts.type != BT_CLASS)
 	{
 	  gfc_free_ref_list (e->ref);
 	  e->ref = NULL;
diff --git a/gcc/testsuite/gfortran.dg/typebound_operator_21.f03 b/gcc/testsuite/gfortran.dg/typebound_operator_21.f03
new file mode 100644
index 000..ea374a1
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/typebound_operator_21.f03
@@ -0,0 +1,78 @@
+! { dg-do run }
+!
+! Test that pr78395 is fixed.
+! Contributed by Chris and Janus Weil
+
+module types_mod
+  implicit none
+
+  type, public :: t1
+integer :: a
+  contains
+procedure :: get_t2
+  end type
+
+  type, public :: t2
+integer :: b
+  contains
+procedure, pass(rhs) :: mul2
+procedure :: assign
+generic :: operator(*) => mul2
+generic :: assignment(=) => assign
+  end type
+
+contains
+
+  function get_t2(this)
+class(t1), intent(in) :: this
+class(t2), allocatable :: get_t2
+type(t2), allocatable :: local
+allocate(local)
+local%b = this%a
+call move_alloc(local, get_t2)
+  end function
+
+  function mul2(lhs, rhs)
+class(t2), intent(in) :: rhs
+integer, intent(in) :: lhs
+class(t2), allocatable :: mul2
+type(t2), allocatable :: local
+allocate(local)
+local%b = rhs%b*lhs
+call move_alloc(local, mul2)
+  end function
+
+  subroutine assign(this, rhs)
+class(t2), intent(out) :: this
+class(t2), intent(in)  :: rhs
+select type(rhs)
+type is(t2)
+  this%b = rhs%b
+class default
+  error stop
+end select
+  end subroutine
+
+end module
+
+
+program minimal
+  use types_mod
+  implicit none
+
+  class(t1), allocatable :: v4
+  class(t2), allocatable :: v6
+
+  allocate(v4, source=t1(4))
+  allocate(v6)
+  v6 = 3 * v4%get_t2() 
+
+  select type (v6)
+type is (t2)
+  if (v6%b /= 12) error stop
+class default
+  error stop
+  end select
+  deallocate(v4, v6)
+end
+

Re: [PATCH PR78114]Refine gfortran.dg/vect/fast-math-mgrid-resid.f

2016-11-18 Thread Bin.Cheng

On Fri, Nov 18, 2016 at 4:52 PM, Michael Matz  wrote:
> Hi,
>
> On Thu, 17 Nov 2016, Bin.Cheng wrote:
>
>> B) Depending on ilp, I think below test strings fail for long time with 
>> haswell:
>> ! { dg-final { scan-tree-dump-times "Executing predictive commoning
>> without unrolling" 1 "pcom" { target lp64 } } }
>> ! { dg-final { scan-tree-dump-times "Executing predictive commoning
>> without unrolling" 2 "pcom" { target ia32 } } }
>> Because vectorizer choose vf==4 in this case, and there is no
>> predictive commoning opportunities at all.
>> Also the newly added test string fails in this case too because the
>> prolog peeled iterates more than 1 times.
>
> Btw, this probably means that on haswell (or other archs with vf==4) mgrid
> is slower than necessary.  On mgrid you really really want predictive
> commoning to happen.  Vectorization isn't that interesting there.
Interesting, I will check if there is difference between 2/4 vf.  we
do have cases that smaller vf is better and should be chosen, though
for different reasons.

Thanks,
bin
>
>
> Ciao,
> Michael.

Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker

2016-11-18 Thread Mike Stump

On Nov 18, 2016, at 3:13 AM, Iain Sandoe  wrote:
> 
> Thanks, at least I’m not going completely crazy ;-)

I'll just note for completeness that Jeff also couldn't explain a failure of 
your latest patch.  If you run into one, let me know.

> OK now for trunk?

Ok.

> open branches?

Ok.

[PATCH GCC]Simplify (cond (cmp (convert? x) c1) (op x c2) c3) -> (op (minmax x c1) c2)

2016-11-18 Thread Bin Cheng

Hi,
This is a rework of https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02007.html.  
Though review comments suggested it could be merged with last kind 
simplification
of fold_cond_expr_with_comparison, it's not really applicable.  As a matter of 
fact, 
the suggestion stands for patch 
@https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02005.html.
I had previous patch (https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01898.html)
moving fold_cond_expr_with_comparison to match.pd pattern and incorporated
that patch into it.  For this one, the rework is trivial, just renames several 
variable
tags as suggested.  Bootstrap and test on x86_64 and AArch64, is it OK?

Thanks,
bin

2016-11-17  Bin Cheng  

* match.pd: Add new pattern:
(cond (cmp (convert? x) c1) (op x c2) c3) -> (op (minmax x c1) c2).

gcc/testsuite/ChangeLog
2016-11-17  Bin Cheng  

* gcc.dg/fold-bopcond-1.c: New test.
* gcc.dg/fold-bopcond-2.c: New test.diff --git a/gcc/match.pd b/gcc/match.pd
index a8d94de..e47f77a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2012,6 +2012,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (code == EQ_EXPR)
   (cond (cmp @1 (convert:from_type @3)) (convert:from_type @3) @2)))
 
+/* (cond (cmp (convert? x) c1) (op x c2) c3) -> (op (minmax x c1) c2) if:
+
+ 1) OP is PLUS or MINUS.
+ 2) CMP is LT, LE, GT or GE.
+ 3) C3 == (C1 op C2), and computation doesn't have undefined behavior.
+
+   This pattern also handles special cases like:
+
+ A) Operand x is a unsigned to signed type conversion and c1 is
+   integer zero.  In this case,
+ (signed type)x  < 0  <=>  x  > MAX_VAL(signed type)
+ (signed type)x >= 0  <=>  x <= MAX_VAL(signed type)
+ B) Const c1 may not equal to (C3 op' C2).  In this case we also
+   check equality for (c1+1) and (c1-1) by adjusting comparison
+   code.
+
+   TODO: Though signed type is handled by this pattern, it cannot be
+   simplified at the moment because C standard requires additional
+   type promotion.  In order to match it here, the IR needs
+   to be cleaned up by other optimizers, i.e, VRP.  */
+(for op (plus minus)
+ (for cmp (lt le gt ge)
+  (simplify
+   (cond (cmp (convert? @X) INTEGER_CST@1) (op @X INTEGER_CST@2) INTEGER_CST@3)
+   (with { tree from_type = TREE_TYPE (@X), to_type = TREE_TYPE (@1); }
+(if (types_match (from_type, to_type)
+/* Check if it is special case A).  */
+|| (TYPE_UNSIGNED (from_type)
+&& !TYPE_UNSIGNED (to_type)
+&& TYPE_PRECISION (from_type) == TYPE_PRECISION (to_type)
+&& integer_zerop (@1)
+&& (cmp == LT_EXPR || cmp == GE_EXPR)))
+ (with
+  {
+   bool overflow = false;
+   enum tree_code code, cmp_code = cmp;
+   tree real_c1, c1 = @1, c2 = @2, c3 = @3;
+   tree op_type = TREE_TYPE (@X);
+   signop sgn = TYPE_SIGN (op_type);
+   widest_int wmin = widest_int::from (wi::min_value (op_type), sgn);
+   widest_int wmax = widest_int::from (wi::max_value (op_type), sgn);
+
+   /* Handle special case A), given x of unsigned type:
+   ((signed type)x  < 0) <=> (x  > MAX_VAL(signed type))
+   ((signed type)x >= 0) <=> (x <= MAX_VAL(signed type))  */
+   if (!types_match (from_type, to_type))
+ {
+   if (cmp_code == LT_EXPR)
+ cmp_code = GT_EXPR;
+   if (cmp_code == GE_EXPR)
+ cmp_code = LE_EXPR;
+   c1 = wide_int_to_tree (op_type, wi::max_value (to_type));
+ }
+   /* To simplify this pattern, we require c3 = (c1 op c2).  Here we
+  compute (c3 op' c2) and check if it equals to c1 with op' being
+  the inverted operator of op.  Make sure overflow doesn't happen
+  if it is undefined.  */
+   if (op == PLUS_EXPR)
+ real_c1 = wide_int_to_tree (op_type,
+ wi::sub (c3, c2, sgn, ));
+   else
+ real_c1 = wide_int_to_tree (op_type,
+ wi::add (c3, c2, sgn, ));
+   code = cmp_code;
+   if (!overflow || !TYPE_OVERFLOW_UNDEFINED (op_type))
+ {
+   /* Check if c1 equals to real_c1.  Boundary condition is handled
+  by adjusting comparison operation if necessary.  */
+   if (wi::to_widest (c1) == (wi::to_widest (real_c1) - 1))
+ {
+   /* X <= Y - 1 equals to X < Y.  */
+   if (cmp_code == LE_EXPR)
+ code = LT_EXPR;
+   /* X > Y - 1 equals to X >= Y.  */
+   if (cmp_code == GT_EXPR)
+ code = GE_EXPR;
+ }
+   if (wi::to_widest (c1) == (wi::to_widest (real_c1) + 1))
+ {
+   /* X < Y + 1 equals to X <= Y.  */
+   if (cmp_code == LT_EXPR)
+ code = LE_EXPR;
+   /* X >= Y + 1 equals to X > Y.  */
+   if (cmp_code ==

[PATCH GCC]Move simplification of (A == C1) ? A : C2 to match.pd

2016-11-18 Thread Bin Cheng

Hi,
This is a follow up patch for 
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01898.html
It moves remaining simplification for (A == C1) ? A : C2 in 
fold_cond_expr_with_comparison
to match.pd.  Bootstrap and test on x86_64 and AArch64, is it OK?

Thanks,
bin

2016-11-17  Bin Cheng  

* fold-const.c (fold_cond_expr_with_comparison): Move simplification
for A == C1 ? A : C2 to below.
* match.pd: Move from above to here:
(cond (eq (convert1? x) c1) (convert2? x) c2)
  -> (cond (eq x c1) c1 c2).diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 1e61ccf..1877dac 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -5210,19 +5210,6 @@ fold_cond_expr_with_comparison (location_t loc, tree 
type,
}
 }
 
-  /* If this is A == C1 ? A : C2 with C1 and C2 constant integers,
- we simplify it into A == C1 ? C1 : C2.  */
-
-  if (comp_code == EQ_EXPR
-  && INTEGRAL_TYPE_P (type)
-  && TREE_CODE (arg01) == INTEGER_CST
-  && TREE_CODE (arg1) != INTEGER_CST
-  && TREE_CODE (arg2) == INTEGER_CST)
-{
-  arg1 = fold_convert_loc (loc, type, arg01);
-  return fold_build3_loc (loc, COND_EXPR, type, arg0, arg1, arg2);
-}
-
   return NULL_TREE;
 }
 
diff --git a/gcc/match.pd b/gcc/match.pd
index 4beac4e..a8d94de 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1939,15 +1939,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* Simplification moved from fold_cond_expr_with_comparison.  It may also
be extended.  */
-/* (cond (cmp (convert1? x) c1) (convert2? x) c2) -> (minmax (x c)) if:
+/* This pattern implements two kinds simplification:
+
+   Case 1)
+   (cond (cmp (convert1? x) c1) (convert2? x) c2) -> (minmax (x c)) if:
  1) Conversions are type widening from smaller type.
  2) Const c1 equals to c2 after canonicalizing comparison.
  3) Comparison has tree code LT, LE, GT or GE.
This specific pattern is needed when (cmp (convert x) c) may not
be simplified by comparison patterns because of multiple uses of
x.  It also makes sense here because simplifying across multiple
-   referred var is always benefitial for complicated cases.  */
-(for cmp (lt le gt ge)
+   referred var is always benefitial for complicated cases.
+
+   Case 2)
+   (cond (eq (convert1? x) c1) (convert2? x) c2) -> (cond (eq x c1) c1 c2).  */
+(for cmp (lt le gt ge eq)
  (simplify
   (cond (cmp@0 (convert1? @1) INTEGER_CST@3) (convert2? @1) INTEGER_CST@2)
   (with
@@ -1966,37 +1972,45 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 && (TYPE_UNSIGNED (from_type)
 || TYPE_SIGN (c2_type) == TYPE_SIGN (from_type)
{
-if (wi::to_widest (@3) == (wi::to_widest (@2) - 1))
-  {
-/* X <= Y - 1 equals to X < Y.  */
-if (cmp_code == LE_EXPR)
-  code = LT_EXPR;
-/* X > Y - 1 equals to X >= Y.  */
-if (cmp_code == GT_EXPR)
-  code = GE_EXPR;
-  }
-if (wi::to_widest (@3) == (wi::to_widest (@2) + 1))
-  {
-/* X < Y + 1 equals to X <= Y.  */
-if (cmp_code == LT_EXPR)
-  code = LE_EXPR;
-/* X >= Y + 1 equals to X > Y.  */
-if (cmp_code == GE_EXPR)
-  code = GT_EXPR;
-  }
-if (code != cmp_code || wi::to_widest (@2) == wi::to_widest (@3))
+if (code != EQ_EXPR)
   {
-if (cmp_code == LT_EXPR || cmp_code == LE_EXPR)
-  code = MIN_EXPR;
-if (cmp_code == GT_EXPR || cmp_code == GE_EXPR)
-  code = MAX_EXPR;
+if (wi::to_widest (@3) == (wi::to_widest (@2) - 1))
+  {
+/* X <= Y - 1 equals to X < Y.  */
+if (cmp_code == LE_EXPR)
+  code = LT_EXPR;
+/* X > Y - 1 equals to X >= Y.  */
+if (cmp_code == GT_EXPR)
+  code = GE_EXPR;
+  }
+if (wi::to_widest (@3) == (wi::to_widest (@2) + 1))
+  {
+/* X < Y + 1 equals to X <= Y.  */
+if (cmp_code == LT_EXPR)
+  code = LE_EXPR;
+/* X >= Y + 1 equals to X > Y.  */
+if (cmp_code == GE_EXPR)
+  code = GT_EXPR;
+  }
+if (code != cmp_code || wi::to_widest (@2) == wi::to_widest (@3))
+  {
+if (cmp_code == LT_EXPR || cmp_code == LE_EXPR)
+  code = MIN_EXPR;
+if (cmp_code == GT_EXPR || cmp_code == GE_EXPR)
+  code = MAX_EXPR;
+  }
   }
+/* Can do A == C1 ? A : C2  ->  A == C1 ? C1 : C2?  */
+else if (!int_fits_type_p (@3, from_type))
+  code = ERROR_MARK;
}
}
(if (code == MAX_EXPR)
 (convert (max @1 (convert:from_type @2)))
 (if (code == MIN_EXPR)
- (convert (min @1 (convert:from_type @2
+

Re: [PATCH PR78114]Refine gfortran.dg/vect/fast-math-mgrid-resid.f

2016-11-18 Thread Michael Matz

Hi,

On Thu, 17 Nov 2016, Bin.Cheng wrote:

> B) Depending on ilp, I think below test strings fail for long time with 
> haswell:
> ! { dg-final { scan-tree-dump-times "Executing predictive commoning
> without unrolling" 1 "pcom" { target lp64 } } }
> ! { dg-final { scan-tree-dump-times "Executing predictive commoning
> without unrolling" 2 "pcom" { target ia32 } } }
> Because vectorizer choose vf==4 in this case, and there is no
> predictive commoning opportunities at all.
> Also the newly added test string fails in this case too because the
> prolog peeled iterates more than 1 times.

Btw, this probably means that on haswell (or other archs with vf==4) mgrid 
is slower than necessary.  On mgrid you really really want predictive 
commoning to happen.  Vectorization isn't that interesting there.

Ciao,
Michael.

Re: [PATCH] substring_loc info needs default track-macro-expansion (PR preprocessor/78324)


Martin: are the changes to your test cases OK by you, or is there
a better way to rewrite them?


Thanks for looking into it!

Since the purpose of the test_sprintf_note function in the test is
to verify the location of the caret within the warnings I think we
should keep it if it's possible.  Would either removing the P macro
or moving the function to a different file that doesn't use the
-ftrack-macro-expansion=0 option work?

Martin


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c
index 5779a95..b6a6011 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c
@@ -181,13 +181,13 @@ void test_sprintf_note (void)
   /* Diagnostic column numbers are 1-based.  */

   P (buffer (0),/* { dg-message "format output 4 bytes into a 
destination of size 0" } */
- "%c%s%i", '1', "2", 3);/* { dg-warning "7:.%c. directive writing 1 byte 
into a region of size 0" } */
+ "%c%s%i", '1', "2", 3);/* { dg-warning ".%c. directive writing 1 byte into 
a region of size 0" } */

   P (buffer (1),/* { dg-message "format output 6 bytes into a 
destination of size 1" } */
- "%c%s%i", '1', "23", 45);  /* { dg-warning "9:.%s. directive writing 2 bytes 
into a region of size 0" } */
+ "%c%s%i", '1', "23", 45);  /* { dg-warning ".%s. directive writing 2 bytes 
into a region of size 0" } */

   P (buffer (2),/* { dg-message "format output 6 bytes into a 
destination of size 2" } */
- "%c%s%i", '1', "2", 345);  /* { dg-warning "11:.%i. directive writing 3 bytes 
into a region of size 0" } */
+ "%c%s%i", '1', "2", 345);  /* { dg-warning ".%i. directive writing 3 bytes 
into a region of size 0" } */

   /* It would be nice if the caret in the location range for the format
  string below could be made to point at the closing quote of the format
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c
index faa5806..b587d00 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Wformat -Wformat-length=1 -fdiagnostics-show-caret 
-ftrack-macro-expansion=0" } */
+/* { dg-options "-Wformat -Wformat-length=1 -fdiagnostics-show-caret" } */

 extern int sprintf (char*, const char*, ...);

Re: [PATCH] avoid calling alloca(0)

2016-11-18 Thread Eric Gallager

On 11/18/16, Jakub Jelinek  wrote:
> On Fri, Nov 18, 2016 at 09:21:38AM -0700, Martin Sebor wrote:
>> >In the libiberty/alloca.c, I don't see anything special about alloca
>> > (0).
>> >Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca
>> > (4035).
>> >alloca (0) just returns NULL after it.  The diffutils link is the same
>> >alloca.c as in libiberty.  Returning NULL or returning a non-unique
>> > pointer
>> >for alloca (0) is just fine, it is the same thing as returning NULL or
>> >returning a non-unique pointer from malloc (0).  We definitely don't
>> > want
>> >to warn for malloc (0), and IMNSHO don't want to warn for alloca (0),
>> >because it behaves the same.
>>
>> I disagree.  At a minimum, the return value of malloc(0) is
>> implementation-defined and so relying on it being either null
>> or non-null is non-portable and can be a source of subtle bugs.
>
> Most apps know what malloc (0) means and treat it correctly, they know they
> shouldn't dereference the pointer, because it is either NULL or holds an
> array with 0 elements.  I fail to see why you would want to warn.
>

Just as a reference point, my old version of the clang static analyzer
warns for malloc(0); but not alloca(0); though. For example:

$ cat alloc_0.c
#include 
#include 

void fn(void)
{
char *ptr0 = (char *)malloc(0);
void *ptr1 = alloca(0);
*ptr0 = *(char *)ptr1;
}

$ clang -Wall -Wextra -pedantic --analyze -c alloc_0.c
alloc_0.c:6:23: warning: Call to 'malloc' has an allocation size of 0 bytes
char *ptr0 = (char *)malloc(0);
 ^  ~
1 warning generated.

Doing some more Googling on the topic finds debates as to whether this
warning is warranted or not, but it seems like people are pretty used
to dealing with it from clang already, so they probably wouldn't mind
too much if gcc started being consistent with it.

Re: [PATCH, GCC/ARM, ping] Optional -mthumb for Thumb only targets

2016-11-18 Thread Thomas Preudhomme


On 11/11/16 14:35, Kyrill Tkachov wrote:


On 08/11/16 13:36, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 25/10/16 18:07, Thomas Preudhomme wrote:

Hi,

Currently when a user compiles for a thumb-only target (such as Cortex-M
processors) without specifying the -mthumb option GCC throws the error "target
CPU does not support ARM mode". This is suboptimal from a usability point of
view: the -mthumb could be deduced from the -march or -mcpu option when there is
no ambiguity.

This patch implements this behavior by extending the DRIVER_SELF_SPECS to
automatically append -mthumb to the command line for thumb-only targets. It does
so by checking the last -march option if any is given or the last -mcpu option
otherwise. There is no ordering issue because conflicting -mcpu and -march is
already handled.

Note that the logic cannot be implemented in function arm_option_override
because we need to provide the modified command line to the GCC driver for
finding the right multilib path and the function arm_option_override is executed
too late for that effect.

ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2016-10-18  Terry Guo  
Thomas Preud'homme 

PR target/64802
* common/config/arm/arm-common.c (arm_target_thumb_only): New function.
* config/arm/arm-opts.h: Include arm-flags.h.
(struct arm_arch_core_flag): Define.
(arm_arch_core_flags): Define.
* config/arm/arm-protos.h: Include arm-flags.h.
(FL_NONE, FL_ANY, FL_CO_PROC, FL_ARCH3M, FL_MODE26, FL_MODE32,
FL_ARCH4, FL_ARCH5, FL_THUMB, FL_LDSCHED, FL_STRONG, FL_ARCH5E,
FL_XSCALE, FL_ARCH6, FL_VFPV2, FL_WBUF, FL_ARCH6K, FL_THUMB2, FL_NOTM,
FL_THUMB_DIV, FL_VFPV3, FL_NEON, FL_ARCH7EM, FL_ARCH7, FL_ARM_DIV,
FL_ARCH8, FL_CRC32, FL_SMALLMUL, FL_NO_VOLATILE_CE, FL_IWMMXT,
FL_IWMMXT2, FL_ARCH6KZ, FL2_ARCH8_1, FL2_ARCH8_2, FL2_FP16INST,
FL_TUNE, FL_FOR_ARCH2, FL_FOR_ARCH3, FL_FOR_ARCH3M, FL_FOR_ARCH4,
FL_FOR_ARCH4T, FL_FOR_ARCH5, FL_FOR_ARCH5T, FL_FOR_ARCH5E,
FL_FOR_ARCH5TE, FL_FOR_ARCH5TEJ, FL_FOR_ARCH6, FL_FOR_ARCH6J,
FL_FOR_ARCH6K, FL_FOR_ARCH6Z, FL_FOR_ARCH6ZK, FL_FOR_ARCH6KZ,
FL_FOR_ARCH6T2, FL_FOR_ARCH6M, FL_FOR_ARCH7, FL_FOR_ARCH7A,
FL_FOR_ARCH7VE, FL_FOR_ARCH7R, FL_FOR_ARCH7M, FL_FOR_ARCH7EM,
FL_FOR_ARCH8A, FL2_FOR_ARCH8_1A, FL2_FOR_ARCH8_2A, FL_FOR_ARCH8M_BASE,
FL_FOR_ARCH8M_MAIN, arm_feature_set, ARM_FSET_MAKE,
ARM_FSET_MAKE_CPU1, ARM_FSET_MAKE_CPU2, ARM_FSET_CPU1, ARM_FSET_CPU2,
ARM_FSET_EMPTY, ARM_FSET_ANY, ARM_FSET_HAS_CPU1, ARM_FSET_HAS_CPU2,
ARM_FSET_HAS_CPU, ARM_FSET_ADD_CPU1, ARM_FSET_ADD_CPU2,
ARM_FSET_DEL_CPU1, ARM_FSET_DEL_CPU2, ARM_FSET_UNION, ARM_FSET_INTER,
ARM_FSET_XOR, ARM_FSET_EXCLUDE, ARM_FSET_IS_EMPTY,
ARM_FSET_CPU_SUBSET): Move to ...
* config/arm/arm-flags.h: This new file.
* config/arm/arm.h (TARGET_MODE_SPEC_FUNCTIONS): Define.
(EXTRA_SPEC_FUNCTIONS): Add TARGET_MODE_SPEC_FUNCTIONS to its value.
(TARGET_MODE_SPECS): Define.
(DRIVER_SELF_SPECS): Add TARGET_MODE_SPECS to its value.


*** gcc/testsuite/ChangeLog ***

2016-10-11  Thomas Preud'homme 

PR target/64802
* gcc.target/arm/optional_thumb-1.c: New test.
* gcc.target/arm/optional_thumb-2.c: New test.
* gcc.target/arm/optional_thumb-3.c: New test.


No regression when running the testsuite for -mcpu=cortex-m0 -mthumb,
-mcpu=cortex-m0 -marm and -mcpu=cortex-a8 -marm

Is this ok for trunk?



This looks like a useful usability improvement.
This is ok after a bootstrap on an arm-none-linux-gnueabihf target.

Sorry for the delay,
Kyrill


I've rebased the patch on top of the arm_feature_set type consistency fix [1] 
and committed it. The committed patch is in attachment for reference.


[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01680.html

Best regards,

Thomas
diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c
index f3b674339a50460d55920ca8d26275a550bbbc1e..473417a2e5f04488197c27ead2b65680bddec274 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -98,6 +98,29 @@ arm_rewrite_mcpu (int argc, const char **argv)
   return arm_rewrite_selected_cpu (argv[argc - 1]);
 }
 
+/* Called by the driver to check whether the target denoted by current
+   command line options is a Thumb-only target.  ARGV is an array of
+   -march and -mcpu values (ie. it contains the rhs after the equal
+   sign) and we use the last one of them to make a decision.  The
+   number of elements in ARGV is given in ARGC.  */
+const char *
+arm_target_thumb_only (int argc, const char **argv)
+{
+  unsigned int opt;
+
+  if (argc)
+{
+  for (opt = 0; opt < (ARRAY_SIZE (arm_arch_core_flags)); opt++)
+	if ((strcmp (argv[argc - 1],

Re: [PATCH] Fix PR78413

2016-11-18 Thread Bill Schmidt


> On Nov 18, 2016, at 10:33 AM, Markus Trippelsdorf  
> wrote:
> 
> On 2016.11.18 at 10:27 -0600, Bill Schmidt wrote:
>> ===
>> --- gcc/testsuite/gcc.dg/tree-ssa/pr78413.c  (revision 0)
>> +++ gcc/testsuite/gcc.dg/tree-ssa/pr78413.c  (working copy)
>> @@ -0,0 +1,35 @@
>> +/* PR78413.  These previously failed in tree if-conversion due to a loop
>> +   latch with multiple predecessors that the code did not anticipate.  */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3 -ffast-math" } */
> 
> Please add -fno-strict-aliasing, otherwise fn1() wouldn't ICE.

Whoops.  Yes indeed, sorry I missed the flag difference for the second failure.

Bill

[COMMITTED] Add myself to MAINTAINERS (Write After Approval).

2016-11-18 Thread Toma Tabacu

Committed as r242595.

Thanks,
Toma Tabacu

Index: ChangeLog
===
--- ChangeLog   (revision 242594)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2016-11-18  Toma Tabacu  
+
+   * MAINTAINERS (Write After Approval): Add myself.
+
 2016-11-15  Matthias Klose  
 
* Makefile.def: Remove references to GCJ.
Index: MAINTAINERS
===
--- MAINTAINERS (revision 242594)
+++ MAINTAINERS (working copy)
@@ -587,6 +587,7 @@
 Robert Suchanek

 Andrew Sutton  
 Gabriele Svelto
+Toma Tabacu
 Sriraman Tallam
 Chung-Lin Tang 
 Samuel Tardieu

Re: [PATCH] Fix PR78413

2016-11-18 Thread Markus Trippelsdorf

On 2016.11.18 at 10:27 -0600, Bill Schmidt wrote:
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/pr78413.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr78413.c   (working copy)
> @@ -0,0 +1,35 @@
> +/* PR78413.  These previously failed in tree if-conversion due to a loop
> +   latch with multiple predecessors that the code did not anticipate.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -ffast-math" } */

Please add -fno-strict-aliasing, otherwise fn1() wouldn't ICE.

> +extern long long int llrint(double x);
> +int a;
> +double b;
> +__attribute__((cold)) void decode_init() {
> +  int c, d = 0;
> +  for (; d < 12; d++) {
> +if (d)
> +  b = 0;
> +c = 0;
> +for (; c < 6; c++)
> +  a = b ? llrint(b) : 0;
> +  }
> +}
> +
> +struct S {
> +  _Bool bo;
> +};
> +int a, bb, c, d;
> +void fn1() {
> +  do
> +do
> +  do {
> + struct S *e = (struct S *)1;
> + do
> +   bb = a / (e->bo ? 2 : 1);
> + while (bb);
> +  } while (0);
> +while (d);
> +  while (c);
> +}

-- 
Markus

Re: [PATCH] avoid calling alloca(0)

On Fri, Nov 18, 2016 at 09:21:38AM -0700, Martin Sebor wrote:
> >In the libiberty/alloca.c, I don't see anything special about alloca (0).
> >Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035).
> >alloca (0) just returns NULL after it.  The diffutils link is the same
> >alloca.c as in libiberty.  Returning NULL or returning a non-unique pointer
> >for alloca (0) is just fine, it is the same thing as returning NULL or
> >returning a non-unique pointer from malloc (0).  We definitely don't want
> >to warn for malloc (0), and IMNSHO don't want to warn for alloca (0),
> >because it behaves the same.
> 
> I disagree.  At a minimum, the return value of malloc(0) is
> implementation-defined and so relying on it being either null
> or non-null is non-portable and can be a source of subtle bugs.

Most apps know what malloc (0) means and treat it correctly, they know they
shouldn't dereference the pointer, because it is either NULL or holds an
array with 0 elements.  I fail to see why you would want to warn.

> But malloc(0) has also been known to result from unsigned overflow
> which has led to vulnerabilities and exploits (famously by the JPG
> COM vulnerability exploit, but there are plenty of others, including
> for instance CVE-2016-2851).  Much academic research has been devoted
> to this problem and to solutions to detect and prevent it.

How is it different from overflowing to 1 or 2 or 27?  What is special on
the value 0?

> In the absence of a policy or guidelines it's a matter of opinion
> whether or not this warning belongs in either -Wall or -Wextra.

It IMHO doesn't belong to either of these.

Jakub

[PATCH] Fix PR78413

2016-11-18 Thread Bill Schmidt

Hi,

The if-conversion patch for PR77848 missed a case where an outer loop
should not be versioned for vectorization; this was caught by an assert
in tests recorded in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78413.
This patch fixes the problem by ensuring that both inner and outer loop
latches have a single predecessor before versioning an outer loop.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Is this ok for trunk?

Thanks,
Bill


[gcc]

2016-11-18  Bill Schmidt  

PR tree-optimization/78413
* tree-if-conv.c (versionable_outer_loop_p): Require that both
inner and outer loop latches have single predecessors.

[gcc/testsuite]

2016-11-18  Bill Schmidt  

PR tree-optimization/78413
* gcc.dg/tree-ssa/pr78413.c: New test.


Index: gcc/testsuite/gcc.dg/tree-ssa/pr78413.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (working copy)
@@ -0,0 +1,35 @@
+/* PR78413.  These previously failed in tree if-conversion due to a loop
+   latch with multiple predecessors that the code did not anticipate.  */
+/* { dg-do compile } */
+/* { dg-options "-O3 -ffast-math" } */
+
+extern long long int llrint(double x);
+int a;
+double b;
+__attribute__((cold)) void decode_init() {
+  int c, d = 0;
+  for (; d < 12; d++) {
+if (d)
+  b = 0;
+c = 0;
+for (; c < 6; c++)
+  a = b ? llrint(b) : 0;
+  }
+}
+
+struct S {
+  _Bool bo;
+};
+int a, bb, c, d;
+void fn1() {
+  do
+do
+  do {
+   struct S *e = (struct S *)1;
+   do
+ bb = a / (e->bo ? 2 : 1);
+   while (bb);
+  } while (0);
+while (d);
+  while (c);
+}
Index: gcc/tree-if-conv.c
===
--- gcc/tree-if-conv.c  (revision 242551)
+++ gcc/tree-if-conv.c  (working copy)
@@ -2575,6 +2575,8 @@ version_loop_for_if_conversion (struct loop *loop)
 - The loop has a single exit.
 - The loop header has a single successor, which is the inner
   loop header.
+- Each of the inner and outer loop latches have a single
+  predecessor.
 - The loop exit block has a single predecessor, which is the
   inner loop's exit block.  */
 
@@ -2586,7 +2588,9 @@ versionable_outer_loop_p (struct loop *loop)
   || loop->inner->next
   || !single_exit (loop)
   || !single_succ_p (loop->header)
-  || single_succ (loop->header) != loop->inner->header)
+  || single_succ (loop->header) != loop->inner->header
+  || !single_pred_p (loop->latch)
+  || !single_pred_p (loop->inner->latch))
 return false;
   
   basic_block outer_exit = single_pred (loop->latch);

Re: [PATCH] avoid calling alloca(0)


On 11/18/2016 12:58 AM, Jakub Jelinek wrote:

On Thu, Nov 17, 2016 at 05:32:31PM -0700, Martin Sebor wrote:

The point is warning on an alloca (0) may not be as clear cut as it
might seem.  It's probably a reasonable thing to do on the host, but on
a target, which might be embedded and explicitly overriding the builtin
alloca, warning on alloca (0) is less of a slam dunk.


I confess I haven't heard of such an implementation before but after
some searching I have managed to find a few examples of it online,
such as in QNX or in the BlackBerry SDK, or in the GCC shipped by
Texas Instruments.  For instance:


In the libiberty/alloca.c, I don't see anything special about alloca (0).
Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035).
alloca (0) just returns NULL after it.  The diffutils link is the same
alloca.c as in libiberty.  Returning NULL or returning a non-unique pointer
for alloca (0) is just fine, it is the same thing as returning NULL or
returning a non-unique pointer from malloc (0).  We definitely don't want
to warn for malloc (0), and IMNSHO don't want to warn for alloca (0),
because it behaves the same.


I disagree.  At a minimum, the return value of malloc(0) is
implementation-defined and so relying on it being either null
or non-null is non-portable and can be a source of subtle bugs.
But malloc(0) has also been known to result from unsigned overflow
which has led to vulnerabilities and exploits (famously by the JPG
COM vulnerability exploit, but there are plenty of others, including
for instance CVE-2016-2851).  Much academic research has been devoted
to this problem and to solutions to detect and prevent it.
The following paper has some good background and references:

https://www.usenix.org/legacy/event/woot10/tech/full_papers/Vanegue.pdf

Coding standards rules have been developed requiring conforming
software to avoid allocating zero bytes for these reasons.  TS
1796, the C Secure Coding Rules technical specification, has such
a requirement.  It was derived from the CERT C Secure Coding rule
MEM04-C. Beware of zero-length allocations:

  https://www.securecoding.cert.org/confluence/x/GQI

The same argument applies to alloca(0) with the added caveat that,
unlike with the other allocation functions, a non-null return value
need not be distinct.

In the absence of a policy or guidelines it's a matter of opinion
whether or not this warning belongs in either -Wall or -Wextra.
Based on the severity of the problems that allocating zero size
has been linked to I think it could be successfully argued that
it should be in -Wall (the problems are obviously more serious
than those that have ever been associated with the -Wunused-type
warnings, for example).  I put it in -Wextra only because I was
trying to be sensitive to the "no false positive" argument.

All this said, before debating the fine details of under which
conditions each of the new warninsg should be enabled, I was
hoping to get comments on the meat of the patch that implements
the warnings.

Martin

libgo patch committed: Don't call __go_alloc/__go_free from environment routines

2016-11-18 Thread Ian Lance Taylor

This patch to libgo changes the environment support routines to not
call __go_alloc/__go_free, but to instead use malloc/free.  This code
just needs temporary buffers, it is natural to write it in C because
it calls setenv/unsetenv, and C code can safely call malloc/free but
with the future GC can not safely allocate memory on the Go heap.
Patch bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 242592)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-2ab785788691ad289f838a0b3a6bc9013d0fc337
+fc4ca600b2fc6de81fd3c4014542d6a50593db1a
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/go-setenv.c
===
--- libgo/runtime/go-setenv.c   (revision 242581)
+++ libgo/runtime/go-setenv.c   (working copy)
@@ -9,10 +9,7 @@
 #include 
 #include 
 
-#include "go-alloc.h"
 #include "runtime.h"
-#include "arch.h"
-#include "malloc.h"
 
 /* Set the C environment from Go.  This is called by syscall.Setenv.  */
 
@@ -25,7 +22,6 @@ setenv_c (String k, String v)
   unsigned char *kn;
   const byte *vs;
   unsigned char *vn;
-  intgo len;
 
   ks = k.str;
   if (ks == NULL)
@@ -39,25 +35,23 @@ setenv_c (String k, String v)
 
 #ifdef HAVE_SETENV
 
-  if (ks != NULL && ks[k.len] != 0)
+  if (ks[k.len] != 0)
 {
-  // Objects that are explicitly freed must be at least 16 bytes in size,
-  // so that they are not allocated using tiny alloc.
-  len = k.len + 1;
-  if (len < TinySize)
-   len = TinySize;
-  kn = __go_alloc (len);
+  kn = malloc (k.len + 1);
+  if (kn == NULL)
+   runtime_throw ("out of malloc memory");
   __builtin_memcpy (kn, ks, k.len);
+  kn[k.len] = '\0';
   ks = kn;
 }
 
-  if (vs != NULL && vs[v.len] != 0)
+  if (vs[v.len] != 0)
 {
-  len = v.len + 1;
-  if (len < TinySize)
-   len = TinySize;
-  vn = __go_alloc (len);
+  vn = malloc (v.len + 1);
+  if (vn == NULL)
+   runtime_throw ("out of malloc memory");
   __builtin_memcpy (vn, vs, v.len);
+  vn[v.len] = '\0';
   vs = vn;
 }
 
@@ -66,19 +60,20 @@ setenv_c (String k, String v)
 #else /* !defined(HAVE_SETENV) */
 
   len = k.len + v.len + 2;
-  if (len < TinySize)
-len = TinySize;
-  kn = __go_alloc (len);
+  kn = malloc (len);
+  if (kn == NULL)
+runtime_throw ("out of malloc memory");
   __builtin_memcpy (kn, ks, k.len);
   kn[k.len] = '=';
   __builtin_memcpy (kn + k.len + 1, vs, v.len);
   kn[k.len + v.len + 1] = '\0';
   putenv ((char *) kn);
+  kn = NULL; /* putenv takes ownership of the string.  */
 
 #endif /* !defined(HAVE_SETENV) */
 
   if (kn != NULL)
-__go_free (kn);
+free (kn);
   if (vn != NULL)
-__go_free (vn);
+free (vn);
 }
Index: libgo/runtime/go-unsetenv.c
===
--- libgo/runtime/go-unsetenv.c (revision 242581)
+++ libgo/runtime/go-unsetenv.c (working copy)
@@ -9,10 +9,7 @@
 #include 
 #include 
 
-#include "go-alloc.h"
 #include "runtime.h"
-#include "arch.h"
-#include "malloc.h"
 
 /* Unset an environment variable from Go.  This is called by
syscall.Unsetenv.  */
@@ -24,7 +21,6 @@ unsetenv_c (String k)
 {
   const byte *ks;
   unsigned char *kn;
-  intgo len;
 
   ks = k.str;
   if (ks == NULL)
@@ -33,14 +29,11 @@ unsetenv_c (String k)
 
 #ifdef HAVE_UNSETENV
 
-  if (ks != NULL && ks[k.len] != 0)
+  if (ks[k.len] != 0)
 {
-  // Objects that are explicitly freed must be at least 16 bytes in size,
-  // so that they are not allocated using tiny alloc.
-  len = k.len + 1;
-  if (len < TinySize)
-   len = TinySize;
-  kn = __go_alloc (len);
+  kn = malloc (k.len + 1);
+  if (kn == NULL)
+   runtime_throw ("out of malloc memory");
   __builtin_memcpy (kn, ks, k.len);
   ks = kn;
 }
@@ -50,5 +43,5 @@ unsetenv_c (String k)
 #endif /* !defined(HAVE_UNSETENV) */
 
   if (kn != NULL)
-__go_free (kn);
+free (kn);
 }

Re: PING [PATCH] enable -fprintf-return-value by default


On 11/17/2016 10:34 PM, Sandra Loosemore wrote:

On 11/16/2016 09:49 AM, Martin Sebor wrote:

I'm looking for an approval of the attached patch.

I've adjusted the documentation based on Sandra's input (i.e.,
documented the negative of the option rather than the positive;
thank you for the review, btw.)

On 11/08/2016 08:13 PM, Martin Sebor wrote:

The -fprintf-return-value optimization has been disabled since
the last time it caused a bootstrap failure on powerpc64le.  With
the underlying problems fixed GCC has bootstrapped fine on all of
powerpc64, powerpc64le and x86_64 and tested with no regressions.
I'd like to re-enable the option.  The attached patch does that.


[snip]

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi(revision 242500)
+++ gcc/doc/invoke.texi(working copy)
@@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}.
 -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss
@gol
 -fomit-frame-pointer -foptimize-sibling-calls @gol
 -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol
--fprefetch-loop-arrays -fprintf-return-value @gol
+-fprefetch-loop-arrays -fno-printf-return-value @gol
 -fprofile-correction @gol
 -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
 -fprofile-reorder-functions @gol


Please keep this list alphabetized -- the other "-fno-*" options are
sorted as such.  The documentation parts of the patch are OK with that
fixed, but I can't approve changing the default for the option.


I kept the option in the same place on the assumption that it was
the "printf" radix of the name, not the "no-" prefix, that these
were alphabetized by.

But now that you point it out and I've looked more carefully at
some of the other options, I see that in some sections they are
listed strictly alphabetically (-fno-foo after -fmoo) while in
others it's the way you suggest.  AFAICS, the former style looks
prevalent in the C++ Language Options and the in Warning Options,
for example.  The latter seems to be more popular in the
Optimization Options section (though there are counterexamples).

I'm happy to follow either of these conventions as long as its
consistent.  Otherwise, without both kinds of examples to choose
from, I don't think I can trust my brain to remember yet another
rule.

Martin

Re: Fix PR77881: combine improvement

2016-11-18 Thread Michael Matz

Hi,

On Fri, 18 Nov 2016, Bin.Cheng wrote:

> On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwab  wrote:
> > On Nov 14 2016, Michael Matz  wrote:
> >
> >>   PR missed-optimization/77881
> >>   * combine.c (simplify_comparison): Remove useless subregs
> >>   also inside the loop, not just after it.
> >>   (make_compound_operation): Recognize some subregs as being
> >>   masking as well.
> >
> > This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64.
> Hi,
> I can confirm that, also new PR opened for tracking.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422

See PR78390 for a patch (comment #8) fixing the aarch64 problem.


Ciao,
Michael.

Re: [PATCH, vec-tails] Support loop epilogue vectorization

On 18 November 2016 at 16:46, Yuri Rumyantsev  wrote:
> It is very strange that this test failed on arm, since it requires
> target avx2 to check vectorizer dumps:
>
> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
> target avx2_runtime } } } */
> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */
>
> Could you please clarify what is the reason of the failure?

It's not the scan-dumps that fail, but the execution.
The test calls abort() for some reason.

It will take me a while to rebuild the test manually in the right
debug environment to provide you with more traces.



>
> Thanks.
>
> 2016-11-18 16:20 GMT+03:00 Christophe Lyon :
>> On 15 November 2016 at 15:41, Yuri Rumyantsev  wrote:
>>> Hi All,
>>>
>>> Here is patch for non-masked epilogue vectoriziation.
>>>
>>> Bootstrap and regression testing did not show any new failures.
>>>
>>> Is it OK for trunk?
>>>
>>> Thanks.
>>> Changelog:
>>>
>>> 2016-11-15  Yuri Rumyantsev  
>>>
>>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
>>> * tree-if-conv.c (tree_if_conversion): Make public.
>>> * * tree-if-conv.h: New file.
>>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
>>> dynamic alias checks for epilogues.
>>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
>>> * tree-vect-loop.c: include tree-if-conv.h.
>>> (new_loop_vec_info): Add zeroing orig_loop_info field.
>>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
>>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
>>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
>>> using passed argument.
>>> (vect_transform_loop): Check if created epilogue should be returned
>>> for further vectorization with less vf.  If-convert epilogue if
>>> required. Print vectorization success for epilogue.
>>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
>>> if it is required, pass loop_vinfo produced during vectorization of
>>> loop body to vect_analyze_loop.
>>> * tree-vectorizer.h (struct _loop_vec_info): Add new field
>>> orig_loop_info.
>>> (LOOP_VINFO_ORIG_LOOP_INFO): New.
>>> (LOOP_VINFO_EPILOGUE_P): New.
>>> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
>>> (vect_do_peeling): Change prototype to return epilogue.
>>> (vect_analyze_loop): Add argument of loop_vec_info type.
>>> (vect_transform_loop): Return created loop.
>>>
>>> gcc/testsuite/
>>>
>>> * lib/target-supports.exp (check_avx2_hw_available): New.
>>> (check_effective_target_avx2_runtime): New.
>>> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
>>>
>>
>> Hi,
>>
>> This new test fails on arm-none-eabi (using default cpu/fpu/mode):
>>   gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
>>   gcc.dg/vect/vect-tail-nomask-1.c execution test
>>
>> It does pass on the same target if configured --with-cpu=cortex-a9.
>>
>> Christophe
>>
>>
>>
>>>
>>> 2016-11-14 20:04 GMT+03:00 Richard Biener :
 On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev 
  wrote:
>Richard,
>
>I checked one of the tests designed for epilogue vectorization using
>patches 1 - 3 and found out that build compiler performs vectorization
>of epilogues with --param vect-epilogues-nomask=1 passed:
>
>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>t1.new-nomask.s -fdump-tree-vect-details
>$ grep VECTORIZED -c t1.c.156t.vect
>4
> Without param only 2 loops are vectorized.
>
>Should I simply add a part of tests related to this feature or I must
>delete all not necessary changes also?

 Please remove all not necessary changes.

 Richard.

>Thanks.
>Yuri.
>
>2016-11-14 16:40 GMT+03:00 Richard Biener :
>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>
>>> Richard,
>>>
>>> In my previous patch I forgot to remove couple lines related to aux
>field.
>>> Here is the correct updated patch.
>>
>> Yeah, I noticed.  This patch would be ok for trunk (together with
>> necessary parts from 1 and 2) if all not required parts are removed
>> (and you'd add the testcases covering non-masked tail vect).
>>
>> Thus, can you please produce a single complete patch containing only
>> non-masked epilogue vectoriziation?
>>
>> Thanks,
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2016-11-14 15:51 GMT+03:00 Richard Biener :
>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>>> >
>>> >> Richard,
>>> >>
>>> >> I prepare updated 3 patch with passing additional argument to
>>> >> vect_analyze_loop as you proposed (untested).
>>> >>
>>> >> You wrote:
>>> >> tw, I wonder if you can produce a single patch containing

Re: [PATCH, ARM] Enable ldrd/strd peephole rules unconditionally

On 11/18/16 12:58, Christophe Lyon wrote:
> On 17 November 2016 at 10:23, Kyrill Tkachov
>  wrote:
>>
>> On 09/11/16 12:58, Bernd Edlinger wrote:
>>>
>>> Hi!
>>>
>>>
>>> This patch enables the ldrd/strd peephole rules unconditionally.
>>>
>>> It is meant to fix cases, where the patch to reduce the sha512
>>> stack usage splits ldrd/strd instructions into separate ldr/str insns,
>>> but is technically independent from the other patch:
>>>
>>> See https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00523.html
>>>
>>> It was necessary to change check_effective_target_arm_prefer_ldrd_strd
>>> to retain the true prefer_ldrd_strd tuning flag.
>>>
>>>
>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>> Is it OK for trunk?
>>
>>
>> This is ok.
>> Thanks,
>> Kyrill
>>
>
> Hi Bernd,
>
> Since you committed this patch (r242549), I'm seeing the new test
> failing on some arm*-linux-gnueabihf configurations:
>
> FAIL:  gcc.target/arm/pr53447-5.c scan-assembler-times ldrd 10
> FAIL:  gcc.target/arm/pr53447-5.c scan-assembler-times strd 9
>
> See 
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/242549/report-build-info.html
> for a map of failures.
>
> Am I missing something?

Hi Christophe,

as always many thanks for your testing...

I have apparently only looked at the case -mfloat-abi=soft here, which
is what my other patch is going to address.  But all targets with
-mfpu=neon -mfloat-abi=hard can also use vldr.64 instead of ldrd
and vstr.64 instead of strd, which should be accepted as well.

So the attached patch should fix at least most of the fallout.

Is it OK for trunk?


Thanks
Bernd.
2016-11-18  Bernd Edlinger  

	* gcc.target/arm/pr53447-5.c: Fix test expectations for neon-fpu.

Index: gcc/testsuite/gcc.target/arm/pr53447-5.c
===
--- gcc/testsuite/gcc.target/arm/pr53447-5.c	(revision 242588)
+++ gcc/testsuite/gcc.target/arm/pr53447-5.c	(working copy)
@@ -15,5 +15,8 @@ void foo(long long* p)
   p[9] -= p[10];
 }
 
-/* { dg-final { scan-assembler-times "ldrd" 10 } } */
-/* { dg-final { scan-assembler-times "strd" 9 } } */
+/* We accept neon instructions vldr.64 and vstr.64 as well.
+   Note: DejaGnu counts patterns with alternatives twice,
+   so actually there are only 10 loads and 9 stores.  */
+/* { dg-final { scan-assembler-times "(ldrd|vldr\\.64)" 20 } } */
+/* { dg-final { scan-assembler-times "(strd|vstr\\.64)" 18 } } */

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-18 Thread Yuri Rumyantsev

It is very strange that this test failed on arm, since it requires
target avx2 to check vectorizer dumps:

/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
target avx2_runtime } } } */
/* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
\\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */

Could you please clarify what is the reason of the failure?

Thanks.

2016-11-18 16:20 GMT+03:00 Christophe Lyon :
> On 15 November 2016 at 15:41, Yuri Rumyantsev  wrote:
>> Hi All,
>>
>> Here is patch for non-masked epilogue vectoriziation.
>>
>> Bootstrap and regression testing did not show any new failures.
>>
>> Is it OK for trunk?
>>
>> Thanks.
>> Changelog:
>>
>> 2016-11-15  Yuri Rumyantsev  
>>
>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
>> * tree-if-conv.c (tree_if_conversion): Make public.
>> * * tree-if-conv.h: New file.
>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
>> dynamic alias checks for epilogues.
>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
>> * tree-vect-loop.c: include tree-if-conv.h.
>> (new_loop_vec_info): Add zeroing orig_loop_info field.
>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
>> using passed argument.
>> (vect_transform_loop): Check if created epilogue should be returned
>> for further vectorization with less vf.  If-convert epilogue if
>> required. Print vectorization success for epilogue.
>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
>> if it is required, pass loop_vinfo produced during vectorization of
>> loop body to vect_analyze_loop.
>> * tree-vectorizer.h (struct _loop_vec_info): Add new field
>> orig_loop_info.
>> (LOOP_VINFO_ORIG_LOOP_INFO): New.
>> (LOOP_VINFO_EPILOGUE_P): New.
>> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
>> (vect_do_peeling): Change prototype to return epilogue.
>> (vect_analyze_loop): Add argument of loop_vec_info type.
>> (vect_transform_loop): Return created loop.
>>
>> gcc/testsuite/
>>
>> * lib/target-supports.exp (check_avx2_hw_available): New.
>> (check_effective_target_avx2_runtime): New.
>> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
>>
>
> Hi,
>
> This new test fails on arm-none-eabi (using default cpu/fpu/mode):
>   gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
>   gcc.dg/vect/vect-tail-nomask-1.c execution test
>
> It does pass on the same target if configured --with-cpu=cortex-a9.
>
> Christophe
>
>
>
>>
>> 2016-11-14 20:04 GMT+03:00 Richard Biener :
>>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev 
>>>  wrote:
Richard,

I checked one of the tests designed for epilogue vectorization using
patches 1 - 3 and found out that build compiler performs vectorization
of epilogues with --param vect-epilogues-nomask=1 passed:

$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
t1.new-nomask.s -fdump-tree-vect-details
$ grep VECTORIZED -c t1.c.156t.vect
4
 Without param only 2 loops are vectorized.

Should I simply add a part of tests related to this feature or I must
delete all not necessary changes also?
>>>
>>> Please remove all not necessary changes.
>>>
>>> Richard.
>>>
Thanks.
Yuri.

2016-11-14 16:40 GMT+03:00 Richard Biener :
> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>
>> Richard,
>>
>> In my previous patch I forgot to remove couple lines related to aux
field.
>> Here is the correct updated patch.
>
> Yeah, I noticed.  This patch would be ok for trunk (together with
> necessary parts from 1 and 2) if all not required parts are removed
> (and you'd add the testcases covering non-masked tail vect).
>
> Thus, can you please produce a single complete patch containing only
> non-masked epilogue vectoriziation?
>
> Thanks,
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2016-11-14 15:51 GMT+03:00 Richard Biener :
>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>> >
>> >> Richard,
>> >>
>> >> I prepare updated 3 patch with passing additional argument to
>> >> vect_analyze_loop as you proposed (untested).
>> >>
>> >> You wrote:
>> >> tw, I wonder if you can produce a single patch containing just
>> >> epilogue vectorization, that is combine patches 1-3 but rip out
>> >> changes only needed by later patches?
>> >>
>> >> Did you mean that I exclude all support for vectorization
epilogues,
>> >> i.e. exclude from 2-nd patch all non-related changes
>> >> like
>> >>
>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> >> index 11863af..32011c1 100644
>> >> ---

Re: Fix PR77881: combine improvement

2016-11-18 Thread Bin.Cheng

On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwab  wrote:
> On Nov 14 2016, Michael Matz  wrote:
>
>>   PR missed-optimization/77881
>>   * combine.c (simplify_comparison): Remove useless subregs
>>   also inside the loop, not just after it.
>>   (make_compound_operation): Recognize some subregs as being
>>   masking as well.
>
> This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64.
Hi,
I can confirm that, also new PR opened for tracking.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422

Thanks,
bin
>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."

[PATCH] substring_loc info needs default track-macro-expansion (PR preprocessor/78324)

gcc.dg/tree-ssa/builtin-sprintf-2.c is showing intermittent failures, which
valgrind shows to be a read past the end of a buffer.

The root cause is that the on-demand substring loc code isn't set up
to cope with -ftrack-macro-expansion != 2 (the default).

In the failing case, it attempts to use this location as the location
of the literal:

../../src/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-2.c:95:1: note:
 RNG (0,  0,  0, "%hhi", i)
 ^~~

rather than:

 RNG (0,  0,  0, "%hhi", i)
  ^~~~

On re-parsing to generate the on-demand substring locations, it thus
erroneously attempts to parse the 'R' as a raw string, and so this
code within cpp_interpret_string_1 fires:

  if (*p == 'R')
{
  const uchar *prefix;

  /* Skip over 'R"'.  */
  p += 2;
  prefix = p;
  while (*p != '(')
p++;

and the issue happens in the "while" loop, as it erroneously
walks through this memory:

(gdb) p strs.m_vec.m_vecdata[0]
$20 = {len = 3, text = 0xc9bcca0 "RNG"}

trying to find the open parenthesis, and starts reading beyond the
allocated buffer.

The fix is to require that -ftrack-macro-expansion == 2 (the default)
for on-demand string literal locations to be available, failing
gracefully to simply using the whole location_t otherwise.

Doing so requires some fixups to existing test cases:
[gcc.dg/tree-ssa/builtin-sprintf-warn-{1,4}.c both have
-ftrack-macro-expansion=0.  In the latter case, it seems to be
unnecessary, so this patch removes it.  In the former case, it
seems to be needed, but there are some expected locations in
test_sprintf_note that are changed by the patch.  It's not clear to
me how to fix that, so for now the patch removes the expected
column numbers from that function within the test case.

Successfully bootstrapped on x86_64-pc-linux-gnu.

I think I can self-approve the input.c change and new test cases,
but I'm not sure about the changes to Martin's test cases.

Martin: are the changes to your test cases OK by you, or is there
a better way to rewrite them?

gcc/ChangeLog:
PR preprocessor/78324
* input.c (get_substring_ranges_for_loc): Fail gracefully if
-ftrack-macro-expansion has a value other than 2.

gcc/testsuite/ChangeLog:
PR preprocessor/78324
* gcc.dg/plugin/diagnostic-test-string-literals-1.c
(test_multitoken_macro): New function.
* gcc.dg/plugin/diagnostic-test-string-literals-3.c: New test
case.
* gcc.dg/plugin/diagnostic-test-string-literals-4.c: New test
case.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add the new test
cases.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c (test_sprintf_note):
Drop expected column numbers from dg-warning directives.
* gcc.dg/tree-ssa/builtin-sprintf-warn-4.c: Drop
-ftrack-macro-expansion=0.
---
 gcc/input.c|  9 +
 .../plugin/diagnostic-test-string-literals-1.c | 16 
 .../plugin/diagnostic-test-string-literals-3.c | 43 ++
 .../plugin/diagnostic-test-string-literals-4.c | 43 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp |  4 +-
 .../gcc.dg/tree-ssa/builtin-sprintf-warn-1.c   |  6 +--
 .../gcc.dg/tree-ssa/builtin-sprintf-warn-4.c   |  2 +-
 7 files changed, 118 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-3.c
 create mode 100644 
gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-4.c

diff --git a/gcc/input.c b/gcc/input.c
index 728f4dd..611e18b 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -1322,6 +1322,15 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
   if (strloc == UNKNOWN_LOCATION)
 return "unknown location";
 
+  /* Reparsing the strings requires accurate location information.
+ If -ftrack-macro-expansion has been overridden from its default
+ of 2, then we might have a location of a macro expansion point,
+ rather than the location of the literal itself.
+ Avoid this by requiring that we have full macro expansion tracking
+ for substring locations to be available.  */
+  if (cpp_get_options (pfile)->track_macro_expansion != 2)
+return "track_macro_expansion != 2";
+
   /* If string concatenation has occurred at STRLOC, get the locations
  of all of the literal tokens making up the compound string.
  Otherwise, just use STRLOC.  */
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c 
b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c
index 3e44936..76a085e 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c
@@ -243,6 +243,22 @@ test_macro (void)
{ dg-end-multiline-output "" } */
 }
 
+void
+test_multitoken_macro (void)
+{
+#define RANGE ("0123456789")  /* { dg-error "unable to

Re: [PATCH 1/2] PR77822

2016-11-18 Thread Dominik Vogt

On Fri, Nov 18, 2016 at 08:02:08AM -0600, Segher Boessenkool wrote:
> On Fri, Nov 18, 2016 at 01:09:24PM +0100, Dominik Vogt wrote:
> > IN_RANGE(POS...) makes sure that POS is a non-negative number
> > smaller than UPPER, so (UPPER) - (POS) does not wrap.  Or is there
> > some case that the new macro does not handle?
> 
> I think it works fine.
> 
> With a name like UPPER, you may want to have SIZE_POS_IN_RANGE work like
> IN_RANGE, i.e. UPPER is inclusive then.  Dunno.

Yeah, maybe rather call it RANGE to avoid too much similarity.
Some name that is so vague that one has to read the documentation.
I'll post an updated patch later.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

libgo patch committed: Remove long-obsolete "old" directories

2016-11-18 Thread Ian Lance Taylor

The directories old/regexp and old/template were removed from the
master Go library in 2012 (https://golang.org/cl/5979046) but somehow
that was not reflected in libgo.  This patch removes them from libgo.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian


patch.txt.bz2
Description: BZip2 compressed data

Re: [PATCH] S390: Lower requirements for successful htm tests.

2016-11-18 Thread Andreas Krebbel

On Fri, Nov 18, 2016 at 02:48:30PM +0100, Dominik Vogt wrote:
> gcc/testsuite/ChangeLog
> 
>   * gcc.target/s390/htm-builtins-1.c (DEFAULT_MAX_REPETITIONS)
>   (DEFAULT_REQUIRED_QUORUM, NUM_WARMUP_RUNS): Lower requirements for
>   successful test.
>   * gcc.target/s390/htm-builtins-2.c (DEFAULT_MAX_REPETITIONS)
>   (DEFAULT_REQUIRED_QUORUM): Likewise.

Applied. Thanks!

-Andreas-

Re: [PATCH v3] PR77359: Properly align local variables in functions calling alloca.

2016-11-18 Thread Andreas Krebbel

On Fri, Nov 11, 2016 at 10:33:16AM +0100, Dominik Vogt wrote:
> gcc/ChangeLog
> 
>   * config/rs6000/rs6000.c (rs6000_stack_info): PR/77359: Properly align
>   local variables in functions calling alloca.  Also update the ASCII
>   drawings
>   * config/rs6000/rs6000.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET):
>   PR/77359: Likewise.
>   * config/rs6000/aix.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET):
>   PR/77359: Copy AIX specific versions of the rs6000.h macros to aix.h.

Applied. Thanks!

-Andreas-

Re: [Patch, Fortran] PR 78392: ICE in gfc_trans_auto_array_allocation, at fortran/trans-array.c:5979

2016-11-18 Thread Dominique d'Humières

Hi Janus,

> the attached patch fixes an ice-on-valid problem, simply by removing an 
> assert. ...

I have several instances in my test suite showing that the proposed patch 
removes the ICE but generates wrong code:

pr42359, second test, => ICE on another place
pr54613, sixth and eighth tests,

Thanks for working on the issue,

Dominique

Re: [PATCH] debug/PR78112: remove recent duplicates for DW_TAG_subprogram attributes

On 10 November 2016 at 12:06, Pierre-Marie de Rodat  wrote:
> On 11/09/2016 10:02 AM, Richard Biener wrote:
>>
>> Using scan-assembler-times on the dwarf?
>
>
> I always have a bad feeling about this kind of check as I imagine it can
> break very easily with legit changes. But I have nothing better to
> contribute, so I’ve added one such testcase. ;-)
>
>>> Ok to commit?
>>
>>
>> Ok.
>
>
> This is committed as r242035. Thanks!
>
Hi,

Part of the new testcase added with this commit fails on arm* targets:
  g++.dg/pr78112.C   scan-assembler-times DW_AT_object_pointer 37

Christophe



> --
> Pierre-Marie de Rodat

Re: [PATCH 1/2] PR77822

On Fri, Nov 18, 2016 at 01:09:24PM +0100, Dominik Vogt wrote:
> How about this:
> 
> --
> /* A convenience macro to determine whether a SIZE lies inclusively
>within [1, UPPER], POS lies inclusively within between
>[0, UPPER - 1] and the sum lies inclusively within [1, UPPER].
>UPPER must be >= 1.  */
> #define SIZE_POS_IN_RANGE(SIZE, POS, UPPER) \
>   (IN_RANGE ((POS), 0, (unsigned HOST_WIDE_INT) (UPPER) - 1) \
>&& IN_RANGE ((SIZE), 1, (unsigned HOST_WIDE_INT) (UPPER) \
>- (unsigned HOST_WIDE_INT)(POS)))
   ^ missing space here

> IN_RANGE(POS...) makes sure that POS is a non-negative number
> smaller than UPPER, so (UPPER) - (POS) does not wrap.  Or is there
> some case that the new macro does not handle?

I think it works fine.

With a name like UPPER, you may want to have SIZE_POS_IN_RANGE work like
IN_RANGE, i.e. UPPER is inclusive then.  Dunno.


Segher

Re: [v3 PATCH] LWG 2766, LWG 2749

2016-11-18 Thread Ville Voutilainen

On 18 November 2016 at 15:54, Jonathan Wakely  wrote:
> I agree, but if we'd just refused to support such undefined behaviour
> in stage 1 we wouldn't now be in a position of saying we can't change
> it in stage 3.

I want to support the code that the previous attempt breaks. I don't
think I can do so without
concepts.

Re: [v3 PATCH] LWG 2766, LWG 2749

2016-11-18 Thread Jonathan Wakely


On 17/11/16 23:38 +0200, Ville Voutilainen wrote:

The first patch does everything but the container adaptor parts (which
are wrong in the p/r) and the tuple part (which is icky). The second
part does the tuple parts, and needs some serious RFC. I know
it's ugly as sin, but it works. I don't think we should try to teach
our tuple to give the right answer for is_constructible etc., because
the last attempt at it broke boost and would break reasonable existing
code in addition to just boost - such things are not Stage 3 material,
imnsho.


I agree, but if we'd just refused to support such undefined behaviour
in stage 1 we wouldn't now be in a position of saying we can't change
it in stage 3.

Oh well, I'll review this ASAP.

Re: Rework subreg_get_info

2016-11-18 Thread Richard Sandiford

Joseph Myers  writes:
> On Tue, 15 Nov 2016, Richard Sandiford wrote:
>> Richard Sandiford  writes:
>> > This isn't intended to change the behaviour, just rewrite the
>> > existing logic in a different (and hopefully clearer) way.
>> > The new form -- particularly the part based on the "block"
>> > concept -- is easier to convert to polynomial sizes.
>> >
>> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>> 
>> Sorry, I should have said: this was also tested by compiling the
>> testsuite before and after the change at -O2 -ftree-vectorize on:
>> 
>> aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi
>> arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf
>> epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf
>> hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu
>> i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf
>> m68k-linux-gnu mcore-elf microblaze-elf mips-linux-gnu
>> mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
>> nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnu
>> powerpc-eabispe powerpc64-linux-gnu powerpc-ibm-aix7.0 rl78-elf
>
> e500 double (both DFmode and TFmode) was the case that motivated the 
> original creation of subreg_get_info.  I think powerpc-linux-gnuspe 
> --enable-e500-double would be a good case for verifying the patch doesn't 
> change generated code.  (Though when I tried building it from mainline 
> sources last week I got an ICE in LRA building __multc3, so testing it 
> might be problematic at present.)

Thanks for the pointer.  I've now tried comparing the testsuite
output on that combination and there were no differences.  31 tests
triggered an internal compiler error but >17800 compiled successfully,
so hopefully the test was at least somewhat meaningful.

(For the record: I got a build failure due to addsi3 FAILing,
but I hacked around that by converting it to a force_reg/DONE.)

Thanks,
Richard

[PATCH] S390: Lower requirements for successful htm tests.

2016-11-18 Thread Dominik Vogt

The attached patch makes the htm tests on s390 less sensitive to
spurious abort.  Please check the commit comment for details.  The
modified tests have been run once on a zEC12.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/testsuite/ChangeLog

* gcc.target/s390/htm-builtins-1.c (DEFAULT_MAX_REPETITIONS)
(DEFAULT_REQUIRED_QUORUM, NUM_WARMUP_RUNS): Lower requirements for
successful test.
* gcc.target/s390/htm-builtins-2.c (DEFAULT_MAX_REPETITIONS)
(DEFAULT_REQUIRED_QUORUM): Likewise.
>From cd354b024c5e129575465b9088b0e0d03340c8f0 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 18 Nov 2016 14:06:28 +0100
Subject: [PATCH] S390: Lower requirements for successful htm tests.

Due to spurious aborts, requiring four successful tests out of five fails too
often; accept four out of seven instead.

Reduce number of warmup passes.
---
 gcc/testsuite/gcc.target/s390/htm-builtins-1.c | 6 +++---
 gcc/testsuite/gcc.target/s390/htm-builtins-2.c | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/htm-builtins-1.c 
b/gcc/testsuite/gcc.target/s390/htm-builtins-1.c
index c90490f..ff43be9 100644
--- a/gcc/testsuite/gcc.target/s390/htm-builtins-1.c
+++ b/gcc/testsuite/gcc.target/s390/htm-builtins-1.c
@@ -13,9 +13,9 @@
 
 /*  local definitions -- */
 
-#define DEFAULT_MAX_REPETITIONS 5
-#define DEFAULT_REQUIRED_QUORUM ((DEFAULT_MAX_REPETITIONS) - 1)
-#define NUM_WARMUP_RUNS 10
+#define DEFAULT_MAX_REPETITIONS 7
+#define DEFAULT_REQUIRED_QUORUM 4
+#define NUM_WARMUP_RUNS 2
 
 /*  local macros --- */
 
diff --git a/gcc/testsuite/gcc.target/s390/htm-builtins-2.c 
b/gcc/testsuite/gcc.target/s390/htm-builtins-2.c
index 15b0d12..bb9d346 100644
--- a/gcc/testsuite/gcc.target/s390/htm-builtins-2.c
+++ b/gcc/testsuite/gcc.target/s390/htm-builtins-2.c
@@ -13,8 +13,8 @@
 
 /*  local definitions -- */
 
-#define DEFAULT_MAX_REPETITIONS 5
-#define DEFAULT_REQUIRED_QUORUM ((DEFAULT_MAX_REPETITIONS) - 1)
+#define DEFAULT_MAX_REPETITIONS 7
+#define DEFAULT_REQUIRED_QUORUM 4
 #define DEFAULT_ABORT_ADDRESS (0x12345678u)
 
 /*  local macros --- */
-- 
2.3.0

[patch,avr] Disallow LDS / STS for .rodata on AVR_TINY.

2016-11-18 Thread Georg-Johann Lay

Currently, Binutils still comes with an AVR_TINY linker description file 
which puts .rodata into RAM so that LDS is applicable for reading such 
objects.


However, as these devices come with a linearised address model, linker 
descriptions which put .rodata into flash are much more reasonable. 
This patch caters for such situations:  As .rodata virtual addresses 
will be offset by 0x4000, respective objects may no more be accessed by LDS.


Moreover, objects with a section attribute won't be accessed by LDS.

Ok for trunk?

Johann

PR target/78093
* config/avr/avr.c (avr_decl_maybe_lds_p): New static function.
(avr_encode_section_info) [TARGET_ABSDATA && AVR_TINY]: Use it.
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 242544)
+++ config/avr/avr.c	(working copy)
@@ -10102,6 +10102,33 @@ avr_section_type_flags (tree decl, const
 }
 
 
+/* A helper for the next function.  NODE is a decl that is associated with
+   a symbol.  Return TRUE if the respective object may be accessed by LDS.
+   There migth still be other reasons for why LDS is not appropriate.
+   This function is only useful for AVR_TINY. */
+
+static bool
+avr_decl_maybe_lds_p (tree node)
+{
+  if (!node
+  || TREE_CODE (node) != VAR_DECL
+  || DECL_SECTION_NAME (node) != NULL)
+return false;
+
+  if (TREE_READONLY (node))
+return true;
+
+  // C++ requires peeling arrays.
+
+  do
+node = TREE_TYPE (node);
+  while (ARRAY_TYPE == TREE_CODE (node));
+
+  return (node != error_mark_node
+  && !TYPE_READONLY (node));
+}
+
+
 /* Implement `TARGET_ENCODE_SECTION_INFO'.  */
 
 static void
@@ -10193,7 +10220,8 @@ avr_encode_section_info (tree decl, rtx
   if (avr_decl_absdata_p (decl, DECL_ATTRIBUTES (decl))
   || (TARGET_ABSDATA
   && !progmem_p
-  && !addr_attr)
+  && !addr_attr
+  && avr_decl_maybe_lds_p (decl))
   || (addr_attr
   // If addr_attr is non-null, it has an argument.  Peek into it.
   && TREE_INT_CST_LOW (TREE_VALUE (TREE_VALUE (addr_attr))) < 0xc0))

Re: [PATCH, vec-tails] Support loop epilogue vectorization

On 15 November 2016 at 15:41, Yuri Rumyantsev  wrote:
> Hi All,
>
> Here is patch for non-masked epilogue vectoriziation.
>
> Bootstrap and regression testing did not show any new failures.
>
> Is it OK for trunk?
>
> Thanks.
> Changelog:
>
> 2016-11-15  Yuri Rumyantsev  
>
> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
> * tree-if-conv.c (tree_if_conversion): Make public.
> * * tree-if-conv.h: New file.
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
> dynamic alias checks for epilogues.
> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
> * tree-vect-loop.c: include tree-if-conv.h.
> (new_loop_vec_info): Add zeroing orig_loop_info field.
> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
> using passed argument.
> (vect_transform_loop): Check if created epilogue should be returned
> for further vectorization with less vf.  If-convert epilogue if
> required. Print vectorization success for epilogue.
> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
> if it is required, pass loop_vinfo produced during vectorization of
> loop body to vect_analyze_loop.
> * tree-vectorizer.h (struct _loop_vec_info): Add new field
> orig_loop_info.
> (LOOP_VINFO_ORIG_LOOP_INFO): New.
> (LOOP_VINFO_EPILOGUE_P): New.
> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
> (vect_do_peeling): Change prototype to return epilogue.
> (vect_analyze_loop): Add argument of loop_vec_info type.
> (vect_transform_loop): Return created loop.
>
> gcc/testsuite/
>
> * lib/target-supports.exp (check_avx2_hw_available): New.
> (check_effective_target_avx2_runtime): New.
> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
>

Hi,

This new test fails on arm-none-eabi (using default cpu/fpu/mode):
  gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
  gcc.dg/vect/vect-tail-nomask-1.c execution test

It does pass on the same target if configured --with-cpu=cortex-a9.

Christophe



>
> 2016-11-14 20:04 GMT+03:00 Richard Biener :
>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev 
>>  wrote:
>>>Richard,
>>>
>>>I checked one of the tests designed for epilogue vectorization using
>>>patches 1 - 3 and found out that build compiler performs vectorization
>>>of epilogues with --param vect-epilogues-nomask=1 passed:
>>>
>>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>>>t1.new-nomask.s -fdump-tree-vect-details
>>>$ grep VECTORIZED -c t1.c.156t.vect
>>>4
>>> Without param only 2 loops are vectorized.
>>>
>>>Should I simply add a part of tests related to this feature or I must
>>>delete all not necessary changes also?
>>
>> Please remove all not necessary changes.
>>
>> Richard.
>>
>>>Thanks.
>>>Yuri.
>>>
>>>2016-11-14 16:40 GMT+03:00 Richard Biener :
 On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:

> Richard,
>
> In my previous patch I forgot to remove couple lines related to aux
>>>field.
> Here is the correct updated patch.

 Yeah, I noticed.  This patch would be ok for trunk (together with
 necessary parts from 1 and 2) if all not required parts are removed
 (and you'd add the testcases covering non-masked tail vect).

 Thus, can you please produce a single complete patch containing only
 non-masked epilogue vectoriziation?

 Thanks,
 Richard.

> Thanks.
> Yuri.
>
> 2016-11-14 15:51 GMT+03:00 Richard Biener :
> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
> >
> >> Richard,
> >>
> >> I prepare updated 3 patch with passing additional argument to
> >> vect_analyze_loop as you proposed (untested).
> >>
> >> You wrote:
> >> tw, I wonder if you can produce a single patch containing just
> >> epilogue vectorization, that is combine patches 1-3 but rip out
> >> changes only needed by later patches?
> >>
> >> Did you mean that I exclude all support for vectorization
>>>epilogues,
> >> i.e. exclude from 2-nd patch all non-related changes
> >> like
> >>
> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> >> index 11863af..32011c1 100644
> >> --- a/gcc/tree-vect-loop.c
> >> +++ b/gcc/tree-vect-loop.c
> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
> >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
> >>LOOP_VINFO_PEELING_FOR_NITER (res) = false;
> >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) =

Re: [PATCH] Fix PR78189