Re: [PATCH V2, RFC] Fix PR62147 by passing finiteness information to RTL phase

2019-06-26 Thread Kewen.Lin
Hi all,

I've committed this and with one more change.

--- gcc/loop-iv.c   (revision 272731)
+++ gcc/loop-iv.c   (working copy)
@@ -3004,7 +3004,7 @@ find_simple_exit (struct loop *loop, struct niter_
  well.  It results in incorrect predicate information on the exit condition
  expression.  For example, if says [(int) _1 + -8, + , -8] != 0 finite,
  it means _1 can exactly divide -8.  */
-  if (single_exit (loop) && finite_loop_p (loop))
+  if (desc->infinite && single_exit (loop) && finite_loop_p (loop))
 {
   desc->infinite = NULL_RTX;
   if (dump_file)

on 2019/6/26 上午11:45, Kewen.Lin wrote:
> Hi Jeff,
> 
> on 2019/6/26 上午5:49, Jeff Law wrote:
>> On 6/25/19 3:41 AM, Kewen.Lin wrote:
>>> Hi Richard,
>>>
>>> Thanks a lot for review comments. 
>>>
>>> on 2019/6/25 下午3:23, Richard Biener wrote:
 On Tue, 25 Jun 2019, Kewen.Lin wrote:

> Hi all,
>
>
> It's based on two observations:
>   1) the loop structure for one specific loop is shared between 
> middle-end and 
>  back-end.
>   2) for one specific loop, if it's finite then never become infinite 
> itself.
>
> As one gcc newbie, I'm not sure whether these two observations are true 
> in all
> cases.  Please feel free to correct me if anything missing.

 I think 2) is not true with -ffinite-loops.
>>>
>>> I just looked at the patch on this option, I don't fully understand it can 
>>> affect
>>> 2).  It's to take one loop as finite with any normal exit, can some loop 
>>> with this
>>> assertion turn into infinite later by some other analysis?
>>>

> btw, I also took a look at how the loop constraint LOOP_C_FINITE is used, 
> I think
> it's not suitable for this purpose, it's mainly set by vectorizer and 
> tell niter 
> and scev to take one loop as finite.  The original patch has the words 
> "constraint flag is mainly set by consumers and affects certain semantics 
> of 
> niter analyzer APIs".
>
> Bootstrapped and regression testing passed on 
> powerpc64le-unknown-linux-gnu.

 Did you consider to simply use finite_loop_p () from doloop.c?  That
 would be a much simpler patch.
>>>
>>> Good suggestion!  I took it for granted that the function can be only 
>>> efficient in
>>> middle-end, but actually some information like bit any_upper_bound could be 
>>> kept to
>>> RTL.
>>>

 For the testcase in question -ffinite-loops would provide this guarantee
 even on RTL, so would the upper bound that may be still set.

 Richard.

>>>
>>> The new version with Richard's suggestion listed below.
>>> Regression testing is ongoing.
>>>
>>>
>>> Thanks,
>>> Kewen
>>>
>>> ---
>>>
>>> gcc/ChangeLog
>>>
>>> 2019-06-25  Kewen Lin  
>>>
>>> PR target/62147
>>> * gcc/loop-iv.c (find_simple_exit): Call finite_loop_p to update 
>>> finiteness.
>>>
>>> gcc/testsuite/ChangeLog
>>>
>>> 2019-06-25  Kewen Lin  
>>>
>>> PR target/62147
>>> * gcc.target/powerpc/pr62147.c: New test.
>> This is fine assuming regression testing was OK.
>>
> 
> Thanks Jeff!  Bootstrapped and regression testing passed on 
> powerpc64le-unknown-linux-gnu.
> 
>> One might argue that "finite_loop_p" belongs elsewhere since it's not
>> really querying tree/gimple structures.
> 
> I guess it will do something gimple specific (estimate_numbers_of_iterations) 
> when it can 
> so it was placed there.
> 
> 
> Thanks,
> Kewen
> 
>>
>> jeff
>>
> 



[PATCH v3 2/2] PR c/65403 - Add tests for -Wno-error=

2019-06-26 Thread Alex Henrie
* c-c++-common/pr65403-1.c: New test.
* c-c++-common/pr65403-2.c: New test.
---
 gcc/testsuite/c-c++-common/pr65403-1.c | 10 ++
 gcc/testsuite/c-c++-common/pr65403-2.c | 15 +++
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/pr65403-1.c
 create mode 100644 gcc/testsuite/c-c++-common/pr65403-2.c

diff --git a/gcc/testsuite/c-c++-common/pr65403-1.c 
b/gcc/testsuite/c-c++-common/pr65403-1.c
new file mode 100644
index 000..fbe004a1f78
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr65403-1.c
@@ -0,0 +1,10 @@
+/* PR c/65403 */
+/* Test an unrecognized -Wno-error option in the absence of any other
+   diagnostics. The -Wno-error option should be ignored. */
+
+/* { dg-options "-Werror -Wno-error=some-future-warning" } */
+
+int main(int argc, char **argv)
+{
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/pr65403-2.c 
b/gcc/testsuite/c-c++-common/pr65403-2.c
new file mode 100644
index 000..8b5faa7270e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr65403-2.c
@@ -0,0 +1,15 @@
+/* PR c/65403 */
+/* Test a warning, treated as an error, that some future -Wno-error option
+   might downgrade back to a warning. The -Wno-error option should produce a
+   warning in this case. */
+
+/* { dg-options "-Wunused-variable -Werror -Wno-error=some-future-warning" } */
+
+int main(int argc, char **argv)
+{
+  int foo; /* { dg-error "unused variable 'foo'" } */
+  return 0;
+}
+
+/* { dg-error "no option '-Wsome-future-warning'" "" { target *-*-* } 0 } */
+/* { dg-message "all warnings being treated as errors" "" { target *-*-* } 0 } 
*/
-- 
2.22.0



[PATCH v3 1/2] PR c/65403 - Ignore -Wno-error=

2019-06-26 Thread Alex Henrie
From: Manuel López-Ibáñez 

* opts-common.c (ignored_wnoerror_options): New global variable.
* opts-global.c (print_ignored_options): Ignore
-Wno-error= except if there are other
diagnostics.
* opts.c (enable_warning_as_error): Record ignored -Wno-error
options.
* opts.h (ignored_wnoerror_options): Declare.
* gcc.dg/Werror-13.c: Don't expect hints for
-Wno-error=.
---
 gcc/opts-common.c|  2 ++
 gcc/opts-global.c| 10 +++---
 gcc/opts.c   | 21 +
 gcc/opts.h   |  2 ++
 gcc/testsuite/gcc.dg/Werror-13.c |  2 +-
 5 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/gcc/opts-common.c b/gcc/opts-common.c
index 660dfe63858..8ceb8461f97 100644
--- a/gcc/opts-common.c
+++ b/gcc/opts-common.c
@@ -26,6 +26,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "spellcheck.h"
 
+vec ignored_wnoerror_options;
+
 static void prune_options (struct cl_decoded_option **, unsigned int *);
 
 /* An option that is undocumented, that takes a joined argument, and
diff --git a/gcc/opts-global.c b/gcc/opts-global.c
index bf4db775928..1d5d4e69dfc 100644
--- a/gcc/opts-global.c
+++ b/gcc/opts-global.c
@@ -132,12 +132,16 @@ print_ignored_options (void)
 {
   while (!ignored_options.is_empty ())
 {
-  const char *opt;
-
-  opt = ignored_options.pop ();
+  const char * opt = ignored_options.pop ();
   warning_at (UNKNOWN_LOCATION, 0,
  "unrecognized command-line option %qs", opt);
 }
+  while (!ignored_wnoerror_options.is_empty ())
+{
+  const char * opt = ignored_wnoerror_options.pop ();
+  warning_at (UNKNOWN_LOCATION, 0,
+ "%<-Wno-error=%s%>: no option %<-W%s%>", opt, opt);
+}
 }
 
 /* Handle an unknown option DECODED, returning true if an error should
diff --git a/gcc/opts.c b/gcc/opts.c
index b38bfb15a56..f31b6aa877e 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -3095,15 +3095,20 @@ enable_warning_as_error (const char *arg, int value, 
unsigned int lang_mask,
   option_index = find_opt (new_option, lang_mask);
   if (option_index == OPT_SPECIAL_unknown)
 {
-  option_proposer op;
-  const char *hint = op.suggest_option (new_option);
-  if (hint)
-   error_at (loc, "%<-W%serror=%s%>: no option %<-%s%>;"
- " did you mean %<-%s%>?", value ? "" : "no-",
- arg, new_option, hint);
+  if (value)
+   {
+ option_proposer op;
+ const char *hint = op.suggest_option (new_option);
+ if (hint)
+   error_at (loc, "%<-W%serror=%s%>: no option %<-%s%>;"
+ " did you mean %<-%s%>?", value ? "" : "no-",
+ arg, new_option, hint);
+ else
+   error_at (loc, "%<-W%serror=%s%>: no option %<-%s%>",
+ value ? "" : "no-", arg, new_option);
+   }
   else
-   error_at (loc, "%<-W%serror=%s%>: no option %<-%s%>",
- value ? "" : "no-", arg, new_option);
+   ignored_wnoerror_options.safe_push (arg);
 }
   else if (!(cl_options[option_index].flags & CL_WARNING))
 error_at (loc, "%<-Werror=%s%>: %<-%s%> is not an option that "
diff --git a/gcc/opts.h b/gcc/opts.h
index e5723a946f7..f553e8d00f0 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -460,4 +460,6 @@ extern bool parse_and_check_align_values (const char *flag,
  bool report_error,
  location_t loc);
 
+extern vec ignored_wnoerror_options;
+
 #endif
diff --git a/gcc/testsuite/gcc.dg/Werror-13.c b/gcc/testsuite/gcc.dg/Werror-13.c
index 3a02b7ea2b5..7c2bf6836ed 100644
--- a/gcc/testsuite/gcc.dg/Werror-13.c
+++ b/gcc/testsuite/gcc.dg/Werror-13.c
@@ -5,6 +5,6 @@
 /* { dg-error "'-Werror' is not an option that controls warnings" "" { target 
*-*-* } 0 } */
 /* { dg-error "'-Wfatal-errors' is not an option that controls warnings" "" { 
target *-*-* } 0 } */
 /* { dg-error "'-Werror=vla2': no option '-Wvla2'; did you mean '-Wvla." "" { 
target *-*-* } 0 } */
-/* { dg-error "'-Wno-error=misleading-indentation2': no option 
'-Wmisleading-indentation2'; did you mean '-Wmisleading-indentation'" "" { 
target *-*-* } 0 } */
+/* { dg-warning "'-Wno-error=misleading-indentation2': no option 
'-Wmisleading-indentation2'" "" { target *-*-* } 0 } */
 
 int i;
-- 
2.22.0



[PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default

2019-06-26 Thread Bill Schmidt
Hi,

We've done some experimenting and realized that the subject option almost
always provide improved performance for Power when the loop unroller is
enabled.  So this patch turns that flag on by default for us.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
Is this OK for trunk?

Thanks!
Bill


2019-06-27  Bill Schmidt  

* config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
-fvariable-expansion-in-unroller by default.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 272719)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -3616,6 +3616,11 @@ rs6000_option_override_internal (bool global_init_
   && !global_options_set.x_flag_asynchronous_unwind_tables)
 flag_asynchronous_unwind_tables = 1;
 
+  /* -fvariable-expansion-in-unroller is a win for POWER whenever the
+ loop unroller is active.  It is only checked during unrolling, so
+ we can just set it on by default.  */
+  flag_variable_expansion_in_unroller = 1;
+
   /* Set the pointer size.  */
   if (TARGET_64BIT)
 {



Re: [PATCH] constrain one character optimization to one character stores (PR 90989)

2019-06-26 Thread Martin Sebor

On 6/26/19 4:31 PM, Jeff Law wrote:

On 6/25/19 5:03 PM, Martin Sebor wrote:



The caller ensures that handle_char_store is only called for stores
to arrays (MEM_REF) or single elements as wide as char.

Where?  I don't see it, even after fixing the formatting in
strlen_check_and_optimize_stmt :-)


   gimple *stmt = gsi_stmt (*gsi);

   if (is_gimple_call (stmt))


I think we can agree that we don't have a call, so this is false.


  else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt))
 {
   tree lhs = gimple_assign_lhs (stmt);

This should be true IIUC, so we'll go into its THEN block.



   if (TREE_CODE (lhs) == SSA_NAME && POINTER_TYPE_P (TREE_TYPE (lhs)))

Should be false.


   else if (TREE_CODE (lhs) == SSA_NAME && INTEGRAL_TYPE_P (TREE_TYPE 
(lhs)))


Should also be false.


   else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs))

Should be true since LHS is a MEM_REF.



{
   tree type = TREE_TYPE (lhs);
   if (TREE_CODE (type) == ARRAY_TYPE)
 type = TREE_TYPE (type);
   if (TREE_CODE (type) == INTEGER_TYPE
   && TYPE_MODE (type) == TYPE_MODE (char_type_node)
   && TYPE_PRECISION (type) == TYPE_PRECISION (char_type_node))
 {
   if (! handle_char_store (gsi))
 return false;
 }
 }

If TREE_TYPE (type) is an ARRAY_TYPE, we strip the ARRAY_TYPE.  We then
verify that TYPE is a single character type.  _But_ we stripped away the
ARRAY_TYPE.  So ISTM that we allow either an array of chars or a single
char on the LHS.

So how does this avoid multiple character stores?!?  We could have had
an ARRAY_REF on the LHS and we never check the number of elements in the
array.  There's no check on the RHS either.  SO I don't see how we
guarantee that we're dealing with a single character store.

What am I missing here?


Can you show me a test case where it breaks?  If not, I don't know
what you want me to do.  I'll just move on to something else.

Martin







What you describe sounds like

   char a[N];
   *(int*)a = 0x31323300;

which is represented as

   MEM[(int *)] = 825373440;This would be closer (I realize it's not C):


   char a[N];
   a[0..3] = 0x313233300;




The LHS type of that is int so the function doesn't get called.

I'm concerned about the case where the LHS is an array.


And if the NUL byte in the original was at byte offset 2, then didn't we
just change the length by overwriting where the NUL is?


No, because cmp is the result of compare_nonzero_chars and cmp > 0
means:

   1 if SI is known to start with more than OFF nonzero characters

i.e., the character is being stored before the terminating nul.
This is the basis of the original optimization:

   /* If si->nonzero_chars > OFFSET, we aren't overwriting '\0',
  and if we aren't storing '\0', we know that the length of the
  string and any other zero terminated string in memory remains
  the same.

But all this is predicated on the assumption that we're dealing with a
single character memory store.  I don't see what enforces that precondition.





Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-26 Thread Hongtao Liu
On Wed, Jun 26, 2019 at 6:14 PM Rainer Orth  
wrote:
>
> Hi Hongtao,
>
> > Index: testsuite/lib/target-supports.exp
> > ===
> > --- testsuite/lib/target-supports.exp (revision 272667)
> > +++ testsuite/lib/target-supports.exp (working copy)
> > @@ -7963,6 +7963,20 @@
> >  } "-mavx512bw" ]
> >  }
> >
> > +# Return 1 if avx512vp2intersect instructions can be compiled.
> > +proc check_effective_target_avx512vp2intersect { } {
> > +return [check_no_compiler_messages avx512vp2intersect object {
> > + typedef int __v16si __attribute__ ((__vector_size__ (64)));
> > + typedef short __mmask16;
> > + void
> > + _mm512_2intersect_epi32 (__v16si __A, __v16si __B, __mmask16 *__U,
> > + __mmask16 *__M)
> > + {
> > + __builtin_ia32_2intersectd512 (__U, __M, (__v16si) __A, (__v16si) 
> > __B);
> > + }
> > +} "-mavx512vp2intersect" ]
> > +}
> > +
> >  # Return 1 if avx512ifma instructions can be compiled.
> >  proc check_effective_target_avx512ifma { } {
> >  return [check_no_compiler_messages avx512ifma object {
>
> as usual, the new effective-target keyword needs documenting in
> sourcebuild.texi.
Like this?

Index: ChangeLog
===
--- ChangeLog (revision 272668)
+++ ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2019-06-27  Hongtao Liu  
+
+ * doc/sourcebuild.texi: Document new effective target keyword
+ avx512vp2intersect.
+
 2019-06-25  Hongtao Liu  
  H.J. Lu  
  Olga Makhotina  
Index: doc/sourcebuild.texi
===
--- doc/sourcebuild.texi (revision 272667)
+++ doc/sourcebuild.texi (working copy)
@@ -2046,6 +2046,9 @@
 @item avx512f_runtime
 Target supports the execution of @code{avx512f} instructions.

+@item avx512vp2intersect
+Target supports the execution of @code{avx512vp2intersect} instructions.
+
 @item cell_hw
 Test system can execute AltiVec and Cell PPU instructions.

Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog (revision 272668)
+++ testsuite/ChangeLog (working copy)
@@ -1,3 +1,11 @@
+2019-06-27  Hongtao Liu  
+
+ * lib/target-supports.exp: Add
+ check_effective_target_avx512vp2intersect.
+ * gcc.target/i386/avx512vp2intersect-2intersect-1b.c: Add
+ dg-require-effective-target avx512vp2intersect.
+ * gcc.target/i386/avx512vp2intersect-2intersectvl-1b.c: Ditto.
+
 2019-06-06  Hongtao Liu  
  Olga Makhotina  

Index: testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c
===
--- testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c
(revision 272668)
+++ testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -mavx512vp2intersect" } */
+/* { dg-require-effective-target "avx512vp2intersect" } */

 #define AVX512F
 #include 
Index: testsuite/gcc.target/i386/avx512vp2intersect-2intersectvl-1b.c
===
--- testsuite/gcc.target/i386/avx512vp2intersect-2intersectvl-1b.c
(revision 272668)
+++ testsuite/gcc.target/i386/avx512vp2intersect-2intersectvl-1b.c
(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -mavx512vp2intersect -mavx512vl" } */
+/* { dg-require-effective-target "avx512vp2intersect" } */

 #define AVX512F
 #include 
Index: testsuite/lib/target-supports.exp
===
--- testsuite/lib/target-supports.exp (revision 272667)
+++ testsuite/lib/target-supports.exp (working copy)
@@ -7963,6 +7963,20 @@
 } "-mavx512bw" ]
 }

+# Return 1 if avx512vp2intersect instructions can be compiled.
+proc check_effective_target_avx512vp2intersect { } {
+return [check_no_compiler_messages avx512vp2intersect object {
+ typedef int __v16si __attribute__ ((__vector_size__ (64)));
+ typedef short __mmask16;
+ void
+ _mm512_2intersect_epi32 (__v16si __A, __v16si __B, __mmask16 *__U,
+ __mmask16 *__M)
+ {
+ __builtin_ia32_2intersectd512 (__U, __M, (__v16si) __A, (__v16si) __B);
+ }
+} "-mavx512vp2intersect" ]
+}
+
 # Return 1 if avx512ifma instructions can be compiled.
 proc check_effective_target_avx512ifma { } {
 return [check_no_compiler_messages avx512ifma object {
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University

-- 
BR,
Hongtao


RE: Use ODR for canonical types construction in LTO

2019-06-26 Thread JiangNing OS
Hi,

This commit https://gcc.gnu.org/viewcvs/gcc?view=revision=272628 is 
breaking trunk LTO on some real benchmarks, so can it be fixed or reverted? For 
example,

lto1: error: type variant differs by TYPE_CXX_ODR_P
  constant 256>
unit-size  constant 32>
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x99943d08
fields 
public unsigned DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set 7 
structural-equality
pointer_to_this >
unsigned DI :0:0 size  
unit-size 
align:64 warn_if_not_align:0 offset_align 128
offset 
bit-offset  context 
chain 
unsigned DI :0:0 size  
unit-size 
align:64 warn_if_not_align:0 offset_align 128 offset  bit-offset  context 
 chain >>
reference_to_this  chain >
  constant 256>
unit-size  constant 32>
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x99943d08
fields 
public unsigned DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set 7 
structural-equality
pointer_to_this >
unsigned DI :0:0 size  
unit-size 
align:64 warn_if_not_align:0 offset_align 128
offset 
bit-offset  context 
chain 
unsigned DI :0:0 size  
unit-size 
align:64 warn_if_not_align:0 offset_align 128 offset  bit-offset  context 
 chain >>
pointer_to_this  reference_to_this 
>
lto1: internal compiler error: 'verify_type' failed
0xe33e93 verify_type(tree_node const*)
../../gcc/gcc/tree.c:14655
0x5efc4b lto_fixup_state
../../gcc/gcc/lto/lto-common.c:2429
0x5fc01b lto_fixup_decls
../../gcc/gcc/lto/lto-common.c:2460
0x5fc01b read_cgraph_and_symbols(unsigned int, char const**)
../../gcc/gcc/lto/lto-common.c:2693
0x5ded23 lto_main()
../../gcc/gcc/lto/lto.c:616
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
lto-wrapper: fatal error: /home/amptest/gcc/install_last//bin/g++ returned 1 
exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org 
> On Behalf Of Christophe Lyon
> Sent: Tuesday, June 25, 2019 8:30 PM
> To: Jan Hubicka 
> Cc: Eric Botcazou ; gcc Patches  patc...@gcc.gnu.org>; Richard Biener ; d...@dcepelik.cz;
> Martin Liška 
> Subject: Re: Use ODR for canonical types construction in LTO
> 
> Hi,
> 
> 
> On Tue, 25 Jun 2019 at 10:20, Jan Hubicka  wrote:
> >
> > > > * gcc-interface/decl.c (gnat_to_gnu_entity): Check that
> > > > type is array or integer prior checking string flag.
> > >
> > > The test for array is superfluous here.
> > >
> > > > * gcc-interface/gigi.h (gnat_signed_type_for,
> > > > maybe_character_value): Likewise.
> > >
> > > Wrong ChangeLog, the first modified function is maybe_character_type.
> > >
> > > I have installed the attached patchlet after testing it on x86-64/Linux.
> > >
> > >
> > >   * gcc-interface/decl.c (gnat_to_gnu_entity): Remove superfluous test
> in
> > >   previous change.
> > >   * gcc-interface/gigi.h (maybe_character_type): Fix formatting.
> > >   (maybe_character_value): Likewise.
> >
> > Thanks a lot. I was not quite sure if ARRAY_TYPEs can happen there and
> > I should have added you to the CC.
> >
> 
> After the main commit (r272628), I have noticed regressions on arm and
> aarch64:
> 
> g++.dg/lto/pr60336 cp_lto_pr60336_0.o-cp_lto_pr60336_0.o link, -O0 -flto
> -flto-partition=1to1 -fno-use-linker-plugin  (internal compiler
> error)
> g++.dg/lto/pr60336 cp_lto_pr60336_0.o-cp_lto_pr60336_0.o link, -O0 -flto
> -flto-partition=none -fuse-linker-plugin (internal compiler
> error)
> g++.dg/lto/pr60336 cp_lto_pr60336_0.o-cp_lto_pr60336_0.o link, -O0 -flto
> -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler
> error)
> g++.dg/lto/pr60336 cp_lto_pr60336_0.o-cp_lto_pr60336_0.o link, -O2 -flto
> -flto-partition=1to1 -fno-use-linker-plugin  (internal compiler
> error)
> g++.dg/lto/pr60336 cp_lto_pr60336_0.o-cp_lto_pr60336_0.o link, -O2 -flto
> -flto-partition=none -fuse-linker-plugin -fno-fat-lto-objects (internal 
> compiler
> error)
> g++.dg/lto/pr60336 cp_lto_pr60336_0.o-cp_lto_pr60336_0.o link, -O2 -flto
> -fuse-linker-plugin (internal compiler error)
> g++.dg/torture/pr45843.C   -O2 -flto -fno-use-linker-plugin
> -flto-partition=none  (internal compiler error)
> g++.dg/torture/pr45843.C   -O2 -flto -fuse-linker-plugin
> -fno-fat-lto-objects  (internal compiler error)
> g++.dg/torture/stackalign/eh-vararg-1.C   -O2 -flto
> -fno-use-linker-plugin -flto-partition=none  (internal compiler error)
> 

Re: [RFA][tree-optimization/90883] Improve DSE to handle redundant calls

2019-06-26 Thread Jeff Law
On 6/26/19 7:14 PM, Bill Schmidt wrote:
> Looks like this patch breaks bootstrap.
> 
> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c: In function
> 'void dse\
> _optimize_redundant_stores(gimple*)':
> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:649:46: error:
> ISO C++\
>  forbids converting a string constant to 'char*' [-Werror=write-strings]
>   649 |   delete_dead_or_redundant_assignment (, "redundant");
>   |  ^~~
> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:651:40: error:
> ISO C++\
>  forbids converting a string constant to 'char*' [-Werror=write-strings]
>   651 |   delete_dead_or_redundant_call (, "redundant");
>   |    ^~~
> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c: In member
> function 'v\
> oid dse_dom_walker::dse_optimize_stmt(gimple_stmt_iterator*)':
> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:979:41: error:
> ISO C++\
>  forbids converting a string constant to 'char*' [-Werror=write-strings]
>   979 | delete_dead_or_redundant_call (gsi, "dead");
>   | ^~
> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:1007:39: error:
> ISO C+\
> + forbids converting a string constant to 'char*' [-Werror=write-strings]
>  1007 |   delete_dead_or_redundant_call (gsi, "dead");
>   |   ^~
> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:1066:49: error:
> ISO C+\
> + forbids converting a string constant to 'char*' [-Werror=write-strings]
>  1066 |   delete_dead_or_redundant_assignment (gsi, "dead");
Strange as I know I've bootstrapped and tested.  I'll take care of it

Thanks,
jeff


Re: [RFA][tree-optimization/90883] Improve DSE to handle redundant calls

2019-06-26 Thread Bill Schmidt
Looks like this patch breaks bootstrap.

/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c: In function
'void dse\
_optimize_redundant_stores(gimple*)':
/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:649:46: error:
ISO C++\
 forbids converting a string constant to 'char*' [-Werror=write-strings]
  649 |   delete_dead_or_redundant_assignment (, "redundant");
  |  ^~~
/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:651:40: error:
ISO C++\
 forbids converting a string constant to 'char*' [-Werror=write-strings]
  651 |   delete_dead_or_redundant_call (, "redundant");
  |    ^~~
/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c: In member
function 'v\
oid dse_dom_walker::dse_optimize_stmt(gimple_stmt_iterator*)':
/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:979:41: error:
ISO C++\
 forbids converting a string constant to 'char*' [-Werror=write-strings]
  979 | delete_dead_or_redundant_call (gsi, "dead");
  | ^~
/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:1007:39: error:
ISO C+\
+ forbids converting a string constant to 'char*' [-Werror=write-strings]
 1007 |   delete_dead_or_redundant_call (gsi, "dead");
  |   ^~
/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:1066:49: error:
ISO C+\
+ forbids converting a string constant to 'char*' [-Werror=write-strings]
 1066 |   delete_dead_or_redundant_assignment (gsi, "dead");
  | ^~

Thanks,
Bill

On 6/26/19 1:13 PM, Jeff Law wrote:
> On 6/26/19 5:53 AM, Richard Biener wrote:
>> On Wed, Jun 26, 2019 at 6:17 AM Jeff Law  wrote:
>>> So based on the conversation in the BZ I cobbled together a patch to
>>> extend tree-ssa-dse.c to also detect redundant stores.
>>>
>>> To be clear, given two stores, the first store is dead if the later
>>> store overwrites all the live bytes set by the first store.   In this
>>> case we delete the first store.  If the first store is partially dead we
>>> may trim it.
>>>
>>> Given two stores, the second store is redundant if it stores _the same
>>> value_ into locations set by the first store.  In this case we delete
>>> the second store.
>>>
>>>
>>> We prefer to remove redundant stores over removing dead or trimming
>>> partially dead stores.
>>>
>>> First, if we detect a redundant store, we can always remove it.  We may
>>> not always be able to trim a partially dead store.  So removing the
>>> redundant store wins in this case.
>>>
>>> But even if the redundant store occurs at the head or tail of the prior
>>> store, removing the redundant store is better than trimming the
>>> partially dead store because we end up with fewer calls to memset with
>>> the same number of total bytes written.
>>>
>>> We only look for redundant stores in a few cases.  The first store must
>>> be a memset, empty constructor or calloc call -- ie things which
>>> initialize multiple memory locations to zero.  Subsequent stores can
>>> occur via memset, empty constructors or simple memory assignments.
>>>
>>> The chagne to tree-ssa-alias.c deserves a quick note.
>>>
>>> When we're trying to determine if we have a redundant store, we create
>>> an AO_REF for the *second* store, then ask the alias system if the first
>>> store would kill the AO_REF.
>>>
>>> So while normally a calloc wouldn't ever kill anything in normal
>>> execution order, we're not asking about things in execution order.  We
>>> really just want to know if the calloc is going to write into the
>>> entirety of the AO_REF of the subsequent store.  So we compute the size
>>> of the allocation and we know the destination from the LHS of the calloc
>>> call and everything "just works".
>> I see how stmt_kills_ref_p is convenient here and it's the only
>> ref-must-include-other-ref kind of query the oracle supports right now.
>> Note it is not optimized for your particular case querying the same
>> stmt for multiple refs.  It's not refs_must_alias_p that is missing
>> but something stronger ('kills' is also wrong since both refs might
>> be reads), ref_covered_by_ref_p or so.  That said, factoring
>> stmt_kills_ref_p might not be so straight-forward for calls since
>> we lack a general ao_ref_init for calls (ao_ref_init_stores_from_call,
>> ao_ref_init_loads_from_call?).
>>
>> So I think the tree-ssa-alias.c change is fine but please put a
>> comment before
>>
>> + if (DECL_FUNCTION_CODE (callee) == BUILT_IN_CALLOC)
>> +   {
>> + tree arg0 = gimple_call_arg (stmt, 0);
>>
>> explaining this is used by DSE to detect redundant stores.
> Agreed.  I should have done this given I called it out in the email.
>
> Jeff
>



Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-06-26 Thread Jeff Law
On 6/18/19 9:19 PM, Martin Sebor wrote:
> On 6/14/19 2:59 PM, Jeff Law wrote:
[ big snip ]
>> A COND_EXPR on the RHS of an assignment is valid gimple.  That's what we
>> need to consider here -- what is and what is not valid gimple.  And its
>> more likely that PHIs will be transformed into RHS COND_EXPRs -- that's
>> standard practice for if-conversion.
>>
>> Gosh, how to get one?  It happens all the time :-)  Since I know DOM so
>> well, I just shoved a quick assert into optimize_stmt to abort if we
>> encounter a gimple assignment where the RHS is a COND_EXPR.  It blew up
>> instantly building libgcc :-)
>>
>> COmpile the attached code with -O2 -m32.
> 
> I wasn't saying it's not valid Gimple, just that I hadn't seen it
> come up here despite compiling Glibc and the kernel with the patched
> GCC.  The only codes I saw are these:
> 
>   addr_expr, array_ref, bit_and_expr, component_ref, max_expr,
>   mem_ref, nop_expr, parm_decl, pointer_plus_expr, ssa_name,
>   and var_decl
The only one here that's really surprising is the MAX_EXPR.  But it is
what it is.

> 
> What I was asking for is a test case that makes COND_EXPR come up
> in this context.  But I managed to find one by triggering the ICE
> with GDB.  The file reduced to the following test case:
Sorry.  email can be a tough medium to nail down specific details.

> 
>   extern struct S s;   // S has to be an incomplete type
> 
>   void* f (int n)
>   {
> void *p;
> int x = 0;
> 
> for (int i = n; i >= 0; i--)
>   {
> p = 
> if (p == (void*)-1)
>    x = 1;
> else if (p)
>    return p;
>   }
> 
> return x ? (void*)-1 : 0;
>   }
> 
> and the dump:
> 
>    [local count: 59055800]:
>   # x_10 = PHI <1(5), 0(2)>
>   _5 = x_10 != 0 ? -1B : 0B;
> 
>    [local count: 114863532]:
>   # _3 = PHI <(4), _5(6), (3)>
>   return _3;
> 
> It seems a little odd that the COND_EXPR disappears when either
> of the constant addresses becomes the address of an object (or
> the result of alloca), and also when the number of iterations
> of the loop is constant.  That's probably why it so rarely comes
> up in this context.
Going into phiopt2 we have:

;;   basic block 6, loop depth 0
;;pred:   5
  if (x_1 != 0)
goto ; [71.00%]
  else
goto ; [29.00%]
;;succ:   8
;;7

;;   basic block 7, loop depth 0
;;pred:   6
;;succ:   8

;;   basic block 8, loop depth 0
;;pred:   3
;;7
;;6
  # _3 = PHI <(3), 0B(7), -1B(6)>
  return _3;

The subgraph starting at block #6 is a classic case for turning branchy
code into straightline code using a COND_EXPR on the RHS of an
assignment.  So you end up with something like this:

;;   basic block 6, loop depth 0
;;pred:   5
  _5 = x_1 != 0 ? -1B : 0B;
;;succ:   7

;;   basic block 7, loop depth 0
;;pred:   3
;;6
  # _3 = PHI <(3), _5(6)>
  return _3;


Now for this specific case within phiopt we are limited to cases there
the result is 0/1 or 0/-1.  That's why you get something different when
you exchange one of the constants for the address of an object, or
anything else for that matter.

This is all a bit academic -- the key point is that we can have a
COND_EXPR on the RHS of an assignment.  That's allowed by gimple.

Sadly this is also likely one of the places where target characteristics
come into play -- targets define a BRANCH_COST which can significantly
change the decisions for the initial generation of conditionals.  It's
one of the things that makes writing  tests for jump threading, if
conversion and other optimizations so damn painful -- on one target
we'll have a series of conditional jumps, on anothers we may have a
series of logicals, potentially with COND_EXPRs.


> 
> That said, even though I've seen a few references to COND_EXPR
> in the midle-end I have been assuming that they, in general, do
> get transformed into PHIs.  But checking the internals manual
> I found in Section 12.6.3 this:
> 
>   A C ?: expression is converted into an if statement with each
>   branch assigning to the same temporary. ... The GIMPLE level
>   if-conversion pass re-introduces ?: expression, if appropriate.
>   It is used to vectorize loops with conditions using vector
>   conditional operations.
> 
> This GDB test case is likely the result of this reintroduction.
Nope.  It happens much earlier in the pipeline :-)


>>
>> And in a more general sense, this kind of permissiveness is not future
>> proof.  What happens if someone needs to add another EXPR node that is
>> valid on the RHS where such recursion is undesirable?  How are they
>> supposed to know that we've got this permissive recursive call and that
>> it's going to do the wrong thing?  And if it's an EXPR node with no
>> arguments, then we're going to do a read out of the bounds of the object
>> and all bets are off at that point (yes we have zero operand EXPR nodes,
>> but thankfully I don't 

[PATCH] C++20 constexpr lib part 2/3

2019-06-26 Thread Ed Smith-Rowland via gcc-patches

Implement C++20 p0879 - Constexpr for swap and swap related functions.

This is much smaller than the first but also basically marks swap and 
the algorithms that depend on swap as constexpr.


It is similarly tested on x86_64-linux:

$ make check -k -j4

$ make check RUNTESTFLAGS=--target_board=unix/-std=gnu++2a -k -j4

OK for trunk (after part 1 is in)?

Ed Smith-Rowland

2019-06-26  Edward Smith-Rowland  <3dw...@verizon.net>

Implement C++20 p0879 - Constexpr for swap and swap related functions.
* include/bits/algorithmfwd.h (__cpp_lib_constexpr_swap_algorithms):
New macro. (iter_swap, make_heap, next_permutation, partial_sort_copy,
pop_heap, prev_permutation, push_heap, reverse, rotate, sort_heap,
swap, swap_ranges, nth_element, partial_sort, sort): Add constexpr.
* include/bits/move.h (swap): Add constexpr.
* include/bits/stl_algo.h (__move_median_to_first, __reverse, reverse,
__gcd, __rotate, rotate, __partition, __heap_select,
__partial_sort_copy, partial_sort_copy, __unguarded_partition,
__unguarded_partition_pivot, __partial_sort, __introsort_loop, __sort,
__introselect, __chunk_insertion_sort, next_permutation,
prev_permutation, partition, partial_sort, nth_element, sort,
__iter_swap::iter_swap, iter_swap, swap_ranges): Add constexpr.
* include/bits/stl_algobase.h (__iter_swap::iter_swap, iter_swap,
swap_ranges): Add constexpr.
* include/bits/stl_heap.h (__push_heap, push_heap, __adjust_heap,
__pop_heap, pop_heap, __make_heap, make_heap, __sort_heap, sort_heap):
Add constexpr.
* include/std/type_traits (swap): Add constexpr.
* testsuite/25_algorithms/headers/algorithm/synopsis.cc: Add constexpr.
* testsuite/25_algorithms/iter_swap/constexpr.cc: New test.
* testsuite/25_algorithms/make_heap/constexpr.cc: New test.
* testsuite/25_algorithms/next_permutation/constexpr.cc: New test.
* testsuite/25_algorithms/nth_element/constexpr.cc: New test.
* testsuite/25_algorithms/partial_sort/constexpr.cc: New test.
* testsuite/25_algorithms/partial_sort_copy/constexpr.cc: New test.
* testsuite/25_algorithms/partition/constexpr.cc: New test.
* testsuite/25_algorithms/pop_heap/constexpr.cc: New test.
* testsuite/25_algorithms/prev_permutation/constexpr.cc: New test.
* testsuite/25_algorithms/push_heap/constexpr.cc: New test.
* testsuite/25_algorithms/reverse/constexpr.cc: New test.
* testsuite/25_algorithms/rotate/constexpr.cc: New test.
* testsuite/25_algorithms/sort/constexpr.cc: New test.
* testsuite/25_algorithms/sort_heap/constexpr.cc: New test.
* testsuite/25_algorithms/swap/constexpr.cc: New test.
* testsuite/25_algorithms/swap_ranges/constexpr.cc: New test.

diff --git a/libstdc++-v3/include/bits/algorithmfwd.h 
b/libstdc++-v3/include/bits/algorithmfwd.h
index 14bdad9a61e..28b3388edaf 100644
--- a/libstdc++-v3/include/bits/algorithmfwd.h
+++ b/libstdc++-v3/include/bits/algorithmfwd.h
@@ -193,6 +193,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #if __cplusplus > 201703L
 #  define __cpp_lib_constexpr_algorithms 201711L
+#  define __cpp_lib_constexpr_swap_algorithms 201712L
 #endif
 
 #if __cplusplus >= 201103L
@@ -377,6 +378,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   template
+_GLIBCXX20_CONSTEXPR
 void
 iter_swap(_FIter1, _FIter2);
 
@@ -391,10 +393,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 lower_bound(_FIter, _FIter, const _Tp&, _Compare);
 
   template
+_GLIBCXX20_CONSTEXPR
 void
 make_heap(_RAIter, _RAIter);
 
   template
+_GLIBCXX20_CONSTEXPR
 void
 make_heap(_RAIter, _RAIter, _Compare);
 
@@ -478,10 +482,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // mismatch
 
   template
+_GLIBCXX20_CONSTEXPR
 bool
 next_permutation(_BIter, _BIter);
 
   template
+_GLIBCXX20_CONSTEXPR
 bool
 next_permutation(_BIter, _BIter, _Compare);
 
@@ -496,10 +502,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // partial_sort
 
   template
+_GLIBCXX20_CONSTEXPR
 _RAIter
 partial_sort_copy(_IIter, _IIter, _RAIter, _RAIter);
 
   template
+_GLIBCXX20_CONSTEXPR
 _RAIter
 partial_sort_copy(_IIter, _IIter, _RAIter, _RAIter, _Compare);
 
@@ -519,26 +527,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   template
+_GLIBCXX20_CONSTEXPR
 void
 pop_heap(_RAIter, _RAIter);
 
   template
+_GLIBCXX20_CONSTEXPR
 void
 pop_heap(_RAIter, _RAIter, _Compare);
 
   template
+_GLIBCXX20_CONSTEXPR
 bool
 prev_permutation(_BIter, _BIter);
 
   template
+_GLIBCXX20_CONSTEXPR
 bool
 prev_permutation(_BIter, _BIter, _Compare);
 
   template
+_GLIBCXX20_CONSTEXPR
 void
 push_heap(_RAIter, _RAIter);
 
   template
+_GLIBCXX20_CONSTEXPR
 void
 push_heap(_RAIter, _RAIter, _Compare);
 
@@ -579,6 +593,7 @@ 

[PATCH 1/3] C++20 constexpr lib part 1/3

2019-06-26 Thread Ed Smith-Rowland via gcc-patches

Here is the first of three patches for C++20 constexpr library.

?? Implement C++20 p0202 - Add constexpr Modifiers to Functions in 
 and  Headers.

 ??Implement C++20 p1023 - constexpr comparison operators for std::array.

I believe I have answered peoples concerns with the last patch attempts 
[https://gcc.gnu.org/ml/libstdc++/2019-03/msg00132.html].


The patch is large because of test cases but really just boils down to 
adding constexpr for c++2a.


The patch passes testing for gnu++2a and pre-gnu++2a on x86_64-linux:

$ make check -k -j4

$ make check RUNTESTFLAGS=--target_board=unix/-std=gnu++2a -k -j4

OK for trunk?

Ed Smith-Rowland


2019-06-26  Edward Smith-Rowland  <3dw...@verizon.net>

Implement C++20 p0202 - Add Constexpr Modifiers to Functions
in  and  Headers.
Implement C++20 p1023 - constexpr comparison operators for std::array.
* include/bits/algorithmfwd.h (all_of, any_of, binary_search, copy,
copy_backward, copy_if, copy_n, equal_range, fill, find_end,
find_if_not, includes, is_heap, is_heap_until, is_partitioned,
is_permutation, is_sorted, is_sorted_until, iter_swap, lower_bound,
none_of, partition_copy, partition_point, remove, remove_if,
remove_copy, remove_copy_if, replace_copy, replace_copy_if,
reverse_copy, rotate_copy, uunique, upper_bound, adjacent_find, count,
count_if, equal, find, find_first_of, find_if, for_each, generate,
generate_n, lexicographical_compare, merge, mismatch, replace,
replace_if, search, search_n, set_difference, set_intersection,
set_symmetric_difference, set_union, transform, unique_copy):
Mark constexpr.
* include/bits/cpp_type_traits.h (__miter_base): Mark constexpr.
* include/bits/predefined_ops.h (_Iter_less_val::operator(),
_Val_less_iter::operator(), _Iter_equal_to_iter::operator(),
_Iter_equal_to_val::operator(), _Iter_equals_val::operator()):
 Use const ref instead of ref arg;
(_Iter_less_val, __iter_less_val, _Val_less_iter, __val_less_iter,
__iter_equal_to_iter, __iter_equal_to_val, __iter_comp_val,
_Iter_comp_val, _Val_comp_iter, __val_comp_iter, __iter_equals_val,
_Iter_equals_iter, __iter_comp_iter, _Iter_pred, __pred_iter,
_Iter_comp_to_val, __iter_comp_val, _Iter_comp_to_iter,
__iter_comp_iter): Mark constexpr.
* include/bits/stl_algo.h (__find_if, __find_if_not, __find_if_not_n,
__search, __search_n_aux, __search_n, __find_end, find_end, all_of,
none_of, any_of, find_if_not, is_partitioned, partition_point,
__remove_copy_if, remove_copy, remove_copy_if, copy_if, __copy_n,
copy_n, partition_copy, __remove_if, remove, remove_if, __adjacent_find,
__unique, unique, __unique_copy, reverse_copy, rotate_copy,
__unguarded_linear_insert, __insertion_sort, __unguarded_insertion_sort,
__final_insertion_sort, lower_bound, __upper_bound, upper_bound,
__equal_range, equal_range, binary_search, __includes, includes,
__next_permutation, __prev_permutation, __replace_copy_if, replace_copy,
replace_copy_if, __count_if, is_sorted, __is_sorted_until,
is_sorted_until, __is_permutation, is_permutation, for_each, find,
find_if, find_first_of, adjacent_find, count, count_if, search,
search_n, transform, replace, replace_if, generate, generate_n,
unique_copy, __merge, merge, __set_union, set_union, __set_intersection,
set_intersection, __set_difference, set_difference,
__set_symmetric_difference, set_symmetric_difference):  Mark constexpr.
* include/bits/stl_algobase.h (__memmove, __memcmp): New maybe constexpr
wrappers around __builtin_memmove and __builtin_memcmp
respectively;
(__niter_base, __niter_wrap, __copy_m, __copy_move_a, __copy_move_a2,
copy, move, __copy_move_b, __copy_move_backward_a,
__copy_move_backward_a2, copy_backward, move_backward, __fill_a, fill,
__fill_n_a, fill_n, equal, __lc_rai::__newlast1, __lc_rai::__cnd2,
__lexicographical_compare_impl, __lexicographical_compare,
__lexicographical_compare::__lc, __lexicographical_compare_aux,
__lower_bound, lower_bound, equal, __equal4, lexicographical_compare,
__mismatch, mismatch, __is_heap_until, __is_heap, is_heap_until,
is_heap): Mark constexpr.
* include/bits/stl_heap.h (__is_heap_until, __is_heap, is_heap_until,
is_heap): Mark constexpr.
* include/bits/stl_iterator.h (__niter_base, __miter_base): Mark 
constexpr.
* include/std/array: Make comparison ops constexpr.
* include/std/utility: Make exchange constexpr.
* testsuite/23_containers/array/tuple_interface/get_neg.cc: Adjust.
* testsuite/23_containers/array/tuple_interface/
tuple_element_neg.cc: Adjust.
* 

[PATCH] Define std::chars_format enumeration type

2019-06-26 Thread Jonathan Wakely

This type isn't used anywhere yet, but will be needed for the
floating-point overloads of to_chars and from_chars.

* include/std/charconv (chars_format): Define bitmask type.
* testsuite/20_util/to_chars/chars_format.cc: New test.

Tested x86_64-linux, committed to trunk.


commit 63c78e2e9bf4a99f79f3a34f4e8b35c6139cb866
Author: redi 
Date:   Wed Jun 26 22:54:38 2019 +

Define std::chars_format enumeration type

This type isn't used anywhere yet, but will be needed for the
floating-point overloads of to_chars and from_chars.

* include/std/charconv (chars_format): Define bitmask type.
* testsuite/20_util/to_chars/chars_format.cc: New test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272718 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/std/charconv 
b/libstdc++-v3/include/std/charconv
index 6a3399764ba..53aa63ea277 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -616,6 +616,40 @@ namespace __detail
   return __res;
 }
 
+  /// floating-point format for primitive numerical conversion
+  enum class chars_format
+  {
+scientific = 1, fixed = 2, hex = 4, general = fixed | scientific
+  };
+
+  constexpr chars_format
+  operator|(chars_format __lhs, chars_format __rhs) noexcept
+  { return (chars_format)((unsigned)__lhs | (unsigned)__rhs); }
+
+  constexpr chars_format
+  operator&(chars_format __lhs, chars_format __rhs) noexcept
+  { return (chars_format)((unsigned)__lhs & (unsigned)__rhs); }
+
+  constexpr chars_format
+  operator^(chars_format __lhs, chars_format __rhs) noexcept
+  { return (chars_format)((unsigned)__lhs ^ (unsigned)__rhs); }
+
+  constexpr chars_format
+  operator~(chars_format __fmt) noexcept
+  { return (chars_format)~(unsigned)__fmt; }
+
+  constexpr chars_format&
+  operator|=(chars_format& __lhs, chars_format __rhs) noexcept
+  { return __lhs = __lhs | __rhs; }
+
+  constexpr chars_format&
+  operator&=(chars_format& __lhs, chars_format __rhs) noexcept
+  { return __lhs = __lhs & __rhs; }
+
+  constexpr chars_format&
+  operator^=(chars_format& __lhs, chars_format __rhs) noexcept
+  { return __lhs = __lhs ^ __rhs; }
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 #endif // C++14
diff --git a/libstdc++-v3/testsuite/20_util/to_chars/chars_format.cc 
b/libstdc++-v3/testsuite/20_util/to_chars/chars_format.cc
new file mode 100644
index 000..f343c58b0eb
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/to_chars/chars_format.cc
@@ -0,0 +1,52 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do compile { target c++17 } }
+
+#include 
+
+// C++17 23.2.1 [utility.syn]
+// chars_format is a bitmask type with elements scientific, fixed and hex
+
+using F = std::chars_format;
+const F none = F{};
+const F all = ~none;
+static_assert(std::is_enum_v);
+static_assert((F::scientific & F::fixed) == none);
+static_assert((F::scientific & F::hex) == none);
+static_assert((F::fixed & F::hex) == none);
+static_assert(F::general == (F::fixed | F::scientific));
+static_assert(F::general == (F::fixed ^ F::scientific));
+
+// sanity check operators
+static_assert((F::scientific & F::scientific) == F::scientific);
+static_assert((F::fixed & F::fixed) == F::fixed);
+static_assert((F::hex & F::hex) == F::hex);
+static_assert((F::general & F::general) == F::general);
+static_assert((F::scientific | F::scientific) == F::scientific);
+static_assert((F::fixed | F::fixed) == F::fixed);
+static_assert((F::hex | F::hex) == F::hex);
+static_assert((F::general | F::general) == F::general);
+static_assert((F::scientific ^ F::scientific) == none);
+static_assert((F::fixed ^ F::fixed) == none);
+static_assert((F::hex ^ F::hex) == none);
+static_assert((F::general ^ F::general) == none);
+static_assert((F::fixed & all) == F::fixed);
+static_assert((F::hex & all) == F::hex);
+static_assert((F::general & all) == F::general);
+static_assert(~all == none);


Re: [PATCH] RISC-V: Add -malign-data= option.

2019-06-26 Thread Ilia Diachkov

Hmm, may I suggest use "natural" rather than "abi" and 32bit or 64bit
rather than "word"; it is not obvious what abi means and it is not
obvious what word means here; it could be either 32bit or 64bit
depending on the option.


It's actually worse: in RISC-V "word" always means 32-bit 
(BITS_PER_WORD is a
GCC name).  "natural" seems like a good term for the "align to the size 
of the
type".  The RISC-V term for "the width of an integer register" is 
"xlen", so I

think that's a good bet for the other option.


Also my other suggestion is create a new macro where you pass
riscv_align_data_type == riscv_align_data_type_word for the "(ALIGN) <
BITS_PER_WORD) " check to reduce the code duplication.


Thanks, Andrew and Palmer. I have updated the patch (see attached) by 
new option values "natural" and "xlen". Also I have added macro 
RISCV_EXPAND_ALIGNMENT similar to one implemented in ARM.


Additionally, has this been tested with "-mstrict-align"?  The 
generated code
can be awful, but if it's not correct then we should throw an error on 
that

combination.


I have tested the new option with -mstrict-align and found no influence 
on each other. Indeed, -malign-data=xlen with -mstrict-align works in 
the same way as you run the current gcc with -mstrict-align. What about 
-malign-data=natural with -mstrict-align, I have tested it by dejagnu 
with modified target_board which additionally passes 
-malign-data=natural (the first run) and -malign-data=natural with 
-mstrict-align (the second run). Both runs shown the same result.


Best regards,
Ilia.

gcc/
* config/riscv/riscv-opts.h (struct riscv_align_data): Added.
	* config/riscv/riscv.c (riscv_constant_alignment): Use 
riscv_align_data_type.

* config/riscv/riscv.h (RISCV_EXPAND_ALIGNMENT): Added.
(DATA_ALIGNMENT): Use RISCV_EXPAND_ALIGNMENT.
(LOCAL_ALIGNMENT): Use RISCV_EXPAND_ALIGNMENT.
* config/riscv/riscv.opt (malign-data): New.
* doc/invoke.texi (RISC-V Options): Document -malign-data=.

---
 gcc/config/riscv/riscv-opts.h |  5 +
 gcc/config/riscv/riscv.c  |  3 ++-
 gcc/config/riscv/riscv.h  | 17 +++--
 gcc/config/riscv/riscv.opt| 14 ++
 gcc/doc/invoke.texi   | 10 +-
 5 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index f3031f2..d00fbe2 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -46,4 +46,9 @@ enum riscv_microarchitecture_type {
 };
 extern enum riscv_microarchitecture_type riscv_microarchitecture;
 
+enum riscv_align_data {
+  riscv_align_data_type_xlen,
+  riscv_align_data_type_natural
+};
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index d61455f..bc457803 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -4904,7 +4904,8 @@ riscv_can_change_mode_class (machine_mode, machine_mode, reg_class_t rclass)
 static HOST_WIDE_INT
 riscv_constant_alignment (const_tree exp, HOST_WIDE_INT align)
 {
-  if (TREE_CODE (exp) == STRING_CST || TREE_CODE (exp) == CONSTRUCTOR)
+  if ((TREE_CODE (exp) == STRING_CST || TREE_CODE (exp) == CONSTRUCTOR)
+  && (riscv_align_data_type == riscv_align_data_type_xlen))
 return MAX (align, BITS_PER_WORD);
   return align;
 }
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 8856cee..2e27e83 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -168,6 +168,13 @@ along with GCC; see the file COPYING3.  If not see
mode that should actually be used.  We allow pairs of registers.  */
 #define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TARGET_64BIT ? TImode : DImode)
 
+/* DATA_ALIGNMENT and LOCAL_ALIGNMENT common definition.  */
+#define RISCV_EXPAND_ALIGNMENT(COND, TYPE, ALIGN)			\
+  (((COND) && ((ALIGN) < BITS_PER_WORD)	\
+&& (TREE_CODE (TYPE) == ARRAY_TYPE	\
+	|| TREE_CODE (TYPE) == UNION_TYPE\
+	|| TREE_CODE (TYPE) == RECORD_TYPE)) ? BITS_PER_WORD : (ALIGN))
+
 /* If defined, a C expression to compute the alignment for a static
variable.  TYPE is the data type, and ALIGN is the alignment that
the object would ordinarily have.  The value of this macro is used
@@ -180,18 +187,16 @@ along with GCC; see the file COPYING3.  If not see
cause character arrays to be word-aligned so that `strcpy' calls
that copy constants to character arrays can be done inline.  */
 
-#define DATA_ALIGNMENT(TYPE, ALIGN)	\
-  ALIGN) < BITS_PER_WORD)		\
-&& (TREE_CODE (TYPE) == ARRAY_TYPE	\
-	|| TREE_CODE (TYPE) == UNION_TYPE\
-	|| TREE_CODE (TYPE) == RECORD_TYPE)) ? BITS_PER_WORD : (ALIGN))
+#define DATA_ALIGNMENT(TYPE, ALIGN)		\
+  RISCV_EXPAND_ALIGNMENT (riscv_align_data_type == riscv_align_data_type_xlen,	\
+			  TYPE, ALIGN)
 
 /* We need this for the same reason as DATA_ALIGNMENT, namely to cause
character arrays to be word-aligned so that `strcpy' 

Re: [PATCH] constrain one character optimization to one character stores (PR 90989)

2019-06-26 Thread Jeff Law
On 6/25/19 5:03 PM, Martin Sebor wrote:

> 
> The caller ensures that handle_char_store is only called for stores
> to arrays (MEM_REF) or single elements as wide as char.
Where?  I don't see it, even after fixing the formatting in
strlen_check_and_optimize_stmt :-)

>   gimple *stmt = gsi_stmt (*gsi);
> 
>   if (is_gimple_call (stmt))

I think we can agree that we don't have a call, so this is false.

>  else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt))
> {
>   tree lhs = gimple_assign_lhs (stmt);
This should be true IIUC, so we'll go into its THEN block.


>   if (TREE_CODE (lhs) == SSA_NAME && POINTER_TYPE_P (TREE_TYPE (lhs)))
Should be false.

>   else if (TREE_CODE (lhs) == SSA_NAME && INTEGRAL_TYPE_P (TREE_TYPE 
> (lhs)))

Should also be false.

>   else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs))
Should be true since LHS is a MEM_REF.


>{
>   tree type = TREE_TYPE (lhs);
>   if (TREE_CODE (type) == ARRAY_TYPE)
> type = TREE_TYPE (type);
>   if (TREE_CODE (type) == INTEGER_TYPE
>   && TYPE_MODE (type) == TYPE_MODE (char_type_node)
>   && TYPE_PRECISION (type) == TYPE_PRECISION (char_type_node))
> {
>   if (! handle_char_store (gsi))
> return false;
> }
> }
If TREE_TYPE (type) is an ARRAY_TYPE, we strip the ARRAY_TYPE.  We then
verify that TYPE is a single character type.  _But_ we stripped away the
ARRAY_TYPE.  So ISTM that we allow either an array of chars or a single
char on the LHS.

So how does this avoid multiple character stores?!?  We could have had
an ARRAY_REF on the LHS and we never check the number of elements in the
array.  There's no check on the RHS either.  SO I don't see how we
guarantee that we're dealing with a single character store.

What am I missing here?



> 
> What you describe sounds like
> 
>   char a[N];
>   *(int*)a = 0x31323300;
> 
> which is represented as
> 
>   MEM[(int *)] = 825373440;This would be closer (I realize it's not C):

  char a[N];
  a[0..3] = 0x313233300;


> 
> The LHS type of that is int so the function doesn't get called.
I'm concerned about the case where the LHS is an array.

>> And if the NUL byte in the original was at byte offset 2, then didn't we
>> just change the length by overwriting where the NUL is?
> 
> No, because cmp is the result of compare_nonzero_chars and cmp > 0
> means:
> 
>   1 if SI is known to start with more than OFF nonzero characters
> 
> i.e., the character is being stored before the terminating nul.
> This is the basis of the original optimization:
> 
>   /* If si->nonzero_chars > OFFSET, we aren't overwriting '\0',
>  and if we aren't storing '\0', we know that the length of the
>  string and any other zero terminated string in memory remains
>  the same.
But all this is predicated on the assumption that we're dealing with a
single character memory store.  I don't see what enforces that precondition.



Re: [PATCH] Fix warnings seen by clang in gcc/symbol-summary.h.

2019-06-26 Thread Jeff Law
On 6/26/19 12:46 AM, Martin Liška wrote:
> Hi.
> 
> The patch is about missing argument to function call and
> unused arguments in symbol-summary.h.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-06-25  Martin Liska  
> 
>   * symbol-summary.h (traverse): Pass
>   argument a to the call of callback.
>   (gt_ggc_mx): Mark arguments as unused.
>   (gt_pch_nx): Likewise.
OK
jeff


Re: [PATCH 31/30] Update documentation for movmem to cpymem change

2019-06-26 Thread Jeff Law
On 6/26/19 2:16 PM, Aaron Sawdey wrote:
> On 6/25/19 4:43 PM, Jeff Law wrote:
>> On 6/25/19 2:22 PM, acsaw...@linux.ibm.com wrote:
>>> From: Aaron Sawdey 
>>>
>>> * builtins.c (get_memory_rtx): Fix comment.
>>> * optabs.def (movmem_optab): Change to cpymem_optab.
>>> * expr.c (emit_block_move_via_cpymem): Change movmem to cpymem.
>>> (emit_block_move_hints): Change movmem to cpymem.
>>> * defaults.h: Change movmem to cpymem.
>>> * targhooks.c (get_move_ratio): Change movmem to cpymem.
>>> (default_use_by_pieces_infrastructure_p): Ditto.
>> So I think you're missing an update to the RTL/MD documentation.  This
>> is also likely to cause problems for any out-of-tree ports, so it's
>> probably worth a mention in the gcc-10 changes, which will need to be
>> created (in CVS no less, ugh).
>>
>> I think the stuff posted to-date is fine, but it shouldn't go in without
>> the corresponding docs and gcc-10 changes updates.
> This would be my proposed patch to update the documentation. I'll also work
> out what the entry in the gcc 10 changes and post that for review before
> this all goes in.
> 
> OK for trunk along with the other 30 patches?
> 
> Thanks,
> Aaron
> 
> 
> 
>   * doc/md.texi: Change movmem to cpymem and update description to match.
>   * doc/rtl.texi: Change movmem to cpymem.
>   * target.def (use_by_pieces_infrastructure_p): Change movmem to cpymem.
> * doc/tm.texi: Regenerate.
OK.  The entire kit is OK for the trunk now.

jeff


Re: [PATCH 23/30] Changes to rs6000

2019-06-26 Thread Segher Boessenkool
On Tue, Jun 25, 2019 at 03:22:32PM -0500, acsaw...@linux.ibm.com wrote:
> From: Aaron Sawdey 
> 
>   * config/rs6000/rs6000.md: (movmemsi) Change name to cpymemsi.

This is fine.  Thanks!


Segher


Re: [PATCH] Remove quite obvious dead assignments.

2019-06-26 Thread Jeff Law
On 6/26/19 4:57 AM, Martin Liška wrote:
> Hi.
> 
> I've spent some with clang-static-analyzer and I analyzed the warnings 
> reported.
> As always wit analyzers, majority of the issues are false positives, however 
> it caught
> couple of real issues:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90973
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90978
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90976
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90975
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90970
> 
> That said, I'm sending a patch that rapidly shrinks number of Dead 
> assignments.
> I've chosen to remove only these that are quite trivial and that do not span
> among multiple if-else branches.
> 
> I hope the patch will be readable and approved.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. I'm
> also testing that on ppc64 big endian machine.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2019-06-26  Martin Liska  
> 
>   * asan.c (asan_emit_allocas_unpoison): Remove obviously
>   dead assignments.
>   * bt-load.c (move_btr_def): Likewise.
>   * builtins.c (expand_builtin_apply_args_1): Likewise.
>   (expand_builtin_apply): Likewise.
>   * cfgexpand.c (expand_asm_stmt): Likewise.
>   (construct_init_block): Likewise.
>   * cfghooks.c (verify_flow_info): Likewise.
>   * cfgloopmanip.c (remove_path): Likewise.
>   * cfgrtl.c (rtl_verify_bb_layout): Likewise.
>   * cgraph.c (cgraph_node::set_pure_flag): Likewise.
>   * combine.c (simplify_if_then_else): Likewise.
>   * config/i386/i386-expand.c (ix86_expand_rounddf_32): Likewise.
>   * config/i386/i386.c (ix86_setup_incoming_vararg_bounds): Likewise.
>   (choose_basereg): Likewise.
>   (ix86_expand_prologue): Likewise.
>   (ix86_preferred_output_reload_class): Likewise.
>   * cselib.c (cselib_record_sets): Likewise.
>   * df-scan.c (df_scan_alloc): Likewise.
>   * dojump.c (do_jump_by_parts_greater_rtx): Likewise.
>   * early-remat.c (early_remat::record_equiv_candidates): Likewise.
>   * emit-rtl.c (try_split): Likewise.
>   * graphite-scop-detection.c (assign_parameter_index_in_region): 
> Likewise.
>   * ipa-cp.c (cgraph_edge_brings_all_agg_vals_for_node): Likewise.
>   * ira-color.c (setup_profitable_hard_regs): Likewise.
>   * ira.c (rtx_moveable_p): Likewise.
>   * lra-eliminations.c (eliminate_regs_in_insn): Likewise.
>   * read-rtl.c (read_subst_mapping): Likewise.
>   * regrename.c (scan_rtx): Likewise.
>   * reorg.c (fill_slots_from_thread): Likewise.
>   * tree-inline.c (tree_function_versioning): Likewise.
>   * tree-ssa-reassoc.c (optimize_ops_list): Likewise.
>   * tree-ssa-sink.c (statement_sink_location): Likewise.
>   * tree-ssa-threadedge.c (thread_across_edge): Likewise.
>   * tree-vect-loop.c (vect_get_loop_niters): Likewise.
>   (vect_create_epilog_for_reduction): Likewise.
>   * tree-vect-stmts.c (vectorizable_call): Likewise.
>   * tree.c (build_nonstandard_integer_type): Likewise.
> 
> gcc/cp/ChangeLog:
> 
> 2019-06-26  Martin Liska  
> 
>   * class.c (adjust_clone_args): Remove obviously
>   dead assignments.
>   (dump_class_hierarchy_r): Likewise.
>   * decl.c (check_initializer): Likewise.
>   * parser.c (cp_parser_lambda_expression): Likewise.
>   * pt.c (unify_bound_ttp_args): Likewise.
>   (convert_template_argument): Likewise.
>   * rtti.c (build_headof): Likewise.
>   * typeck.c (convert_for_initialization): Likewise.
> 
> libgcc/ChangeLog:
> 
> 2019-06-26  Martin Liska  
> 
>   * libgcov-driver-system.c (gcov_exit_open_gcda_file): Remove obviously
>   dead assignments.
>   * libgcov-util.c: Likewise.
I think you've already received a bit of feedback here, particularly WRT
emit_library_call vs emit_library_call_value.  I think this is fine for
the trunk once you've addressed the comments that have already been made.

jeff



[PATCH 31/30] Update documentation for movmem to cpymem change

2019-06-26 Thread Aaron Sawdey
On 6/25/19 4:43 PM, Jeff Law wrote:
> On 6/25/19 2:22 PM, acsaw...@linux.ibm.com wrote:
>> From: Aaron Sawdey 
>>
>>  * builtins.c (get_memory_rtx): Fix comment.
>>  * optabs.def (movmem_optab): Change to cpymem_optab.
>>  * expr.c (emit_block_move_via_cpymem): Change movmem to cpymem.
>>  (emit_block_move_hints): Change movmem to cpymem.
>>  * defaults.h: Change movmem to cpymem.
>>  * targhooks.c (get_move_ratio): Change movmem to cpymem.
>>  (default_use_by_pieces_infrastructure_p): Ditto.
> So I think you're missing an update to the RTL/MD documentation.  This
> is also likely to cause problems for any out-of-tree ports, so it's
> probably worth a mention in the gcc-10 changes, which will need to be
> created (in CVS no less, ugh).
> 
> I think the stuff posted to-date is fine, but it shouldn't go in without
> the corresponding docs and gcc-10 changes updates.
This would be my proposed patch to update the documentation. I'll also work
out what the entry in the gcc 10 changes and post that for review before
this all goes in.

OK for trunk along with the other 30 patches?

Thanks,
Aaron



* doc/md.texi: Change movmem to cpymem and update description to match.
* doc/rtl.texi: Change movmem to cpymem.
* target.def (use_by_pieces_infrastructure_p): Change movmem to cpymem.
* doc/tm.texi: Regenerate.
---
 gcc/doc/md.texi  | 26 ++
 gcc/doc/rtl.texi |  2 +-
 gcc/doc/tm.texi  |  4 ++--
 gcc/target.def   |  4 ++--
 4 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index b45b4be..3f9d545 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6200,13 +6200,13 @@ This pattern is not allowed to @code{FAIL}.
 @item @samp{one_cmpl@var{m}2}
 Store the bitwise-complement of operand 1 into operand 0.

-@cindex @code{movmem@var{m}} instruction pattern
-@item @samp{movmem@var{m}}
-Block move instruction.  The destination and source blocks of memory
+@cindex @code{cpymem@var{m}} instruction pattern
+@item @samp{cpymem@var{m}}
+Block copy instruction.  The destination and source blocks of memory
 are the first two operands, and both are @code{mem:BLK}s with an
 address in mode @code{Pmode}.

-The number of bytes to move is the third operand, in mode @var{m}.
+The number of bytes to copy is the third operand, in mode @var{m}.
 Usually, you specify @code{Pmode} for @var{m}.  However, if you can
 generate better code knowing the range of valid lengths is smaller than
 those representable in a full Pmode pointer, you should provide
@@ -6226,14 +6226,16 @@ in a way that the blocks are not required to be aligned 
according to it in
 all cases. This expected alignment is also in bytes, just like operand 4.
 Expected size, when unknown, is set to @code{(const_int -1)}.

-Descriptions of multiple @code{movmem@var{m}} patterns can only be
+Descriptions of multiple @code{cpymem@var{m}} patterns can only be
 beneficial if the patterns for smaller modes have fewer restrictions
 on their first, second and fourth operands.  Note that the mode @var{m}
-in @code{movmem@var{m}} does not impose any restriction on the mode of
-individually moved data units in the block.
+in @code{cpymem@var{m}} does not impose any restriction on the mode of
+individually copied data units in the block.

-These patterns need not give special consideration to the possibility
-that the source and destination strings might overlap.
+The @code{cpymem@var{m}} patterns need not give special consideration
+to the possibility that the source and destination strings might
+overlap. These patterns are used to do inline expansion of
+@code{__builtin_memcpy}.

 @cindex @code{movstr} instruction pattern
 @item @samp{movstr}
@@ -6254,7 +6256,7 @@ given as a @code{mem:BLK} whose address is in mode 
@code{Pmode}.  The
 number of bytes to set is the second operand, in mode @var{m}.  The value to
 initialize the memory with is the third operand. Targets that only support the
 clearing of memory should reject any value that is not the constant 0.  See
-@samp{movmem@var{m}} for a discussion of the choice of mode.
+@samp{cpymem@var{m}} for a discussion of the choice of mode.

 The fourth operand is the known alignment of the destination, in the form
 of a @code{const_int} rtx.  Thus, if the compiler knows that the
@@ -6272,13 +6274,13 @@ Operand 9 is the probable maximal size (i.e.@: we 
cannot rely on it for
 correctness, but it can be used for choosing proper code sequence for a
 given size).

-The use for multiple @code{setmem@var{m}} is as for @code{movmem@var{m}}.
+The use for multiple @code{setmem@var{m}} is as for @code{cpymem@var{m}}.

 @cindex @code{cmpstrn@var{m}} instruction pattern
 @item @samp{cmpstrn@var{m}}
 String compare instruction, with five operands.  Operand 0 is the output;
 it has mode @var{m}.  The remaining four operands are like the operands
-of @samp{movmem@var{m}}.  The two memory blocks specified are compared
+of 

[PATCH] PR debug/90981 Empty .debug_addr crashes -gdwarf-5 -gsplit-dwarf

2019-06-26 Thread Mark Wielaard
Even if there was no, or an empty address list we would try to generate
and index for the .debug_addr section with -gdwarf-5 and -gsplit-dwarf.
The skeleton DIE would also get a (dangling) DW_AT_addr_base in that case.

PR debug/90981
* dwarf2out.c (add_top_level_skeleton_die_attrs): Only add
DW_AT_addr_base if there is actually a .debug_addr section with
addresses.
(output_addr_table): Add DWARF5 table index generation here after
checking there are actually any addresses from...
(dwarf2out_finish): ...here.
* testsuite/g++.dg/pr90981.C: New test.
---
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 7fa8b05..c3c2dbc 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -11196,7 +11196,8 @@ add_top_level_skeleton_die_attrs (dw_die_ref die)
   if (comp_dir != NULL)
 add_skeleton_AT_string (die, DW_AT_comp_dir, comp_dir);
   add_AT_pubnames (die);
-  add_AT_lineptr (die, dwarf_AT (DW_AT_addr_base), debug_addr_section_label);
+  if (addr_index_table != NULL && addr_index_table->size () > 0)
+add_AT_lineptr (die, dwarf_AT (DW_AT_addr_base), debug_addr_section_label);
 }
 
 /* Output skeleton debug sections that point to the dwo file.  */
@@ -29108,6 +29109,30 @@ output_addr_table (void)
 return;
 
   switch_to_section (debug_addr_section);
+  /* GNU DebugFission https://gcc.gnu.org/wiki/DebugFission
+ which GCC uses to implement -gsplit-dwarf as DWARF GNU extension
+ before DWARF5, didn't have a header for .debug_addr units.
+ DWARF5 specifies a small header when address tables are used.  */
+  if (dwarf_version >= 5)
+{
+  unsigned int last_idx = 0;
+  unsigned long addrs_length;
+
+  addr_index_table->traverse_noresize
+(_idx);
+  addrs_length = last_idx * DWARF2_ADDR_SIZE + 4;
+
+  if (DWARF_INITIAL_LENGTH_SIZE - DWARF_OFFSET_SIZE == 4)
+   dw2_asm_output_data (4, 0x,
+"Escape value for 64-bit DWARF extension");
+  dw2_asm_output_data (DWARF_OFFSET_SIZE, addrs_length,
+  "Length of Address Unit");
+  dw2_asm_output_data (2, 5, "DWARF addr version");
+  dw2_asm_output_data (1, DWARF2_ADDR_SIZE, "Size of Address");
+  dw2_asm_output_data (1, 0, "Size of Segment Descriptor");
+}
+  ASM_OUTPUT_LABEL (asm_out_file, debug_addr_section_label);
+
   addr_index_table
 ->traverse_noresize ();
 }
@@ -31630,30 +31655,6 @@ dwarf2out_finish (const char *filename)
ranges_section_label);
}
 
-  switch_to_section (debug_addr_section);
-  /* GNU DebugFission https://gcc.gnu.org/wiki/DebugFission
-which GCC uses to implement -gsplit-dwarf as DWARF GNU extension
-before DWARF5, didn't have a header for .debug_addr units.
-DWARF5 specifies a small header when address tables are used.  */
-  if (dwarf_version >= 5)
-   {
- unsigned int last_idx = 0;
- unsigned long addrs_length;
-
- addr_index_table->traverse_noresize
-(_idx);
- addrs_length = last_idx * DWARF2_ADDR_SIZE + 4;
-
- if (DWARF_INITIAL_LENGTH_SIZE - DWARF_OFFSET_SIZE == 4)
-   dw2_asm_output_data (4, 0x,
-"Escape value for 64-bit DWARF extension");
- dw2_asm_output_data (DWARF_OFFSET_SIZE, addrs_length,
-  "Length of Address Unit");
- dw2_asm_output_data (2, 5, "DWARF addr version");
- dw2_asm_output_data (1, DWARF2_ADDR_SIZE, "Size of Address");
- dw2_asm_output_data (1, 0, "Size of Segment Descriptor");
-   }
-  ASM_OUTPUT_LABEL (asm_out_file, debug_addr_section_label);
   output_addr_table ();
 }
 
diff --git a/gcc/testsuite/g++.dg/pr90981.C b/gcc/testsuite/g++.dg/pr90981.C
new file mode 100644
index 000..4ae871e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr90981.C
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -g -gdwarf-5 -gsplit-dwarf" } */
+
+/* No addresses in the DWARF, so no .debug_addr section,
+   don't crash trying to generate an addr table index anyway.  */
+
+namespace { struct t {}; }
+t f () { return t (); }
-- 
1.8.3.1



Re: [PATCH] Fix misc stuff seen by clang-static-analyzer.

2019-06-26 Thread Jeff Law
On 6/26/19 4:57 AM, Martin Liška wrote:
> Hi.
> 
> This small stuff handles a misc clang-static-analyzer issues.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/lto/ChangeLog:
> 
> 2019-06-26  Martin Liska  
> 
>   * lto-dump.c (struct symbol_entry): Add default dtor.
>   (struct variable_entry): Likewise.
>   (struct function_entry): Likewise.
>   (dump_list_functions): Release memory.
>   (dump_list_variables): Likewise.
> 
> libgcc/ChangeLog:
> 
> 2019-06-26  Martin Liska  
> 
>   * libgcov-util.c (gcov_profile_merge): Release allocated
>   memory.
>   (calculate_overlap): Likewise.
OK
jeff


Re: [SVE] [fwprop] PR88833 - Redundant moves for WHILELO-based loops

2019-06-26 Thread Richard Sandiford
Prathamesh Kulkarni  writes:
> On Wed, 26 Jun 2019 at 16:05, Richard Sandiford
>  wrote:
>>
>> Prathamesh Kulkarni  writes:
>> > On Tue, 25 Jun 2019 at 20:05, Richard Sandiford
>> >  wrote:
>> >>
>> >> Prathamesh Kulkarni  writes:
>> >> > On Mon, 24 Jun 2019 at 21:41, Prathamesh Kulkarni
>> >> >  wrote:
>> >> >>
>> >> >> On Mon, 24 Jun 2019 at 19:51, Richard Sandiford
>> >> >>  wrote:
>> >> >> >
>> >> >> > Prathamesh Kulkarni  writes:
>> >> >> > > @@ -1415,6 +1460,19 @@ forward_propagate_into (df_ref use)
>> >> >> > >if (!def_set)
>> >> >> > >  return false;
>> >> >> > >
>> >> >> > > +  if (reg_prop_only
>> >> >> > > +  && !REG_P (SET_SRC (def_set))
>> >> >> > > +  && !REG_P (SET_DEST (def_set)))
>> >> >> > > +return false;
>> >> >> >
>> >> >> > This should be:
>> >> >> >
>> >> >> >   if (reg_prop_only
>> >> >> >   && (!REG_P (SET_SRC (def_set)) || !REG_P (SET_DEST (def_set
>> >> >> > return false;
>> >> >> >
>> >> >> > so that we return false if either operand isn't a register.
>> >> >> Oops, sorry about that  -:(
>> >> >> >
>> >> >> > > +
>> >> >> > > +  /* Allow propagations into a loop only for reg-to-reg copies, 
>> >> >> > > since
>> >> >> > > + replacing one register by another shouldn't increase the 
>> >> >> > > cost.  */
>> >> >> > > +
>> >> >> > > +  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father
>> >> >> > > +  && !REG_P (SET_SRC (def_set))
>> >> >> > > +  && !REG_P (SET_DEST (def_set)))
>> >> >> > > +return false;
>> >> >> >
>> >> >> > Same here.
>> >> >> >
>> >> >> > OK with that change, thanks.
>> >> >> Thanks for the review, will make the changes and commit the patch
>> >> >> after re-testing.
>> >> > Hi,
>> >> > Testing the patch showed following failures on 32-bit x86:
>> >> >
>> >> >   Executed from: g++.target/i386/i386.exp
>> >> > g++:g++.target/i386/pr88152.C   scan-assembler-not 
>> >> > vpcmpgt|vpcmpeq|vpsra
>> >> >   Executed from: gcc.target/i386/i386.exp
>> >> > gcc:gcc.target/i386/pr66768.c scan-assembler add*.[ \t]%gs:
>> >> > gcc:gcc.target/i386/pr90178.c scan-assembler-times xorl[\\t
>> >> > ]*\\%eax,[\\t ]*%eax 1
>> >> >
>> >> > The failure of pr88152.C is also seen on x86_64.
>> >> >
>> >> > For pr66768.c, and pr90178.c, forwprop replaces register which is
>> >> > volatile and frame related respectively.
>> >> > To avoid that, the attached patch, makes a stronger constraint that
>> >> > src and dest should be a register
>> >> > and not have frame_related or volatil flags set, which is checked in
>> >> > usable_reg_p().
>> >> > Which avoids the failures for both the cases.
>> >> > Does it look OK ?
>> >>
>> >> That's not the reason it's a bad transform.  In both cases we're
>> >> propagating r2 <- r1 even though
>> >>
>> >> (a) r1 dies in the copy and
>> >> (b) fwprop can't replace all uses of r2, because some have multiple
>> >> definitions
>> >>
>> >> This has the effect of making both values live in cases where only one
>> >> was previously.
>> >>
>> >> In the case of pr66768.c, fwprop2 is undoing the effect of
>> >> cse.c:canon_reg, which tries to pick the best register to use
>> >> (see cse.c:make_regs_eqv).  fwprop1 makes the same change,
>> >> and made it even before the patch, but the cse.c choice should win.
>> >>
>> >> A (hopefully) conservative fix would be to propagate the copy only if
>> >> both registers have a single definition, which you can test with:
>> >>
>> >>   (DF_REG_DEF_COUNT (regno) == 1
>> >>&& !bitmap_bit_p (DF_LR_OUT (ENTRY_BLOCK_PTR_FOR_FN (m_fn)), regno))
>> >>
>> >> In that case, fwprop should see all uses of the destination, and should
>> >> be able to replace it in all cases with the source.
>> > Ah I see, thanks for the explanation!
>> > The above check works to resolve the failure.
>> > IIUC, !bitmap_bit_p (...) above checks that reg isn't used uninitialized ?
>>
>> Right.
>>
>> >> > For g++.target/i386/pr88152.C, the issue is that after the patch,
>> >> > forwprop1 does following propagation (in f10) which wasn't done
>> >> > before:
>> >> >
>> >> > In insn 10, replacing
>> >> >  (unspec:SI [
>> >> > (reg:V2DF 91)
>> >> > ] UNSPEC_MOVMSK)
>> >> >  with (unspec:SI [
>> >> > (subreg:V2DF (reg:V2DI 90) 0)
>> >> > ] UNSPEC_MOVMSK)
>> >> >
>> >> > This later defeats combine because insn 9 gets deleted.
>> >> > Without patch, the following combination takes place:
>> >> >
>> >> > Trying 7 -> 9:
>> >> > 7: r90:V2DI=r89:V2DI>r93:V2DI
>> >> >   REG_DEAD r93:V2DI
>> >> >   REG_DEAD r89:V2DI
>> >> > 9: r91:V2DF=r90:V2DI#0
>> >> >   REG_DEAD r90:V2DI
>> >> > Successfully matched this instruction:
>> >> > (set (subreg:V2DI (reg:V2DF 91) 0)
>> >> > (gt:V2DI (reg:V2DI 89)
>> >> > (reg:V2DI 93)))
>> >> > allowing combination of insns 7 and 9
>> >> >
>> >> > and then:
>> >> > Trying 6, 9 -> 10:
>> >> > 6: r89:V2DI=const_vector
>> >> > 9: r91:V2DF#0=r89:V2DI>r93:V2DI
>> >> >   REG_DEAD 

Re: [RFA][tree-optimization/90883] Improve DSE to handle redundant calls

2019-06-26 Thread Jeff Law
On 6/26/19 5:53 AM, Richard Biener wrote:
> On Wed, Jun 26, 2019 at 6:17 AM Jeff Law  wrote:
>>
>> So based on the conversation in the BZ I cobbled together a patch to
>> extend tree-ssa-dse.c to also detect redundant stores.
>>
>> To be clear, given two stores, the first store is dead if the later
>> store overwrites all the live bytes set by the first store.   In this
>> case we delete the first store.  If the first store is partially dead we
>> may trim it.
>>
>> Given two stores, the second store is redundant if it stores _the same
>> value_ into locations set by the first store.  In this case we delete
>> the second store.
>>
>>
>> We prefer to remove redundant stores over removing dead or trimming
>> partially dead stores.
>>
>> First, if we detect a redundant store, we can always remove it.  We may
>> not always be able to trim a partially dead store.  So removing the
>> redundant store wins in this case.
>>
>> But even if the redundant store occurs at the head or tail of the prior
>> store, removing the redundant store is better than trimming the
>> partially dead store because we end up with fewer calls to memset with
>> the same number of total bytes written.
>>
>> We only look for redundant stores in a few cases.  The first store must
>> be a memset, empty constructor or calloc call -- ie things which
>> initialize multiple memory locations to zero.  Subsequent stores can
>> occur via memset, empty constructors or simple memory assignments.
>>
>> The chagne to tree-ssa-alias.c deserves a quick note.
>>
>> When we're trying to determine if we have a redundant store, we create
>> an AO_REF for the *second* store, then ask the alias system if the first
>> store would kill the AO_REF.
>>
>> So while normally a calloc wouldn't ever kill anything in normal
>> execution order, we're not asking about things in execution order.  We
>> really just want to know if the calloc is going to write into the
>> entirety of the AO_REF of the subsequent store.  So we compute the size
>> of the allocation and we know the destination from the LHS of the calloc
>> call and everything "just works".
> 
> I see how stmt_kills_ref_p is convenient here and it's the only
> ref-must-include-other-ref kind of query the oracle supports right now.
> Note it is not optimized for your particular case querying the same
> stmt for multiple refs.  It's not refs_must_alias_p that is missing
> but something stronger ('kills' is also wrong since both refs might
> be reads), ref_covered_by_ref_p or so.  That said, factoring
> stmt_kills_ref_p might not be so straight-forward for calls since
> we lack a general ao_ref_init for calls (ao_ref_init_stores_from_call,
> ao_ref_init_loads_from_call?).
> 
> So I think the tree-ssa-alias.c change is fine but please put a
> comment before
> 
> + if (DECL_FUNCTION_CODE (callee) == BUILT_IN_CALLOC)
> +   {
> + tree arg0 = gimple_call_arg (stmt, 0);
> 
> explaining this is used by DSE to detect redundant stores.
Agreed.  I should have done this given I called it out in the email.

Jeff


Re: [RFA][tree-optimization/90883] Improve DSE to handle redundant calls

2019-06-26 Thread Jeff Law
On 6/26/19 5:53 AM, Richard Biener wrote:
> On Wed, Jun 26, 2019 at 6:17 AM Jeff Law  wrote:
>>
>> So based on the conversation in the BZ I cobbled together a patch to
>> extend tree-ssa-dse.c to also detect redundant stores.
>>
>> To be clear, given two stores, the first store is dead if the later
>> store overwrites all the live bytes set by the first store.   In this
>> case we delete the first store.  If the first store is partially dead we
>> may trim it.
>>
>> Given two stores, the second store is redundant if it stores _the same
>> value_ into locations set by the first store.  In this case we delete
>> the second store.
>>
>>
>> We prefer to remove redundant stores over removing dead or trimming
>> partially dead stores.
>>
>> First, if we detect a redundant store, we can always remove it.  We may
>> not always be able to trim a partially dead store.  So removing the
>> redundant store wins in this case.
>>
>> But even if the redundant store occurs at the head or tail of the prior
>> store, removing the redundant store is better than trimming the
>> partially dead store because we end up with fewer calls to memset with
>> the same number of total bytes written.
>>
>> We only look for redundant stores in a few cases.  The first store must
>> be a memset, empty constructor or calloc call -- ie things which
>> initialize multiple memory locations to zero.  Subsequent stores can
>> occur via memset, empty constructors or simple memory assignments.
>>
>> The chagne to tree-ssa-alias.c deserves a quick note.
>>
>> When we're trying to determine if we have a redundant store, we create
>> an AO_REF for the *second* store, then ask the alias system if the first
>> store would kill the AO_REF.
>>
>> So while normally a calloc wouldn't ever kill anything in normal
>> execution order, we're not asking about things in execution order.  We
>> really just want to know if the calloc is going to write into the
>> entirety of the AO_REF of the subsequent store.  So we compute the size
>> of the allocation and we know the destination from the LHS of the calloc
>> call and everything "just works".
> 
> I see how stmt_kills_ref_p is convenient here and it's the only
> ref-must-include-other-ref kind of query the oracle supports right now.
> Note it is not optimized for your particular case querying the same
> stmt for multiple refs.  It's not refs_must_alias_p that is missing
> but something stronger ('kills' is also wrong since both refs might
> be reads), ref_covered_by_ref_p or so.  That said, factoring
> stmt_kills_ref_p might not be so straight-forward for calls since
> we lack a general ao_ref_init for calls (ao_ref_init_stores_from_call,
> ao_ref_init_loads_from_call?).
> 
> So I think the tree-ssa-alias.c change is fine but please put a
> comment before
> 
> + if (DECL_FUNCTION_CODE (callee) == BUILT_IN_CALLOC)
> +   {
> + tree arg0 = gimple_call_arg (stmt, 0);
> 
> explaining this is used by DSE to detect redundant stores.
> 
>> This patch also includes a hunk I apparently left out from yesterday's
>> submission which just adds _CHK cases to all the existing BUILT_IN_MEM*
>> cases.  That's what I get for writing this part first, then adding the
>> _CHK stuff afterwards, then reversing the order of submission.
>>
>> This includes a slightly reduced testcase from the BZ in g++.dg -- it's
>> actually a good way to capture when one empty constructor causes another
>> empty constructor to be redundant.  The gcc.dg cases capture other
>> scenarios.
>>
>> This has been bootstrapped and regression tested on x86-64, i686, ppc64,
>> ppc64le, sparc64 & aarch64.  It's also bootstrapped on various arm
>> targets, alpha, m68k, mips, riscv64, sh4.  It's been built and tested on
>> a variety of *-elf targets as well as various *-linux-gnu targets as
>> crosses.  ANd just for giggles it was tested before the changes to add
>> the _CHK support, so it works with and without that as well.
>>
>> OK for the trunk?
> 
> I also notice we wouldn't handle
> 
>memset(p, 1, 64);
>memset(p, 1, 32);
or even just *p = 1.

> 
> (non-zero-initializer) or
> 
>   x = {};
>   y = {};
>   x.a = {};
> 
> (intermediate non-aliasing store)
Correct in both cases.  Handling the first wouldn't be terribly hard to
implement, I can probably cobble that together quickly to see if it even
triggers.  THe memset-memset case was the least commonly detected IIRC.

The second would likely require more tracking and some of the other
complexities we handle for dead stores.  Possible?  Certainly, not sure
if it's worth the effort.

> 
> Did you see if / how often this triggers on trunk?
They triggered a couple hundred times.I didn't initially implement
the calloc case, but found it reported in the llvm bug database.  It
triggered more often than I anticipated.

Jeff


Re: [PATCH, PPC 2/2] Fix Darwin bootstrap after split of rs6000.c.

2019-06-26 Thread Segher Boessenkool
On Wed, Jun 26, 2019 at 04:58:06PM +0100, Iain Sandoe wrote:
> The recent change in the file layout in rs6000 breaks Darwin bootstrap.
> 
> To fix this we need to make the branch islands (or code) visible between
> both files.  I chose to keep the generation side in rs6000.c and move
> the output routine to rs6000-logue.c, placing a reference to the islands
> vector in rs6000-internal.h.
> 
> bootstrap succeeds for powerpc-linux-gnu
> 
> OK for trunk (assuming bootstrap completes for Darwin)?
> Iain
> 
> 2019-06-26  Iain Sandoe  
> 
>   * config/rs6000/rs6000-internal.h (branch_island): New typedef.
>   (branch_islands): New extern.
>   * config/rs6000/rs6000-logue.c (macho_branch_islands): Moved from
>   * config/rs6000/rs6000.c: .. here.

Looks fine to me.  Okay for trunk.  Thanks :-)


Segher


Re: [PATCH, PPC 1/2] Make sure the common gt-*.h files are built for all sub-targets.

2019-06-26 Thread Segher Boessenkool
Hi Iain,

On Wed, Jun 26, 2019 at 04:57:12PM +0100, Iain Sandoe wrote:
> The new gt-rs6000-logue.h is common to all sub-targets in the port, so
> it needs to be added for them.
> 
> It seems better to place the common target_gtfiles in the powerpc*-*-*
> section, rather than duplicating them in sub-targets.  This would make it
> less likely that a sub-target would be overlooked in any future file
> introductions.

This is fine.  Thanks!


Segher


Re: [gomp4.5] Handle #pragma omp declare target link

2019-06-26 Thread Thomas Schwinge
Hi!

On Mon, 14 Dec 2015 20:17:33 +0300, Ilya Verbin  wrote:
> Here is an updated patch [for "#pragma omp declare target link"]

..., that got committed long ago (trunk r231655), with additional changes
later on.

As has later been filed in PR81689, the test case added
"libgomp.c/target-link-1.c fails for nvptx: #pragma omp target link not
implemented".  Curious, has anybody ever looked into what's going
on/wrong?


Grüße
 Thomas


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/target-link-1.c
> @@ -0,0 +1,63 @@
> +struct S { int s, t; };
> +
> +int a = 1, b = 1;
> +double c[27];
> +struct S d = { ,  };
> +#pragma omp declare target link (a) to (b) link (c, d)
> +
> +int
> +foo (void)
> +{
> +  return a++ + b++;
> +}
> +
> +int
> +bar (int n)
> +{
> +  int *p1 = 
> +  int *p2 = 
> +  c[n] += 2.0;
> +  d.s -= 2;
> +  d.t -= 2;
> +  return *p1 + *p2 + d.s + d.t;
> +}
> +
> +#pragma omp declare target (foo, bar)
> +
> +int
> +main ()
> +{
> +  a = b = 2;
> +  d.s = 17;
> +  d.t = 18;
> +
> +  int res, n = 10;
> +  #pragma omp target map (to: a, b, c, d) map (from: res)
> +  {
> +res = foo () + foo ();
> +c[n] = 3.0;
> +res += bar (n);
> +  }
> +
> +  int shared_mem = 0;
> +  #pragma omp target map (alloc: shared_mem)
> +shared_mem = 1;
> +
> +  if ((shared_mem && res != (2 + 2) + (3 + 3) + (4 + 4 + 15 + 16))
> +  || (!shared_mem && res != (2 + 1) + (3 + 2) + (4 + 3 + 15 + 16)))
> +__builtin_abort ();
> +
> +  #pragma omp target enter data map (to: c)
> +  #pragma omp target update from (c)
> +  res = (int) (c[n] + 0.5);
> +  if ((shared_mem && res != 5) || (!shared_mem && res != 0))
> +__builtin_abort ();
> +
> +  #pragma omp target map (to: a, b) map (from: res)
> +res = foo ();
> +
> +  if ((shared_mem && res != 4 + 4) || (!shared_mem && res != 2 + 3))
> +__builtin_abort ();
> +
> +  return 0;
> +}


signature.asc
Description: PGP signature


[Committed] Document revision 272667

2019-06-26 Thread Steve Kargl
WHen I committed revision 272667 last nigth, I
forgot to commit the updated ChangeLog entries.
This commit documents the changes.

2016-06-26  Steven G. Kargl  

* ChangeLog: Document revision 272667

2016-06-26  Steven G. Kargl  
 
* testsuite/ChangeLog: Document revision 272667

-- 
Steve


[PATCH, PPC 2/2] Fix Darwin bootstrap after split of rs6000.c.

2019-06-26 Thread Iain Sandoe
The recent change in the file layout in rs6000 breaks Darwin bootstrap.

To fix this we need to make the branch islands (or code) visible between
both files.  I chose to keep the generation side in rs6000.c and move
the output routine to rs6000-logue.c, placing a reference to the islands
vector in rs6000-internal.h.

bootstrap succeeds for powerpc-linux-gnu

OK for trunk (assuming bootstrap completes for Darwin)?
Iain

2019-06-26  Iain Sandoe  

* config/rs6000/rs6000-internal.h (branch_island): New typedef.
(branch_islands): New extern.
* config/rs6000/rs6000-logue.c (macho_branch_islands): Moved from
* config/rs6000/rs6000.c: .. here.

diff --git a/gcc/config/rs6000/rs6000-internal.h 
b/gcc/config/rs6000/rs6000-internal.h
index 22ebd37..f69fa5d 100644
--- a/gcc/config/rs6000/rs6000-internal.h
+++ b/gcc/config/rs6000/rs6000-internal.h
@@ -110,5 +110,18 @@ quad_address_offset_p (HOST_WIDE_INT offset)
   return (IN_RANGE (offset, -32768, 32767) && ((offset) & 0xf) == 0);
 }
 
+/* Mach-O (Darwin) support for longcalls, emitted from  rs6000-logue.c.  */
+
+#if TARGET_MACHO
+
+typedef struct branch_island_d {
+  tree function_name;
+  tree label_name;
+  int line_number;
+ } branch_island;
+
+extern vec *branch_islands;
+
+#endif
 
 #endif
diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c
index 607d1ef..3fe6230 100644
--- a/gcc/config/rs6000/rs6000-logue.c
+++ b/gcc/config/rs6000/rs6000-logue.c
@@ -48,6 +48,10 @@
 #include "params.h"
 #include "alias.h"
 #include "rs6000-internal.h"
+#if TARGET_MACHO
+#include "gstab.h"  /* for N_SLINE */
+#include "dbxout.h" /* dbxout_ */
+#endif
 
 static int rs6000_ra_ever_killed (void);
 static void is_altivec_return_reg (rtx, void *);
@@ -5061,6 +5065,94 @@ rs6000_emit_epilogue (enum epilogue_type epilogue_type)
 }
 }
 
+#if TARGET_MACHO
+
+/* Generate far-jump branch islands for everything recorded in
+   branch_islands.  Invoked immediately after the last instruction of
+   the epilogue has been emitted; the branch islands must be appended
+   to, and contiguous with, the function body.  Mach-O stubs are
+   generated in machopic_output_stub().  */
+
+static void
+macho_branch_islands (void)
+{
+  char tmp_buf[512];
+
+  while (!vec_safe_is_empty (branch_islands))
+{
+  branch_island *bi = _islands->last ();
+  const char *label = IDENTIFIER_POINTER (bi->label_name);
+  const char *name = IDENTIFIER_POINTER (bi->function_name);
+  char name_buf[512];
+  /* Cheap copy of the details from the Darwin ASM_OUTPUT_LABELREF().  */
+  if (name[0] == '*' || name[0] == '&')
+   strcpy (name_buf, name+1);
+  else
+   {
+ name_buf[0] = '_';
+ strcpy (name_buf+1, name);
+   }
+  strcpy (tmp_buf, "\n");
+  strcat (tmp_buf, label);
+#if defined (DBX_DEBUGGING_INFO) || defined (XCOFF_DEBUGGING_INFO)
+  if (write_symbols == DBX_DEBUG || write_symbols == XCOFF_DEBUG)
+   dbxout_stabd (N_SLINE, bi->line_number);
+#endif /* DBX_DEBUGGING_INFO || XCOFF_DEBUGGING_INFO */
+  if (flag_pic)
+   {
+ if (TARGET_LINK_STACK)
+   {
+ char name[32];
+ get_ppc476_thunk_name (name);
+ strcat (tmp_buf, ":\n\tmflr r0\n\tbl ");
+ strcat (tmp_buf, name);
+ strcat (tmp_buf, "\n");
+ strcat (tmp_buf, label);
+ strcat (tmp_buf, "_pic:\n\tmflr r11\n");
+   }
+ else
+   {
+ strcat (tmp_buf, ":\n\tmflr r0\n\tbcl 20,31,");
+ strcat (tmp_buf, label);
+ strcat (tmp_buf, "_pic\n");
+ strcat (tmp_buf, label);
+ strcat (tmp_buf, "_pic:\n\tmflr r11\n");
+   }
+
+ strcat (tmp_buf, "\taddis r11,r11,ha16(");
+ strcat (tmp_buf, name_buf);
+ strcat (tmp_buf, " - ");
+ strcat (tmp_buf, label);
+ strcat (tmp_buf, "_pic)\n");
+
+ strcat (tmp_buf, "\tmtlr r0\n");
+
+ strcat (tmp_buf, "\taddi r12,r11,lo16(");
+ strcat (tmp_buf, name_buf);
+ strcat (tmp_buf, " - ");
+ strcat (tmp_buf, label);
+ strcat (tmp_buf, "_pic)\n");
+
+ strcat (tmp_buf, "\tmtctr r12\n\tbctr\n");
+   }
+  else
+   {
+ strcat (tmp_buf, ":\n\tlis r12,hi16(");
+ strcat (tmp_buf, name_buf);
+ strcat (tmp_buf, ")\n\tori r12,r12,lo16(");
+ strcat (tmp_buf, name_buf);
+ strcat (tmp_buf, ")\n\tmtctr r12\n\tbctr");
+   }
+  output_asm_insn (tmp_buf, 0);
+#if defined (DBX_DEBUGGING_INFO) || defined (XCOFF_DEBUGGING_INFO)
+  if (write_symbols == DBX_DEBUG || write_symbols == XCOFF_DEBUG)
+   dbxout_stabd (N_SLINE, bi->line_number);
+#endif /* DBX_DEBUGGING_INFO || XCOFF_DEBUGGING_INFO */
+  branch_islands->pop ();
+}
+}
+#endif
+
 /* Write function epilogue.  */
 
 void
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index bcfc881..1837b31 

[PATCH, PPC 1/2] Make sure the common gt-*.h files are built for all sub-targets.

2019-06-26 Thread Iain Sandoe
The new gt-rs6000-logue.h is common to all sub-targets in the port, so
it needs to be added for them.

It seems better to place the common target_gtfiles in the powerpc*-*-*
section, rather than duplicating them in sub-targets.  This would make it
less likely that a sub-target would be overlooked in any future file
introductions.

(could also be done for rs6000-*-*, but this file has already been added to
the sub-targets there)  

bootstrap succeeds powerpc-linux-gnu

OK for trunk (assuming Darwin bootstrap completes)?
Iain


2019-06-26  Iain Sandoe  

* config.gcc (powerpc*-*-linux*): Move target_gtfiles from here..
(powerpc*-*-*) ... to here.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index c9939b8..062ed8c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -513,6 +513,7 @@ powerpc*-*-*)
;;
esac
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c"
;;
 pru-*-*)
cpu_type=pru
@@ -2693,7 +2694,6 @@ powerpc*-*-linux*)
extra_options="${extra_options} rs6000/sysv4.opt"
tmake_file="${tmake_file} rs6000/t-fprules rs6000/t-ppccomm"
extra_objs="$extra_objs rs6000-linux.o"
-   target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c"
case ${target} in
powerpc*le-*-*)
tm_file="${tm_file} rs6000/sysv4le.h" ;;



Re: [PATCH][gcc] libgccjit: check result_type in gcc_jit_context_new_binary_op

2019-06-26 Thread Andrea Corallo

David Malcolm writes:

> On Tue, 2019-06-25 at 08:11 +, Andrea Corallo wrote:
>> Hi,
>> third version for this patch with the simplified test.
>>
>> make check-jit pass clean
>>
>> Bests
>>   Andrea
>>
>> 2019-06-09  Andrea Corallo  andrea.cora...@arm.com
>>
>> * libgccjit.c (gcc_jit_context_new_binary_op): Check result_type to
>> be a
>> numeric type.
>>
>>
>> 2019-06-20  Andrea Corallo andrea.cora...@arm.com
>>
>> * jit.dg/test-error-gcc_jit_context_new_binary_op-bad-res-type.c:
>> New testcase.
>
> Thanks for the updated patch.
>
> This is good for trunk.
>
> (Copying and pasting from my other review): are you working on getting
> SVN commit access, or do you want me to commit your two patches on your
> behalf?
>
> Thanks
> Dave

Hi David,
I can work on to get the SVN commit access.
As a maintainer has to sponsor it would you mind being the one?

Thanks
  Andrea


[PING] [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-06-26 Thread Martin Sebor

Ping: did my reply and updated patch resolve your concerns?
  https://gcc.gnu.org/ml/gcc-patches/2019-06/msg01106.html

On 6/18/19 9:19 PM, Martin Sebor wrote:

On 6/14/19 2:59 PM, Jeff Law wrote:

On 6/4/19 1:40 PM, Martin Sebor wrote:

On 6/3/19 5:24 PM, Martin Sebor wrote:

On 5/31/19 2:46 PM, Jeff Law wrote:

On 5/22/19 3:34 PM, Martin Sebor wrote:

-Wreturn-local-addr detects a subset of instances of returning
the address of a local object from a function but the warning
doesn't try to handle alloca or VLAs, or some non-trivial cases
of ordinary automatic variables[1].

The attached patch extends the implementation of the warning to
detect those.  It still doesn't detect instances where the address
is the result of a built-in such strcpy[2].

Tested on x86_64-linux.

Martin

[1] For example, this is only diagnosed with the patch:

void* f (int i)
{
  struct S { int a[2]; } s[2];
  return >a[i];
}

[2] The following is not diagnosed even with the patch:

void sink (void*);

void* f (int i)
{
  char a[6];
  char *p = __builtin_strcpy (a, "123");
  sink (p);
  return p;
}

I would expect detecting to be possible and useful.  Maybe as
a follow-up.

gcc-71924.diff

PR middle-end/71924 - missing -Wreturn-local-addr returning alloca
result
PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an
address of a local array plus offset

gcc/ChangeLog:

 PR c/71924
 * gimple-ssa-isolate-paths.c (is_addr_local): New function.
 (warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
 (find_implicit_erroneous_behavior): Call
warn_return_addr_local_phi_arg.
 (find_explicit_erroneous_behavior): Call warn_return_addr_local.

gcc/testsuite/ChangeLog:

 PR c/71924
 * gcc.dg/Wreturn-local-addr-2.c: New test.
 * gcc.dg/Walloca-4.c: Prune expected warnings.
 * gcc.dg/pr41551.c: Same.
 * gcc.dg/pr59523.c: Same.
 * gcc.dg/tree-ssa/pr88775-2.c: Same.
 * gcc.dg/winline-7.c: Same.

diff --git a/gcc/gimple-ssa-isolate-paths.c
b/gcc/gimple-ssa-isolate-paths.c
index 33fe352bb23..2933ecf502e 100644
--- a/gcc/gimple-ssa-isolate-paths.c
+++ b/gcc/gimple-ssa-isolate-paths.c
@@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple
*stmt)
 return false;
   }
+/* Return true if EXPR is a expression of pointer type that refers
+   to the address of a variable with automatic storage duration.
+   If so, set *PLOC to the location of the object or the call that
+   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
+   also consider PHI statements and set *PMAYBE when some but not
+   all arguments of such statements refer to local variables, and
+   to clear it otherwise.  */
+
+static bool
+is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
+   hash_set *visited = NULL)
+{
+  if (TREE_CODE (exp) == SSA_NAME)
+    {
+  gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
+  enum gimple_code code = gimple_code (def_stmt);
+
+  if (is_gimple_assign (def_stmt))
+    {
+  tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
+  if (POINTER_TYPE_P (type))
+    {
+  tree ptr = gimple_assign_rhs1 (def_stmt);
+  return is_addr_local (ptr, ploc, pmaybe, visited);
+    }
+  return false;
+    }

So this is going to recurse on the rhs1 of something like
POINTER_PLUS_EXPR, that's a good thing :-)   But isn't it 
non-selective

about the codes where we recurse?

Consider

    ptr = (cond) ? res1 : res2

I think we'll end up recursing on the condition rather than looking at
res1 and res2.


I suspect there are a very limited number of expression codes that
appear on the RHS where we'd want to recurse on one or both operands.

POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to 
recurse
on both and logically and the result), BIT_AND (maybe we masked off 
some

bits in an address).  That's probably about it :-)

Are there any other codes you've seen or think would be useful in
practice to recurse through?  I'd rather list them explicitly rather
than just recurse down through every rhs1 we encounter.


I don't have a list of codes to test for.  I initially contemplated
enumerating them but in the end decided the pointer type check would
be sufficient.  I wouldn't expect a COND_EXPR here.  Don't they get
transformed into PHIs?  In all my tests they do and and running
the whole test suite with an assert that it doesn't come up doesn't
expose any either.  (I left the assert for COND_EXPR there.)  If
a COND_EXPR really can come up in a GIMPLE assignment here can you
please show me how so I can add a test for it?

A COND_EXPR on the RHS of an assignment is valid gimple.  That's what we
need to consider here -- what is and what is not valid gimple.  And its
more likely that PHIs will be transformed into RHS COND_EXPRs -- that's
standard practice for if-conversion.

Gosh, how to get one?  It happens all the time :-)  Since I know DOM so
well, I just 

Re: [PATCH] True IPA reimplementation of IPA-SRA

2019-06-26 Thread Martin Jambor
Hi,

On Thu, Jun 13 2019, Jan Hubicka wrote:
> Hi,
> i read all changes except for ipa-sra itself.  Here are some comments,
> I will look at the remaining file next.
>
> Honza
>
>
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 9a19d83fffb..3f838c08e76 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
>   struct GTY(()) cgraph_clone_info
>   {
> +  /* Constants discovered by IPA-CP, i.e. which parameter should be 
> replaced
> + with what.  */
> vec *tree_map;
> -  bitmap args_to_skip;
> -  bitmap combined_args_to_skip;
> +  /* Parameter modification that IPA-SRA decided to perform.  */
> +  ipa_param_adjustments *param_adjustments;
> +  /* Lists of all splits with their offsets for each dummy variables
> + representing a replaced-by-splits parameter.  */
> +  vec *performed_splits;
>
> Please explain what dummy variables are :)

OK, I replaced the comment with:

  /* Lists of dummy-decl and offset pairs representing split formal parameters
 in the caller.  Offsets of all new replacements are enumerated, those
 coming from the same original parameter have the same dummy decl stored
 along with them.

 Dummy decls sit in call statement arguments followed by new parameter
 decls (or their SSA names) in between (caller) clone materialization and
 call redirection.  Redirection then recognizes the dummy variable and
 together with the stored offsets can reconstruct what exactly the new
 parameter decls represent and can leave in place only those that the
 callee expects.  */

>
> +/* Return true if we would like to remove a parameter from NODE when 
> cloning it
> +   with KNOWN_CSTS scalar constants.  */
> +
> +static bool
> +want_remove_some_param_p (cgraph_node *node, vec known_csts)
> +{
> +  auto_vec surviving;
> +  bool filled_vec = false;
> +  ipa_node_params *info = IPA_NODE_REF (node);
> +  int i, count = ipa_get_param_count (info);
> +  for (i = 0; i < count; i++)
>
> vertical space after the declarations :)

OK

>
> -   tree t = known_csts[i];
> -
> -   if (t || !ipa_is_param_used (info, i))
> - bitmap_set_bit (args_to_skip, i);
> +   ipa_adjusted_param *old_adj = &(*old_adjustments->m_adj_params)[i];
> +   if (!node->local.can_change_signature
> +   || old_adj->op != IPA_PARAM_OP_COPY
> +   || (!known_csts[old_adj->base_index]
> +   && ipa_is_param_used (info, old_adj->base_index)))
> + {
> +   ipa_adjusted_param new_adj;
> +   memcpy (_adj, old_adj, sizeof (new_adj));
>
> Why this is not *new_adj=*old_adj?

Not sure, I changed it to aggregate copy.

>
> +/* Names of parameters for dumping.  Keep in sync with enum 
> ipa_parm_op.  */
> +
> +static const char *ipa_param_op_names[] = {"IPA_PARAM_OP_UNDEFINED",
> +"IPA_PARAM_OP_COPY",
> +"IPA_PARAM_OP_NEW",
> +"IPA_PARAM_OP_SPLIT"};
>
> Given brave new C++ world, can't we statically assert that size of
> array match the enum?

Hm, I suppose so, I changed it.

> Also it seems to me that ipa-param-modification would benefit from
> some toplevel comment of what it does and what is the main API how
> to use it.
>

Right, so I wrote that, but it ended up a bit large, have a look at:

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/ipa-param-manipulation.h;h=b9da00bb5c9c813a0b16c7ec2f32e0d480f91238;hb=refs/heads/jamborm/ipa-sra


> Also functions like
>
> ipa_fill_vector_with_formal_parms
> ipa_fill_vector_with_formal_parm_types
>
> does not seem very IPA specific

They are basically replacements of currently existing functions:

  vec ipa_get_vector_of_formal_parms (tree fndecl);
  vec ipa_get_vector_of_formal_parm_types (tree fntype);

The difference is that the reimplementations do not directly return a
new heap allocated vector but fill a vector that the caller passes,
which makes it auto_vec friendly.

> and it was not quite obvoius from name what it does.  Perhaps it
> should go somewhere to common tree manipulatoin and have more
> fitting name?

I'm not sure what the problem with the name is, but...

> If it was something like push_function_arg_decls or
> push_function_arg_types it would be more obvious to me what it does
> :)

...if that is the case, OK, why not, I changed them.

As far as their placement is concerned, again, I don't care much.  I
can put them to tree.[hc] if you'd like (I haven't dome that that
yet).  It just so happens that people only use these when they mess
with parameters, hence they are in ipa-param-manipulation.[hc].

>
> Also I wonder if the code would not be more readable if functions was
> returning auto_vecs references. Then they can be just something like
> "function_arg_types" and the API would be self explanatory.

I probably don't understand.  References to auto_vecs that are in fact
stored where?

> +tree
> +ipa_param_adjustments::adjust_decl (tree orig_decl)
> +{
> +  tree 

[PATCH] Add new helper traits for signed/unsigned integer types

2019-06-26 Thread Jonathan Wakely

Reuse the __is_one_of alias in additional places, and define traits to
check for signed/unsigned integer types so we don't have to duplicate
those checks elsewhere.

The additional overloads for std::byte in  were reviewed by LEWG
and considered undesirable, so this patch removes them.

* include/bits/fs_path.h (path::__is_encoded_char): Use __is_one_of.
* include/std/bit (_If_is_unsigned_integer_type): Remove.
(_If_is_unsigned_integer): Use __is_unsigned_integer.
(rotl(byte, unsigned), rotr(byte, unsigned), countl_zero(byte))
(countl_one(byte), countr_zero(byte), countr_one(byte))
(popcount(byte), ispow2(byte), ceil2(byte), floor2(byte))
(log2p1(byte)): Remove.
* include/std/charconv (__detail::__is_one_of): Move to .
(__detail::__is_int_to_chars_type): Remove.
(__detail::__integer_to_chars_result_type): Use __is_signed_integer
and __is_unsigned_integer.
* include/std/type_traits (__is_one_of): Move here from .
(__is_signed_integer, __is_unsigned_integer): New helpers.
* testsuite/26_numerics/bit/bit.pow.two/ceil2.cc: Remove test for
std::byte overload.
* testsuite/26_numerics/bit/bit.pow.two/floor2.cc: Likewise.
* testsuite/26_numerics/bit/bit.pow.two/ispow2.cc: Likewise.
* testsuite/26_numerics/bit/bit.pow.two/log2p1.cc: Likewise.
* testsuite/26_numerics/bit/bitops.count/countl_one.cc: Likewise.
* testsuite/26_numerics/bit/bitops.count/countl_zero.cc: Likewise.
* testsuite/26_numerics/bit/bitops.count/countr_one.cc: Likewise.
* testsuite/26_numerics/bit/bitops.count/countr_zero.cc: Likewise.
* testsuite/26_numerics/bit/bitops.count/popcount.cc: Likewise.
* testsuite/26_numerics/bit/bitops.rot/rotl.cc: Likewise.
* testsuite/26_numerics/bit/bitops.rot/rotr.cc: Likewise.

Tested x86_64-linux, committed to trunk.


commit 95d162498f763e8780f188459d878476a6c52aeb
Author: redi 
Date:   Wed Jun 26 14:38:23 2019 +

Add new helper traits for signed/unsigned integer types

Reuse the __is_one_of alias in additional places, and define traits to
check for signed/unsigned integer types so we don't have to duplicate
those checks elsewhere.

The additional overloads for std::byte in  were reviewed by LEWG
and considered undesirable, so this patch removes them.

* include/bits/fs_path.h (path::__is_encoded_char): Use __is_one_of.
* include/std/bit (_If_is_unsigned_integer_type): Remove.
(_If_is_unsigned_integer): Use __is_unsigned_integer.
(rotl(byte, unsigned), rotr(byte, unsigned), countl_zero(byte))
(countl_one(byte), countr_zero(byte), countr_one(byte))
(popcount(byte), ispow2(byte), ceil2(byte), floor2(byte))
(log2p1(byte)): Remove.
* include/std/charconv (__detail::__is_one_of): Move to 
.
(__detail::__is_int_to_chars_type): Remove.
(__detail::__integer_to_chars_result_type): Use __is_signed_integer
and __is_unsigned_integer.
* include/std/type_traits (__is_one_of): Move here from .
(__is_signed_integer, __is_unsigned_integer): New helpers.
* testsuite/26_numerics/bit/bit.pow.two/ceil2.cc: Remove test for
std::byte overload.
* testsuite/26_numerics/bit/bit.pow.two/floor2.cc: Likewise.
* testsuite/26_numerics/bit/bit.pow.two/ispow2.cc: Likewise.
* testsuite/26_numerics/bit/bit.pow.two/log2p1.cc: Likewise.
* testsuite/26_numerics/bit/bitops.count/countl_one.cc: Likewise.
* testsuite/26_numerics/bit/bitops.count/countl_zero.cc: Likewise.
* testsuite/26_numerics/bit/bitops.count/countr_one.cc: Likewise.
* testsuite/26_numerics/bit/bitops.count/countr_zero.cc: Likewise.
* testsuite/26_numerics/bit/bitops.count/popcount.cc: Likewise.
* testsuite/26_numerics/bit/bitops.rot/rotl.cc: Likewise.
* testsuite/26_numerics/bit/bitops.rot/rotr.cc: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272695 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 0a8ab0de2ff..e1083acf30f 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -66,15 +66,16 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   /// A filesystem path.
   class path
   {
-template>
-  using __is_encoded_char
-   = __or_,
+template
+  using __is_encoded_char = __is_one_of,
+   char,
 #ifdef _GLIBCXX_USE_CHAR8_T
-   is_same<_Ch, char8_t>,
+   char8_t,
 #endif
-   is_same<_Ch, wchar_t>,
-   is_same<_Ch, char16_t>,
-   is_same<_Ch, char32_t>>;
+#if _GLIBCXX_USE_WCHAR_T
+   wchar_t,
+#endif
+   char16_t, char32_t>;
 
 

Re: [SVE] [fwprop] PR88833 - Redundant moves for WHILELO-based loops

2019-06-26 Thread Prathamesh Kulkarni
On Wed, 26 Jun 2019 at 16:05, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Tue, 25 Jun 2019 at 20:05, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> > On Mon, 24 Jun 2019 at 21:41, Prathamesh Kulkarni
> >> >  wrote:
> >> >>
> >> >> On Mon, 24 Jun 2019 at 19:51, Richard Sandiford
> >> >>  wrote:
> >> >> >
> >> >> > Prathamesh Kulkarni  writes:
> >> >> > > @@ -1415,6 +1460,19 @@ forward_propagate_into (df_ref use)
> >> >> > >if (!def_set)
> >> >> > >  return false;
> >> >> > >
> >> >> > > +  if (reg_prop_only
> >> >> > > +  && !REG_P (SET_SRC (def_set))
> >> >> > > +  && !REG_P (SET_DEST (def_set)))
> >> >> > > +return false;
> >> >> >
> >> >> > This should be:
> >> >> >
> >> >> >   if (reg_prop_only
> >> >> >   && (!REG_P (SET_SRC (def_set)) || !REG_P (SET_DEST (def_set
> >> >> > return false;
> >> >> >
> >> >> > so that we return false if either operand isn't a register.
> >> >> Oops, sorry about that  -:(
> >> >> >
> >> >> > > +
> >> >> > > +  /* Allow propagations into a loop only for reg-to-reg copies, 
> >> >> > > since
> >> >> > > + replacing one register by another shouldn't increase the 
> >> >> > > cost.  */
> >> >> > > +
> >> >> > > +  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father
> >> >> > > +  && !REG_P (SET_SRC (def_set))
> >> >> > > +  && !REG_P (SET_DEST (def_set)))
> >> >> > > +return false;
> >> >> >
> >> >> > Same here.
> >> >> >
> >> >> > OK with that change, thanks.
> >> >> Thanks for the review, will make the changes and commit the patch
> >> >> after re-testing.
> >> > Hi,
> >> > Testing the patch showed following failures on 32-bit x86:
> >> >
> >> >   Executed from: g++.target/i386/i386.exp
> >> > g++:g++.target/i386/pr88152.C   scan-assembler-not 
> >> > vpcmpgt|vpcmpeq|vpsra
> >> >   Executed from: gcc.target/i386/i386.exp
> >> > gcc:gcc.target/i386/pr66768.c scan-assembler add*.[ \t]%gs:
> >> > gcc:gcc.target/i386/pr90178.c scan-assembler-times xorl[\\t
> >> > ]*\\%eax,[\\t ]*%eax 1
> >> >
> >> > The failure of pr88152.C is also seen on x86_64.
> >> >
> >> > For pr66768.c, and pr90178.c, forwprop replaces register which is
> >> > volatile and frame related respectively.
> >> > To avoid that, the attached patch, makes a stronger constraint that
> >> > src and dest should be a register
> >> > and not have frame_related or volatil flags set, which is checked in
> >> > usable_reg_p().
> >> > Which avoids the failures for both the cases.
> >> > Does it look OK ?
> >>
> >> That's not the reason it's a bad transform.  In both cases we're
> >> propagating r2 <- r1 even though
> >>
> >> (a) r1 dies in the copy and
> >> (b) fwprop can't replace all uses of r2, because some have multiple
> >> definitions
> >>
> >> This has the effect of making both values live in cases where only one
> >> was previously.
> >>
> >> In the case of pr66768.c, fwprop2 is undoing the effect of
> >> cse.c:canon_reg, which tries to pick the best register to use
> >> (see cse.c:make_regs_eqv).  fwprop1 makes the same change,
> >> and made it even before the patch, but the cse.c choice should win.
> >>
> >> A (hopefully) conservative fix would be to propagate the copy only if
> >> both registers have a single definition, which you can test with:
> >>
> >>   (DF_REG_DEF_COUNT (regno) == 1
> >>&& !bitmap_bit_p (DF_LR_OUT (ENTRY_BLOCK_PTR_FOR_FN (m_fn)), regno))
> >>
> >> In that case, fwprop should see all uses of the destination, and should
> >> be able to replace it in all cases with the source.
> > Ah I see, thanks for the explanation!
> > The above check works to resolve the failure.
> > IIUC, !bitmap_bit_p (...) above checks that reg isn't used uninitialized ?
>
> Right.
>
> >> > For g++.target/i386/pr88152.C, the issue is that after the patch,
> >> > forwprop1 does following propagation (in f10) which wasn't done
> >> > before:
> >> >
> >> > In insn 10, replacing
> >> >  (unspec:SI [
> >> > (reg:V2DF 91)
> >> > ] UNSPEC_MOVMSK)
> >> >  with (unspec:SI [
> >> > (subreg:V2DF (reg:V2DI 90) 0)
> >> > ] UNSPEC_MOVMSK)
> >> >
> >> > This later defeats combine because insn 9 gets deleted.
> >> > Without patch, the following combination takes place:
> >> >
> >> > Trying 7 -> 9:
> >> > 7: r90:V2DI=r89:V2DI>r93:V2DI
> >> >   REG_DEAD r93:V2DI
> >> >   REG_DEAD r89:V2DI
> >> > 9: r91:V2DF=r90:V2DI#0
> >> >   REG_DEAD r90:V2DI
> >> > Successfully matched this instruction:
> >> > (set (subreg:V2DI (reg:V2DF 91) 0)
> >> > (gt:V2DI (reg:V2DI 89)
> >> > (reg:V2DI 93)))
> >> > allowing combination of insns 7 and 9
> >> >
> >> > and then:
> >> > Trying 6, 9 -> 10:
> >> > 6: r89:V2DI=const_vector
> >> > 9: r91:V2DF#0=r89:V2DI>r93:V2DI
> >> >   REG_DEAD r89:V2DI
> >> >   REG_DEAD r93:V2DI
> >> >10: r87:SI=unspec[r91:V2DF] 43
> >> >   REG_DEAD r91:V2DF
> >> > Successfully matched this instruction:
> >> > 

Re: [PATCH][gcc] libgccjit: check result_type in gcc_jit_context_new_binary_op

2019-06-26 Thread David Malcolm
On Tue, 2019-06-25 at 08:11 +, Andrea Corallo wrote:
> Hi,
> third version for this patch with the simplified test.
> 
> make check-jit pass clean
> 
> Bests
>   Andrea
> 
> 2019-06-09  Andrea Corallo  andrea.cora...@arm.com
> 
> * libgccjit.c (gcc_jit_context_new_binary_op): Check result_type to
> be a
> numeric type.
> 
> 
> 2019-06-20  Andrea Corallo andrea.cora...@arm.com
> 
> * jit.dg/test-error-gcc_jit_context_new_binary_op-bad-res-type.c:
> New testcase.

Thanks for the updated patch.

This is good for trunk.

(Copying and pasting from my other review): are you working on getting
SVN commit access, or do you want me to commit your two patches on your
behalf?

Thanks
Dave


Re: [PATCH][gcc] libgccjit: add bitfield support

2019-06-26 Thread David Malcolm
On Wed, 2019-06-26 at 11:07 +, Andrea Corallo wrote:
> Hi David,
> thanks for the suggestions.
> Updated version for the bitfield libgccjit support patch here
> addressing comments.
> 
> test-error-gcc_jit_context_new_bitfield-invalid-width.c is reworked
> and now assume that the long of the compiler compiling the test is of
> the same size of the libgccjit long.
> I'm not sure this assumption is sufficent, in case is not we have to
> find another way around this.
> 
> Checked with make check-jit runs clean.
> 
> Bests
> 
>   Andrea
> 
> 
> 2019-06-20  Andrea Corallo andrea.cora...@arm.com
> 
> * docs/topics/compatibility.rst (LIBGCCJIT_ABI_12): New ABI tag.
> * docs/topics/types.rst: Add gcc_jit_context_new_bitfield.
> * jit-common.h (namespace recording): Add class bitfield.
> * jit-playback.c:
> (DECL_C_BIT_FIELD, SET_DECL_C_BIT_FIELD): Add macros.
> (playback::context::new_bitfield): New method.
> (playback::compound_type::set_fields): Add bitfield support.
> (playback::lvalue::mark_addressable): Was jit_mark_addressable make
> this
> a method of lvalue plus return a bool to communicate success.
> (playback::lvalue::get_address): Check for jit_mark_addressable
> return
> value.
> * jit-playback.h (new_bitfield): New method.
> (class bitfield): New class.
> (class lvalue): Add jit_mark_addressable method.
> * jit-recording.c (recording::context::new_bitfield): New method.
> (recording::bitfield::replay_into): New method.
> (recording::bitfield::write_to_dump): Likewise.
> (recording::bitfield::make_debug_string): Likewise.
> (recording::bitfield::write_reproducer): Likewise.
> * jit-recording.h (class context): Add new_bitfield method.
> (class field): Make it derivable by class bitfield.
> (class bitfield): Add new class.
> * libgccjit++.h (class context): Add new_bitfield method.
> * libgccjit.c (struct gcc_jit_bitfield): New structure.
> (gcc_jit_context_new_bitfield): New function.
> * libgccjit.h
> (LIBGCCJIT_HAVE_gcc_jit_context_new_bitfield) New macro.
> (gcc_jit_context_new_bitfield): New function.
> * libgccjit.map (LIBGCCJIT_ABI_12) New ABI tag.
> 
> 
> 2019-06-20  Andrea Corallo andrea.cora...@arm.com
> 
> * jit.dg/all-non-failing-tests.h: Add test-accessing-bitfield.c.
> * jit.dg/test-accessing-bitfield.c: New testcase.
> * jit.dg/test-error-gcc_jit_context_new_bitfield-invalid-type.c:
> Likewise.
> * jit.dg/test-error-gcc_jit_context_new_bitfield-invalid-width.c:
> Likewise.
> * jit.dg/test-error-gcc_jit_lvalue_get_address-bitfield.c:
> Likewise.

Thanks for the updated patch.

One last nit:

[...]

> diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
> index b74495c..ae8a732 100644
> --- a/gcc/jit/jit-playback.c
> +++ b/gcc/jit/jit-playback.c
> @@ -47,6 +47,12 @@ along with GCC; see the file COPYING3.  If not see
>  #include "jit-builtins.h"
>  #include "jit-tempdir.h"
>  
> +/* Compare with gcc/c-family/c-common.h

We should say what to compare it against:
  Compare with "DECL_C_BIT_FIELD" etc in gcc/c-family/c-common.h

> +   This is redefined here to avoid depending from the C
frontend.  */

s/from/on/ also, FWIW


> +#define DECL_JIT_BIT_FIELD(NODE) \
> +  (DECL_LANG_FLAG_4 (FIELD_DECL_CHECK (NODE)) == 1)
> +#define SET_DECL_JIT_BIT_FIELD(NODE) \
> +  (DECL_LANG_FLAG_4 (FIELD_DECL_CHECK (NODE)) = 1)

With that change, the patch is good for trunk - thanks for all your
work on this.

Are you working on getting SVN commit access, or do you want me to
commit your two patches on your behalf?

Dave


[PR preprocessor/90927] Fixe dependency output

2019-06-26 Thread Nathan Sidwell
this patch fixes 90927.  The assert triggers when the user uses both -MT 
and -MQ target options in an unfortunate order.  I had thought about 
this and thought users wouldn't use both.  You'd think by now that I've 
learnt the answer to 'Would a user ever do $X?', is 'Yes, of course they 
will'.  Heck, they do things I can't even imagine!


applying to trunk

--
Nathan Sidwell
2019-06-26  Nathan Sidwell  

	libcpp/
	PR preprocessor/90927
	* mkdeps.c (mkdeps::vec::operator[]): Add non-const variant.
	(deps_add_target): Deal with out of order unquoted targets.

	gcc/testsuite/
	* c-c++-common/pr90927.c: New.


Index: libcpp/mkdeps.c
===
--- libcpp/mkdeps.c	(revision 272688)
+++ libcpp/mkdeps.c	(working copy)
@@ -60,4 +60,8 @@ public:
   return ary[ix];
 }
+T [] (unsigned ix)
+{
+  return ary[ix];
+}
 void push (const T )
 {
@@ -236,12 +240,20 @@ void
 deps_add_target (struct mkdeps *d, const char *t, int quote)
 {
-  t = apply_vpath (d, t);
+  t = xstrdup (apply_vpath (d, t));
+
   if (!quote)
 {
-  gcc_assert (d->quote_lwm == d->targets.size ());
+  /* Sometimes unquoted items are added after quoted ones.
+	 Swap out the lowest quoted.  */
+  if (d->quote_lwm != d->targets.size ())
+	{
+	  const char *lowest = d->targets[d->quote_lwm];
+	  d->targets[d->quote_lwm] = t;
+	  t = lowest;
+	}
   d->quote_lwm++;
 }
 
-  d->targets.push (xstrdup (t));
+  d->targets.push (t);
 }
 
Index: gcc/testsuite/c-c++-common/pr90927.c
===
--- gcc/testsuite/c-c++-common/pr90927.c	(revision 0)
+++ gcc/testsuite/c-c++-common/pr90927.c	(working copy)
@@ -0,0 +1,6 @@
+/* { dg-do preprocess } */
+/* { dg-additional-options "-M -MQ b\\\$ob -MT b\\\$ill" } */
+
+int i;
+
+/* { dg-final { scan-file pr90927.i {b\$ill b\$\$ob:} } } */


[PATCH 2/2] rs6000: Fix rs6000_keep_leaf_when_profiled

2019-06-26 Thread Segher Boessenkool
This function is called from elsewhere, so shouldn't be static.

Committing.


2019-06-26  Segher Boessenkool  

* config/rs6000/rs6000-internal.h (rs6000_keep_leaf_when_profiled): New
declaration.
* config/rs6000/rs6000-logue.c (rs6000_keep_leaf_when_profiled): Remove
"static".
* config/rs6000/rs6000-logue.c (rs6000_keep_leaf_when_profiled): Delete
declaration.

---
 gcc/config/rs6000/rs6000-internal.h | 1 +
 gcc/config/rs6000/rs6000-logue.c| 4 ++--
 gcc/config/rs6000/rs6000.c  | 1 -
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-internal.h 
b/gcc/config/rs6000/rs6000-internal.h
index a1acb66..22ebd37 100644
--- a/gcc/config/rs6000/rs6000-internal.h
+++ b/gcc/config/rs6000/rs6000-internal.h
@@ -99,6 +99,7 @@ extern bool save_reg_p (int reg);
 extern const char * rs6000_machine_from_flags (void);
 extern void emit_asm_machine (void);
 extern bool rs6000_global_entry_point_prologue_needed_p (void);
+extern bool rs6000_keep_leaf_when_profiled (void);
 
 /* Return true if the OFFSET is valid for the quad address instructions that
use d-form (register + offset) addressing.  */
diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c
index 9df4b5a..adc137b 100644
--- a/gcc/config/rs6000/rs6000-logue.c
+++ b/gcc/config/rs6000/rs6000-logue.c
@@ -4025,8 +4025,8 @@ rs6000_output_function_prologue (FILE *file)
 
 /* -mprofile-kernel code calls mcount before the function prolog,
so a profiled leaf function should stay a leaf function.  */
-static bool
-rs6000_keep_leaf_when_profiled ()
+bool
+rs6000_keep_leaf_when_profiled (void)
 {
   return TARGET_PROFILE_KERNEL;
 }
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 3fc4029..bcfc881 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1338,7 +1338,6 @@ static bool rs6000_secondary_reload_move (enum 
rs6000_reg_type,
  secondary_reload_info *,
  bool);
 rtl_opt_pass *make_pass_analyze_swaps (gcc::context*);
-static bool rs6000_keep_leaf_when_profiled () __attribute__ ((unused));
 static tree rs6000_fold_builtin (tree, int, tree *, bool);
 
 /* Hash table stuff for keeping track of TOC entries.  */
-- 
1.8.3.1



[PATCH 1/2] rs6000: Remove duplicated code

2019-06-26 Thread Segher Boessenkool
A large portion of the code moved from rs6000.c (to rs6000-logue.c)
was accidentally retained.  This fixes it.

Committing.


Segher


2019-06-26  Segher Boessenkool  

* rs6000.c: Fix previous commit, it missed some changes.

---
 gcc/config/rs6000/rs6000.c | 1279 
 1 file changed, 1279 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e52a971..3fc4029 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -23691,1285 +23691,6 @@ get_TOC_alias_set (void)
   return set;
 }
 
-/* This ties together stack memory (MEM with an alias set of frame_alias_set)
-   and the change to the stack pointer.  */
-
-static void
-rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
-{
-  rtvec p;
-  int i;
-  rtx regs[3];
-
-  i = 0;
-  regs[i++] = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
-  if (hard_frame_needed)
-regs[i++] = gen_rtx_REG (Pmode, HARD_FRAME_POINTER_REGNUM);
-  if (!(REGNO (fp) == STACK_POINTER_REGNUM
-   || (hard_frame_needed
-   && REGNO (fp) == HARD_FRAME_POINTER_REGNUM)))
-regs[i++] = fp;
-
-  p = rtvec_alloc (i);
-  while (--i >= 0)
-{
-  rtx mem = gen_frame_mem (BLKmode, regs[i]);
-  RTVEC_ELT (p, i) = gen_rtx_SET (mem, const0_rtx);
-}
-
-  emit_insn (gen_stack_tie (gen_rtx_PARALLEL (VOIDmode, p)));
-}
-
-/* Allocate SIZE_INT bytes on the stack using a store with update style insn
-   and set the appropriate attributes for the generated insn.  Return the
-   first insn which adjusts the stack pointer or the last insn before
-   the stack adjustment loop. 
-
-   SIZE_INT is used to create the CFI note for the allocation.
-
-   SIZE_RTX is an rtx containing the size of the adjustment.  Note that
-   since stacks grow to lower addresses its runtime value is -SIZE_INT.
-
-   ORIG_SP contains the backchain value that must be stored at *sp.  */
-
-static rtx_insn *
-rs6000_emit_allocate_stack_1 (HOST_WIDE_INT size_int, rtx orig_sp)
-{
-  rtx_insn *insn;
-
-  rtx size_rtx = GEN_INT (-size_int);
-  if (size_int > 32767)
-{
-  rtx tmp_reg = gen_rtx_REG (Pmode, 0);
-  /* Need a note here so that try_split doesn't get confused.  */
-  if (get_last_insn () == NULL_RTX)
-   emit_note (NOTE_INSN_DELETED);
-  insn = emit_move_insn (tmp_reg, size_rtx);
-  try_split (PATTERN (insn), insn, 0);
-  size_rtx = tmp_reg;
-}
-  
-  if (TARGET_32BIT)
-insn = emit_insn (gen_movsi_update_stack (stack_pointer_rtx,
- stack_pointer_rtx,
- size_rtx,
- orig_sp));
-  else
-insn = emit_insn (gen_movdi_update_stack (stack_pointer_rtx,
- stack_pointer_rtx,
- size_rtx,
- orig_sp));
-  rtx par = PATTERN (insn);
-  gcc_assert (GET_CODE (par) == PARALLEL);
-  rtx set = XVECEXP (par, 0, 0);
-  gcc_assert (GET_CODE (set) == SET);
-  rtx mem = SET_DEST (set);
-  gcc_assert (MEM_P (mem));
-  MEM_NOTRAP_P (mem) = 1;
-  set_mem_alias_set (mem, get_frame_alias_set ());
-
-  RTX_FRAME_RELATED_P (insn) = 1;
-  add_reg_note (insn, REG_FRAME_RELATED_EXPR,
-   gen_rtx_SET (stack_pointer_rtx,
-gen_rtx_PLUS (Pmode,
-  stack_pointer_rtx,
-  GEN_INT (-size_int;
-
-  /* Emit a blockage to ensure the allocation/probing insns are
- not optimized, combined, removed, etc.  Add REG_STACK_CHECK
- note for similar reasons.  */
-  if (flag_stack_clash_protection)
-{
-  add_reg_note (insn, REG_STACK_CHECK, const0_rtx);
-  emit_insn (gen_blockage ());
-}
-
-  return insn;
-}
-
-static HOST_WIDE_INT
-get_stack_clash_protection_probe_interval (void)
-{
-  return (HOST_WIDE_INT_1U
- << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL));
-}
-
-static HOST_WIDE_INT
-get_stack_clash_protection_guard_size (void)
-{
-  return (HOST_WIDE_INT_1U
- << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE));
-}
-
-/* Allocate ORIG_SIZE bytes on the stack and probe the newly
-   allocated space every STACK_CLASH_PROTECTION_PROBE_INTERVAL bytes.
-
-   COPY_REG, if non-null, should contain a copy of the original
-   stack pointer at exit from this function.
-
-   This is subtly different than the Ada probing in that it tries hard to
-   prevent attacks that jump the stack guard.  Thus it is never allowed to
-   allocate more than STACK_CLASH_PROTECTION_PROBE_INTERVAL bytes of stack
-   space without a suitable probe.  */
-static rtx_insn *
-rs6000_emit_probe_stack_range_stack_clash (HOST_WIDE_INT orig_size,
-  rtx copy_reg)
-{
-  rtx orig_sp = copy_reg;
-
-  HOST_WIDE_INT probe_interval = 

Re: [RFA][tree-optimization/90883] Improve DSE to handle redundant calls

2019-06-26 Thread Richard Biener
On Wed, Jun 26, 2019 at 6:17 AM Jeff Law  wrote:
>
> So based on the conversation in the BZ I cobbled together a patch to
> extend tree-ssa-dse.c to also detect redundant stores.
>
> To be clear, given two stores, the first store is dead if the later
> store overwrites all the live bytes set by the first store.   In this
> case we delete the first store.  If the first store is partially dead we
> may trim it.
>
> Given two stores, the second store is redundant if it stores _the same
> value_ into locations set by the first store.  In this case we delete
> the second store.
>
>
> We prefer to remove redundant stores over removing dead or trimming
> partially dead stores.
>
> First, if we detect a redundant store, we can always remove it.  We may
> not always be able to trim a partially dead store.  So removing the
> redundant store wins in this case.
>
> But even if the redundant store occurs at the head or tail of the prior
> store, removing the redundant store is better than trimming the
> partially dead store because we end up with fewer calls to memset with
> the same number of total bytes written.
>
> We only look for redundant stores in a few cases.  The first store must
> be a memset, empty constructor or calloc call -- ie things which
> initialize multiple memory locations to zero.  Subsequent stores can
> occur via memset, empty constructors or simple memory assignments.
>
> The chagne to tree-ssa-alias.c deserves a quick note.
>
> When we're trying to determine if we have a redundant store, we create
> an AO_REF for the *second* store, then ask the alias system if the first
> store would kill the AO_REF.
>
> So while normally a calloc wouldn't ever kill anything in normal
> execution order, we're not asking about things in execution order.  We
> really just want to know if the calloc is going to write into the
> entirety of the AO_REF of the subsequent store.  So we compute the size
> of the allocation and we know the destination from the LHS of the calloc
> call and everything "just works".

I see how stmt_kills_ref_p is convenient here and it's the only
ref-must-include-other-ref kind of query the oracle supports right now.
Note it is not optimized for your particular case querying the same
stmt for multiple refs.  It's not refs_must_alias_p that is missing
but something stronger ('kills' is also wrong since both refs might
be reads), ref_covered_by_ref_p or so.  That said, factoring
stmt_kills_ref_p might not be so straight-forward for calls since
we lack a general ao_ref_init for calls (ao_ref_init_stores_from_call,
ao_ref_init_loads_from_call?).

So I think the tree-ssa-alias.c change is fine but please put a
comment before

+ if (DECL_FUNCTION_CODE (callee) == BUILT_IN_CALLOC)
+   {
+ tree arg0 = gimple_call_arg (stmt, 0);

explaining this is used by DSE to detect redundant stores.

> This patch also includes a hunk I apparently left out from yesterday's
> submission which just adds _CHK cases to all the existing BUILT_IN_MEM*
> cases.  That's what I get for writing this part first, then adding the
> _CHK stuff afterwards, then reversing the order of submission.
>
> This includes a slightly reduced testcase from the BZ in g++.dg -- it's
> actually a good way to capture when one empty constructor causes another
> empty constructor to be redundant.  The gcc.dg cases capture other
> scenarios.
>
> This has been bootstrapped and regression tested on x86-64, i686, ppc64,
> ppc64le, sparc64 & aarch64.  It's also bootstrapped on various arm
> targets, alpha, m68k, mips, riscv64, sh4.  It's been built and tested on
> a variety of *-elf targets as well as various *-linux-gnu targets as
> crosses.  ANd just for giggles it was tested before the changes to add
> the _CHK support, so it works with and without that as well.
>
> OK for the trunk?

I also notice we wouldn't handle

   memset(p, 1, 64);
   memset(p, 1, 32);

(non-zero-initializer) or

  x = {};
  y = {};
  x.a = {};

(intermediate non-aliasing store)

Did you see if / how often this triggers on trunk?

OK.

Thanks,
Richard.

>
> Jeff


Re: [PATCH] Remove quite obvious dead assignments.

2019-06-26 Thread Martin Jambor
Hi,

On Wed, Jun 26 2019, Martin Liška wrote:
> Hi.
>
> I've spent some with clang-static-analyzer and I analyzed the warnings 
> reported.
> As always wit analyzers, majority of the issues are false positives, however 
> it caught
> couple of real issues:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90973
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90978
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90976
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90975
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90970
>
> That said, I'm sending a patch that rapidly shrinks number of Dead 
> assignments.
> I've chosen to remove only these that are quite trivial and that do not span
> among multiple if-else branches.
>
> I hope the patch will be readable and approved.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. I'm
> also testing that on ppc64 big endian machine.
>
> Ready to be installed?
> Thanks,
> Martin
>
>
> gcc/ChangeLog:
>
> 2019-06-26  Martin Liska  
>
>   * ipa-cp.c (cgraph_edge_brings_all_agg_vals_for_node): Likewise.

The ipa-cp.c hunk is obvious, I'd say.

Thanks,

Martin


> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 69c00a9c5a5..b6e781f7450 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -4445,7 +4445,6 @@ static bool
>  cgraph_edge_brings_all_agg_vals_for_node (struct cgraph_edge *cs,
> struct cgraph_node *node)
>  {
> -  struct ipa_node_params *orig_caller_info = IPA_NODE_REF (cs->caller);
>struct ipa_node_params *orig_node_info;
>struct ipa_agg_replacement_value *aggval;
>int i, ec, count;
> @@ -4462,8 +4461,6 @@ cgraph_edge_brings_all_agg_vals_for_node (struct 
> cgraph_edge *cs,
>   return false;
>  
>orig_node_info = IPA_NODE_REF (IPA_NODE_REF (node)->ipcp_orig_node);
> -  if (orig_caller_info->ipcp_orig_node)
> -orig_caller_info = IPA_NODE_REF (orig_caller_info->ipcp_orig_node);
>  
>for (i = 0; i < count; i++)
>  {


Re: [PATCH] Remove quite obvious dead assignments.

2019-06-26 Thread Jakub Jelinek
On Wed, Jun 26, 2019 at 12:57:15PM +0200, Martin Liška wrote:
> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -1713,8 +1713,8 @@ asan_emit_allocas_unpoison (rtx top, rtx bot, rtx_insn 
> *before)
>rtx ret = init_one_libfunc ("__asan_allocas_unpoison");
>top = convert_memory_address (ptr_mode, top);
>bot = convert_memory_address (ptr_mode, bot);
> -  ret = emit_library_call_value (ret, NULL_RTX, LCT_NORMAL, ptr_mode,
> -  top, ptr_mode, bot, ptr_mode);
> +  emit_library_call_value (ret, NULL_RTX, LCT_NORMAL, ptr_mode,
> +top, ptr_mode, bot, ptr_mode);

If you don't need the return value, you should use emit_library_call,
not emit_library_call_value.

> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -3044,7 +3044,7 @@ expand_asm_stmt (gasm *stmt)
> }
>   }
>  }
> -  unsigned nclobbers = clobber_rvec.length();
> +  unsigned nclobbers;

Can you instead move the nclobbers declaration to the spot where it
is initialized (right now the second time), and add space before (
there too?

> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -16051,8 +16051,6 @@ ix86_expand_rounddf_32 (rtx operand0, rtx operand1)
>mhalf = expand_simple_binop (mode, MINUS, half, one, NULL_RTX,
>  0, OPTAB_DIRECT);
>  
> -  /* Compensate.  */
> -  tmp = gen_reg_rtx (mode);
>/* xa2 = xa2 - (dxa > 0.5 ? 1 : 0) */
>tmp = ix86_expand_sse_compare_mask (UNGT, dxa, half, false);
>emit_insn (gen_rtx_SET (tmp, gen_rtx_AND (mode, one, tmp)));

Is this really desirable?
Just quick look suggests perhaps it wants to use one temporary
for the gen_reg_rtx (mode) and use that on the lhs of SET, and another one
as the last operand to AND holding ix86_expand_sse_compare_mask?
Though, such problem is in most of the ix86_expand_sse_compare_mask callers.
While ix86_expand_sse_compare_mask returns a result of gen_reg_rtx too,
I think it is better not to reuse pseudos for different values.
I'd say take out this change and deal with it with i386 maintainers
separately.

> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -3483,8 +3483,7 @@ vectorizable_call (stmt_vec_info stmt_info, 
> gimple_stmt_iterator *gsi,
>   = gimple_build_call_internal_vec (ifn, vargs);
> gimple_call_set_lhs (call, half_res);
> gimple_call_set_nothrow (call, true);
> -   new_stmt_info
> - = vect_finish_stmt_generation (stmt_info, call, gsi);
> +   vect_finish_stmt_generation (stmt_info, call, gsi);
> if ((i & 1) == 0)
>   {
> prev_res = half_res;
> @@ -3583,8 +3582,7 @@ vectorizable_call (stmt_vec_info stmt_info, 
> gimple_stmt_iterator *gsi,
> gcall *call = gimple_build_call_internal_vec (ifn, vargs);
> gimple_call_set_lhs (call, half_res);
> gimple_call_set_nothrow (call, true);
> -   new_stmt_info
> - = vect_finish_stmt_generation (stmt_info, call, gsi);
> +   vect_finish_stmt_generation (stmt_info, call, gsi);
> if ((j & 1) == 0)
>   {
> prev_res = half_res;

This looks wrong.
There should have been
SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt_info);
in between for slp_node, or the usual code like:
  if (cond_for_first)
STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt_info;
  else
STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
  prev_stmt_info = new_stmt_info;
otherwise.  In any case, I think this should be dealt with separately.

Jakub


Re: [RFA] Handle _CHK builtins in tree-ssa-dse.c

2019-06-26 Thread Richard Biener
On Tue, Jun 25, 2019 at 2:20 AM Jeff Law  wrote:
>
> These are some minor improvements to tree-ssa-dse, in particular it adds
> handling of the _CHK variants of the supported functions (memcpy,
> memmove, memset).  It's just something I noticed while poking at 90883.
>
> These don't trigger during bootstraps, but do in the testsuite.   The
> tests that were changed were verified to make sure that the
> removal/trimming of *_CHK calls were correct.  For example, some tests
> in the c-torture/builtins directory do things like
>
>   __builtin_memset_chk (...);
>   abort ();
>
> Since the memory locations are never read, DSE just removes the call.
> This happened with enough regularity in the c-torture/execute/builtins
> tests that I changed the .exp driver to add the -fno-tree-dse flag.  In
> the other cases I just changed the affected tests.
>
> Bootstrapped and regression tested  on x86_64, ppc64le, sparc64, others
> will follow (aarch, ppc64, i686, etc).  It's built toolchains &
> libraries and regression tested on a wide variety of other platforms as
> well.
>
>
> OK for the trunk?

OK.

Thanks,
Richard.

> jeff


[PATCH] Fix ICE in PR90982

2019-06-26 Thread Richard Biener


The following works around an issue in IPA SRA which does analysis
twice, once on the original function body and once on the cloned one
when replacing refs.  If both end up not agreeing we end up with
stale unreplaced refs - as in this case where earlier analysis had
access to SSA range info which isn't copied in the process of cloning.

Thus the following patch copies over SSA range info which is a good
idea anyways.

Not sure to what extent the IPA SRA issue will prevail with the
new implementation.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2019-06-26  Richard Biener  

PR ipa/90982
* tree-inline.c (remap_ssa_name): Copy SSA range info.

* g++.dg/torture/pr90982.C: New testcase.

Index: gcc/tree-inline.c
===
--- gcc/tree-inline.c   (revision 272636)
+++ gcc/tree-inline.c   (working copy)
@@ -259,6 +259,11 @@ remap_ssa_name (tree name, copy_body_dat
  struct ptr_info_def *new_pi = get_ptr_info (new_tree);
  new_pi->pt = pi->pt;
}
+  /* So can range-info.  */
+  if (!POINTER_TYPE_P (TREE_TYPE (name))
+ && SSA_NAME_RANGE_INFO (name))
+   duplicate_ssa_name_range_info (new_tree, SSA_NAME_RANGE_TYPE (name),
+  SSA_NAME_RANGE_INFO (name));
   return new_tree;
 }
 
@@ -292,6 +297,11 @@ remap_ssa_name (tree name, copy_body_dat
  struct ptr_info_def *new_pi = get_ptr_info (new_tree);
  new_pi->pt = pi->pt;
}
+  /* So can range-info.  */
+  if (!POINTER_TYPE_P (TREE_TYPE (name))
+ && SSA_NAME_RANGE_INFO (name))
+   duplicate_ssa_name_range_info (new_tree, SSA_NAME_RANGE_TYPE (name),
+  SSA_NAME_RANGE_INFO (name));
   if (SSA_NAME_IS_DEFAULT_DEF (name))
{
  /* By inlining function having uninitialized variable, we might
Index: gcc/testsuite/g++.dg/torture/pr90982.C
===
--- gcc/testsuite/g++.dg/torture/pr90982.C  (nonexistent)
+++ gcc/testsuite/g++.dg/torture/pr90982.C  (working copy)
@@ -0,0 +1,23 @@
+// { dg-do compile }
+
+template  struct S
+{
+  long c[n];
+  void f (S d)
+{
+  for (int i = 2;; i++)
+   c[i] &= d.c[i];
+}
+};
+
+template  struct T:S
+{
+  void operator &= (T d)
+{ this -> f (d); }
+};
+
+void g (T<192> )
+{
+  T<192> v;
+  d &= v;
+}


[PATCH] Kill lto_bitmap_alloc/free

2019-06-26 Thread Richard Biener


This removes another global obstack and adjusts the single user,
fixing inconsitent freeing on the way.

Bootstrapped/tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-06-26  Richard Biener  

* lto-streamer.h (lto_bitmap_alloc): Remove.
(lto_bitmap_free): Likewise.
* lto-streamer.c (lto_bitmap_alloc): Remove.
(lto_bitmap_free): Likewise.
(lto_obstack): Likewise.
(lto_obstack_initialized): Likewise.
* lto-streamer-out.c (lto_output): Use own obstack for local
bitmap, free it consistently.

Index: gcc/lto-streamer.c
===
--- gcc/lto-streamer.c  (revision 272636)
+++ gcc/lto-streamer.c  (working copy)
@@ -35,11 +35,6 @@ along with GCC; see the file COPYING3.
 /* Statistics gathered during LTO, WPA and LTRANS.  */
 struct lto_stats_d lto_stats;
 
-/* LTO uses bitmaps with different life-times.  So use a separate
-   obstack for all LTO bitmaps.  */
-static bitmap_obstack lto_obstack;
-static bool lto_obstack_initialized;
-
 const char *section_name_prefix = LTO_SECTION_NAME_PREFIX;
 /* Set when streaming LTO for offloading compiler.  */
 bool lto_stream_offload_p;
@@ -113,28 +108,6 @@ lto_tag_name (enum LTO_tags tag)
 }
 
 
-/* Allocate a bitmap from heap.  Initializes the LTO obstack if necessary.  */
-
-bitmap
-lto_bitmap_alloc (void)
-{
-  if (!lto_obstack_initialized)
-{
-  bitmap_obstack_initialize (_obstack);
-  lto_obstack_initialized = true;
-}
-  return BITMAP_ALLOC (_obstack);
-}
-
-/* Free bitmap B.  */
-
-void
-lto_bitmap_free (bitmap b)
-{
-  BITMAP_FREE (b);
-}
-
-
 /* Get a section name for a particular type or name.  The NAME field
is only used if SECTION_TYPE is LTO_section_function_body. For all
others it is ignored.  The callee of this function is responsible
Index: gcc/lto-streamer.h
===
--- gcc/lto-streamer.h  (revision 272636)
+++ gcc/lto-streamer.h  (working copy)
@@ -822,8 +822,6 @@ extern void lto_append_block (struct lto
 extern bool lto_stream_offload_p;
 
 extern const char *lto_tag_name (enum LTO_tags);
-extern bitmap lto_bitmap_alloc (void);
-extern void lto_bitmap_free (bitmap);
 extern char *lto_get_section_name (int, const char *, struct 
lto_file_decl_data *);
 extern void print_lto_report (const char *);
 extern void lto_streamer_init (void);
Index: gcc/lto-streamer-out.c
===
--- gcc/lto-streamer-out.c  (revision 272636)
+++ gcc/lto-streamer-out.c  (working copy)
@@ -2397,13 +2397,17 @@ lto_output (void)
 {
   struct lto_out_decl_state *decl_state;
   bitmap output = NULL;
+  bitmap_obstack output_obstack;
   int i, n_nodes;
   lto_symtab_encoder_t encoder = lto_get_out_decl_state 
()->symtab_node_encoder;
 
   prune_offload_funcs ();
 
   if (flag_checking)
-output = lto_bitmap_alloc ();
+{
+  bitmap_obstack_initialize (_obstack);
+  output = BITMAP_ALLOC (_obstack);
+}
 
   /* Initialize the streamer.  */
   lto_streamer_init ();
@@ -2419,10 +2423,7 @@ lto_output (void)
  && !node->alias)
{
  if (flag_checking)
-   {
- gcc_assert (!bitmap_bit_p (output, DECL_UID (node->decl)));
- bitmap_set_bit (output, DECL_UID (node->decl));
-   }
+   gcc_assert (bitmap_set_bit (output, DECL_UID (node->decl)));
  decl_state = lto_new_out_decl_state ();
  lto_push_out_decl_state (decl_state);
  if (gimple_has_body_p (node->decl)
@@ -2452,10 +2453,7 @@ lto_output (void)
{
  timevar_push (TV_IPA_LTO_CTORS_OUT);
  if (flag_checking)
-   {
- gcc_assert (!bitmap_bit_p (output, DECL_UID (node->decl)));
- bitmap_set_bit (output, DECL_UID (node->decl));
-   }
+   gcc_assert (bitmap_set_bit (output, DECL_UID (node->decl)));
  decl_state = lto_new_out_decl_state ();
  lto_push_out_decl_state (decl_state);
  if (DECL_INITIAL (node->decl) != error_mark_node
@@ -2480,9 +2478,11 @@ lto_output (void)
 
   output_offload_tables ();
 
-#if CHECKING_P
-  lto_bitmap_free (output);
-#endif
+  if (flag_checking)
+{
+  BITMAP_FREE (output);
+  bitmap_obstack_release (_obstack);
+}
 }
 
 /* Write each node in encoded by ENCODER to OB, as well as those reachable


Re: [PATCH][gcc] libgccjit: add bitfield support

2019-06-26 Thread Andrea Corallo
Hi David,
thanks for the suggestions.
Updated version for the bitfield libgccjit support patch here
addressing comments.

test-error-gcc_jit_context_new_bitfield-invalid-width.c is reworked
and now assume that the long of the compiler compiling the test is of
the same size of the libgccjit long.
I'm not sure this assumption is sufficent, in case is not we have to
find another way around this.

Checked with make check-jit runs clean.

Bests

  Andrea


2019-06-20  Andrea Corallo andrea.cora...@arm.com

* docs/topics/compatibility.rst (LIBGCCJIT_ABI_12): New ABI tag.
* docs/topics/types.rst: Add gcc_jit_context_new_bitfield.
* jit-common.h (namespace recording): Add class bitfield.
* jit-playback.c:
(DECL_C_BIT_FIELD, SET_DECL_C_BIT_FIELD): Add macros.
(playback::context::new_bitfield): New method.
(playback::compound_type::set_fields): Add bitfield support.
(playback::lvalue::mark_addressable): Was jit_mark_addressable make this
a method of lvalue plus return a bool to communicate success.
(playback::lvalue::get_address): Check for jit_mark_addressable return
value.
* jit-playback.h (new_bitfield): New method.
(class bitfield): New class.
(class lvalue): Add jit_mark_addressable method.
* jit-recording.c (recording::context::new_bitfield): New method.
(recording::bitfield::replay_into): New method.
(recording::bitfield::write_to_dump): Likewise.
(recording::bitfield::make_debug_string): Likewise.
(recording::bitfield::write_reproducer): Likewise.
* jit-recording.h (class context): Add new_bitfield method.
(class field): Make it derivable by class bitfield.
(class bitfield): Add new class.
* libgccjit++.h (class context): Add new_bitfield method.
* libgccjit.c (struct gcc_jit_bitfield): New structure.
(gcc_jit_context_new_bitfield): New function.
* libgccjit.h
(LIBGCCJIT_HAVE_gcc_jit_context_new_bitfield) New macro.
(gcc_jit_context_new_bitfield): New function.
* libgccjit.map (LIBGCCJIT_ABI_12) New ABI tag.


2019-06-20  Andrea Corallo andrea.cora...@arm.com

* jit.dg/all-non-failing-tests.h: Add test-accessing-bitfield.c.
* jit.dg/test-accessing-bitfield.c: New testcase.
* jit.dg/test-error-gcc_jit_context_new_bitfield-invalid-type.c:
Likewise.
* jit.dg/test-error-gcc_jit_context_new_bitfield-invalid-width.c:
Likewise.
* jit.dg/test-error-gcc_jit_lvalue_get_address-bitfield.c:
Likewise.
diff --git a/gcc/jit/docs/topics/compatibility.rst b/gcc/jit/docs/topics/compatibility.rst
index abefa56..da64920 100644
--- a/gcc/jit/docs/topics/compatibility.rst
+++ b/gcc/jit/docs/topics/compatibility.rst
@@ -177,3 +177,8 @@ entrypoints:
 
 ``LIBGCCJIT_ABI_11`` covers the addition of
 :func:`gcc_jit_context_add_driver_option`
+
+``LIBGCCJIT_ABI_12``
+
+``LIBGCCJIT_ABI_12`` covers the addition of
+:func:`gcc_jit_context_new_bitfield`
diff --git a/gcc/jit/docs/topics/types.rst b/gcc/jit/docs/topics/types.rst
index 1d2dcd4..37d9d01 100644
--- a/gcc/jit/docs/topics/types.rst
+++ b/gcc/jit/docs/topics/types.rst
@@ -247,6 +247,30 @@ You can model C `struct` types by creating :c:type:`gcc_jit_struct *` and
underlying string, so it is valid to pass in a pointer to an on-stack
buffer.
 
+.. function:: gcc_jit_field *\
+  gcc_jit_context_new_bitfield (gcc_jit_context *ctxt,\
+gcc_jit_location *loc,\
+gcc_jit_type *type,\
+int width,\
+const char *name)
+
+   Construct a new bit field, with the given type width and name.
+
+   The parameter ``name`` must be non-NULL.  The call takes a copy of the
+   underlying string, so it is valid to pass in a pointer to an on-stack
+   buffer.
+
+   The parameter ``type`` must be an integer type.
+
+   The parameter ``width`` must be a positive integer that does not exceed the
+   size of ``type``.
+
+   This API entrypoint was added in :ref:`LIBGCCJIT_ABI_12`; you can test
+   for its presence using
+   .. code-block:: c
+
+  #ifdef LIBGCCJIT_HAVE_gcc_jit_context_new_bitfield
+
 .. function:: gcc_jit_object *\
   gcc_jit_field_as_object (gcc_jit_field *field)
 
diff --git a/gcc/jit/jit-common.h b/gcc/jit/jit-common.h
index 1d96cc3..e747d96 100644
--- a/gcc/jit/jit-common.h
+++ b/gcc/jit/jit-common.h
@@ -119,6 +119,7 @@ namespace recording {
 	class union_;
   class vector_type;
 class field;
+  class bitfield;
 class fields;
 class function;
 class block;
diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
index b74495c..ae8a732 100644
--- a/gcc/jit/jit-playback.c
+++ b/gcc/jit/jit-playback.c
@@ -47,6 +47,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "jit-builtins.h"
 #include "jit-tempdir.h"
 
+/* Compare with gcc/c-family/c-common.h
+   This is redefined here to avoid depending from the C frontend.  */
+#define DECL_JIT_BIT_FIELD(NODE) \
+  (DECL_LANG_FLAG_4 (FIELD_DECL_CHECK (NODE)) == 1)

[PATCH] Remove quite obvious dead assignments.

2019-06-26 Thread Martin Liška
Hi.

I've spent some with clang-static-analyzer and I analyzed the warnings reported.
As always wit analyzers, majority of the issues are false positives, however it 
caught
couple of real issues:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90973
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90978
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90976
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90975
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90970

That said, I'm sending a patch that rapidly shrinks number of Dead assignments.
I've chosen to remove only these that are quite trivial and that do not span
among multiple if-else branches.

I hope the patch will be readable and approved.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests. I'm
also testing that on ppc64 big endian machine.

Ready to be installed?
Thanks,
Martin


gcc/ChangeLog:

2019-06-26  Martin Liska  

* asan.c (asan_emit_allocas_unpoison): Remove obviously
dead assignments.
* bt-load.c (move_btr_def): Likewise.
* builtins.c (expand_builtin_apply_args_1): Likewise.
(expand_builtin_apply): Likewise.
* cfgexpand.c (expand_asm_stmt): Likewise.
(construct_init_block): Likewise.
* cfghooks.c (verify_flow_info): Likewise.
* cfgloopmanip.c (remove_path): Likewise.
* cfgrtl.c (rtl_verify_bb_layout): Likewise.
* cgraph.c (cgraph_node::set_pure_flag): Likewise.
* combine.c (simplify_if_then_else): Likewise.
* config/i386/i386-expand.c (ix86_expand_rounddf_32): Likewise.
* config/i386/i386.c (ix86_setup_incoming_vararg_bounds): Likewise.
(choose_basereg): Likewise.
(ix86_expand_prologue): Likewise.
(ix86_preferred_output_reload_class): Likewise.
* cselib.c (cselib_record_sets): Likewise.
* df-scan.c (df_scan_alloc): Likewise.
* dojump.c (do_jump_by_parts_greater_rtx): Likewise.
* early-remat.c (early_remat::record_equiv_candidates): Likewise.
* emit-rtl.c (try_split): Likewise.
* graphite-scop-detection.c (assign_parameter_index_in_region): 
Likewise.
* ipa-cp.c (cgraph_edge_brings_all_agg_vals_for_node): Likewise.
* ira-color.c (setup_profitable_hard_regs): Likewise.
* ira.c (rtx_moveable_p): Likewise.
* lra-eliminations.c (eliminate_regs_in_insn): Likewise.
* read-rtl.c (read_subst_mapping): Likewise.
* regrename.c (scan_rtx): Likewise.
* reorg.c (fill_slots_from_thread): Likewise.
* tree-inline.c (tree_function_versioning): Likewise.
* tree-ssa-reassoc.c (optimize_ops_list): Likewise.
* tree-ssa-sink.c (statement_sink_location): Likewise.
* tree-ssa-threadedge.c (thread_across_edge): Likewise.
* tree-vect-loop.c (vect_get_loop_niters): Likewise.
(vect_create_epilog_for_reduction): Likewise.
* tree-vect-stmts.c (vectorizable_call): Likewise.
* tree.c (build_nonstandard_integer_type): Likewise.

gcc/cp/ChangeLog:

2019-06-26  Martin Liska  

* class.c (adjust_clone_args): Remove obviously
dead assignments.
(dump_class_hierarchy_r): Likewise.
* decl.c (check_initializer): Likewise.
* parser.c (cp_parser_lambda_expression): Likewise.
* pt.c (unify_bound_ttp_args): Likewise.
(convert_template_argument): Likewise.
* rtti.c (build_headof): Likewise.
* typeck.c (convert_for_initialization): Likewise.

libgcc/ChangeLog:

2019-06-26  Martin Liska  

* libgcov-driver-system.c (gcov_exit_open_gcda_file): Remove obviously
dead assignments.
* libgcov-util.c: Likewise.
---
 gcc/asan.c |  4 ++--
 gcc/bt-load.c  |  1 -
 gcc/builtins.c |  8 ++--
 gcc/cfgexpand.c|  8 
 gcc/cfghooks.c |  2 --
 gcc/cfgloopmanip.c |  1 -
 gcc/cfgrtl.c   |  1 -
 gcc/cgraph.c   |  2 --
 gcc/combine.c  |  1 -
 gcc/config/i386/i386-expand.c  |  2 --
 gcc/config/i386/i386.c | 13 ++---
 gcc/cp/class.c |  4 
 gcc/cp/decl.c  |  2 +-
 gcc/cp/parser.c|  2 +-
 gcc/cp/pt.c|  4 +---
 gcc/cp/rtti.c  |  4 ++--
 gcc/cp/typeck.c|  2 --
 gcc/cselib.c   |  3 +--
 gcc/df-scan.c  |  2 --
 gcc/dojump.c   |  2 --
 gcc/early-remat.c  |  1 -
 gcc/emit-rtl.c |  2 +-
 gcc/graphite-scop-detection.c  |  2 --
 gcc/ipa-cp.c   |  3 ---
 gcc/ira-color.c|  1 -
 gcc/ira.c  |  3 +--
 gcc/lra-eliminations.c |  1 -
 gcc/read-rtl.c |  4 ++--
 gcc/regrename.c|  3 +--
 gcc/reorg.c|  3 +--
 gcc/tree-inline.c  |  4 ++--
 gcc/tree-ssa-reassoc.c   

[PATCH] Fix misc stuff seen by clang-static-analyzer.

2019-06-26 Thread Martin Liška
Hi.

This small stuff handles a misc clang-static-analyzer issues.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/lto/ChangeLog:

2019-06-26  Martin Liska  

* lto-dump.c (struct symbol_entry): Add default dtor.
(struct variable_entry): Likewise.
(struct function_entry): Likewise.
(dump_list_functions): Release memory.
(dump_list_variables): Likewise.

libgcc/ChangeLog:

2019-06-26  Martin Liska  

* libgcov-util.c (gcov_profile_merge): Release allocated
memory.
(calculate_overlap): Likewise.
---
 gcc/lto/lto-dump.c| 19 +--
 libgcc/libgcov-util.c |  5 +
 2 files changed, 22 insertions(+), 2 deletions(-)


diff --git a/gcc/lto/lto-dump.c b/gcc/lto/lto-dump.c
index 691d109ff34..e9de33cf5d8 100644
--- a/gcc/lto/lto-dump.c
+++ b/gcc/lto/lto-dump.c
@@ -44,6 +44,9 @@ struct symbol_entry
   symbol_entry (symtab_node *node_): node (node_)
   {}
 
+  virtual ~symbol_entry ()
+  {}
+
   char* get_name () const
   {
 if (flag_lto_dump_demangle)
@@ -72,6 +75,9 @@ struct variable_entry: public symbol_entry
   variable_entry (varpool_node *node_): symbol_entry (node_)
   {}
 
+  virtual ~variable_entry ()
+  {}
+
   virtual size_t get_size () const
   {
 varpool_node *vnode = dyn_cast (node);
@@ -99,6 +105,9 @@ struct function_entry: public symbol_entry
   function_entry (cgraph_node *node_): symbol_entry (node_)
   {}
 
+  virtual ~function_entry ()
+  {}
+
   virtual void dump ()
   {
 symbol_entry :: dump ();
@@ -166,7 +175,10 @@ void dump_list_functions (void)
   int i=0;
   symbol_entry* e;
   FOR_EACH_VEC_ELT (v, i, e)
-e->dump ();
+{
+  e->dump ();
+  delete e;
+}
 }
 
 /* Dump list of variables and their details.  */
@@ -194,7 +206,10 @@ void dump_list_variables (void)
   int i=0;
   symbol_entry* e;
   FOR_EACH_VEC_ELT (v, i, e)
-e->dump ();
+{
+  e->dump ();
+  delete e;
+}
 }
 
 /* Dump symbol list.  */
diff --git a/libgcc/libgcov-util.c b/libgcc/libgcov-util.c
index 6a94b80810b..94d4575c929 100644
--- a/libgcc/libgcov-util.c
+++ b/libgcc/libgcov-util.c
@@ -680,6 +680,9 @@ gcov_profile_merge (struct gcov_info *tgt_profile, struct gcov_info *src_profile
   tgt_tail = gi_ptr;
 }
 
+  free (in_src_not_tgt);
+  free (tgt_infos);
+
   return 0;
 }
 
@@ -1279,6 +1282,8 @@ calculate_overlap (struct gcov_info *gcov_list1,
 
 }
 
+  free (all_infos);
+
   if (overlap_obj_level)
 printf("   SUM:%36s  overlap = %6.2f%% (%5.2f%% %5.2f%%)\n",
"", sum_val*100, sum_cum_1*100, sum_cum_2*100);



[libsanitizer] Fix sanitizer_common/sanitizer_posix_libcdep.cc compilation on Solaris 11.5

2019-06-26 Thread Rainer Orth
A recent Solaris 11.5 Beta build (st_047) introduced MADV_DONTDUMP,
breaking the libsanitizer build.  The fix is already upstream

https://reviews.llvm.org/D62892

and I've now installed it on mainline and the gcc-9 branch after testing
on i386-pc-solaris2.11 and sparc-sun-solaris2.11.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2019-06-26  Rainer Orth  

* sanitizer_common/sanitizer_posix_libcdep.cc: Cherry-pick
compiler-rt revision 363778.

# HG changeset patch
# Parent  c1cbe0c8b63f80662191072c1b6dbd599e42af53
Fix sanitizer_common/sanitizer_posix_libcdep.cc compilation on Solaris 11.5

diff --git a/libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cc b/libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cc
--- a/libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cc
+++ b/libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cc
@@ -69,7 +69,7 @@ void ReleaseMemoryPagesToOS(uptr beg, up
 
 bool NoHugePagesInRegion(uptr addr, uptr size) {
 #ifdef MADV_NOHUGEPAGE  // May not be defined on old systems.
-  return madvise((void *)addr, size, MADV_NOHUGEPAGE) == 0;
+  return madvise((char *)addr, size, MADV_NOHUGEPAGE) == 0;
 #else
   return true;
 #endif  // MADV_NOHUGEPAGE
@@ -77,9 +77,9 @@ bool NoHugePagesInRegion(uptr addr, uptr
 
 bool DontDumpShadowMemory(uptr addr, uptr length) {
 #if defined(MADV_DONTDUMP)
-  return madvise((void *)addr, length, MADV_DONTDUMP) == 0;
+  return madvise((char *)addr, length, MADV_DONTDUMP) == 0;
 #elif defined(MADV_NOCORE)
-  return madvise((void *)addr, length, MADV_NOCORE) == 0;
+  return madvise((char *)addr, length, MADV_NOCORE) == 0;
 #else
   return true;
 #endif  // MADV_DONTDUMP


Re: [SVE] [fwprop] PR88833 - Redundant moves for WHILELO-based loops

2019-06-26 Thread Richard Sandiford
Prathamesh Kulkarni  writes:
> On Tue, 25 Jun 2019 at 20:05, Richard Sandiford
>  wrote:
>>
>> Prathamesh Kulkarni  writes:
>> > On Mon, 24 Jun 2019 at 21:41, Prathamesh Kulkarni
>> >  wrote:
>> >>
>> >> On Mon, 24 Jun 2019 at 19:51, Richard Sandiford
>> >>  wrote:
>> >> >
>> >> > Prathamesh Kulkarni  writes:
>> >> > > @@ -1415,6 +1460,19 @@ forward_propagate_into (df_ref use)
>> >> > >if (!def_set)
>> >> > >  return false;
>> >> > >
>> >> > > +  if (reg_prop_only
>> >> > > +  && !REG_P (SET_SRC (def_set))
>> >> > > +  && !REG_P (SET_DEST (def_set)))
>> >> > > +return false;
>> >> >
>> >> > This should be:
>> >> >
>> >> >   if (reg_prop_only
>> >> >   && (!REG_P (SET_SRC (def_set)) || !REG_P (SET_DEST (def_set
>> >> > return false;
>> >> >
>> >> > so that we return false if either operand isn't a register.
>> >> Oops, sorry about that  -:(
>> >> >
>> >> > > +
>> >> > > +  /* Allow propagations into a loop only for reg-to-reg copies, since
>> >> > > + replacing one register by another shouldn't increase the cost.  
>> >> > > */
>> >> > > +
>> >> > > +  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father
>> >> > > +  && !REG_P (SET_SRC (def_set))
>> >> > > +  && !REG_P (SET_DEST (def_set)))
>> >> > > +return false;
>> >> >
>> >> > Same here.
>> >> >
>> >> > OK with that change, thanks.
>> >> Thanks for the review, will make the changes and commit the patch
>> >> after re-testing.
>> > Hi,
>> > Testing the patch showed following failures on 32-bit x86:
>> >
>> >   Executed from: g++.target/i386/i386.exp
>> > g++:g++.target/i386/pr88152.C   scan-assembler-not 
>> > vpcmpgt|vpcmpeq|vpsra
>> >   Executed from: gcc.target/i386/i386.exp
>> > gcc:gcc.target/i386/pr66768.c scan-assembler add*.[ \t]%gs:
>> > gcc:gcc.target/i386/pr90178.c scan-assembler-times xorl[\\t
>> > ]*\\%eax,[\\t ]*%eax 1
>> >
>> > The failure of pr88152.C is also seen on x86_64.
>> >
>> > For pr66768.c, and pr90178.c, forwprop replaces register which is
>> > volatile and frame related respectively.
>> > To avoid that, the attached patch, makes a stronger constraint that
>> > src and dest should be a register
>> > and not have frame_related or volatil flags set, which is checked in
>> > usable_reg_p().
>> > Which avoids the failures for both the cases.
>> > Does it look OK ?
>>
>> That's not the reason it's a bad transform.  In both cases we're
>> propagating r2 <- r1 even though
>>
>> (a) r1 dies in the copy and
>> (b) fwprop can't replace all uses of r2, because some have multiple
>> definitions
>>
>> This has the effect of making both values live in cases where only one
>> was previously.
>>
>> In the case of pr66768.c, fwprop2 is undoing the effect of
>> cse.c:canon_reg, which tries to pick the best register to use
>> (see cse.c:make_regs_eqv).  fwprop1 makes the same change,
>> and made it even before the patch, but the cse.c choice should win.
>>
>> A (hopefully) conservative fix would be to propagate the copy only if
>> both registers have a single definition, which you can test with:
>>
>>   (DF_REG_DEF_COUNT (regno) == 1
>>&& !bitmap_bit_p (DF_LR_OUT (ENTRY_BLOCK_PTR_FOR_FN (m_fn)), regno))
>>
>> In that case, fwprop should see all uses of the destination, and should
>> be able to replace it in all cases with the source.
> Ah I see, thanks for the explanation!
> The above check works to resolve the failure.
> IIUC, !bitmap_bit_p (...) above checks that reg isn't used uninitialized ?

Right.

>> > For g++.target/i386/pr88152.C, the issue is that after the patch,
>> > forwprop1 does following propagation (in f10) which wasn't done
>> > before:
>> >
>> > In insn 10, replacing
>> >  (unspec:SI [
>> > (reg:V2DF 91)
>> > ] UNSPEC_MOVMSK)
>> >  with (unspec:SI [
>> > (subreg:V2DF (reg:V2DI 90) 0)
>> > ] UNSPEC_MOVMSK)
>> >
>> > This later defeats combine because insn 9 gets deleted.
>> > Without patch, the following combination takes place:
>> >
>> > Trying 7 -> 9:
>> > 7: r90:V2DI=r89:V2DI>r93:V2DI
>> >   REG_DEAD r93:V2DI
>> >   REG_DEAD r89:V2DI
>> > 9: r91:V2DF=r90:V2DI#0
>> >   REG_DEAD r90:V2DI
>> > Successfully matched this instruction:
>> > (set (subreg:V2DI (reg:V2DF 91) 0)
>> > (gt:V2DI (reg:V2DI 89)
>> > (reg:V2DI 93)))
>> > allowing combination of insns 7 and 9
>> >
>> > and then:
>> > Trying 6, 9 -> 10:
>> > 6: r89:V2DI=const_vector
>> > 9: r91:V2DF#0=r89:V2DI>r93:V2DI
>> >   REG_DEAD r89:V2DI
>> >   REG_DEAD r93:V2DI
>> >10: r87:SI=unspec[r91:V2DF] 43
>> >   REG_DEAD r91:V2DF
>> > Successfully matched this instruction:
>> > (set (reg:SI 87)
>> > (unspec:SI [
>> > (lt:V2DF (reg:V2DI 93)
>> > (const_vector:V2DI [
>> > (const_int 0 [0]) repeated x2
>> > ]))
>> > ] UNSPEC_MOVMSK))
>>
>> Eh?  lt:*V2DF*?  Does that mean that it's 0 for false and an all-1 NaN
>> for true?
> Well 

Re: [RFC PATCH, i386]: Autovectorize 8-byte vectors

2019-06-26 Thread Jakub Jelinek
On Wed, Jun 26, 2019 at 12:19:28PM +0200, Uros Bizjak wrote:
> > The patch isn't correct if TARGET_MMX_WITH_SSE, but not TARGET_AVX, because
> > in that case it will push only that 8 and nothing else, while you really
> > want to have 16 and 8 in that order, so that it tries to vectorize first
> > with 16-byte vectors and fall back to 8-byte.  The hook is supposed to
> > either push nothing at all, then only one vector size is tried,
> > one derived from preferred_simd_mode, or push all possible vectorization
> > sizes to be tried.
> 
> Thanks for the explanation and the patch!

It is even documented that way:
 "If the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is not\n\
the only one that is worth considering, this hook should add all suitable\n\
vector sizes to @var{sizes}, in order of decreasing preference.  The first\n\
one should be the size of @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.\n\
If @var{all} is true, add suitable vector sizes even when they are generally\n\
not expected to be worthwhile.\n\

Jakub


Re: [RFC PATCH, i386]: Autovectorize 8-byte vectors

2019-06-26 Thread Uros Bizjak
On Wed, Jun 26, 2019 at 10:47 AM Jakub Jelinek  wrote:
>
> On Wed, Jun 26, 2019 at 10:17:26AM +0200, Uros Bizjak wrote:
> > Please note that the patch regresses
> >
> > FAIL: gcc.target/i386/sse2-vect-simd-11.c scan-tree-dump-times vect
> > "vectorized [1-3] loops" 2
> > FAIL: gcc.target/i386/sse2-vect-simd-15.c scan-tree-dump-times vect
> > "vectorized [1-3] loops" 2
> >
> > For some reason, the compiler decides to vectorize with 8-byte
> > vectors, resulting in:
> >
> > missed:   not vectorized: relevant stmt not supported: _8 = (short
> > unsigned int) _4;
> > missed:  bad operation or unsupported loop bound.
> > missed: couldn't vectorize loop
> >
> > However, the unpatched compiler is able to vectorize loop using
> > 16-byte vectors. It looks that the compiler should re-run
> > vectorization with wider vectors, if vectorization with narrower
> > vectors fails. Jakub, Richard, do you have any insight in this issue?
> >
> > 2019-06-26  Uroš Bizjak  
> >
> > * config/i386/i386.c (ix86_autovectorize_vector_sizes):
> > Autovectorize 8-byte vectors for TARGET_MMX_WITH_SSE.
>
> The patch isn't correct if TARGET_MMX_WITH_SSE, but not TARGET_AVX, because
> in that case it will push only that 8 and nothing else, while you really
> want to have 16 and 8 in that order, so that it tries to vectorize first
> with 16-byte vectors and fall back to 8-byte.  The hook is supposed to
> either push nothing at all, then only one vector size is tried,
> one derived from preferred_simd_mode, or push all possible vectorization
> sizes to be tried.

Thanks for the explanation and the patch!

Yes, the patch works OK. I'll regression test it and push it later today.

Thanks,
Uros.

> The following patch fixes the failures:
>
> --- gcc/config/i386/i386.c.jj   2019-06-26 09:15:53.474869259 +0200
> +++ gcc/config/i386/i386.c  2019-06-26 10:42:01.354106012 +0200
> @@ -21401,6 +21401,11 @@ ix86_autovectorize_vector_sizes (vector_
>sizes->safe_push (16);
>sizes->safe_push (32);
>  }
> +  else if (TARGET_MMX_WITH_SSE)
> +sizes->safe_push (16);
> +
> +  if (TARGET_MMX_WITH_SSE)
> +sizes->safe_push (8);
>  }
>
>  /* Implemenation of targetm.vectorize.get_mask_mode.  */
>
>
> Jakub


Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-26 Thread Rainer Orth
Hi Hongtao,

> Index: testsuite/lib/target-supports.exp
> ===
> --- testsuite/lib/target-supports.exp (revision 272667)
> +++ testsuite/lib/target-supports.exp (working copy)
> @@ -7963,6 +7963,20 @@
>  } "-mavx512bw" ]
>  }
>
> +# Return 1 if avx512vp2intersect instructions can be compiled.
> +proc check_effective_target_avx512vp2intersect { } {
> +return [check_no_compiler_messages avx512vp2intersect object {
> + typedef int __v16si __attribute__ ((__vector_size__ (64)));
> + typedef short __mmask16;
> + void
> + _mm512_2intersect_epi32 (__v16si __A, __v16si __B, __mmask16 *__U,
> + __mmask16 *__M)
> + {
> + __builtin_ia32_2intersectd512 (__U, __M, (__v16si) __A, (__v16si) __B);
> + }
> +} "-mavx512vp2intersect" ]
> +}
> +
>  # Return 1 if avx512ifma instructions can be compiled.
>  proc check_effective_target_avx512ifma { } {
>  return [check_no_compiler_messages avx512ifma object {

as usual, the new effective-target keyword needs documenting in
sourcebuild.texi.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [SVE] [fwprop] PR88833 - Redundant moves for WHILELO-based loops

2019-06-26 Thread Prathamesh Kulkarni
On Tue, 25 Jun 2019 at 20:05, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Mon, 24 Jun 2019 at 21:41, Prathamesh Kulkarni
> >  wrote:
> >>
> >> On Mon, 24 Jun 2019 at 19:51, Richard Sandiford
> >>  wrote:
> >> >
> >> > Prathamesh Kulkarni  writes:
> >> > > @@ -1415,6 +1460,19 @@ forward_propagate_into (df_ref use)
> >> > >if (!def_set)
> >> > >  return false;
> >> > >
> >> > > +  if (reg_prop_only
> >> > > +  && !REG_P (SET_SRC (def_set))
> >> > > +  && !REG_P (SET_DEST (def_set)))
> >> > > +return false;
> >> >
> >> > This should be:
> >> >
> >> >   if (reg_prop_only
> >> >   && (!REG_P (SET_SRC (def_set)) || !REG_P (SET_DEST (def_set
> >> > return false;
> >> >
> >> > so that we return false if either operand isn't a register.
> >> Oops, sorry about that  -:(
> >> >
> >> > > +
> >> > > +  /* Allow propagations into a loop only for reg-to-reg copies, since
> >> > > + replacing one register by another shouldn't increase the cost.  
> >> > > */
> >> > > +
> >> > > +  if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father
> >> > > +  && !REG_P (SET_SRC (def_set))
> >> > > +  && !REG_P (SET_DEST (def_set)))
> >> > > +return false;
> >> >
> >> > Same here.
> >> >
> >> > OK with that change, thanks.
> >> Thanks for the review, will make the changes and commit the patch
> >> after re-testing.
> > Hi,
> > Testing the patch showed following failures on 32-bit x86:
> >
> >   Executed from: g++.target/i386/i386.exp
> > g++:g++.target/i386/pr88152.C   scan-assembler-not vpcmpgt|vpcmpeq|vpsra
> >   Executed from: gcc.target/i386/i386.exp
> > gcc:gcc.target/i386/pr66768.c scan-assembler add*.[ \t]%gs:
> > gcc:gcc.target/i386/pr90178.c scan-assembler-times xorl[\\t
> > ]*\\%eax,[\\t ]*%eax 1
> >
> > The failure of pr88152.C is also seen on x86_64.
> >
> > For pr66768.c, and pr90178.c, forwprop replaces register which is
> > volatile and frame related respectively.
> > To avoid that, the attached patch, makes a stronger constraint that
> > src and dest should be a register
> > and not have frame_related or volatil flags set, which is checked in
> > usable_reg_p().
> > Which avoids the failures for both the cases.
> > Does it look OK ?
>
> That's not the reason it's a bad transform.  In both cases we're
> propagating r2 <- r1 even though
>
> (a) r1 dies in the copy and
> (b) fwprop can't replace all uses of r2, because some have multiple
> definitions
>
> This has the effect of making both values live in cases where only one
> was previously.
>
> In the case of pr66768.c, fwprop2 is undoing the effect of
> cse.c:canon_reg, which tries to pick the best register to use
> (see cse.c:make_regs_eqv).  fwprop1 makes the same change,
> and made it even before the patch, but the cse.c choice should win.
>
> A (hopefully) conservative fix would be to propagate the copy only if
> both registers have a single definition, which you can test with:
>
>   (DF_REG_DEF_COUNT (regno) == 1
>&& !bitmap_bit_p (DF_LR_OUT (ENTRY_BLOCK_PTR_FOR_FN (m_fn)), regno))
>
> In that case, fwprop should see all uses of the destination, and should
> be able to replace it in all cases with the source.
Ah I see, thanks for the explanation!
The above check works to resolve the failure.
IIUC, !bitmap_bit_p (...) above checks that reg isn't used uninitialized ?
>
> > For g++.target/i386/pr88152.C, the issue is that after the patch,
> > forwprop1 does following propagation (in f10) which wasn't done
> > before:
> >
> > In insn 10, replacing
> >  (unspec:SI [
> > (reg:V2DF 91)
> > ] UNSPEC_MOVMSK)
> >  with (unspec:SI [
> > (subreg:V2DF (reg:V2DI 90) 0)
> > ] UNSPEC_MOVMSK)
> >
> > This later defeats combine because insn 9 gets deleted.
> > Without patch, the following combination takes place:
> >
> > Trying 7 -> 9:
> > 7: r90:V2DI=r89:V2DI>r93:V2DI
> >   REG_DEAD r93:V2DI
> >   REG_DEAD r89:V2DI
> > 9: r91:V2DF=r90:V2DI#0
> >   REG_DEAD r90:V2DI
> > Successfully matched this instruction:
> > (set (subreg:V2DI (reg:V2DF 91) 0)
> > (gt:V2DI (reg:V2DI 89)
> > (reg:V2DI 93)))
> > allowing combination of insns 7 and 9
> >
> > and then:
> > Trying 6, 9 -> 10:
> > 6: r89:V2DI=const_vector
> > 9: r91:V2DF#0=r89:V2DI>r93:V2DI
> >   REG_DEAD r89:V2DI
> >   REG_DEAD r93:V2DI
> >10: r87:SI=unspec[r91:V2DF] 43
> >   REG_DEAD r91:V2DF
> > Successfully matched this instruction:
> > (set (reg:SI 87)
> > (unspec:SI [
> > (lt:V2DF (reg:V2DI 93)
> > (const_vector:V2DI [
> > (const_int 0 [0]) repeated x2
> > ]))
> > ] UNSPEC_MOVMSK))
>
> Eh?  lt:*V2DF*?  Does that mean that it's 0 for false and an all-1 NaN
> for true?
Well looking at .optimized dump:

  vector(2) long int _2;
  vector(2) double _3;
  int _6;
  signed long _7;
  long int _8;
  signed long _9;
  long int _10;

   [local count: 1073741824]:
 

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-26 Thread Hongtao Liu
On Wed, Jun 26, 2019 at 5:21 PM Martin Liška  wrote:
>
> Hi.
>
> Started from r272668 I see:
>
> /tmp/ccqxwVjt.s: Assembler messages:
>
> /tmp/ccqxwVjt.s:22: Error: no such instruction: `vp2intersectq 
> .LC1(%rip),%zmm0,%k0'
>
> /tmp/ccqxwVjt.s:33: Error: no such instruction: `vp2intersectd 
> .LC3(%rip),%zmm0,%k0'
>
> compiler exited with status 1
> FAIL: gcc.target/i386/avx512vp2intersect-2intersect-1b.c (test for excess 
> errors)
> Excess errors:
> /tmp/ccqxwVjt.s:22: Error: no such instruction: `vp2intersectq 
> .LC1(%rip),%zmm0,%k0'
> /tmp/ccqxwVjt.s:33: Error: no such instruction: `vp2intersectd 
> .LC3(%rip),%zmm0,%k0'
>
> You'll need a dg-require detection I guess.
Yes, thank you.

>
> Thanks,
> Martin

Patch:
Index: testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c
===
--- testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c
(revision 272668)
+++ testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -mavx512vp2intersect" } */
+/* { dg-require-effective-target "avx512vp2intersect" } */

 #define AVX512F
 #include 
Index: testsuite/gcc.target/i386/avx512vp2intersect-2intersectvl-1b.c
===
--- testsuite/gcc.target/i386/avx512vp2intersect-2intersectvl-1b.c
(revision 272668)
+++ testsuite/gcc.target/i386/avx512vp2intersect-2intersectvl-1b.c
(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -mavx512vp2intersect -mavx512vl" } */
+/* { dg-require-effective-target "avx512vp2intersect" } */

 #define AVX512F
 #include 
Index: testsuite/lib/target-supports.exp
===
--- testsuite/lib/target-supports.exp (revision 272667)
+++ testsuite/lib/target-supports.exp (working copy)
@@ -7963,6 +7963,20 @@
 } "-mavx512bw" ]
 }

+# Return 1 if avx512vp2intersect instructions can be compiled.
+proc check_effective_target_avx512vp2intersect { } {
+return [check_no_compiler_messages avx512vp2intersect object {
+ typedef int __v16si __attribute__ ((__vector_size__ (64)));
+ typedef short __mmask16;
+ void
+ _mm512_2intersect_epi32 (__v16si __A, __v16si __B, __mmask16 *__U,
+ __mmask16 *__M)
+ {
+ __builtin_ia32_2intersectd512 (__U, __M, (__v16si) __A, (__v16si) __B);
+ }
+} "-mavx512vp2intersect" ]
+}
+
 # Return 1 if avx512ifma instructions can be compiled.
 proc check_effective_target_avx512ifma { } {
 return [check_no_compiler_messages avx512ifma object {



-- 
BR,
Hongtao


[PING][AArch64] Use scvtf fbits option where appropriate

2019-06-26 Thread Joel Hutton
Ping, plus minor rework (mostly non-functional changes)

gcc/ChangeLog:

2019-06-12  Joel Hutton  

 * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow2_recip): New 
prototype
 * config/aarch64/aarch64.c (aarch64_fpconst_pow2_recip): New function
 * config/aarch64/aarch64.md 
(*aarch64_cvtf2_mult): New pattern
 (*aarch64_cvtf2_mult): New pattern
 * config/aarch64/constraints.md (Dt): New constraint
 * config/aarch64/predicates.md (aarch64_fpconst_pow2_recip): New 
predicate

gcc/testsuite/ChangeLog:

2019-06-12  Joel Hutton  

 * gcc.target/aarch64/fmul_scvtf_1.c: New test.

Bootstrapped and regression tested on aarch64-linux-none target.

From e866ce55c9febd92ab8e6314bf79b067085b2d1b Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Wed, 19 Jun 2019 17:24:38 +0100
Subject: [PATCH] SCVTF

---
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 gcc/config/aarch64/aarch64.c  |  23 +++
 gcc/config/aarch64/aarch64.md |  39 +
 gcc/config/aarch64/constraints.md |   7 +
 gcc/config/aarch64/predicates.md  |   4 +
 .../gcc.target/aarch64/fmul_scvtf_1.c | 140 ++
 6 files changed, 214 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmul_scvtf_1.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 1e3b1c91db1026a44f32b144a6e97398c0659feb..ad1ba458a3fa081d83acf806776e911aa789b5d0 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -494,6 +494,7 @@ enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
 int aarch64_fpconst_pow_of_2 (rtx);
+int aarch64_fpconst_pow2_recip (rtx);
 machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 		   machine_mode);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9a035dd9ed8665274249581f8c404d18ae72e873..d88716576850eedd1070de108da152838c127c36 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18707,6 +18707,29 @@ aarch64_fpconst_pow_of_2 (rtx x)
   return exact_log2 (real_to_integer (r));
 }
 
+/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a
+   power of 2 (i.e 1/2^n) return the number of float bits. e.g. for x==(1/2^n)
+   return n.  Otherwise return -1.  */
+
+int
+aarch64_fpconst_pow2_recip (rtx x)
+{
+  REAL_VALUE_TYPE r0;
+
+  if (!CONST_DOUBLE_P (x))
+return -1;
+
+  r0 = *CONST_DOUBLE_REAL_VALUE (x);
+  if (exact_real_inverse (DFmode, )
+  && !REAL_VALUE_NEGATIVE (r0))
+{
+	int ret = exact_log2 (real_to_integer ());
+	if (ret >= 1 && ret <= 32)
+	return ret;
+}
+  return -1;
+}
+
 /* If X is a vector of equal CONST_DOUBLE values and that value is
Y, return the aarch64_fpconst_pow_of_2 of Y.  Otherwise return -1.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 526c7fb0dabc540065d77d4a7922aeca16a402aa..0ccd5de3d807f079614b0076ac439c1cb8e56ab8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6016,6 +6016,44 @@
   [(set_attr "type" "f_cvtf2i")]
 )
 
+;; Equal width integer to fp and multiply combine.
+(define_insn "*aarch64_cvtf2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w,w")
+	(mult:GPF (FLOATUORS:GPF
+		   (match_operand: 1 "register_operand" "w,?r"))
+		   (match_operand:GPF 2 "aarch64_fp_pow2_recip" "Dt,Dt")))]
+  "TARGET_FLOAT"
+  {
+operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
+switch (which_alternative)
+{
+  case 0:
+	return "cvtf\t%0, %1, #%2";
+  case 1:
+	return "cvtf\t%0, %1, #%2";
+  default:
+	gcc_unreachable ();
+}
+  }
+  [(set_attr "type" "neon_int_to_fp_,f_cvti2f")
+   (set_attr "arch" "simd,fp")]
+)
+
+;; Unequal width integer to fp and multiply combine.
+(define_insn "*aarch64_cvtf2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w")
+	(mult:GPF (FLOATUORS:GPF
+		   (match_operand: 1 "register_operand" "r"))
+		   (match_operand:GPF 2 "aarch64_fp_pow2_recip" "Dt")))]
+  "TARGET_FLOAT"
+  {
+operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
+return "cvtf\t%0, %1, #%2";
+  }
+  [(set_attr "type" "f_cvti2f")]
+)
+
+;; Equal width integer to fp conversion.
 (define_insn "2"
   [(set (match_operand:GPF 0 "register_operand" "=w,w")
 (FLOATUORS:GPF (match_operand: 1 "register_operand" "w,?r")))]
@@ -6027,6 +6065,7 @@
(set_attr "arch" "simd,fp")]
 )
 
+;; Unequal width integer to fp conversions.
 (define_insn "2"
   [(set (match_operand:GPF 0 "register_operand" "=w")
 (FLOATUORS:GPF (match_operand: 1 "register_operand" "r")))]
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 

Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-26 Thread Martin Liška
Hi.

Started from r272668 I see:

/tmp/ccqxwVjt.s: Assembler messages:

/tmp/ccqxwVjt.s:22: Error: no such instruction: `vp2intersectq 
.LC1(%rip),%zmm0,%k0'

/tmp/ccqxwVjt.s:33: Error: no such instruction: `vp2intersectd 
.LC3(%rip),%zmm0,%k0'

compiler exited with status 1
FAIL: gcc.target/i386/avx512vp2intersect-2intersect-1b.c (test for excess 
errors)
Excess errors:
/tmp/ccqxwVjt.s:22: Error: no such instruction: `vp2intersectq 
.LC1(%rip),%zmm0,%k0'
/tmp/ccqxwVjt.s:33: Error: no such instruction: `vp2intersectd 
.LC3(%rip),%zmm0,%k0'

You'll need a dg-require detection I guess.

Thanks,
Martin


Re: [PATCH 05/30] Changes to arm

2019-06-26 Thread Richard Earnshaw
On 25/06/2019 21:22, acsaw...@linux.ibm.com wrote:
> From: Aaron Sawdey 
> 
>   * config/arm/arm-protos.h: Change movmem to cpymem in names.
>   * config/arm/arm.c (arm_movmemqi_unaligned, arm_gen_movmemqi,
>   gen_movmem_ldrd_strd, thumb_expand_movmemqi) Change movmem to cpymem.
>   * config/arm/arm.md (movmemqi): Change movmem to cpymem.

OK.

R.

> ---
>  gcc/config/arm/arm-protos.h |  6 +++---
>  gcc/config/arm/arm.c| 18 +-
>  gcc/config/arm/arm.md   |  8 
>  gcc/config/arm/thumb1.md|  4 ++--
>  4 files changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 485bc68..bf2bf1c 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -126,8 +126,8 @@ extern bool offset_ok_for_ldrd_strd (HOST_WIDE_INT);
>  extern bool operands_ok_ldrd_strd (rtx, rtx, rtx, HOST_WIDE_INT, bool, bool);
>  extern bool gen_operands_ldrd_strd (rtx *, bool, bool, bool);
>  extern bool valid_operands_ldrd_strd (rtx *, bool);
> -extern int arm_gen_movmemqi (rtx *);
> -extern bool gen_movmem_ldrd_strd (rtx *);
> +extern int arm_gen_cpymemqi (rtx *);
> +extern bool gen_cpymem_ldrd_strd (rtx *);
>  extern machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
>  extern machine_mode arm_select_dominance_cc_mode (rtx, rtx,
>  HOST_WIDE_INT);
> @@ -203,7 +203,7 @@ extern void thumb2_final_prescan_insn (rtx_insn *);
>  extern const char *thumb_load_double_from_address (rtx *);
>  extern const char *thumb_output_move_mem_multiple (int, rtx *);
>  extern const char *thumb_call_via_reg (rtx);
> -extern void thumb_expand_movmemqi (rtx *);
> +extern void thumb_expand_cpymemqi (rtx *);
>  extern rtx arm_return_addr (int, rtx);
>  extern void thumb_reload_out_hi (rtx *);
>  extern void thumb_set_return_address (rtx, rtx);
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index e3e71ea..820502a 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -14385,7 +14385,7 @@ arm_block_move_unaligned_loop (rtx dest, rtx src, 
> HOST_WIDE_INT length,
> core type, optimize_size setting, etc.  */
>  
>  static int
> -arm_movmemqi_unaligned (rtx *operands)
> +arm_cpymemqi_unaligned (rtx *operands)
>  {
>HOST_WIDE_INT length = INTVAL (operands[2]);
>
> @@ -14422,7 +14422,7 @@ arm_movmemqi_unaligned (rtx *operands)
>  }
>  
>  int
> -arm_gen_movmemqi (rtx *operands)
> +arm_gen_cpymemqi (rtx *operands)
>  {
>HOST_WIDE_INT in_words_to_go, out_words_to_go, last_bytes;
>HOST_WIDE_INT srcoffset, dstoffset;
> @@ -14436,7 +14436,7 @@ arm_gen_movmemqi (rtx *operands)
>  return 0;
>  
>if (unaligned_access && (INTVAL (operands[3]) & 3) != 0)
> -return arm_movmemqi_unaligned (operands);
> +return arm_cpymemqi_unaligned (operands);
>  
>if (INTVAL (operands[3]) & 3)
>  return 0;
> @@ -14570,7 +14570,7 @@ arm_gen_movmemqi (rtx *operands)
>return 1;
>  }
>  
> -/* Helper for gen_movmem_ldrd_strd. Increase the address of memory rtx
> +/* Helper for gen_cpymem_ldrd_strd. Increase the address of memory rtx
>  by mode size.  */
>  inline static rtx
>  next_consecutive_mem (rtx mem)
> @@ -14585,7 +14585,7 @@ next_consecutive_mem (rtx mem)
>  /* Copy using LDRD/STRD instructions whenever possible.
> Returns true upon success. */
>  bool
> -gen_movmem_ldrd_strd (rtx *operands)
> +gen_cpymem_ldrd_strd (rtx *operands)
>  {
>unsigned HOST_WIDE_INT len;
>HOST_WIDE_INT align;
> @@ -14629,7 +14629,7 @@ gen_movmem_ldrd_strd (rtx *operands)
>  
>/* If we cannot generate any LDRD/STRD, try to generate LDM/STM.  */
>if (!(dst_aligned || src_aligned))
> -return arm_gen_movmemqi (operands);
> +return arm_gen_cpymemqi (operands);
>  
>/* If the either src or dst is unaligned we'll be accessing it as pairs
>   of unaligned SImode accesses.  Otherwise we can generate DImode
> @@ -26395,7 +26395,7 @@ thumb_call_via_reg (rtx reg)
>  
>  /* Routines for generating rtl.  */
>  void
> -thumb_expand_movmemqi (rtx *operands)
> +thumb_expand_cpymemqi (rtx *operands)
>  {
>rtx out = copy_to_mode_reg (SImode, XEXP (operands[0], 0));
>rtx in  = copy_to_mode_reg (SImode, XEXP (operands[1], 0));
> @@ -26404,13 +26404,13 @@ thumb_expand_movmemqi (rtx *operands)
>  
>while (len >= 12)
>  {
> -  emit_insn (gen_movmem12b (out, in, out, in));
> +  emit_insn (gen_cpymem12b (out, in, out, in));
>len -= 12;
>  }
>  
>if (len >= 8)
>  {
> -  emit_insn (gen_movmem8b (out, in, out, in));
> +  emit_insn (gen_cpymem8b (out, in, out, in));
>len -= 8;
>  }
>  
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index ae58217..a7fa410 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -7250,7 +7250,7 @@
>  ;; We could let this apply for blocks of less than this, but it clobbers so
>  ;; many registers that there is 

Re: [PATCH 02/30] Changes for aarch64

2019-06-26 Thread Richard Earnshaw (lists)
On 25/06/2019 21:22, acsaw...@linux.ibm.com wrote:
> From: Aaron Sawdey 
> 
>   * config/aarch64/aarch64-protos.h: Change movmem to cpymem.
>   * config/aarch64/aarch64.c (aarch64_expand_movmem): Change movmem
>   to cpymem.
>   * config/aarch64/aarch64.h: Change movmem to cpymem.
>   * config/aarch64/aarch64.md (movmemdi): Change name to cpymemdi.

OK.

R.

> ---
>  gcc/config/aarch64/aarch64-protos.h | 4 ++--
>  gcc/config/aarch64/aarch64.c| 4 ++--
>  gcc/config/aarch64/aarch64.h| 2 +-
>  gcc/config/aarch64/aarch64.md   | 6 +++---
>  4 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 4b20796..e2f4cc1 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -424,12 +424,12 @@ bool aarch64_constant_address_p (rtx);
>  bool aarch64_emit_approx_div (rtx, rtx, rtx);
>  bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
>  void aarch64_expand_call (rtx, rtx, bool);
> -bool aarch64_expand_movmem (rtx *);
> +bool aarch64_expand_cpymem (rtx *);
>  bool aarch64_float_const_zero_rtx_p (rtx);
>  bool aarch64_float_const_rtx_p (rtx);
>  bool aarch64_function_arg_regno_p (unsigned);
>  bool aarch64_fusion_enabled_p (enum aarch64_fusion_pairs);
> -bool aarch64_gen_movmemqi (rtx *);
> +bool aarch64_gen_cpymemqi (rtx *);
>  bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *);
>  bool aarch64_is_extend_from_extract (scalar_int_mode, rtx, rtx);
>  bool aarch64_is_long_call_p (rtx);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 285ae1c..5a923ca 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -17386,11 +17386,11 @@ aarch64_copy_one_block_and_progress_pointers (rtx 
> *src, rtx *dst,
>*dst = aarch64_progress_pointer (*dst);
>  }
>  
> -/* Expand movmem, as if from a __builtin_memcpy.  Return true if
> +/* Expand cpymem, as if from a __builtin_memcpy.  Return true if
> we succeed, otherwise return false.  */
>  
>  bool
> -aarch64_expand_movmem (rtx *operands)
> +aarch64_expand_cpymem (rtx *operands)
>  {
>int n, mode_bits;
>rtx dst = operands[0];
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index bf06caa..92e38a8 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -855,7 +855,7 @@ typedef struct
>  /* MOVE_RATIO dictates when we will use the move_by_pieces infrastructure.
> move_by_pieces will continually copy the largest safe chunks.  So a
> 7-byte copy is a 4-byte + 2-byte + byte copy.  This proves inefficient
> -   for both size and speed of copy, so we will instead use the "movmem"
> +   for both size and speed of copy, so we will instead use the "cpymem"
> standard name to implement the copy.  This logic does not apply when
> targeting -mstrict-align, so keep a sensible default in that case.  */
>  #define MOVE_RATIO(speed) \
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 91e46cf..7026b3a 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1375,17 +1375,17 @@
>  
>  ;; 0 is dst
>  ;; 1 is src
> -;; 2 is size of move in bytes
> +;; 2 is size of copy in bytes
>  ;; 3 is alignment
>  
> -(define_expand "movmemdi"
> +(define_expand "cpymemdi"
>[(match_operand:BLK 0 "memory_operand")
> (match_operand:BLK 1 "memory_operand")
> (match_operand:DI 2 "immediate_operand")
> (match_operand:DI 3 "immediate_operand")]
> "!STRICT_ALIGNMENT"
>  {
> -  if (aarch64_expand_movmem (operands))
> +  if (aarch64_expand_cpymem (operands))
>  DONE;
>FAIL;
>  }
> 



Re: [PATCH 01/30] Changes to machine independent code

2019-06-26 Thread Richard Earnshaw
On 26/06/2019 09:36, Richard Sandiford wrote:
> Jeff Law  writes:
>> On 6/25/19 2:22 PM, acsaw...@linux.ibm.com wrote:
>>> From: Aaron Sawdey 
>>>
>>> * builtins.c (get_memory_rtx): Fix comment.
>>> * optabs.def (movmem_optab): Change to cpymem_optab.
>>> * expr.c (emit_block_move_via_cpymem): Change movmem to cpymem.
>>> (emit_block_move_hints): Change movmem to cpymem.
>>> * defaults.h: Change movmem to cpymem.
>>> * targhooks.c (get_move_ratio): Change movmem to cpymem.
>>> (default_use_by_pieces_infrastructure_p): Ditto.
>> So I think you're missing an update to the RTL/MD documentation.  This
>> is also likely to cause problems for any out-of-tree ports, so it's
>> probably worth a mention in the gcc-10 changes, which will need to be
>> created (in CVS no less, ugh).
> 
> Mentioning changes like this could give a false sense that the release
> notes are the best place to look.  I can think of quite a few changes
> in this line that don't get mentioned in release notes. :-)
> 
> Diffing the texi files is probably more reliable (but would still miss
> things like Wilco's recent buitin_setjmp/longjmp change, which could
> also be relevant to out-of-tree ports).
> 
> Richard
> 

Simply renaming an API is generally fine.  Out of tree ports will fail
to build and the fixes are generally simple at that point.

Adding a new API with the old name is where the problem usually lies,
since now the OOT port will still build but have changed semantics.

R.


Re: [PATCH] Define midpoint and lerp functions for C++20 (P0811R3)

2019-06-26 Thread Jonathan Wakely

On 25/06/19 21:55 +0200, Rainer Orth wrote:

Hi Jonathan,


Doh, I looked in  and saw that we get std::abs(double) from the
Solaris headers, and then forgot and used it anyway.

I'll replace that right away, thanks.


Should be fixed at r272653.

Tested x86_64-linux, committed to trunk.


it did indeed.

Thanks a lot for the super-quick fix.


I already had the fix in a branch, so should never have committed the
broken version in the first place!



Re: [PATCH] Move rust_{is_mangled,demangle_sym} to a private libiberty header.

2019-06-26 Thread Eduard-Mihai Burtescu
Bootstrapped and tested on x86_64-unknown-linux-gnu.

(Apologies for the delay, while I was able to run libiberty tests back when I 
submitted the patch, I wanted to make sure I can run the whole GCC testsuite, 
especially for more significant future contributions, so I had to wait until I 
had the time to troubleshoot the NixOS support for GCC's make check)

Thanks,
- Eddy B.


On Mon, Jun 3, 2019, at 7:23 AM, Ian Lance Taylor wrote:
> On Sat, Jun 1, 2019 at 7:15 AM Eduard-Mihai Burtescu  wrote:
> >
> > 2019-06-01 Eduard-Mihai Burtescu 
> > include/ChangeLog:
> > * demangle.h (rust_is_mangled): Move to libiberty/rust-demangle.h.
> > (rust_demangle_sym): Move to libiberty/rust-demangle.h.
> > libiberty/ChangeLog:
> > * cplus-dem.c: Include rust-demangle.h.
> > * rust-demangle.c: Include rust-demangle.h.
> > * rust-demangle.h: New file.
> 
> This is OK if it bootstraps and tests pass.
> 
> Thanks.
> 
> Ian
> 


Re: [RFC PATCH, i386]: Autovectorize 8-byte vectors

2019-06-26 Thread Jakub Jelinek
On Wed, Jun 26, 2019 at 10:17:26AM +0200, Uros Bizjak wrote:
> Please note that the patch regresses
> 
> FAIL: gcc.target/i386/sse2-vect-simd-11.c scan-tree-dump-times vect
> "vectorized [1-3] loops" 2
> FAIL: gcc.target/i386/sse2-vect-simd-15.c scan-tree-dump-times vect
> "vectorized [1-3] loops" 2
> 
> For some reason, the compiler decides to vectorize with 8-byte
> vectors, resulting in:
> 
> missed:   not vectorized: relevant stmt not supported: _8 = (short
> unsigned int) _4;
> missed:  bad operation or unsupported loop bound.
> missed: couldn't vectorize loop
> 
> However, the unpatched compiler is able to vectorize loop using
> 16-byte vectors. It looks that the compiler should re-run
> vectorization with wider vectors, if vectorization with narrower
> vectors fails. Jakub, Richard, do you have any insight in this issue?
> 
> 2019-06-26  Uroš Bizjak  
> 
> * config/i386/i386.c (ix86_autovectorize_vector_sizes):
> Autovectorize 8-byte vectors for TARGET_MMX_WITH_SSE.

The patch isn't correct if TARGET_MMX_WITH_SSE, but not TARGET_AVX, because
in that case it will push only that 8 and nothing else, while you really
want to have 16 and 8 in that order, so that it tries to vectorize first
with 16-byte vectors and fall back to 8-byte.  The hook is supposed to
either push nothing at all, then only one vector size is tried,
one derived from preferred_simd_mode, or push all possible vectorization
sizes to be tried.

The following patch fixes the failures:

--- gcc/config/i386/i386.c.jj   2019-06-26 09:15:53.474869259 +0200
+++ gcc/config/i386/i386.c  2019-06-26 10:42:01.354106012 +0200
@@ -21401,6 +21401,11 @@ ix86_autovectorize_vector_sizes (vector_
   sizes->safe_push (16);
   sizes->safe_push (32);
 }
+  else if (TARGET_MMX_WITH_SSE)
+sizes->safe_push (16);
+
+  if (TARGET_MMX_WITH_SSE)
+sizes->safe_push (8);
 }
 
 /* Implemenation of targetm.vectorize.get_mask_mode.  */


Jakub


Re: [RFC PATCH, i386]: Autovectorize 8-byte vectors

2019-06-26 Thread Uros Bizjak
On Wed, Jun 26, 2019 at 10:36 AM Richard Biener  wrote:
>
> On June 26, 2019 10:25:44 AM GMT+02:00, Uros Bizjak  wrote:
> >On Wed, Jun 26, 2019 at 10:17 AM Uros Bizjak  wrote:
> >>
> >> Now that TARGET_MMX_WITH_SSE is implemented, the compiler should be
> >> able to auto-vectorize:
> >
> >On a related note, following slightly changed testcase:
> >
> >void
> >foo (char *restrict r, char *restrict a)
> >{
> >  for (int i = 0; i < 24; i++)
> >r[i] += a[i];
> >}
> >
> >compiles to:
> >
> >foo:
> >vmovdqu (%rdi), %xmm1
> >vpaddb  (%rsi), %xmm1, %xmm0
> >movzbl  16(%rsi), %eax
> >addb%al, 16(%rdi)
> >vmovups %xmm0, (%rdi)
> >movzbl  17(%rsi), %eax
> >addb%al, 17(%rdi)
> >movzbl  18(%rsi), %eax
> >addb%al, 18(%rdi)
> >movzbl  19(%rsi), %eax
> >addb%al, 19(%rdi)
> >movzbl  20(%rsi), %eax
> >addb%al, 20(%rdi)
> >movzbl  21(%rsi), %eax
> >addb%al, 21(%rdi)
> >movzbl  22(%rsi), %eax
> >addb%al, 22(%rdi)
> >movzbl  23(%rsi), %eax
> >addb%al, 23(%rdi)
> >ret
> >
> >One would expect that the remaining 8-byte array would also get
> >vectorized, resulting in one 16-byte operation and one 8-byte
> >operation.
>
> Try - - param vect-epilogue-nomask=1 (or so).

Yes, this (--param vect-epilogues-nomask=1) works!

foo:
movdqu  (%rdi), %xmm0
movdqu  (%rsi), %xmm2
movq16(%rsi), %xmm1
paddb   %xmm2, %xmm0
movups  %xmm0, (%rdi)
movq16(%rdi), %xmm0
paddb   %xmm1, %xmm0
movq%xmm0, 16(%rdi)
ret

Thanks,
Uros.


Re: [PATCH 0/3] RFC: Let debug stmts influence codegen@-Og

2019-06-26 Thread Jonathan Wakely

On 23/06/19 14:51 +0100, Richard Sandiford wrote:

Also, the new mode is mostly orthogonal to the optimisation level
(although it would in effect disable optimisations like loop
vectorisation, until we have a way of representing debug info for
vectorised loops).  The third patch therefore adds an -O1g option
that optimises more heavily than -Og but provides a better debug
experience than -O1.


I think it would be confusing to have -O and -O1 mean the same, but
-Og and -O1g mean something different.

Maybe another name could avoid that, e.g. appending +g to signify the
new modes, so -O+g and -O1+g would mean the same thing.


I think -O2g would make sense too, and would be a viable option
for people who want to deploy relatively heavily optimised binaries
without compromising the debug experience too much.


Which would be -O2+g using the naming scheme above.

If the mode is orthogonal to optimisation level I think this is
clearer, because you can have +g appended to any level, -Os+g, maybe
even -Og+g ;-)



Re: [PATCH 01/30] Changes to machine independent code

2019-06-26 Thread Richard Sandiford
Jeff Law  writes:
> On 6/25/19 2:22 PM, acsaw...@linux.ibm.com wrote:
>> From: Aaron Sawdey 
>> 
>>  * builtins.c (get_memory_rtx): Fix comment.
>>  * optabs.def (movmem_optab): Change to cpymem_optab.
>>  * expr.c (emit_block_move_via_cpymem): Change movmem to cpymem.
>>  (emit_block_move_hints): Change movmem to cpymem.
>>  * defaults.h: Change movmem to cpymem.
>>  * targhooks.c (get_move_ratio): Change movmem to cpymem.
>>  (default_use_by_pieces_infrastructure_p): Ditto.
> So I think you're missing an update to the RTL/MD documentation.  This
> is also likely to cause problems for any out-of-tree ports, so it's
> probably worth a mention in the gcc-10 changes, which will need to be
> created (in CVS no less, ugh).

Mentioning changes like this could give a false sense that the release
notes are the best place to look.  I can think of quite a few changes
in this line that don't get mentioned in release notes. :-)

Diffing the texi files is probably more reliable (but would still miss
things like Wilco's recent buitin_setjmp/longjmp change, which could
also be relevant to out-of-tree ports).

Richard


Re: [RFC PATCH, i386]: Autovectorize 8-byte vectors

2019-06-26 Thread Richard Biener
On June 26, 2019 10:25:44 AM GMT+02:00, Uros Bizjak  wrote:
>On Wed, Jun 26, 2019 at 10:17 AM Uros Bizjak  wrote:
>>
>> Now that TARGET_MMX_WITH_SSE is implemented, the compiler should be
>> able to auto-vectorize:
>
>On a related note, following slightly changed testcase:
>
>void
>foo (char *restrict r, char *restrict a)
>{
>  for (int i = 0; i < 24; i++)
>r[i] += a[i];
>}
>
>compiles to:
>
>foo:
>vmovdqu (%rdi), %xmm1
>vpaddb  (%rsi), %xmm1, %xmm0
>movzbl  16(%rsi), %eax
>addb%al, 16(%rdi)
>vmovups %xmm0, (%rdi)
>movzbl  17(%rsi), %eax
>addb%al, 17(%rdi)
>movzbl  18(%rsi), %eax
>addb%al, 18(%rdi)
>movzbl  19(%rsi), %eax
>addb%al, 19(%rdi)
>movzbl  20(%rsi), %eax
>addb%al, 20(%rdi)
>movzbl  21(%rsi), %eax
>addb%al, 21(%rdi)
>movzbl  22(%rsi), %eax
>addb%al, 22(%rdi)
>movzbl  23(%rsi), %eax
>addb%al, 23(%rdi)
>ret
>
>One would expect that the remaining 8-byte array would also get
>vectorized, resulting in one 16-byte operation and one 8-byte
>operation.

Try - - param vect-epilogue-nomask=1 (or so). 

Richard. 

>Uros.



Re: [PATCH] Move rust_{is_mangled,demangle_sym} to a private libiberty header.

2019-06-26 Thread Eduard-Mihai Burtescu
Hi Mark,

Valgrind is definitely on my upstreaming list, alongside GDB, LLDB and Linux 
perf.

You can see the preliminary version here: 
https://gist.github.com/eddyb/c41a69378750a433767cf53fe2316768 (do not use it 
yet, I still want to tweak it a bit more before upstreaming it, soon, and I 
want it to go through the GCC/GDB review process).
You'll be able to either modify it, to replace the malloc/realloc/free calls, 
or use rust_demangle_with_callback and use your own buffer (or directly print 
the demangling, if that's all you need).

Feel free to contact me outside of this list, at this email or on IRC (eddyb on 
Freenode and OFTC), if you want to further discuss the details of upstreaming 
the new demangler to Valgrind.

- Eddy B.


On Tue, Jun 4, 2019, at 11:29 AM, Mark Wielaard wrote:
> On Sat, 2019-06-01 at 17:14 +0300, Eduard-Mihai Burtescu wrote:
> > When libiberty/rust-demangle.c was initially added, its two exports,
> > rust_is_mangled and rust_demangle_sym, made it to include/demangle.h.
> > However, these two functions are merely implementation details of
> > cplus_demangle and rust_demangle, only the latter should be public.
> > 
> > This is becoming a problem, because the new Rust mangling scheme
> > does not fit this "postprocess after C++ demangling" API at all,
> > so rust_demangle_sym would forever be stuck supporting only the
> > legacy mangling, whereas rust_demangle can easily handle both
> > (the new version of which I plan to upstream soon).
> > 
> > I'm hoping that libiberty doesn't have strict backwards-compat
> > requirements, so that we can hide these two functions.
> > Also, as far as I'm aware, nobody is using them in the wild.
> 
> valgrind uses an embedded copy of the libiberty demangler (slightly
> changed to use valgrind's internal memory allocation scheme) which does
> use these functions directly:
> 
> https://sourceware.org/git/?p=valgrind.git;a=blob;f=coregrind/m_demangle/demangle.c;hb=HEAD#l153
> But we could of course just include the "private" header instead, when
> we next sync up with libiberty.
> 
> We use these functions directly precisely because the rust demangling
> scheme is (currently) based on top of the traditional _Z C++ demangling
> scheme and we know that it will be done "in place". If there is a new
> Rust demangling scheme that doesn't have that property we'll have to
> adopt to a different demangling scheme in the future. Any help with
> that appreciated. valgrind has been useful for combined c/c++/rust
> programs.
> 
> Cheers,
> 
> Mark
> 


Re: [RFC PATCH, i386]: Autovectorize 8-byte vectors

2019-06-26 Thread Richard Biener
On June 26, 2019 10:17:26 AM GMT+02:00, Uros Bizjak  wrote:
>Now that TARGET_MMX_WITH_SSE is implemented, the compiler should be
>able to auto-vectorize:
>
>void
>foo (char *restrict r, char *restrict a)
>{
>  for (int i = 0; i < 8; i++)
>r[i] += a[i];
>}
>
>Attached patch enables the conversion and produces:
>
>foo:
>movq(%rdi), %xmm1
>movq(%rsi), %xmm0
>paddb   %xmm1, %xmm0
>movq%xmm0, (%rdi)
>ret
>
>Please note that the patch regresses
>
>FAIL: gcc.target/i386/sse2-vect-simd-11.c scan-tree-dump-times vect
>"vectorized [1-3] loops" 2
>FAIL: gcc.target/i386/sse2-vect-simd-15.c scan-tree-dump-times vect
>"vectorized [1-3] loops" 2
>
>For some reason, the compiler decides to vectorize with 8-byte
>vectors, resulting in:
>
>missed:   not vectorized: relevant stmt not supported: _8 = (short
>unsigned int) _4;
>missed:  bad operation or unsupported loop bound.
>missed: couldn't vectorize loop
>
>However, the unpatched compiler is able to vectorize loop using
>16-byte vectors. It looks that the compiler should re-run
>vectorization with wider vectors, if vectorization with narrower
>vectors fails. Jakub, Richard, do you have any insight in this issue?

Double check the ordering of the vector size pushes - it should already iterate 
but first successful wins. 

Richard. 

>2019-06-26  Uroš Bizjak  
>
>* config/i386/i386.c (ix86_autovectorize_vector_sizes):
>Autovectorize 8-byte vectors for TARGET_MMX_WITH_SSE.
>
>testsuite/ChangeLog:
>
>2019-06-26  Uroš Bizjak  
>
>* lib/target-supports.exp (available_vector_sizes)
><[istarget i?86-*-*] || [istarget x86_64-*-*]>: Add
>64-bit vectors for !ia32.
>
>The patch was bootstrapped and regression tested on x86_64-linux-gnu
>{,-m32}.
>
>Uros.



Re: [RFC PATCH, i386]: Autovectorize 8-byte vectors

2019-06-26 Thread Uros Bizjak
On Wed, Jun 26, 2019 at 10:17 AM Uros Bizjak  wrote:
>
> Now that TARGET_MMX_WITH_SSE is implemented, the compiler should be
> able to auto-vectorize:

On a related note, following slightly changed testcase:

void
foo (char *restrict r, char *restrict a)
{
  for (int i = 0; i < 24; i++)
r[i] += a[i];
}

compiles to:

foo:
vmovdqu (%rdi), %xmm1
vpaddb  (%rsi), %xmm1, %xmm0
movzbl  16(%rsi), %eax
addb%al, 16(%rdi)
vmovups %xmm0, (%rdi)
movzbl  17(%rsi), %eax
addb%al, 17(%rdi)
movzbl  18(%rsi), %eax
addb%al, 18(%rdi)
movzbl  19(%rsi), %eax
addb%al, 19(%rdi)
movzbl  20(%rsi), %eax
addb%al, 20(%rdi)
movzbl  21(%rsi), %eax
addb%al, 21(%rdi)
movzbl  22(%rsi), %eax
addb%al, 22(%rdi)
movzbl  23(%rsi), %eax
addb%al, 23(%rdi)
ret

One would expect that the remaining 8-byte array would also get
vectorized, resulting in one 16-byte operation and one 8-byte
operation.

Uros.


Re: [PATCH] Fix AVX512* wrong-code due to *_vinsert_0 (PR target/90991)

2019-06-26 Thread Uros Bizjak
On Wed, Jun 26, 2019 at 10:11 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase is miscompiled starting with my PR85480 change.
> While it is perfectly fine to use "xm" or "vm" constraints for the source
> operand when the other operand is "C", we rely on the AVX/AVX512 behavior
> that most 128-bit or 256-bit vector instructions clear the upper bits of the
> 512-bit vectors, but if the source is in memory, we need to check if it is
> aligned or maybe misaligned and use corresponding aligned or unaligned loads
> accordingly, rather than always aligned ones.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk and 9.2?
>
> Note, I have a follow-up patch to improve avx_vec_concat, just will
> need to test it.
>
> 2019-06-26  Jakub Jelinek  
>
> PR target/90991
> * config/i386/sse.md
> (*_vinsert_0): Use vmovupd,
> vmovups, vmovdqu, vmovdqu32 or vmovdqu64 instead of the aligned
> insns if operands[2] is misaligned_operand.
>
> * gcc.target/i386/avx512dq-pr90991-1.c: New test.

OK, looks even obvious to me.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2019-06-21 08:43:17.734263742 +0200
> +++ gcc/config/i386/sse.md  2019-06-25 22:36:12.955476294 +0200
> @@ -13744,15 +13744,29 @@ (define_insn "*_vinsertswitch (mode)
>  {
>  case E_V8DFmode:
> -  return "vmovapd\t{%2, %x0|%x0, %2}";
> +  if (misaligned_operand (operands[2], mode))
> +   return "vmovupd\t{%2, %x0|%x0, %2}";
> +  else
> +   return "vmovapd\t{%2, %x0|%x0, %2}";
>  case E_V16SFmode:
> -  return "vmovaps\t{%2, %x0|%x0, %2}";
> +  if (misaligned_operand (operands[2], mode))
> +   return "vmovups\t{%2, %x0|%x0, %2}";
> +  else
> +   return "vmovaps\t{%2, %x0|%x0, %2}";
>  case E_V8DImode:
> -  return which_alternative == 2 ? "vmovdqa64\t{%2, %x0|%x0, %2}"
> -   : "vmovdqa\t{%2, %x0|%x0, %2}";
> +  if (misaligned_operand (operands[2], mode))
> +   return which_alternative == 2 ? "vmovdqu64\t{%2, %x0|%x0, %2}"
> + : "vmovdqu\t{%2, %x0|%x0, %2}";
> +  else
> +   return which_alternative == 2 ? "vmovdqa64\t{%2, %x0|%x0, %2}"
> + : "vmovdqa\t{%2, %x0|%x0, %2}";
>  case E_V16SImode:
> -  return which_alternative == 2 ? "vmovdqa32\t{%2, %x0|%x0, %2}"
> -   : "vmovdqa\t{%2, %x0|%x0, %2}";
> +  if (misaligned_operand (operands[2], mode))
> +   return which_alternative == 2 ? "vmovdqu32\t{%2, %x0|%x0, %2}"
> + : "vmovdqu\t{%2, %x0|%x0, %2}";
> +  else
> +   return which_alternative == 2 ? "vmovdqa32\t{%2, %x0|%x0, %2}"
> + : "vmovdqa\t{%2, %x0|%x0, %2}";
>  default:
>gcc_unreachable ();
>  }
> --- gcc/testsuite/gcc.target/i386/avx512dq-pr90991-1.c.jj   2019-06-25 
> 23:17:44.831272146 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512dq-pr90991-1.c  2019-06-25 
> 23:27:27.732357035 +0200
> @@ -0,0 +1,47 @@
> +/* PR target/90991 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512dq -masm=att" } */
> +/* { dg-final { scan-assembler-times "vmovaps\[ \t]\+\\(\[^\n\r]*\\), %xmm0" 
> 1 } } */
> +/* { dg-final { scan-assembler-times "vmovapd\[ \t]\+\\(\[^\n\r]*\\), %xmm0" 
> 1 } } */
> +/* { dg-final { scan-assembler-times "vmovdqa\[ \t]\+\\(\[^\n\r]*\\), %xmm0" 
> 1 } } */
> +/* { dg-final { scan-assembler-times "vmovups\[ \t]\+\\(\[^\n\r]*\\), %xmm0" 
> 1 } } */
> +/* { dg-final { scan-assembler-times "vmovupd\[ \t]\+\\(\[^\n\r]*\\), %xmm0" 
> 1 } } */
> +/* { dg-final { scan-assembler-times "vmovdqu\[ \t]\+\\(\[^\n\r]*\\), %xmm0" 
> 1 } } */
> +
> +#include 
> +
> +__m512
> +f1 (void *a)
> +{
> +  return _mm512_insertf32x4 (_mm512_set1_ps (0.0f), _mm_load_ps (a), 0);
> +}
> +
> +__m512d
> +f2 (void *a)
> +{
> +  return _mm512_insertf64x2 (_mm512_set1_pd (0.0), _mm_load_pd (a), 0);
> +}
> +
> +__m512i
> +f3 (void *a)
> +{
> +  return _mm512_inserti32x4 (_mm512_set1_epi32 (0), _mm_load_si128 (a), 0);
> +}
> +
> +__m512
> +f4 (void *a)
> +{
> +  return _mm512_insertf32x4 (_mm512_set1_ps (0.0f), _mm_loadu_ps (a), 0);
> +}
> +
> +__m512d
> +f5 (void *a)
> +{
> +  return _mm512_insertf64x2 (_mm512_set1_pd (0.0), _mm_loadu_pd (a), 0);
> +}
> +
> +__m512i
> +f6 (void *a)
> +{
> +  return _mm512_inserti32x4 (_mm512_set1_epi32 (0), _mm_loadu_si128 (a), 0);
> +}
>
> Jakub


[RFC PATCH, i386]: Autovectorize 8-byte vectors

2019-06-26 Thread Uros Bizjak
Now that TARGET_MMX_WITH_SSE is implemented, the compiler should be
able to auto-vectorize:

void
foo (char *restrict r, char *restrict a)
{
  for (int i = 0; i < 8; i++)
r[i] += a[i];
}

Attached patch enables the conversion and produces:

foo:
movq(%rdi), %xmm1
movq(%rsi), %xmm0
paddb   %xmm1, %xmm0
movq%xmm0, (%rdi)
ret

Please note that the patch regresses

FAIL: gcc.target/i386/sse2-vect-simd-11.c scan-tree-dump-times vect
"vectorized [1-3] loops" 2
FAIL: gcc.target/i386/sse2-vect-simd-15.c scan-tree-dump-times vect
"vectorized [1-3] loops" 2

For some reason, the compiler decides to vectorize with 8-byte
vectors, resulting in:

missed:   not vectorized: relevant stmt not supported: _8 = (short
unsigned int) _4;
missed:  bad operation or unsupported loop bound.
missed: couldn't vectorize loop

However, the unpatched compiler is able to vectorize loop using
16-byte vectors. It looks that the compiler should re-run
vectorization with wider vectors, if vectorization with narrower
vectors fails. Jakub, Richard, do you have any insight in this issue?

2019-06-26  Uroš Bizjak  

* config/i386/i386.c (ix86_autovectorize_vector_sizes):
Autovectorize 8-byte vectors for TARGET_MMX_WITH_SSE.

testsuite/ChangeLog:

2019-06-26  Uroš Bizjak  

* lib/target-supports.exp (available_vector_sizes)
<[istarget i?86-*-*] || [istarget x86_64-*-*]>: Add
64-bit vectors for !ia32.

The patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1ca1712183dc..24bd0896f137 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21401,6 +21401,9 @@ ix86_autovectorize_vector_sizes (vector_sizes *sizes, 
bool all)
   sizes->safe_push (16);
   sizes->safe_push (32);
 }
+
+  if (TARGET_MMX_WITH_SSE)
+sizes->safe_push (8);
 }
 
 /* Implemenation of targetm.vectorize.get_mask_mode.  */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 1d4aaa2a87ec..285c32f8cebb 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6603,9 +6603,14 @@ proc available_vector_sizes { } {
 } elseif { [istarget arm*-*-*]
&& [check_effective_target_arm_neon_ok] } {
lappend result 128 64
-} elseif { (([istarget i?86-*-*] || [istarget x86_64-*-*])
-&& ([check_avx_available] && ![check_prefer_avx128])) } {
-   lappend result 256 128
+} elseif { [istarget i?86-*-*] || [istarget x86_64-*-*] } {
+   if { [check_avx_available] && ![check_prefer_avx128] } {
+   lappend result 256
+   }
+   lappend result 128
+   if { ![is-effective-target ia32] } {
+   lappend result 64
+   }
 } elseif { [istarget sparc*-*-*] } {
lappend result 64
 } else {


[PATCH] Fix AVX512* wrong-code due to *_vinsert_0 (PR target/90991)

2019-06-26 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled starting with my PR85480 change.
While it is perfectly fine to use "xm" or "vm" constraints for the source
operand when the other operand is "C", we rely on the AVX/AVX512 behavior
that most 128-bit or 256-bit vector instructions clear the upper bits of the
512-bit vectors, but if the source is in memory, we need to check if it is
aligned or maybe misaligned and use corresponding aligned or unaligned loads
accordingly, rather than always aligned ones.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk and 9.2?

Note, I have a follow-up patch to improve avx_vec_concat, just will
need to test it.

2019-06-26  Jakub Jelinek  

PR target/90991
* config/i386/sse.md
(*_vinsert_0): Use vmovupd,
vmovups, vmovdqu, vmovdqu32 or vmovdqu64 instead of the aligned
insns if operands[2] is misaligned_operand.

* gcc.target/i386/avx512dq-pr90991-1.c: New test.

--- gcc/config/i386/sse.md.jj   2019-06-21 08:43:17.734263742 +0200
+++ gcc/config/i386/sse.md  2019-06-25 22:36:12.955476294 +0200
@@ -13744,15 +13744,29 @@ (define_insn "*_vinsertmode)
 {
 case E_V8DFmode:
-  return "vmovapd\t{%2, %x0|%x0, %2}";
+  if (misaligned_operand (operands[2], mode))
+   return "vmovupd\t{%2, %x0|%x0, %2}";
+  else
+   return "vmovapd\t{%2, %x0|%x0, %2}";
 case E_V16SFmode:
-  return "vmovaps\t{%2, %x0|%x0, %2}";
+  if (misaligned_operand (operands[2], mode))
+   return "vmovups\t{%2, %x0|%x0, %2}";
+  else
+   return "vmovaps\t{%2, %x0|%x0, %2}";
 case E_V8DImode:
-  return which_alternative == 2 ? "vmovdqa64\t{%2, %x0|%x0, %2}"
-   : "vmovdqa\t{%2, %x0|%x0, %2}";
+  if (misaligned_operand (operands[2], mode))
+   return which_alternative == 2 ? "vmovdqu64\t{%2, %x0|%x0, %2}"
+ : "vmovdqu\t{%2, %x0|%x0, %2}";
+  else
+   return which_alternative == 2 ? "vmovdqa64\t{%2, %x0|%x0, %2}"
+ : "vmovdqa\t{%2, %x0|%x0, %2}";
 case E_V16SImode:
-  return which_alternative == 2 ? "vmovdqa32\t{%2, %x0|%x0, %2}"
-   : "vmovdqa\t{%2, %x0|%x0, %2}";
+  if (misaligned_operand (operands[2], mode))
+   return which_alternative == 2 ? "vmovdqu32\t{%2, %x0|%x0, %2}"
+ : "vmovdqu\t{%2, %x0|%x0, %2}";
+  else
+   return which_alternative == 2 ? "vmovdqa32\t{%2, %x0|%x0, %2}"
+ : "vmovdqa\t{%2, %x0|%x0, %2}";
 default:
   gcc_unreachable ();
 }
--- gcc/testsuite/gcc.target/i386/avx512dq-pr90991-1.c.jj   2019-06-25 
23:17:44.831272146 +0200
+++ gcc/testsuite/gcc.target/i386/avx512dq-pr90991-1.c  2019-06-25 
23:27:27.732357035 +0200
@@ -0,0 +1,47 @@
+/* PR target/90991 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512dq -masm=att" } */
+/* { dg-final { scan-assembler-times "vmovaps\[ \t]\+\\(\[^\n\r]*\\), %xmm0" 1 
} } */
+/* { dg-final { scan-assembler-times "vmovapd\[ \t]\+\\(\[^\n\r]*\\), %xmm0" 1 
} } */
+/* { dg-final { scan-assembler-times "vmovdqa\[ \t]\+\\(\[^\n\r]*\\), %xmm0" 1 
} } */
+/* { dg-final { scan-assembler-times "vmovups\[ \t]\+\\(\[^\n\r]*\\), %xmm0" 1 
} } */
+/* { dg-final { scan-assembler-times "vmovupd\[ \t]\+\\(\[^\n\r]*\\), %xmm0" 1 
} } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \t]\+\\(\[^\n\r]*\\), %xmm0" 1 
} } */
+
+#include 
+
+__m512
+f1 (void *a)
+{
+  return _mm512_insertf32x4 (_mm512_set1_ps (0.0f), _mm_load_ps (a), 0);
+}
+
+__m512d
+f2 (void *a)
+{
+  return _mm512_insertf64x2 (_mm512_set1_pd (0.0), _mm_load_pd (a), 0);
+}
+
+__m512i
+f3 (void *a)
+{
+  return _mm512_inserti32x4 (_mm512_set1_epi32 (0), _mm_load_si128 (a), 0);
+}
+
+__m512
+f4 (void *a)
+{
+  return _mm512_insertf32x4 (_mm512_set1_ps (0.0f), _mm_loadu_ps (a), 0);
+}
+
+__m512d
+f5 (void *a)
+{
+  return _mm512_insertf64x2 (_mm512_set1_pd (0.0), _mm_loadu_pd (a), 0);
+}
+
+__m512i
+f6 (void *a)
+{
+  return _mm512_inserti32x4 (_mm512_set1_epi32 (0), _mm_loadu_si128 (a), 0);
+}

Jakub


Re: [PATCH] RISC-V: Add -malign-data= option.

2019-06-26 Thread Palmer Dabbelt

On Tue, 25 Jun 2019 15:58:53 PDT (-0700), pins...@gmail.com wrote:

On Tue, Jun 25, 2019 at 3:46 PM Ilia Diachkov
 wrote:


Hello,

This patch adds new machine specific option -malign-data={word,abi} to
RISC-V port. The option switches alignment of  global variables and
constants of array/record/union types. The default value
(-malign-data=word) keeps existing way of alignment computation. Another
option value (-malign-data=abi) makes data natural alignment. It avoids
extra spaces between data to reduce code size. The measured code size
reduction is about 0.4% at -Os on EEMBC automotive 1.1 tests and
SPEC2006 C/C++ benchmarks. The patch was tested in riscv-gnu-toolchain
by dejagnu.

Please check the patch into the trunk.


Hmm, may I suggest use "natural" rather than "abi" and 32bit or 64bit
rather than "word"; it is not obvious what abi means and it is not
obvious what word means here; it could be either 32bit or 64bit
depending on the option.


It's actually worse: in RISC-V "word" always means 32-bit (BITS_PER_WORD is a
GCC name).  "natural" seems like a good term for the "align to the size of the
type".  The RISC-V term for "the width of an integer register" is "xlen", so I
think that's a good bet for the other option.


Also my other suggestion is create a new macro where you pass
riscv_align_data_type == riscv_align_data_type_word for the "(ALIGN) <
BITS_PER_WORD) " check to reduce the code duplication.


Additionally, has this been tested with "-mstrict-align"?  The generated code
can be awful, but if it's not correct then we should throw an error on that
combination.



Thanks,
Andrew Pinski



Best regards,
Ilia.

gcc/
* config/riscv/riscv-opts.h (struct riscv_align_data): Added.
* config/riscv/riscv.c (riscv_constant_alignment): Use
riscv_align_data_type.
* config/riscv/riscv.h (DATA_ALIGNMENT): Use riscv_align_data_type.
(LOCAL_ALIGNMENT): Set to old DATA_ALIGNMENT value.
* config/riscv/riscv.opt (malign-data): New.
* doc/invoke.texi (RISC-V Options): Document -malign-data=.


Re: [PATCH] Add to same comdate group only if set (PR middle-end/90899)

2019-06-26 Thread Martin Liška
PING^1

On 6/18/19 10:58 AM, Martin Liška wrote:
> Hi.
> 
> The patch is quite obvious, it copies the same what we do in
> another IPA passes.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-06-18  Martin Liska  
> 
>   PR middle-end/90899
>   * multiple_target.c (create_dispatcher_calls): Add to comdat
>   group only if set for ifunc.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-06-18  Martin Liska  
> 
>   PR middle-end/90899
>   * gcc.target/i386/pr90899.c: New test.
> ---
>  gcc/multiple_target.c   | 3 ++-
>  gcc/testsuite/gcc.target/i386/pr90899.c | 6 ++
>  2 files changed, 8 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90899.c
> 
> 



Re: [PATCH] Fix PR90914

2019-06-26 Thread Martin Liška
Hi.

Just for the record, the patch is responsible for a significant
debug info growth for 434.zeusmp with -O2 -flto and other options:

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=227.75.4

Thanks,
Martin


Re: [PATCH 12/30] Changes to i386

2019-06-26 Thread Uros Bizjak
On Tue, Jun 25, 2019 at 10:27 PM  wrote:
>
> From: Aaron Sawdey 
>
> * config/i386/i386-expand.c (expand_set_or_movmem_via_loop,
> expand_set_or_movmem_via_rep, expand_movmem_epilogue,
> expand_setmem_epilogue_via_loop, expand_set_or_cpymem_prologue,
> expand_small_cpymem_or_setmem,
> expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves,
> expand_set_or_cpymem_constant_prologue,
> ix86_expand_set_or_cpymem): Change movmem to cpymem.
> * config/i386/i386-protos.h: Change movmem to cpymem.
> * config/i386/i386.h: Change movmem to cpymem in comment.
> * config/i386/i386.md (movmem): Change name to cpymem.
> (setmem): Change expansion function name.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-expand.c | 36 ++--
>  gcc/config/i386/i386-protos.h |  2 +-
>  gcc/config/i386/i386.h|  2 +-
>  gcc/config/i386/i386.md   |  6 +++---
>  4 files changed, 23 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 72be1df..ae1fe2a9 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -5801,7 +5801,7 @@ counter_mode (rtx count_exp)
>
>
>  static void
> -expand_set_or_movmem_via_loop (rtx destmem, rtx srcmem,
> +expand_set_or_cpymem_via_loop (rtx destmem, rtx srcmem,
>rtx destptr, rtx srcptr, rtx value,
>rtx count, machine_mode mode, int unroll,
>int expected_size, bool issetmem)
> @@ -5954,7 +5954,7 @@ scale_counter (rtx countreg, int scale)
> Other arguments have same meaning as for previous function.  */
>
>  static void
> -expand_set_or_movmem_via_rep (rtx destmem, rtx srcmem,
> +expand_set_or_cpymem_via_rep (rtx destmem, rtx srcmem,
>rtx destptr, rtx srcptr, rtx value, rtx orig_value,
>rtx count,
>machine_mode mode, bool issetmem)
> @@ -6121,7 +6121,7 @@ ix86_expand_aligntest (rtx variable, int value, bool 
> epilogue)
>  /* Output code to copy at most count & (max_size - 1) bytes from SRC to 
> DEST.  */
>
>  static void
> -expand_movmem_epilogue (rtx destmem, rtx srcmem,
> +expand_cpymem_epilogue (rtx destmem, rtx srcmem,
> rtx destptr, rtx srcptr, rtx count, int max_size)
>  {
>rtx src, dest;
> @@ -6146,7 +6146,7 @@ expand_movmem_epilogue (rtx destmem, rtx srcmem,
>  {
>count = expand_simple_binop (GET_MODE (count), AND, count, GEN_INT 
> (max_size - 1),
> count, 1, OPTAB_DIRECT);
> -  expand_set_or_movmem_via_loop (destmem, srcmem, destptr, srcptr, NULL,
> +  expand_set_or_cpymem_via_loop (destmem, srcmem, destptr, srcptr, NULL,
>  count, QImode, 1, 4, false);
>return;
>  }
> @@ -6295,7 +6295,7 @@ expand_setmem_epilogue_via_loop (rtx destmem, rtx 
> destptr, rtx value,
>  {
>count = expand_simple_binop (counter_mode (count), AND, count,
>GEN_INT (max_size - 1), count, 1, 
> OPTAB_DIRECT);
> -  expand_set_or_movmem_via_loop (destmem, NULL, destptr, NULL,
> +  expand_set_or_cpymem_via_loop (destmem, NULL, destptr, NULL,
>  gen_lowpart (QImode, value), count, QImode,
>  1, max_size / 2, true);
>  }
> @@ -6416,7 +6416,7 @@ ix86_adjust_counter (rtx countreg, HOST_WIDE_INT value)
> Return value is updated DESTMEM.  */
>
>  static rtx
> -expand_set_or_movmem_prologue (rtx destmem, rtx srcmem,
> +expand_set_or_cpymem_prologue (rtx destmem, rtx srcmem,
>   rtx destptr, rtx srcptr, rtx value,
>   rtx vec_value, rtx count, int align,
>   int desired_alignment, bool issetmem)
> @@ -6449,7 +6449,7 @@ expand_set_or_movmem_prologue (rtx destmem, rtx srcmem,
> or setmem sequence that is valid for SIZE..2*SIZE-1 bytes
> and jump to DONE_LABEL.  */
>  static void
> -expand_small_movmem_or_setmem (rtx destmem, rtx srcmem,
> +expand_small_cpymem_or_setmem (rtx destmem, rtx srcmem,
>rtx destptr, rtx srcptr,
>rtx value, rtx vec_value,
>rtx count, int size,
> @@ -6575,7 +6575,7 @@ expand_small_movmem_or_setmem (rtx destmem, rtx srcmem,
> done_label:
>*/
>  static void
> -expand_set_or_movmem_prologue_epilogue_by_misaligned_moves (rtx destmem, rtx 
> srcmem,
> +expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves (rtx destmem, rtx 
> srcmem,
> rtx *destptr, rtx 
> *srcptr,
> machine_mode mode,
> rtx 

[PATCH] doc: Fix opindex for -W options

2019-06-26 Thread Segher Boessenkool
@opindex -Wxxx is wrong; it should be @opindex Wxxx.

Committing as trivial and obvious.


2019-06-26  Segher Boessenkool  

* doc/invoke.texi (Warning Options): Fix some @opindex syntax.

---
 gcc/doc/invoke.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7c09680..20f9e69 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6146,8 +6146,8 @@ false positives and is deactivated by default.
 
 @item -Wattribute-alias=@var{n}
 @itemx -Wno-attribute-alias
-@opindex -Wattribute-alias
-@opindex -Wno-attribute-alias
+@opindex Wattribute-alias
+@opindex Wno-attribute-alias
 Warn about declarations using the @code{alias} and similar attributes whose
 target is incompatible with the type of the alias.
 @xref{Function Attributes,,Declaring Attributes of Functions}.
-- 
1.8.3.1



[PATCH] Fix warnings seen by clang in gcc/symbol-summary.h.

2019-06-26 Thread Martin Liška
Hi.

The patch is about missing argument to function call and
unused arguments in symbol-summary.h.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-06-25  Martin Liska  

* symbol-summary.h (traverse): Pass
argument a to the call of callback.
(gt_ggc_mx): Mark arguments as unused.
(gt_pch_nx): Likewise.
---
 gcc/symbol-summary.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)


diff --git a/gcc/symbol-summary.h b/gcc/symbol-summary.h
index 0219f3a81ea..e90d4481a10 100644
--- a/gcc/symbol-summary.h
+++ b/gcc/symbol-summary.h
@@ -362,7 +362,7 @@ public:
   {
 for (unsigned i = 0; i < m_vector->length (); i++)
   if ((*m_vector[i]) != NULL)
-	f ((*m_vector)[i]);
+	f ((*m_vector)[i], a);
   }
 
   /* Getter for summary callgraph node pointer.  If a summary for a node
@@ -846,7 +846,7 @@ public:
   {
 for (unsigned i = 0; i < m_vector->length (); i++)
   if ((*m_vector[i]) != NULL)
-	f ((*m_vector)[i]);
+	f ((*m_vector)[i], a);
   }
 
   /* Getter for summary callgraph edge pointer.
@@ -966,21 +966,21 @@ fast_call_summary::is_ggc ()
 
 template 
 void
-gt_ggc_mx (fast_call_summary* const )
+gt_ggc_mx (fast_call_summary* const  ATTRIBUTE_UNUSED)
 {
 }
 
 template 
 void
-gt_pch_nx (fast_call_summary* const )
+gt_pch_nx (fast_call_summary* const  ATTRIBUTE_UNUSED)
 {
 }
 
 template 
 void
-gt_pch_nx (fast_call_summary* const& summary,
-	   gt_pointer_operator op,
-	   void *cookie)
+gt_pch_nx (fast_call_summary* const& summary ATTRIBUTE_UNUSED,
+	   gt_pointer_operator op ATTRIBUTE_UNUSED,
+	   void *cookie ATTRIBUTE_UNUSED)
 {
 }