Re: Committed: gcc.misc-tests/outputs.exp (outest): Fix typo "is_target"

2021-02-15 Thread Bernd Edlinger
Oops,

thanks for fixing this problem.

To my excuse I would like to note,
that the script error does not happen on x86_64-pc-linux-gnu,
probably it would only happen when a file is left over.

Since usually this is never executed because $outs is empty:

foreach f $outs {
file delete $f
# collect2 may create .cdtor* files in -save-temps link tests,
# ??? without regard to aux output naming conventions.
# Limit this exception to targets that define EH_FRAME_THROUGH_COLLECT2.
if { !(([istarget powerpc*-*-aix*] || [is_target hppa*-*-hpux*])
   && ([string match "*.cdtor.*" $f]
   || [string match "*.gcc_args" $f])) } {
lappend outb $f
}
}

Can you say which target was this, where you found this bug?
Which file was in $outs?

I am also a bit surprised that a script error in one test aborts
the whole gcc.target tests.  How can that be?

Thanks
Bernd.


On 2/16/21 2:33 AM, Hans-Peter Nilsson wrote:
> Committed as obvious.  Please be more careful; this typo
> should have been obvious in "make check" output as below.
> 
> Commit-log:
> ---
> Fix typo for istarget in "is_target hppa*-*-hpux*", yielding
> an error running the test-suite for any target not matching
> powerpc*-*-aix* (presumably, by code inspection), aborting
> the check-gcc (check-gcc-c) regression test run some 3000
> tests before the last one, missing e.g. all gcc.target
> tests like so:
> 
> -
> ...
> Running /x/gcc/gcc/testsuite/gcc.misc-tests/outputs.exp ...
> ERROR: (DejaGnu) proc "is_target hppa*-*-hpux*" does not exist.
> The error code is TCL LOOKUP COMMAND is_target
> The info on the error is:
> invalid command name "is_target"
> while executing
> "::tcl_unknown is_target hppa*-*-hpux*"
> ("uplevel" body line 1)
> invoked from within
> "uplevel 1 ::tcl_unknown $args"
> 
>   === gcc Summary ===
> ...
> -
> 
> gcc/testsuite:
>   * gcc.misc-tests/outputs.exp (outest): Fix typo "is_target".
> ---
>  gcc/testsuite/gcc.misc-tests/outputs.exp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.misc-tests/outputs.exp 
> b/gcc/testsuite/gcc.misc-tests/outputs.exp
> index d5a9709910c2..4d904bde31d5 100644
> --- a/gcc/testsuite/gcc.misc-tests/outputs.exp
> +++ b/gcc/testsuite/gcc.misc-tests/outputs.exp
> @@ -192,7 +192,7 @@ proc outest { test sources opts dirs outputs } {
>   # collect2 may create .cdtor* files in -save-temps link tests,
>   # ??? without regard to aux output naming conventions.
>   # Limit this exception to targets that define EH_FRAME_THROUGH_COLLECT2.
> - if { !(([istarget powerpc*-*-aix*] || [is_target hppa*-*-hpux*])
> + if { !(([istarget powerpc*-*-aix*] || [istarget hppa*-*-hpux*])
>  && ([string match "*.cdtor.*" $f]
>  || [string match "*.gcc_args" $f])) } {
>   lappend outb $f
> 


Re: [RFC] test builtin ratio for loop distribution

2021-02-15 Thread Alexandre Oliva
On Feb 12, 2021, Richard Biener  wrote:

>> +  if (TREE_CODE (mem) == SSA_NAME)
>> +if (ptr_info_def *pi = get_ptr_info (mem))
>> +  {
>> +   unsigned al = get_pointer_alignment (builtin->dst_base);
>> +   if (al > pi->align || pi->misalign)

> We still might prefer pi->align == 64 and pi->misalign == 32 over al == 16
> so maybe factor that in, too.

Ugh, apologies, I somehow posted an incorrect and outdated version of
the patch.  The improved (propagates both alignments) and fixed (divides
by BITS_PER_UNIT, fixing a regression in gfortran.dg/sms-2.f90) had
this alternate hunk as the only difference:

@@ -1155,6 +1156,16 @@ generate_memset_builtin (class loop *loop, partition 
*partition)
   mem = force_gimple_operand_gsi (, mem, true, NULL_TREE,
  false, GSI_CONTINUE_LINKING);
 
+  if (TREE_CODE (mem) == SSA_NAME)
+if (ptr_info_def *pi = get_ptr_info (mem))
+  {
+   unsigned al;
+   unsigned HOST_WIDE_INT misal;
+   if (get_pointer_alignment_1 (builtin->dst_base, , ))
+ set_ptr_info_alignment (pi, al / BITS_PER_UNIT,
+ misal / BITS_PER_UNIT);
+  }
+
   /* This exactly matches the pattern recognition in classify_partition.  */
   val = gimple_assign_rhs1 (DR_STMT (builtin->dst_dr));
   /* Handle constants like 0x15151515 and similarly



> So I wonder whether we should instead re-run CCP after loop opts which
> computes nonzero bits as well instead of the above "hack".  Would
> nonzero bits be ready to compute in the above way from loop distribution?
> That is, you can do set_nonzero_bits just like you did
> set_ptr_info_alignment ...

> Since CCP also performs copy propagation an obvious candidate would be
> to replace the last pass_copy_prop with pass_ccp (with a comment noting
> to compute up-to-date nonzero bits info).

I'll look into these possibilities.


-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: [PATCH] MIPS: Fix PR target/98491 (ChangeLog)

2021-02-15 Thread Xi Ruoyao via Gcc-patches
On 2021-02-15 16:16 -0700, Jeff Law wrote:
> 
> 
> On 2/12/21 7:17 AM, Xi Ruoyao wrote:
> > On 2021-01-11 01:01 +0800, Xi Ruoyao wrote:
> > > Hi Jeff and Jakub,
> > > 
> > > On 2021-01-04 14:19 -0700, Jeff Law wrote:
> > > > On 1/4/21 2:00 PM, Jakub Jelinek wrote:
> > > > > On Mon, Jan 04, 2021 at 01:51:59PM -0700, Jeff Law via Gcc-patches
> > > > > wrote:
> > > > > > > Sorry, I forgot to include the ChangeLog:
> > > > > > > 
> > > > > > >     gcc/ChangeLog:
> > > > > > >     
> > > > > > >     2020-12-31  Xi Ruoyao 
> > > > > > >     
> > > > > > >     PR target/98491
> > > > > > >     * config/mips/mips.c (mips_symbol_insns): Do not use
> > > > > > >   MSA_SUPPORTED_MODE_P if mode is MAX_MACHINE_MODE.
> > > > > > So I absolutely agree the current code is wrong as it does an out of
> > > > > > bounds array access.
> > > > > > 
> > > > > > 
> > > > > > Would it be better to instead to change MSA_SUPPORTED_MODE_P to
> > > > > > evaluate
> > > > > > to zero if MODE is MAX_MACHINE_MODE?  That would protect all the 
> > > > > > uses
> > > > > > of
> > > > > > MSA_SUPPORTED_MODE_P.    Something like this perhaps?
> > > > > But MAX_MACHINE_MODE is the one past last valid mode, I'm not aware of
> > > > > any target that would protect all macros that deal with modes that 
> > > > > way.
> > > > > 
> > > > > So, perhaps best would be stop using the MAX_MACHINE_MODE as magic 
> > > > > value
> > > > > for that function and instead use say VOIDmode that shouldn't normally
> > > > > appear either?
> > > > I think we have to allow VOIDmode because constants don't necessarily
> > > > have modes.   And I certainly agree that using MAX_MACHINE_MODE like
> > > > this is ugly and error prone (as we can see from the BZ).
> > > > 
> > > > I also couldn't convince myself that the code and comments were actually
> > > > consistent, particularly for MSA targets which the comment claims can
> > > > never handle constants for ld/st (and thus should be returning 0 for
> > > > MAX_MACHINE_MODE).  Though maybe mips_symbol_insns_1 ultimately handles
> > > > that correctly.
> > > > 
> > > > 
> > > > > But I don't really see anything wrong on the mips_symbol_insns above
> > > > > change either.
> > > > Me neither.  I'm just questioning if bullet-proofing in the
> > > > MSA_SUPPORTED_MODE_P would be a better option.  While I've worked in the
> > > > MIPS port in the past, I don't really have any significannt experience
> > > > with the MSA support.
> > > I can't understand the comment either.  To me it looks like it's possible 
> > > to
> > > remove this "if (MSA_SUPPORTED_P (mode)) return 0;"
> > > 
> > > CC Robert to get some help.
> > Happy new lunar year folks.
> > 
> > I found a newer email address of Robert.  Hope it is still being used.
> > 
> > Could someone update MAINTAINERS file by the way?
> If you have an updated email address, I can reach out to Robert and see
> if he wants his entry updated or removed.

 His latest reply in gcc mail lists used robert.sucha...@mips.com.  But when I
sent mail to it, the mail was just rejected with "access denied".  Google told
me Office 365 mail service (used by mips.com) rejects mail to deleted accounts
with "access denied". So I'm not sure if this email address is invalid again, or
Office 365 just dislikes me...
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University



Committed: gcc.misc-tests/outputs.exp (outest): Fix typo "is_target"

2021-02-15 Thread Hans-Peter Nilsson via Gcc-patches
Committed as obvious.  Please be more careful; this typo
should have been obvious in "make check" output as below.

Commit-log:
---
Fix typo for istarget in "is_target hppa*-*-hpux*", yielding
an error running the test-suite for any target not matching
powerpc*-*-aix* (presumably, by code inspection), aborting
the check-gcc (check-gcc-c) regression test run some 3000
tests before the last one, missing e.g. all gcc.target
tests like so:

-
...
Running /x/gcc/gcc/testsuite/gcc.misc-tests/outputs.exp ...
ERROR: (DejaGnu) proc "is_target hppa*-*-hpux*" does not exist.
The error code is TCL LOOKUP COMMAND is_target
The info on the error is:
invalid command name "is_target"
while executing
"::tcl_unknown is_target hppa*-*-hpux*"
("uplevel" body line 1)
invoked from within
"uplevel 1 ::tcl_unknown $args"

=== gcc Summary ===
...
-

gcc/testsuite:
* gcc.misc-tests/outputs.exp (outest): Fix typo "is_target".
---
 gcc/testsuite/gcc.misc-tests/outputs.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.misc-tests/outputs.exp 
b/gcc/testsuite/gcc.misc-tests/outputs.exp
index d5a9709910c2..4d904bde31d5 100644
--- a/gcc/testsuite/gcc.misc-tests/outputs.exp
+++ b/gcc/testsuite/gcc.misc-tests/outputs.exp
@@ -192,7 +192,7 @@ proc outest { test sources opts dirs outputs } {
# collect2 may create .cdtor* files in -save-temps link tests,
# ??? without regard to aux output naming conventions.
# Limit this exception to targets that define EH_FRAME_THROUGH_COLLECT2.
-   if { !(([istarget powerpc*-*-aix*] || [is_target hppa*-*-hpux*])
+   if { !(([istarget powerpc*-*-aix*] || [istarget hppa*-*-hpux*])
   && ([string match "*.cdtor.*" $f]
   || [string match "*.gcc_args" $f])) } {
lappend outb $f
-- 
2.11.0

brgds, H-P


Re: [PATCH] PR98096: inline-asm: Take inout operands into account for access to labels by names.

2021-02-15 Thread Jeff Law via Gcc-patches



On 2/4/21 2:29 PM, Vladimir Makarov wrote:
> The following patch solves
>
>    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98096
>
> The patch is for a new GCC extension -- asm goto with output reloads.
>
> GCC splits inout operands (with constraint "+") into output and new
> matched input operands during gimplfication.  Addressing input or
> output operands by name or number is not a problem as the new input
> operands are added at the end of existing input operands.
>
> However it became a problem for labels in asm goto with output
> reloads.  Addressing labels should take into account the added matched
> input operands.  The patch solves the problem.
>
> The patch was successfully bootstrapped and tested on x86-64.
>
> Is it ok to commit into the trunk?
Yes.  I think one can easily argue the inconsistent operand numbering is
a diagnostic regression.

jeff



Re: [PATCH] [X86] Fold more shuffle builtins to VEC_PERM_EXPR.

2021-02-15 Thread Jeff Law via Gcc-patches



On 12/16/20 3:41 AM, Hongtao Liu via Gcc-patches wrote:
> On Tue, Dec 15, 2020 at 7:11 PM Jakub Jelinek  wrote:
>> On Tue, Dec 15, 2020 at 06:10:57PM +0800, Hongtao Liu via Gcc-patches wrote:
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -18187,21 +18187,67 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator 
>>> *gsi)
>>>   }
>>>break;
>>>
>>> +case IX86_BUILTIN_SHUFPD512:
>>> +case IX86_BUILTIN_SHUFPS512:
>>> +  if (n_args > 2)
>>> + {
>>> +   /* This is masked shuffle.  Only optimize if the mask is all ones.  
>>> */
>>> +   tree argl = gimple_call_arg (stmt, n_args - 1);
>>> +   arg0 = gimple_call_arg (stmt, 0);
>>> +   if (!tree_fits_uhwi_p (argl))
>>> + break;
>>> +   unsigned HOST_WIDE_INT mask = tree_to_uhwi (argl);
>>> +   unsigned elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> I think it would be better not to mix the argl and arg0 stuff.
>> So e.g. do
>>   arg0 = gimple_call_arg (stmt, 0);
>>   unsigned elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> first and then the argl stuff, or vice versa.
>> Furthermore, you don't really care about the upper bits of argl,
>> so why don't punt just if (TREE_CODE (argl) != INTEGER_CST)
>> and use mask = TREE_LOW_CST (argl);
>> ?
>>
> Yes, and for maintenance convenience, i put these code into a new
> function which can be also called by masked shift
> @@ -18128,17 +18142,10 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>gcc_assert (n_args >= 2);
>arg0 = gimple_call_arg (stmt, 0);
>arg1 = gimple_call_arg (stmt, 1);
> -  if (n_args > 2)
> - {
> -   /* This is masked shift.  Only optimize if the mask is all ones.  */
> -   tree argl = gimple_call_arg (stmt, n_args - 1);
> -   if (!tree_fits_uhwi_p (argl))
> - break;
> -   unsigned HOST_WIDE_INT mask = tree_to_uhwi (argl);
> -   unsigned elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> -   if ((mask | (HOST_WIDE_INT_M1U << elems)) != HOST_WIDE_INT_M1U)
> - break;
> - }
> +  /* For masked shift, only optimize if the mask is all ones.  */
> +  if (n_args > 2
> +   && !ix86_masked_all_ones (arg0, gimple_call_arg (stmt, n_args - 1)))
> + break;
>
>
>>> +   if ((mask | (HOST_WIDE_INT_M1U << elems)) != HOST_WIDE_INT_M1U)
>>> + break;
>>> + }
>>> +  /* Fall thru.  */
>>>  case IX86_BUILTIN_SHUFPD:
>>> +case IX86_BUILTIN_SHUFPD256:
>>> +case IX86_BUILTIN_SHUFPS:
>>> +case IX86_BUILTIN_SHUFPS256:
>>>arg2 = gimple_call_arg (stmt, 2);
>>>if (TREE_CODE (arg2) == INTEGER_CST)
>>>   {
>>> -   location_t loc = gimple_location (stmt);
>>> -   unsigned HOST_WIDE_INT imask = TREE_INT_CST_LOW (arg2);
>>> arg0 = gimple_call_arg (stmt, 0);
>>> +   unsigned elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>>> +   machine_mode imode = GET_MODE_INNER (TYPE_MODE (TREE_TYPE (arg0)));
>>> +   unsigned HOST_WIDE_INT imask = TREE_INT_CST_LOW (arg2);
>>> +
>>> +   /* Check valid imm, refer to gcc.target/i386/testimm-10.c.  */
>>> +   if (imask > 255
>>> +   || (imask >= HOST_WIDE_INT_1U << elems
>>> +   && imode == E_DFmode))
>>> + return false;
>> Why is this extra checking done only for DFmode and not for SFmode?
> Oh, yes, delete extra checking, the instruction would ignore high bits
> for 128/256-bit DFmode version.
>>> +   tree itype = imode == E_DFmode
>>> + ? long_long_integer_type_node : integer_type_node;
>> Formatting.  Should be e.g.
>>   tree itype
>> = (imode == E_DFmode
>>? long_long_integer_type_node : integer_type_node);
>> or
>>   tree itype = (imode == E_DFmode ? long_long_integer_type_node
>>   : integer_type_node);
>> but the ? which is part of the imode == E_DFmode ... subexpression
>> can't just be below something unrelated.
>>
> Changed.
>>> +   if (imode == E_DFmode)
>>> + sel_idx = (i & 1) * elems
>>> +   + (i >> 1 << 1) + ((imask & 1 << i) >> i);
>> Again, formatting.  Plus, i >> 1 << 1 looks too ugly/unreadable,
>> if you mean i & ~1, write it like that, it is up to the compiler to emit
>> it like i >> 1 << 1 if that is the best implementation.
>>
> Changed.
>>> +   else
>>> + {
>>> +   /* Imm[7:0](if VL > 128, also use Imm[7:0]) provide 4 select
>>> +  controls for each element of the destination.  */
>>> +   unsigned j = i % 4;
>>> +   sel_idx = ((i & 2) >> 1) * elems
>>> + + (i >> 2 << 2) + ((imask & 3 << j << j) >> j >> j);
>> Ditto.
>>
> Changed.
>> Jakub
>>
> Update patch
>
> --
> BR,
> Hongtao
>
> 0001-X86-Fold-more-shuffle-builtins-to-VEC_PERM_EXPR.patch
>
> From 1cfec402ffa25375c88fa38e783d203401f38c5e Mon Sep 17 00:00:00 2001
> From: liuhongt 
> Date: Fri, 11 Dec 2020 19:02:43 +0800
> Subject: 

Re: [PATCH] Make switchconv smarter.

2021-02-15 Thread Jeff Law via Gcc-patches



On 12/18/20 6:19 AM, Martin Liška wrote:
> The patch covers 2 cases mentioned in the PR.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> PR tree-optimization/94779
> * tree-switch-conversion.c (switch_conversion::switch_conversion):
> Initialize m_default_unreachable, m_missing_return_in_default
> and m_inbound_check_needed.
> (switch_conversion::collect): Start with first edge also
> if m_default_unreachable or m_missing_return_in_default.
> (switch_conversion::check_all_empty_except_final): Ignore
> default edge for case where m_default_unreachable is true.
> (switch_conversion::phi_leads_to_return): New.
> (switch_conversion::build_one_array): Drop boundary
> check for linear transformation where
> m_missing_return_in_default is true
> (switch_conversion::analyze_default_bb): New.
> (switch_conversion::gen_inbound_check): Generate if TRUE when
> m_default_unreachable or we don't need boundary check.
> (switch_conversion::expand): Do transformation as we can't be
> sure that the switch will be expanded with JT.
> * tree-switch-conversion.h: Declare new functions and
> variables.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/94779
> * gcc.dg/tree-ssa/cswtch-6.c: New test.
> * gcc.dg/tree-ssa/cswtch-7.c: New test.
Given this is "just" a missed optimization, are we deferring to gcc-12?

jeff



Re: [PATCH] don't enable DWARF5 by default on Windows (PR98860)

2021-02-15 Thread Jeff Law via Gcc-patches



On 2/9/21 2:07 AM, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Feb 09, 2021 at 07:47:00AM +0100, Richard Biener via Gcc-patches 
> wrote:
>> On February 8, 2021 10:44:26 PM GMT+01:00, Mikael Pettersson via Gcc-patches 
>>  wrote:
>>> PR98860 is a gcc-11 regression where bootstrap fails on Windows since
>>> the switch to enable DWARF5 by default. The symptoms are that
>>> executables generated by the stage1 compiler fail to run with "Exec
>>> format error", which confuses subsequent configure steps and causes
>>> hard errors. This happens even with the very latest binutils master.
>>>
>>> Fixed by updating SUBTARGET_OVERRIDE_OPTIONS to set dwarf_version to 4
>>> unless the user explicitly requested another version. I see some other
>>> targets did the same.
>>>
>>> Tested on Cygwin64 and mingw-w64.
>>>
>>> Ok for master?
>> It might be better to expose the default via configure (I've heard desires 
>> to control that elsewhere). 
> But also it would be better to see some analysis what exactly and why
> doesn't work.
> What program emits that Exec format error message and why.
Yea.  It seems quite odd that a change in debug format triggers an exec
format error.   I'd want to have a better understanding why before
acking this patch as well.

jeff



[PATCH] doc: c: c++: Document the C/C++ extended asm empty input constraints

2021-02-15 Thread Neven Sajko via Gcc-patches
There is a long-standing, but undocumented GCC inline assembly feature
that's part of the extended asm GCC extension to C and C++: extended
asm empty input constraints.

Although I don't really use extended asm much, and I never contributed
to GCC before; I tried to document the feature as far as I understand
it. I ran make html to check that the changed Texinfo is well formed.

FTR, empty input constraints have been mentioned on the GCC mailing
lists, e.g.:
https://gcc.gnu.org/pipermail/gcc-help/2015-June/124410.html

I release this contribution into the public domain.

Neven Sajko

gcc/ChangeLog:

* doc/md.texi: Document extended asm empty input constraints

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index e3686dbfe..deccfd38a 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -1131,7 +1131,102 @@ the addressing register.
 @subsection Simple Constraints
 @cindex simple constraints

-The simplest kind of constraint is a string full of letters, each of
+An input constraint is allowed to be an empty string, in which case it is
+called an empty input constraint. (When an empty input constraint is used,
+the assembler template will most probably also be empty. I.e., the @code{asm}
+declaration need not contain actual assembly code.) An empty input
+constraint can be used to create an artificial dependency on a C or C++
+variable (the variable that appears in the expression associated with the
+constraint) without incurring unnecessary costs to performance.
+
+An example of where such behavior may be useful is for preventing compiler
+optimizations like dead store elimination or hoisting code outside a loop for
+certain pieces of C or C++ code. Specific applications may include direct
+interaction with hardware features; or things like testing, fuzzing and
+benchmarking.
+
+Here's a simple C++20 program that is not useful in practice but demonstrates
+relevant behavior; store it as a file called asm.cc:
+
+@verbatim
+#include 
+
+int
+main() {
+// Greater than or equal to zero.
+constexpr int asmV = ASM_V;
+
+// The exact content of v is irrelevant for
+// this example.
+std::vector v{7, 6, 9, 3, 2, 0};
+
+for (int i{0}; i < (1 << 28); i++) {
+for (int j{0}; j < 6; j++) {
+// The exact operation on the contents
+// of v is not relevant for this
+// example.
+v[j]++;
+
+if constexpr (1 <= asmV) {
+asm volatile ("" :: ""(v.size()));
+for (auto x: v) {
+asm volatile ("" :: ""(x));
+}
+}
+if constexpr (2 <= asmV) {
+asm volatile ("" :: ""(v.size()));
+for (auto x: v) {
+asm volatile ("" :: ""(x));
+}
+}
+if constexpr (3 <= asmV) {
+asm volatile ("" :: ""(v.size()));
+for (auto x: v) {
+asm volatile ("" :: ""(x));
+}
+}
+}
+}
+
+return 0;
+}
+@end verbatim
+
+Compile with, e.g., the following command (with @code{XXX} equal to @code{0},
+@code{1}, @code{2}, and @code{3}).
+
+@verbatim
+g++ -std=c++20 -O3 -flto -march=native -D ASM_V=XXX -o XXX asm.cc
+@end verbatim
+
+Firstly, for @code{XXX} equal to @code{0}; all of the @code{asm} declarations
+are dead code, thus formally the contents of @var{v} are not observable,
+thus the program consists almost entirely of code that may be eliminated by a
+(valid) compiler. While this usually aligns with what the programming user
+wants, sometimes we might want to, e.g., measure how long does it take for
+some piece of code to execute, even if we aren't interested in its results
+(or already know what its results must be). Such is the case in, e.g.,
+benchmarking.
+
+Secondly, for @code{XXX} equal to @code{1}; only the first part with
+@code{asm} declarations (the body of the first @code{if} statement) is
+effective, and because of it the preceding code can not be eliminated,
+because the @code{asm} declarations depend on @var{v} and its contents as
+input operands. The same effect would exist with a nonempty input constraint
+in place of the empty input constraints, but probably with additional
+unnecessary code generation and diminished performance. The innermost loop
+should not cause any code to be generated, because the input constraint is
+empty.
+
+Thirdly, for @code{XXX} equal to @code{2} or @code{3}; assuming the
required compiler
+optimizations are successful, the generated code should be the same as for
+@code{XXX} equal to @code{1}. This is again because of the empty
input constraint
+preventing unnecessary code generation (a nonempty input constraint would
+probably require that the compiler store values into either registers or
+memory, even though the assembler template is empty).
+
+The simplest kind of constraint, apart from the empty constraint,
+is a string full of letters, 

Re: [PATCH] MIPS: Fix PR target/98491 (ChangeLog)

2021-02-15 Thread Jeff Law via Gcc-patches



On 2/12/21 7:17 AM, Xi Ruoyao wrote:
> On 2021-01-11 01:01 +0800, Xi Ruoyao wrote:
>> Hi Jeff and Jakub,
>>
>> On 2021-01-04 14:19 -0700, Jeff Law wrote:
>>> On 1/4/21 2:00 PM, Jakub Jelinek wrote:
 On Mon, Jan 04, 2021 at 01:51:59PM -0700, Jeff Law via Gcc-patches wrote:
>> Sorry, I forgot to include the ChangeLog:
>>
>>     gcc/ChangeLog:
>>     
>>     2020-12-31  Xi Ruoyao 
>>     
>>     PR target/98491
>>     * config/mips/mips.c (mips_symbol_insns): Do not use
>>   MSA_SUPPORTED_MODE_P if mode is MAX_MACHINE_MODE.
> So I absolutely agree the current code is wrong as it does an out of
> bounds array access.
>
>
> Would it be better to instead to change MSA_SUPPORTED_MODE_P to evaluate
> to zero if MODE is MAX_MACHINE_MODE?  That would protect all the uses of
> MSA_SUPPORTED_MODE_P.    Something like this perhaps?
 But MAX_MACHINE_MODE is the one past last valid mode, I'm not aware of
 any target that would protect all macros that deal with modes that way.

 So, perhaps best would be stop using the MAX_MACHINE_MODE as magic value
 for that function and instead use say VOIDmode that shouldn't normally
 appear either?
>>> I think we have to allow VOIDmode because constants don't necessarily
>>> have modes.   And I certainly agree that using MAX_MACHINE_MODE like
>>> this is ugly and error prone (as we can see from the BZ).
>>>
>>> I also couldn't convince myself that the code and comments were actually
>>> consistent, particularly for MSA targets which the comment claims can
>>> never handle constants for ld/st (and thus should be returning 0 for
>>> MAX_MACHINE_MODE).  Though maybe mips_symbol_insns_1 ultimately handles
>>> that correctly.
>>>
>>>
 But I don't really see anything wrong on the mips_symbol_insns above
 change either.
>>> Me neither.  I'm just questioning if bullet-proofing in the
>>> MSA_SUPPORTED_MODE_P would be a better option.  While I've worked in the
>>> MIPS port in the past, I don't really have any significannt experience
>>> with the MSA support.
>> I can't understand the comment either.  To me it looks like it's possible to
>> remove this "if (MSA_SUPPORTED_P (mode)) return 0;"
>>
>> CC Robert to get some help.
> Happy new lunar year folks.
>
> I found a newer email address of Robert.  Hope it is still being used.
>
> Could someone update MAINTAINERS file by the way?
If you have an updated email address, I can reach out to Robert and see
if he wants his entry updated or removed.

Thanks,
jeff



[PATCH] Add -fgnu-retain/-fno-gnu-retain

2021-02-15 Thread H.J. Lu via Gcc-patches
When building Linux kernel, ld in bninutils 2.36 with GCC 11 generates
thousands of

ld: warning: orphan section `.data.event_initcall_finish' from `init/main.o' 
being placed in section `.data.event_initcall_finish'
ld: warning: orphan section `.data.event_initcall_start' from `init/main.o' 
being placed in section `.data.event_initcall_start'
ld: warning: orphan section `.data.event_initcall_level' from `init/main.o' 
being placed in section `.data.event_initcall_level'

Since these sections are marked with SHF_GNU_RETAIN, they are placed in
separate sections.  They become orphan sections since they aren't expected
in the Linux kernel linker script. But orphan sections normally don't work
well with the Linux kernel linker script and the resulting kernel crashed.

Add -fgnu-retain to disable SHF_GNU_RETAIN for Linux kernel build with
-fno-gnu-retain.

gcc/

PR target/99113
* common.opt: Add -fgnu-retain.
* toplev.c (process_options): Update flag_gnu_retain from
SUPPORTS_SHF_GNU_RETAIN.
* varasm.c (get_section): Replace SUPPORTS_SHF_GNU_RETAIN with
flag_gnu_retain.
(resolve_unique_section): Likewise.
(get_variable_section): Likewise.
(switch_to_section): Likewise.
* doc/invoke.texi: Document -fgnu-retain/-fno-gnu-retain.

gcc/testsuite/

* c-c++-common/pr99113.c: New test.
---
 gcc/common.opt   | 4 
 gcc/doc/invoke.texi  | 8 
 gcc/testsuite/c-c++-common/pr99113.c | 7 +++
 gcc/toplev.c | 9 +
 gcc/varasm.c | 8 
 5 files changed, 32 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr99113.c

diff --git a/gcc/common.opt b/gcc/common.opt
index c75dd36843e..912e879e49d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1666,6 +1666,10 @@ floop-unroll-and-jam
 Common Var(flag_unroll_jam) Optimization
 Perform unroll-and-jam on loops.
 
+fgnu-retain
+Common Var(flag_gnu_retain) Init(-1)
+Use SHF_GNU_RETAIN if supported by the assembler and the linker.
+
 fgnu-tm
 Common Var(flag_tm)
 Enable support for GNU transactional memory.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e8baa545eee..20962a8749e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16168,6 +16168,14 @@ DSOs; if your program relies on reinitialization of a 
DSO via
 @code{dlclose} and @code{dlopen}, you can use
 @option{-fno-gnu-unique}.
 
+@item -fno-gnu-retain
+@opindex fno-gnu-retain
+@opindex fgnu-retain
+On systems with recent GNU assembler and linker, the compiler places
+preserved symbols in separate SHF_GNU_RETAIN sections.  If your program,
+like Linux kernel, doesn't work with such symbol placement, you can use
+@option{-fno-gnu-retain} to disable SHF_GNU_RETAIN sections.
+
 @item -fpcc-struct-return
 @opindex fpcc-struct-return
 Return ``short'' @code{struct} and @code{union} values in memory like
diff --git a/gcc/testsuite/c-c++-common/pr99113.c 
b/gcc/testsuite/c-c++-common/pr99113.c
new file mode 100644
index 000..d3c830247a2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr99113.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-Wall -O2 -fno-gnu-retain " } */
+
+static int xyzzy __attribute__((__used__)) = 1; 
+
+/* { dg-final { scan-assembler "xyzzy" } } */
+/* { dg-final { scan-assembler-not "\.data.*,\"awR\"" { target 
R_flag_in_section } } } */
diff --git a/gcc/toplev.c b/gcc/toplev.c
index d8cc254adef..a1b18f5338c 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1761,6 +1761,15 @@ process_options (void)
   if (flag_large_source_files)
 line_table->default_range_bits = 0;
 
+  if (flag_gnu_retain == -1)
+flag_gnu_retain = SUPPORTS_SHF_GNU_RETAIN;
+  else if (flag_gnu_retain > 0 && !SUPPORTS_SHF_GNU_RETAIN)
+{
+  warning_at (UNKNOWN_LOCATION, 0, "%qs is not supported for this target",
+ "-fgnu-retain");
+  flag_gnu_retain = 0;
+}
+
   /* Please don't change global_options after this point, those changes won't
  be reflected in optimization_{default,current}_node.  */
 }
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 29478ab0d8d..4e0e30abee5 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -297,7 +297,7 @@ get_section (const char *name, unsigned int flags, tree 
decl,
   slot = section_htab->find_slot_with_hash (name, htab_hash_string (name),
INSERT);
   flags |= SECTION_NAMED;
-  if (SUPPORTS_SHF_GNU_RETAIN
+  if (flag_gnu_retain
   && decl != nullptr
   && DECL_P (decl)
   && DECL_PRESERVE_P (decl))
@@ -487,7 +487,7 @@ resolve_unique_section (tree decl, int reloc 
ATTRIBUTE_UNUSED,
   if (DECL_SECTION_NAME (decl) == NULL
   && targetm_common.have_named_sections
   && (flag_function_or_data_sections
- || (SUPPORTS_SHF_GNU_RETAIN && DECL_PRESERVE_P (decl))
+ || (flag_gnu_retain && DECL_PRESERVE_P (decl))
  || DECL_COMDAT_GROUP 

[PATCH] c++: Fix CTAD from single-element initializer list [PR99103]

2021-02-15 Thread Patrick Palka via Gcc-patches
When determining whether to rule out initializer-list constructors
during CTAD with a single-element initializer list, the element type's
cv-qualifiers should be irrelevant.  This patch fixes this by making
is_spec_or_derived strip cv-qualifiers from the supplied expression
type.

In passing, I noticed in maybe_aggr_guide we were calling
is_spec_or_derived with swapped arguments.  This led us to prefer the
aggregate deduction candidate over copying deduction in the second
testcase below with -std=c++20.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

PR c++/99103
* pt.c (is_spec_or_derived): Drop cv-qualifiers from 'etype'.
(maybe_aggr_guide): Fix order of arguments to
is_spec_or_derived.

gcc/testsuite/ChangeLog:

PR c++/99103
* g++.dg/cpp1z/class-deduction79.C: New test.
* g++.dg/cpp1z/class-deduction80.C: New test.
---
 gcc/cp/pt.c|  3 ++-
 gcc/testsuite/g++.dg/cpp1z/class-deduction79.C | 10 ++
 gcc/testsuite/g++.dg/cpp1z/class-deduction80.C |  9 +
 3 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction79.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction80.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 5102bf02d0f..1acb5b3a097 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -28829,6 +28829,7 @@ is_spec_or_derived (tree etype, tree tmpl)
   if (!etype || !CLASS_TYPE_P (etype))
 return false;
 
+  etype = cv_unqualified (etype);
   tree type = TREE_TYPE (tmpl);
   tree tparms = (INNERMOST_TEMPLATE_PARMS
 (DECL_TEMPLATE_PARMS (tmpl)));
@@ -28859,7 +28860,7 @@ maybe_aggr_guide (tree tmpl, tree init, vec 
*args)
   if (args->length() == 1)
 {
   tree val = (*args)[0];
-  if (is_spec_or_derived (tmpl, TREE_TYPE (val)))
+  if (is_spec_or_derived (TREE_TYPE (val), tmpl))
return NULL_TREE;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction79.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction79.C
new file mode 100644
index 000..ad0ba9bb8f5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction79.C
@@ -0,0 +1,10 @@
+// PR c++/99103
+// { dg-do compile { target c++17 } }
+#include 
+
+template 
+struct S { S(std::initializer_list); };
+
+extern const S x;
+using T = decltype(S{x});
+using T = S; // not S>
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction80.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction80.C
new file mode 100644
index 000..3dd7cb5890b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction80.C
@@ -0,0 +1,9 @@
+// PR c++/99103
+// { dg-do compile { target c++17 } }
+
+template  struct X { T a; };
+template  struct Y : X {};
+
+extern const Y y;
+using T = decltype(X{y});
+using T = X; // not X>
-- 
2.30.1.489.g328c109303



Re: [PATCH] aarch64: Run SUBTARGET_INIT_BUILTINS if it exists

2021-02-15 Thread Richard Sandiford via Gcc-patches
Maya Rashish via Gcc-patches  writes:
> Some subtargets don't provide the canonical function names as
> the symbol name in C libraries, and libcalls will only work if
> the builtins are patched to emit the correct library name.
>
> For example, on NetBSD, cabsl has the symbol name __c99_cabsl,
> and the patching is done via netbsd_patch_builtin.
>
> With this change, libgfortran.so is correctly built with a
> reference to __c99_cabsl, instead of "cabsl" which is not defined.
>
> gcc/ChangeLog:
>   * config/aarch64/aarch64.c (aarch64_init_builtins):
>   Call SUBTARGET_INIT_BUILTINS.

Pushed to trunk, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 146ed8c1b69..6fda6bca2a5 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13492,6 +13492,9 @@ aarch64_init_builtins ()
>  {
>aarch64_general_init_builtins ();
>aarch64_sve::init_builtins ();
> +#ifdef SUBTARGET_INIT_BUILTINS
> +  SUBTARGET_INIT_BUILTINS;
> +#endif
>  }
>  
>  /* Implement TARGET_FOLD_BUILTIN.  */


[PATCH] aarch64: Run SUBTARGET_INIT_BUILTINS if it exists

2021-02-15 Thread Maya Rashish via Gcc-patches
Some subtargets don't provide the canonical function names as
the symbol name in C libraries, and libcalls will only work if
the builtins are patched to emit the correct library name.

For example, on NetBSD, cabsl has the symbol name __c99_cabsl,
and the patching is done via netbsd_patch_builtin.

With this change, libgfortran.so is correctly built with a
reference to __c99_cabsl, instead of "cabsl" which is not defined.

gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch64_init_builtins):
Call SUBTARGET_INIT_BUILTINS.
---
 gcc/config/aarch64/aarch64.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 146ed8c1b69..6fda6bca2a5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13492,6 +13492,9 @@ aarch64_init_builtins ()
 {
   aarch64_general_init_builtins ();
   aarch64_sve::init_builtins ();
+#ifdef SUBTARGET_INIT_BUILTINS
+  SUBTARGET_INIT_BUILTINS;
+#endif
 }
 
 /* Implement TARGET_FOLD_BUILTIN.  */
-- 
2.28.0



Re: rtl-optimization: Fix uninitialized use of opaque mode variable ICE [PR98872]

2021-02-15 Thread Peter Bergner via Gcc-patches
On 2/15/21 6:25 AM, Richard Sandiford wrote:
> Peter Bergner  writes:
>> 2021-02-12  Peter Bergner  
>>
>> gcc/
>>  PR rtl-optimization/98872
>>  * init-regs.c (initialize_uninitialized_regs): Skip initialization
>>  if CONST0_RTX is NULL.
>>
>> gcc/testsuite/
>>  PR rtl-optimization/98872
>>  * gcc.target/powerpc/pr98872.c: New test.
> 
> OK, thanks.

Ok, patch pushed to mainline.  Thanks!

Peter



[PATCH] sparc: Run SUBTARGET_INIT_BUILTINS if it exists

2021-02-15 Thread Maya Rashish via Gcc-patches
Some subtargets don't provide the canonical function names as
the symbol name in C libraries, and libcalls will only work if
the builtins are patched to emit the correct library name.

For example, on NetBSD, cabsl has the symbol name __c99_cabsl,
and the patching is done via netbsd_patch_builtin.

With this change, libgfortran.so is correctly built with a
reference to __c99_cabsl, instead of "cabsl" which is not defined.

gcc/ChangeLog:
* config/sparc/sparc.c (sparc_init_builtins):
Call SUBTARGET_INIT_BUILTINS.
---
 gcc/config/sparc/sparc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index f3557936114..ce75e1f8cc0 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -10962,6 +10962,10 @@ sparc_init_builtins (void)
 
   if (TARGET_VIS)
 sparc_vis_init_builtins ();
+
+#ifdef SUBTARGET_INIT_BUILTINS
+  SUBTARGET_INIT_BUILTINS;
+#endif
 }
 
 /* Create builtin functions for FPU instructions.  */
-- 
2.28.0



[committed] libstdc++: Fix __thread_yield for non-gthreads targets

2021-02-15 Thread Jonathan Wakely via Gcc-patches
The __gthread_yield() function is only defined for gthreads targets, so
check _GLIBCXX_HAS_GTHREADS before using it.

Also reorder __thread_relax and __thread_yield so that the former can
use the latter instead of repeating the same preprocessor checks.

libstdc++-v3/ChangeLog:

* include/bits/atomic_wait.h (__thread_yield()): Check
_GLIBCXX_HAS_GTHREADS before using __gthread_yield.
(__thread_relax()): Use __thread_yield() instead of repeating
the preprocessor checks for __gthread_yield.

Tested x86_64-linux. Committed to trunk.

commit cc9a0a3d79d6abb08753a819c9ea21a25015e962
Author: Jonathan Wakely 
Date:   Mon Feb 15 15:35:55 2021

libstdc++: Fix __thread_yield for non-gthreads targets

The __gthread_yield() function is only defined for gthreads targets, so
check _GLIBCXX_HAS_GTHREADS before using it.

Also reorder __thread_relax and __thread_yield so that the former can
use the latter instead of repeating the same preprocessor checks.

libstdc++-v3/ChangeLog:

* include/bits/atomic_wait.h (__thread_yield()): Check
_GLIBCXX_HAS_GTHREADS before using __gthread_yield.
(__thread_relax()): Use __thread_yield() instead of repeating
the preprocessor checks for __gthread_yield.

diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h
index 1a0f0943ebd..37085ae8e50 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -213,24 +213,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { _M_w._M_do_wait(_M_version); }
 };
 
-inline void
-__thread_relax() noexcept
-{
-#if defined __i386__ || defined __x86_64__
-  __builtin_ia32_pause();
-#elif defined _GLIBCXX_USE_SCHED_YIELD
-  __gthread_yield();
-#endif
-}
-
 inline void
 __thread_yield() noexcept
 {
-#if defined _GLIBCXX_USE_SCHED_YIELD
+#if defined _GLIBCXX_HAS_GTHREADS && defined _GLIBCXX_USE_SCHED_YIELD
  __gthread_yield();
 #endif
 }
 
+inline void
+__thread_relax() noexcept
+{
+#if defined __i386__ || defined __x86_64__
+  __builtin_ia32_pause();
+#else
+  __gthread_yield();
+#endif
+}
   } // namespace __detail
 
   template


Re: [PATCH][AArch64] Leveraging the use of STP instruction for vec_duplicate

2021-02-15 Thread Richard Sandiford via Gcc-patches
Hi Victor,

Thanks for the patch.  I have a couple of very minor comments below,
but otherwise it looks good to go.  However, it will need to wait for
stage 1 to open, unless it fixes a regression.

Victor Do Nascimento via Gcc-patches  writes:
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 68baf416045..4623cbb95f4 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -205,6 +205,16 @@
>[(set_attr "type" "neon_stp")]
>  )
>  
> +(define_insn "aarch64_simd_stp"
> +[(set (match_operand:VP_2E 0 "aarch64_mem_pair_operand" "=Ump,Ump")
> +  (vec_duplicate:VP_2E (match_operand: 1 "register_operand" 
> "w,r")))]
> +"TARGET_SIMD"
> +"@
> + stp\\t%1, %1, %z0
> + stp\\t%1, %1, %z0"
> +[(set_attr "type" "neon_stp, store_")]

Minor formatting nit, but: these patterns are generally indented
by 2 spaces rather than 4 (at least in config/aarch64).  Also,
it would be good if the (vec_duplicate:…) lined up with the
(match_operand:…)

I think the type for the first alternative should be neon_stp
rather than neon_stp.  The instruction only ever stores S or D
registers, whereas neon_stp_q is for storing Q registers.

Thanks,
Richard


[committed] libstdc++: Add missing return and use reserved name

2021-02-15 Thread Jonathan Wakely via Gcc-patches
The once_flag::_M_activate() function is only ever called immediately
after a call to once_flag::_M_passive(), and so in the non-gthreads case
it is impossible for _M_passive() to be true in the body of
_M_activate(). Add a check for it anyway, to avoid warnings about
missing return.

Also replace a non-reserved name with a reserved one.

libstdc++-v3/ChangeLog:

* include/std/mutex (once_flag::_M_activate()): Add explicit
return statement for passive case.
(once_flag::_M_finish(bool)): Use reserved name for parameter.

Tested x86_64-linux. Committed to trunk.

commit d27153f038c2f39ed1b7e6ba9dab59f88b8ca245
Author: Jonathan Wakely 
Date:   Mon Feb 15 14:00:36 2021

libstdc++: Add missing return and use reserved name

The once_flag::_M_activate() function is only ever called immediately
after a call to once_flag::_M_passive(), and so in the non-gthreads case
it is impossible for _M_passive() to be true in the body of
_M_activate(). Add a check for it anyway, to avoid warnings about
missing return.

Also replace a non-reserved name with a reserved one.

libstdc++-v3/ChangeLog:

* include/std/mutex (once_flag::_M_activate()): Add explicit
return statement for passive case.
(once_flag::_M_finish(bool)): Use reserved name for parameter.

diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index 25e580cf0fc..f96c48e88ec 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -706,6 +706,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 bool _M_activate();
 
 // Must be called to complete an active execution.
+// The argument is true if the active execution was a returning execution,
+// false if it was an exceptional execution.
 void _M_finish(bool __returning) noexcept;
 
 // RAII helper to call _M_finish.
@@ -742,18 +744,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   inline bool
   once_flag::_M_activate()
   {
-if (_M_once == _Bits::_Init)
+if (_M_once == _Bits::_Init) [[__likely__]]
   {
_M_once = _Bits::_Active;
return true;
   }
-else if (!_M_passive())
+else if (_M_passive()) // Caller should have checked this already.
+  return false;
+else
   __throw_system_error(EDEADLK);
   }
 
   inline void
-  once_flag::_M_finish(bool returning) noexcept
-  { _M_once = returning ? _Bits::_Done : _Bits::_Init; }
+  once_flag::_M_finish(bool __returning) noexcept
+  { _M_once = __returning ? _Bits::_Done : _Bits::_Init; }
 
 #elif defined _GLIBCXX_HAVE_LINUX_FUTEX
 


Re: [PATCH 2/2] sparc: Run SUBTARGET_INIT_BUILTINS if it exists

2021-02-15 Thread Eric Botcazou
> Some subtargets don't provide the canonical function names as
> the symbol name in C libraries, and libcalls will only work if
> the builtins are patched to emit the correct library name.
> 
> For example, on NetBSD, cabsl has the symbol name __c99_cabsl,
> and the patching is done via netbsd_patch_builtin.
> 
> With this change, libgfortran.so is correctly built with a
> reference to __c99_cabsl, instead of "cabsl" which is not defined.

The change is OK modulo a couple of nits:

> gcc/ChangeLog:
> * config/sparc/sparc.c
>   (sparc_init_builtins): Call SUBTARGET_INIT_BUILTINS.

The name of the function on the first line.

> ---
>  gcc/config/sparc/sparc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
> index f3557936114..fe6475f2520 100644
> --- a/gcc/config/sparc/sparc.c
> +++ b/gcc/config/sparc/sparc.c
> @@ -10962,6 +10962,9 @@ sparc_init_builtins (void)
> 
>if (TARGET_VIS)
>  sparc_vis_init_builtins ();
> +#ifdef SUBTARGET_INIT_BUILTINS
> +  SUBTARGET_INIT_BUILTINS;
> +#endif
>  }

Missing blank line before the change.

-- 
Eric Botcazou




Re: [PATCH][pushed] Add 2 missing Param keywords.

2021-02-15 Thread Richard Biener via Gcc-patches
On Mon, Feb 15, 2021 at 3:09 PM Martin Liška  wrote:
>
> Pushed as obvious.

Oops.  Can we somehow make this more magic?

>
> Martin
>
> gcc/ChangeLog:
>
> * params.opt: Add 2 missing Param keywords.
> ---
>   gcc/params.opt | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/params.opt b/gcc/params.opt
> index c633648d047..a4e1ac0e88e 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -682,11 +682,11 @@ Common Joined UInteger Var(param_max_stores_to_sink) 
> Init(2) Param Optimization
>   Maximum number of conditional store pairs that can be sunk.
>
>   -param=max-store-chains-to-track=
> -Common Joined UInteger Var(param_max_store_chains_to_track) Init(64) 
> IntegerRange(1, 65536)
> +Common Joined UInteger Var(param_max_store_chains_to_track) Init(64) 
> IntegerRange(1, 65536) Param
>   Maximum number of store chains to track at the same time in the store 
> merging pass.
>
>   -param=max-stores-to-track=
> -Common Joined UInteger Var(param_max_stores_to_track) Init(1024) 
> IntegerRange(2, 1048576)
> +Common Joined UInteger Var(param_max_stores_to_track) Init(1024) 
> IntegerRange(2, 1048576) Param
>   Maximum number of store chains to track at the same time in the store 
> merging pass.
>
>   -param=max-tail-merge-comparisons=
> --
> 2.30.0
>


Re: [PATCH] Fix producer string memory leaks

2021-02-15 Thread Richard Biener via Gcc-patches
On Mon, Feb 15, 2021 at 2:46 PM Martin Liška  wrote:
>
> On 2/12/21 5:56 PM, Martin Sebor wrote:
> > On 2/12/21 2:09 AM, Richard Biener via Gcc-patches wrote:
> >> On Thu, Feb 11, 2021 at 6:41 PM Martin Liška  wrote:
> >>>
> >>> Hello.
> >>>
> >>> This fixes 2 memory leaks I noticed.
> >>>
> >>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>>
> >>> Ready to be installed?
> >>
> >> OK.
> >>
> >>> Thanks,
> >>> Martin
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>  * opts-common.c (decode_cmdline_option): Release werror_arg.
> >>>  * opts.c (gen_producer_string): Release output of
> >>>  gen_command_line_string.
> >>> ---
> >>>gcc/opts-common.c | 1 +
> >>>gcc/opts.c| 7 +--
> >>>2 files changed, 6 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/gcc/opts-common.c b/gcc/opts-common.c
> >>> index 6cb5602896d..5e10edeb4cf 100644
> >>> --- a/gcc/opts-common.c
> >>> +++ b/gcc/opts-common.c
> >>> @@ -766,6 +766,7 @@ decode_cmdline_option (const char *const *argv, 
> >>> unsigned int lang_mask,
> >>>  werror_arg[0] = 'W';
> >>>
> >>>  size_t warning_index = find_opt (werror_arg, lang_mask);
> >>> +  free (werror_arg);
> >
> > Sorry to butt in here, but since we're having a discussion on this
> > same subject in another review of a fix for a memory leak, I thought
> > I'd pipe up: I would suggest a better (more in line with C++ and more
> > future-proof) solution is to replace the call to xstrdup (and the need
> > to subsequently call free) with std::string.
>
> Hello.
>
> To be honest, I like the suggested approach using std::string. On the other 
> hand,
> I don't want to mix existing C API (char *) with std::string.

That's one of the main problems.

> I'm sending a patch that fixed the remaining leaks.

OK.

> >
> >>>  if (warning_index != OPT_SPECIAL_unknown)
> >>>  {
> >>>const struct cl_option *warning_option
> >>> diff --git a/gcc/opts.c b/gcc/opts.c
> >>> index fc5f61e13cc..24bb64198c8 100644
> >>> --- a/gcc/opts.c
> >>> +++ b/gcc/opts.c
> >>> @@ -3401,8 +3401,11 @@ char *
> >>>gen_producer_string (const char *language_string, cl_decoded_option 
> >>> *options,
> >>>   unsigned int options_count)
> >>>{
> >>> -  return concat (language_string, " ", version_string, " ",
> >>> -gen_command_line_string (options, options_count), NULL);
> >>> +  char *cmdline = gen_command_line_string (options, options_count);
> >>> +  char *combined = concat (language_string, " ", version_string, " ",
> >>> +  cmdline, NULL);
> >>> +  free (cmdline);
> >>> +  return combined;
> >>>}
> >
> > Here, changing gen_command_line_string() to return std::string instead
> > of a raw pointer would similarly avoid having to remember to free
> > the pointer (and forgetting).  The function has two other callers,
> > both in gcc/toplev.c, and both also leak the memory for the same
> > reason as here.
>
> Btw. have we made a general consensus that using std::string is fine in the
> GCC internals?

No, we didn't.  But if Martin likes RAII adding sth like

template 
class auto_free
{
   auto_free (T *ptr) : m_ptr (ptr) {};
  ~auto_free () { m_ptr->~T (); free (m_ptr); }
  T  *m_ptr;
};

with appropriate move CTORs to support returning this
should be straight-forward.  You should then be able to
change the return type from char * to auto_free or so.

But then there's the issue of introducing lifetime bugs because
you definitely need to have the pointer escape at points like
the printf ...

Richard.

> Martin
>
> >
> > Martin
> >
> >>>
> >>>#if CHECKING_P
> >>> --
> >>> 2.30.0
> >>>
> >
>


[PATCH][pushed] Add 2 missing Param keywords.

2021-02-15 Thread Martin Liška

Pushed as obvious.

Martin

gcc/ChangeLog:

* params.opt: Add 2 missing Param keywords.
---
 gcc/params.opt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/params.opt b/gcc/params.opt
index c633648d047..a4e1ac0e88e 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -682,11 +682,11 @@ Common Joined UInteger Var(param_max_stores_to_sink) 
Init(2) Param Optimization
 Maximum number of conditional store pairs that can be sunk.
 
 -param=max-store-chains-to-track=

-Common Joined UInteger Var(param_max_store_chains_to_track) Init(64) 
IntegerRange(1, 65536)
+Common Joined UInteger Var(param_max_store_chains_to_track) Init(64) 
IntegerRange(1, 65536) Param
 Maximum number of store chains to track at the same time in the store merging 
pass.
 
 -param=max-stores-to-track=

-Common Joined UInteger Var(param_max_stores_to_track) Init(1024) 
IntegerRange(2, 1048576)
+Common Joined UInteger Var(param_max_stores_to_track) Init(1024) 
IntegerRange(2, 1048576) Param
 Maximum number of store chains to track at the same time in the store merging 
pass.
 
 -param=max-tail-merge-comparisons=

--
2.30.0



Re: [PATCH] Fix producer string memory leaks

2021-02-15 Thread Martin Liška

On 2/12/21 5:56 PM, Martin Sebor wrote:

On 2/12/21 2:09 AM, Richard Biener via Gcc-patches wrote:

On Thu, Feb 11, 2021 at 6:41 PM Martin Liška  wrote:


Hello.

This fixes 2 memory leaks I noticed.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


OK.


Thanks,
Martin

gcc/ChangeLog:

 * opts-common.c (decode_cmdline_option): Release werror_arg.
 * opts.c (gen_producer_string): Release output of
 gen_command_line_string.
---
   gcc/opts-common.c | 1 +
   gcc/opts.c    | 7 +--
   2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/opts-common.c b/gcc/opts-common.c
index 6cb5602896d..5e10edeb4cf 100644
--- a/gcc/opts-common.c
+++ b/gcc/opts-common.c
@@ -766,6 +766,7 @@ decode_cmdline_option (const char *const *argv, unsigned 
int lang_mask,
 werror_arg[0] = 'W';

 size_t warning_index = find_opt (werror_arg, lang_mask);
+  free (werror_arg);


Sorry to butt in here, but since we're having a discussion on this
same subject in another review of a fix for a memory leak, I thought
I'd pipe up: I would suggest a better (more in line with C++ and more
future-proof) solution is to replace the call to xstrdup (and the need
to subsequently call free) with std::string.


Hello.

To be honest, I like the suggested approach using std::string. On the other 
hand,
I don't want to mix existing C API (char *) with std::string.

I'm sending a patch that fixed the remaining leaks.




 if (warning_index != OPT_SPECIAL_unknown)
 {
   const struct cl_option *warning_option
diff --git a/gcc/opts.c b/gcc/opts.c
index fc5f61e13cc..24bb64198c8 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -3401,8 +3401,11 @@ char *
   gen_producer_string (const char *language_string, cl_decoded_option *options,
  unsigned int options_count)
   {
-  return concat (language_string, " ", version_string, " ",
-    gen_command_line_string (options, options_count), NULL);
+  char *cmdline = gen_command_line_string (options, options_count);
+  char *combined = concat (language_string, " ", version_string, " ",
+  cmdline, NULL);
+  free (cmdline);
+  return combined;
   }


Here, changing gen_command_line_string() to return std::string instead
of a raw pointer would similarly avoid having to remember to free
the pointer (and forgetting).  The function has two other callers,
both in gcc/toplev.c, and both also leak the memory for the same
reason as here.


Btw. have we made a general consensus that using std::string is fine in the
GCC internals?

Martin



Martin



   #if CHECKING_P
--
2.30.0





>From 8c3c6812cb094888696302001c114dc11cfa2694 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 15 Feb 2021 11:28:19 +0100
Subject: [PATCH] Fix 2 more leaks related to gen_command_line_string.

gcc/ChangeLog:

	* toplev.c (init_asm_output): Free output of
	gen_command_line_string function.
	(process_options): Likewise.
---
 gcc/toplev.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/toplev.c b/gcc/toplev.c
index 05bd449eafc..d8cc254adef 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -748,9 +748,10 @@ init_asm_output (const char *name)
 	  print_version (asm_out_file, ASM_COMMENT_START, true);
 	  fputs (ASM_COMMENT_START, asm_out_file);
 	  fputs (" options passed: ", asm_out_file);
-	  fputs (gen_command_line_string (save_decoded_options,
-	  save_decoded_options_count),
-		 asm_out_file);
+	  char *cmdline = gen_command_line_string (save_decoded_options,
+		   save_decoded_options_count);
+	  fputs (cmdline, asm_out_file);
+	  free (cmdline);
 	  fputc ('\n', asm_out_file);
 	}
 }
@@ -1384,8 +1385,11 @@ process_options (void)
   if (!quiet_flag)
 	{
 	  fputs ("options passed: ", stderr);
-	  fputs (gen_command_line_string (save_decoded_options,
-	  save_decoded_options_count), stderr);
+	  char *cmdline = gen_command_line_string (save_decoded_options,
+		   save_decoded_options_count);
+
+	  fputs (cmdline, stderr);
+	  free (cmdline);
 	  fputc ('\n', stderr);
 	}
 }
-- 
2.30.0



[PATCH] PING^2 Add input_modes parameter to TARGET_MD_ASM_ADJUST hook

2021-02-15 Thread Ilya Leoshkevich via Gcc-patches
Hello,

I would like to ping the following patch:

Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html

It is needed for the following regression fix:

IBM Z: Fix usage of "f" constraint with long doubles
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html

Best regards,
Ilya



Re: [PATCH] c++: Private parent access check for using decls [PR19377]

2021-02-15 Thread Anthony Sharp via Gcc-patches
Scan the fields of BINFO for an exact match of DECL.  If found, return
   DECL.  Otherwise, return NULL_TREE.  DECL is really the type "tree". */

Should say

Scan the fields of BINFO for an exact match of DECL.  If found, return
   the field.  Otherwise, return NULL_TREE.  DECL is really the type "tree". */

On Mon, 15 Feb 2021 at 12:45, Anthony Sharp  wrote:
>
> Hi all,
>
> This overloaded functions issue has been a pain in the ass but I have
> found a solution that seems to work. I could really do with some
> feedback on it though since I'm not sure if there's a better way to do
> it that I am missing.
>
> Here's the problem. When a using statement causes an access failure,
> we need to find out WHICH using statement caused the access failure
> (this information is not given to enforce_access; it merely gets the
> ORIGINAL decl [i.e. the one that the using statement inherited]). To
> make matters worse, a USING_DECL does not seem to store any
> information about where it "came from", e.g. there's no ORIGINAL_DECL
> (USING_DECL) macro or anything like that. DECL_INITIAL doesn't seem to
> help (and is probably totally not related to the issue).
>
> So we need to do a long-winded lookup.
>
> First, we iterate through the USING_DECLs in the class where access
> failed (in the context of a parent that has caused the access failure
> because it has private access). For normal field variables, we COULD
> simply do a name lookup on the USING_DECL and compare it to the
> original DECL. That would be easy, since there can only be one
> variable field with the same name.
>
> However, that doesn't work for overloaded functions. Name lookup would
> return multiple results, making comparison meaningless.
>
> What we need is therefore not a name lookup, but a decl lookup. It
> sounds stupid, because what that basically means is looking for an
> exact match of a decl, which is intuitively stupid, because why would
> you look for an exact match of something you already have? But
> actually we can take advantage of a quirk that USING_DECLs have, which
> is that, when stripped, cp_tree_equal (stripped_using_decl,
> original_decl) returns true, even through stripped_using_decl and
> original_decl exist on different lines and in different places. In
> other words, a USING_DECL is the same as the original DECL it
> inherits, even though they are in different places (Unless I am just
> being really dumb?).
>
> Anyways, to summarise ...
>
> 1) We iterate through USING_DECLs in the class that caused a private
> access failure.
> 2) For each USING_DECL, we find the DECL that USING_DECL inherited.
>   2.1) To do this, we iterate through all fields of all base classes
> (possibly slow? but it is just diagnostics code afterall,
>  so idk if that's a big deal)
>   2.2)  We compare each FIELD against the USING_DECL. if equal, then
> we return FIELD.
> 3) if the DECL that USING_DECL inherited is equal to the diagnostic
> decl we were given in enforce_access, we return USING_DECL as being
> the source of the problem.
>
> In a perfect world, I guess the USING_DECL would store somewhere what
> it inherited as it was parsed. But I'm not sure if that's practical to
> do and I'm not sure it would be worth the effort considering it would
> probably be used only for this niche scenario. Donno.
>
> I have not regression tested this, but it does seem to work on the
> test case at least (which I've also included). Also please ignore
> formatting issues ATM.
>
> If you think this is a reasonable way to do it then I will tidy up the
> code, test it and make a patch and send it over. If anyone has any
> better ideas (probably), then please let me know. I did try the name
> lookup as Jason suggested but, as I say, in the case of overloaded
> functions it returns multiple values, so it didn't help in determining
> what DECL a USING_DECL inherited.
>
> BTW, I will eventually put find_decl_using_decl_inherits and
> lookup_decl_exact_match in search.c.
>
> Just for proof, here's the output from the testcase (hopefully it
> formats this correctly):
>
> /home/anthony/Desktop/using9.C: In member function ‘void C::f()’:
> /home/anthony/Desktop/using9.C:34:11: error: ‘int A1::comma()’ is
> private within this context
>34 | comma();
>   |   ^
> /home/anthony/Desktop/using9.C:22:24: note: declared private here
>22 |   using A2::comma, A1::comma;  // { dg-message "declared" }
>   |   ^
> /home/anthony/Desktop/using9.C:35:14: error: ‘int A1::separate()’ is
> private within this context
>35 | separate(); // { dg-error "private" }
>   | ^
> /home/anthony/Desktop/using9.C:25:13: note: declared private here
>25 |   using A1::separate; // { dg-message "declared" }
>   |  ^~~~
> /home/anthony/Desktop/using9.C:36:5: error: ‘int A2::alone’ is private
> within this context
>36 | alone = 5; // { dg-error "private" }
> 

Re: [PATCH] c++: Private parent access check for using decls [PR19377]

2021-02-15 Thread Anthony Sharp via Gcc-patches
Hi all,

This overloaded functions issue has been a pain in the ass but I have
found a solution that seems to work. I could really do with some
feedback on it though since I'm not sure if there's a better way to do
it that I am missing.

Here's the problem. When a using statement causes an access failure,
we need to find out WHICH using statement caused the access failure
(this information is not given to enforce_access; it merely gets the
ORIGINAL decl [i.e. the one that the using statement inherited]). To
make matters worse, a USING_DECL does not seem to store any
information about where it "came from", e.g. there's no ORIGINAL_DECL
(USING_DECL) macro or anything like that. DECL_INITIAL doesn't seem to
help (and is probably totally not related to the issue).

So we need to do a long-winded lookup.

First, we iterate through the USING_DECLs in the class where access
failed (in the context of a parent that has caused the access failure
because it has private access). For normal field variables, we COULD
simply do a name lookup on the USING_DECL and compare it to the
original DECL. That would be easy, since there can only be one
variable field with the same name.

However, that doesn't work for overloaded functions. Name lookup would
return multiple results, making comparison meaningless.

What we need is therefore not a name lookup, but a decl lookup. It
sounds stupid, because what that basically means is looking for an
exact match of a decl, which is intuitively stupid, because why would
you look for an exact match of something you already have? But
actually we can take advantage of a quirk that USING_DECLs have, which
is that, when stripped, cp_tree_equal (stripped_using_decl,
original_decl) returns true, even through stripped_using_decl and
original_decl exist on different lines and in different places. In
other words, a USING_DECL is the same as the original DECL it
inherits, even though they are in different places (Unless I am just
being really dumb?).

Anyways, to summarise ...

1) We iterate through USING_DECLs in the class that caused a private
access failure.
2) For each USING_DECL, we find the DECL that USING_DECL inherited.
  2.1) To do this, we iterate through all fields of all base classes
(possibly slow? but it is just diagnostics code afterall,
 so idk if that's a big deal)
  2.2)  We compare each FIELD against the USING_DECL. if equal, then
we return FIELD.
3) if the DECL that USING_DECL inherited is equal to the diagnostic
decl we were given in enforce_access, we return USING_DECL as being
the source of the problem.

In a perfect world, I guess the USING_DECL would store somewhere what
it inherited as it was parsed. But I'm not sure if that's practical to
do and I'm not sure it would be worth the effort considering it would
probably be used only for this niche scenario. Donno.

I have not regression tested this, but it does seem to work on the
test case at least (which I've also included). Also please ignore
formatting issues ATM.

If you think this is a reasonable way to do it then I will tidy up the
code, test it and make a patch and send it over. If anyone has any
better ideas (probably), then please let me know. I did try the name
lookup as Jason suggested but, as I say, in the case of overloaded
functions it returns multiple values, so it didn't help in determining
what DECL a USING_DECL inherited.

BTW, I will eventually put find_decl_using_decl_inherits and
lookup_decl_exact_match in search.c.

Just for proof, here's the output from the testcase (hopefully it
formats this correctly):

/home/anthony/Desktop/using9.C: In member function ‘void C::f()’:
/home/anthony/Desktop/using9.C:34:11: error: ‘int A1::comma()’ is
private within this context
   34 | comma();
  |   ^
/home/anthony/Desktop/using9.C:22:24: note: declared private here
   22 |   using A2::comma, A1::comma;  // { dg-message "declared" }
  |   ^
/home/anthony/Desktop/using9.C:35:14: error: ‘int A1::separate()’ is
private within this context
   35 | separate(); // { dg-error "private" }
  | ^
/home/anthony/Desktop/using9.C:25:13: note: declared private here
   25 |   using A1::separate; // { dg-message "declared" }
  |  ^~~~
/home/anthony/Desktop/using9.C:36:5: error: ‘int A2::alone’ is private
within this context
   36 | alone = 5; // { dg-error "private" }
  |   ^
/home/anthony/Desktop/using9.C:27:13: note: declared private here
   27 |   using A2::alone;
  |^

Actual code attached.

Anthony


On Tue, 9 Feb 2021 at 20:07, Jason Merrill  wrote:
>
> On 2/9/21 6:18 AM, Anthony Sharp wrote:
> > The parens are to help the editor line up the last line properly.  If
> > the subexpression fits on one line, the parens aren't needed.
> >
> >
> > A okay.
> >
> > No; "compile" means translate from C++ to assembly, "assemble" means
> > that, 

Re: [PATCH] df: Record all definitions in DF_LR_BB_INFO->def [PR98863]

2021-02-15 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Thu, Feb 11, 2021 at 03:03:38PM +, Richard Sandiford via Gcc-patches 
> wrote:
>> gcc/
>>  * df-problems.c (df_lr_bb_local_compute): Treat partial definitions
>>  as read-modify operations.
>> 
>> gcc/testsuite/
>>  * gcc.dg/rtl/aarch64/multi-subreg-1.c: New test.
>
> The test fails everywhere but on aarch64.
>
> Fixed thusly, tested on x86_64-linux -m32/-m64, committed to trunk as
> obvious:

Gah, sorry for forgetting to check that.  Thanks for the fix.

Richard

>
> 2021-02-13  Jakub Jelinek  
>
>   * gcc.dg/rtl/aarch64/multi-subreg-1.c: Add dg-do compile directive
>   and restrict the test to aarch64-*-* target only.
>
> --- gcc/testsuite/gcc.dg/rtl/aarch64/multi-subreg-1.c.jj  2021-02-12 
> 23:57:30.594140834 +0100
> +++ gcc/testsuite/gcc.dg/rtl/aarch64/multi-subreg-1.c 2021-02-13 
> 00:01:04.935749889 +0100
> @@ -1,3 +1,4 @@
> +/* { dg-do compile { target aarch64-*-* } } */
>  /* { dg-additional-options "-O -fdump-rtl-cse1-all" } */
>  
>  __int128 __RTL (startwith ("vregs")) foo (void)
>
>
>   Jakub


Re: [PATCH 1/2] aarch64: Run SUBTARGET_INIT_BUILTINS if it exists

2021-02-15 Thread Richard Sandiford via Gcc-patches
Maya Rashish via Gcc-patches  writes:
> Some subtargets don't provide the canonical function names as
> the symbol name in C libraries, and libcalls will only work if
> the builtins are patched to emit the correct library name.
>
> For example, on NetBSD, cabsl has the symbol name __c99_cabsl,
> and the patching is done via netbsd_patch_builtin.
>
> With this change, libgfortran.so is correctly built with a
> reference to __c99_cabsl, instead of "cabsl" which is not defined.
>
> gcc/ChangeLog:
>   * config/aarch64/aarch64-builtins.c
> (aarch64_general_init_builtins): Call SUBTARGET_INIT_BUILTINS.

I think it might be better to do this at the end of the parent function
aarch64_init_builtins instead, if that works.  Although no SVE built-in
functions currently have linkage, the current structure allows more
functions with linkage to be added outside aarch64-builtins.c.

Thanks,
Richard

> ---
>  gcc/config/aarch64/aarch64-builtins.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index 25ab866ccd4..df13c9be051 100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -1459,6 +1459,10 @@ aarch64_general_init_builtins (void)
>  
>if (TARGET_MEMTAG)
>  aarch64_init_memtag_builtins ();
> +
> +#ifdef SUBTARGET_INIT_BUILTINS
> +  SUBTARGET_INIT_BUILTINS;
> +#endif
>  }
>  
>  /* Implement TARGET_BUILTIN_DECL for the AARCH64_BUILTIN_GENERAL group.  */


Re: rtl-optimization: Fix uninitialized use of opaque mode variable ICE [PR98872]

2021-02-15 Thread Richard Sandiford via Gcc-patches
Peter Bergner  writes:
> 2021-02-12  Peter Bergner  
>
> gcc/
>   PR rtl-optimization/98872
>   * init-regs.c (initialize_uninitialized_regs): Skip initialization
>   if CONST0_RTX is NULL.
>
> gcc/testsuite/
>   PR rtl-optimization/98872
>   * gcc.target/powerpc/pr98872.c: New test.

OK, thanks.

Richard


Re: [PATCH] rtl-ssa: Reduce the amount of temporary memory needed [PR98863]

2021-02-15 Thread Richard Biener via Gcc-patches
On Fri, Feb 12, 2021 at 6:03 PM Richard Sandiford via Gcc-patches
 wrote:
>
> The rtl-ssa code uses an on-the-side IL and needs to build that IL
> for each block and RTL insn.  I'd originally not used the classical
> dominance frontier method for placing phis on the basis that it seemed
> like more work in this context: we're having to visit everything in
> an RPO walk anyway, so for non-backedge cases we can tell immediately
> whether a phi node is needed.  We then speculatively created phis for
> registers that are live across backedges and simplified them later.
> This avoided having to walk most of the IL twice (once to build the
> initial IL, and once to link uses to phis).
>
> However, as shown in PR98863, this leads to excessive temporary
> memory in extreme cases, since we had to record the value of
> every live register on exit from every block.  In that PR,
> there were many registers that were live (but unused) across
> a large region of code.
>
> This patch does use the classical approach to placing phis, but tries
> to use the existing DF defs information to avoid two walks of the IL.
> We still use the previous approach for memory, since there is no
> up-front information to indicate whether a block defines memory or not.
> However, since memory is just treated as a single unified thing
> (like for gimple vops), memory doesn't suffer from the same
> scalability problems as registers.
>
> With this change, fwprop no longer seems to be a memory-hog outlier
> in the PR: the maximum RSS is similar with and without fwprop.
>
> The PR also shows the problems inherent in using bitmap operations
> involving the live-in and live-out sets, which in the testcase are
> very large.  I've therefore tried to reduce those operations to the
> bare minimum.
>
> The patch also includes other compile-time optimisations motivated
> by the PR; see the changelog for details.
>
> I tried adding:
>
> for (int i = 0; i < 200; ++i)
>   {
> crtl->ssa = new rtl_ssa::function_info (cfun);
> delete crtl->ssa;
>   }
>
> to fwprop.c to stress the code.  fwprop then took 35% of the compile
> time for the problematic partition in the PR (measured on a release
> build).  fwprop takes less than .5% of the compile time when running
> normally.
>
> The command:
>
>   git diff 0b76990a9d75d97b84014e37519086b81824c307~ gcc/fwprop.c | \
> patch -p1 -R
>
> still gives a working compiler that uses the old fwprop.c.  The compile
> time with that version is very similar.
>
> For a more reasonable testcase like optabs.ii at -O, I saw a 6.7%
> compile time regression with the loop above added (i.e. creating
> the info 201 times per pass instead of once per pass).  That goes
> down to 4.8% with -O -g.  I can't measure a significant difference
> with a normal compiler (no 200-iteration loop).
>
> So I think that (as expected) the patch does make things a bit
> slower in the normal case.  But like Richi says, peak memory usage
> is harder for users to work around than slighter slower compile times.
>
> The patch is going to be a nightmare to review, sorry.
>
> Tested on aarch64-linux-gnu, x86_&4-linux-gnu, armeb-eabi and
> powerpc64-linux-gnu.  OK to install?

So I've stared at this for quite a while now and I can't find obvious flaws
but details probably escape me since I'm not familiar with the RTL SSA
code otherwise.

I'd say let's go with it.

Thus OK, and thanks for the attention.

Richard.

> If this seems too risky, I wouldn't mind reverting to the old fwprop.c
> for GCC 11 and reinstating the current version for GCC 12.  However,
> I'd still like to apply some version of the patch for GCC 11 in order
> to support aarch64-cc-fusion.cc.
>
> Richard
>
>
> gcc/
> PR rtl-optimization/98863
> * rtl-ssa/functions.h (function_info::bb_live_out_info): Delete.
> (function_info::build_info): Turn into a declaration, moving the
> definition to internals.h.
> (function_info::bb_walker): Declare.
> (function_info::create_reg_use): Likewise.
> (function_info::calculate_potential_phi_regs): Take a build_info
> parameter.
> (function_info::place_phis, function_info::create_ebbs): Declare.
> (function_info::calculate_ebb_live_in_for_debug): Likewise.
> (function_info::populate_backedge_phis): Delete.
> (function_info::start_block, function_info::end_block): Declare.
> (function_info::populate_phi_inputs): Delete.
> (function_info::m_potential_phi_regs): Move information to build_info.
> * rtl-ssa/internals.h: New file.
> (function_info::bb_phi_info): New class.
> (function_info::build_info): Moved from functions.h.
> Add a constructor and destructor.
> (function_info::build_info::ebb_use): Delete.
> (function_info::build_info::ebb_def): Likewise.
> (function_info::build_info::bb_live_out): Likewise.
> 

Re: [[PATCH] 1/3] aarch64: Sanitize access to cfun in aarch64_declare_function_name()

2021-02-15 Thread Richard Earnshaw via Gcc-patches
On 09/12/2020 17:21, Christoph Müllner wrote:
> From: Christoph Muellner 
> 
> It is possible to call aarch64_declare_function_name() and
> have cfun not set. Let's sanitize the access to this variable.
> 
> gcc/
> 
> * config/aarch64/aarch64.c (aarch64_declare_function_name):
> Santize access to cfun.
> ---
>  gcc/config/aarch64/aarch64.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 67ffba02d3e..264ccb8beb2 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -19317,7 +19317,8 @@ aarch64_declare_function_name (FILE *stream, const 
> char* name,
>ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function");
>ASM_OUTPUT_LABEL (stream, name);
>  
> -  cfun->machine->label_is_assembled = true;
> +  if (cfun)
> +cfun->machine->label_is_assembled = true;
>  }
>  
>  /* Implement PRINT_PATCHABLE_FUNCTION_ENTRY.  Check if the patch area is 
> after
> 

Do you have a simple testcase that demonstrates this?  If so, we likely
have a regression here that will not only need fixing, but back-porting
as well.

R.


Fix cast in df_worklist_dataflow_doublequeue

2021-02-15 Thread Eric Botcazou
The existing cast to float gives weird results in the RTL dump files on x86 
when the compiler is configured --with-fpmath=sse.

Bootstrapped on x86/Linux, applied on the mainline as obvious.


2021-02-15  Eric Botcazou 

* df-core.c (df_worklist_dataflow_doublequeue): Use proper cast.

-- 
Eric Botcazoudiff --git a/gcc/df-core.c b/gcc/df-core.c
index 8536e5e1f51..b4407eec460 100644
--- a/gcc/df-core.c
+++ b/gcc/df-core.c
@@ -1064,7 +1064,7 @@ df_worklist_dataflow_doublequeue (struct dataflow *dataflow,
 	 " n_basic_blocks %d n_edges %d"
 	 " count %d (%5.2g)\n",
 	 n_basic_blocks_for_fn (cfun), n_edges_for_fn (cfun),
-	 dcount, dcount / (float)n_basic_blocks_for_fn (cfun));
+	 dcount, dcount / (double)n_basic_blocks_for_fn (cfun));
 }
 
 /* Worklist-based dataflow solver. It uses sbitmap as a worklist,


Re: [PATCH] match.pd: Fix up A % (cast) (pow2cst << B) simplification [PR99079]

2021-02-15 Thread Richard Biener
On Sat, 13 Feb 2021, Jakub Jelinek wrote:

> Hi!
> 
> The (mod @0 (convert?@3 (power_of_two_cand@1 @2))) simplification
> uses tree_nop_conversion_p (type, TREE_TYPE (@3)) condition, but I believe
> it doesn't check what it was meant to check.  On convert?@3
> TREE_TYPE (@3) is not the type of what it has been converted from, but
> what it has been converted to, which needs to be (because it is operand
> of normal binary operation) equal or compatible to type of the modulo
> result and first operand - type.
> I could fix that by using && tree_nop_conversion_p (type, TREE_TYPE (@1))
> and be done with it, but actually most of the non-nop conversions are IMHO
> ok and so we would regress those optimizations.
> In particular, if we have say narrowing conversions (foo5 and foo6 in
> the new testcase), I think we are fine, either the shift of the power of two
> constant after narrowing conversion is still that power of two (or negation
> of that) and then it will still work, or the result of narrowing conversion
> is 0 and then we would have UB which we can ignore.
> Similarly, widening conversions where the shift result is unsigned are fine,
> or even widening conversions where the shift result is signed, but we sign
> extend to a signed wider divisor, the problematic case of INT_MIN will
> become x % (long long) INT_MIN and we can still optimize that to
> x & (long long) INT_MAX.
> What doesn't work is the case in the pr99079.c testcase, widening conversion
> of a signed shift result to wider unsigned divisor, where if the shift
> is negative, we end up with x % (unsigned long long) INT_MIN which is
> x % 0x8000ULL where the divisor is not a power of two and
> we can't optimize that to x & 0x7fffULL.
> 
> So, the patch rejects only the single problematic case.
> 
> Furthermore, when the shift result is signed, we were introducing UB into
> a program which previously didn't have one (well, left shift into the sign
> bit is UB in some language/version pairs, but it is definitely valid in
> C++20 - wonder if I shouldn't move the gcc.c-torture/execute/pr99079.c
> testcase to g++.dg/torture/pr99079.C and use -std=c++20), by adding that
> subtraction of 1, x % (1 << 31) in C++20 is well defined, but
> x & ((1 << 31) - 1) triggers UB on the subtraction.
> So, the patch performs the subtraction in the unsigned type if it isn't
> wrapping.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2021-02-13  Jakub Jelinek  
> 
>   PR tree-optimization/99079
>   * match.pd (A % (pow2pcst << N) -> A & ((pow2pcst << N) - 1)): Remove
>   useless tree_nop_conversion_p (type, TREE_TYPE (@3)) check.  Instead
>   require both type and TREE_TYPE (@1) to be integral types and either
>   type having smaller or equal precision, or TREE_TYPE (@1) being
>   unsigned type, or type being signed type.  If TREE_TYPE (@1)
>   doesn't have wrapping overflow, perform the subtraction of one in
>   unsigned type.
> 
>   * gcc.dg/fold-modpow2-2.c: New test.
>   * gcc.c-torture/execute/pr99079.c: New test.
> 
> --- gcc/match.pd.jj   2021-01-28 16:11:30.0 +0100
> +++ gcc/match.pd  2021-02-12 20:17:26.656857183 +0100
> @@ -619,12 +619,23 @@ (define_operator_list COND_TERNARY
>  (shift @0 (bit_and @1 (minus @2 { build_int_cst (TREE_TYPE (@2),
> 1); }))
>   (simplify
> -  (mod @0 (convert?@3 (power_of_two_cand@1 @2)))
> -  (if ((TYPE_UNSIGNED (type)
> - || tree_expr_nonnegative_p (@0))
> - && tree_nop_conversion_p (type, TREE_TYPE (@3))
> - && integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0)
> -   (bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 1); 
> }))
> +  (mod @0 (convert? (power_of_two_cand@1 @2)))
> +  (if ((TYPE_UNSIGNED (type) || tree_expr_nonnegative_p (@0))
> +   /* Allow any integral conversions of the divisor, except
> +   conversion from narrower signed to wider unsigned type
> +   where if @1 would be negative power of two, the divisor
> +   would not be a power of two.  */
> +   && INTEGRAL_TYPE_P (type)
> +   && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +   && (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@1))
> +|| TYPE_UNSIGNED (TREE_TYPE (@1))
> +|| !TYPE_UNSIGNED (type))
> +   && integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0)
> +   (with { tree utype = TREE_TYPE (@1);
> +if (!TYPE_OVERFLOW_WRAPS (utype))
> +  utype = unsigned_type_for (utype); }
> +(bit_and @0 (convert (minus (convert:utype @1)
> + { build_one_cst (utype); })))
>  
>  /* Simplify (unsigned t * 2)/2 -> unsigned t & 0x7FFF.  */
>  (simplify
> --- gcc/testsuite/gcc.dg/fold-modpow2-2.c.jj  2021-02-12 19:36:45.833237766 
> +0100
> +++ gcc/testsuite/gcc.dg/fold-modpow2-2.c 2021-02-12 20:03:20.413315445 
> +0100
> @@ -0,0 +1,47 @@
> +/* PR