Re: [PATCH] middle-end/114579 - speed up add_scope_conflicts

2024-04-04 Thread Michael Matz
Hello,

On Thu, 4 Apr 2024, Richard Biener wrote:

> I have reworded the comment before the all-to-all conflict recording
> since it seemed to be confusing and missing the point - but maybe I
> am also missing something here.

The point of the comment was to explain how this differs from building a 
conflict graph for named objects without indirect accesses, i.e. how e.g. 
a conflict graph for register allocation is done.  There conflicts are 
added at the points where objects are defined (between that def and all 
currently live objects), and _only_ there.  I.e. not at CFG merge points 
and not at uses.  So something like so:

  foreach INSN in block backwards:
foreach DEF in INSN:
  foreach O in active:
add_conflict (DEF, O);
  remove (DEF from active);

That's for the more usual backwards direction, but it would be similar 
for forward one.  The point is that this is the only place where conflicts 
are added, not at CFG splits/merges.

The above relies on being able to see all places where the objects are 
written.  With indirect accesses that's not the case anymore:

   ptr = cond :  :// MENTION foo, bar
   *ptr = 42;  // no DEF of a named object
   escape (ptr);
   CLOBBER (foo)
   CLOBBER (bar)

We somewhere need to record the conflict between foo and bar.  
Conservatively we did so at the first real instruction of a block.  As 
minor optimization we also start a block in a mode where we just record 
mentions, and switch to also recording conflicts at the first real insn. 
(An empty block, or just PHIs do not cause conflicts in themself)

This need for an N-to-N conflict generation at some point is the major 
difference between building the conflict graph for named vs. 
potentionally indirect objects, and the comment tried to explain that 
from the perspective of someone familiar with conflict graph building 
in regalloc :-)

Where exactly these N-to-N conflicts are generated doesn't matter so much, 
as long as everything is there.  The current way was the most obvious one.

The observation that these N-to-N conflicts only need to be done on the 
commonly disjoint sets of the inputs (which is trivially empty for a 
single predecessor) is correct, i.e. only doing it with more than a single 
predecessor makes sense.  This requires adding the N-to-Ns always even in 
mostly emtpy blocks, for the danger of that being missed in the successor 
block, as you are doing.

So, fine with me, also with the changed comment if you think it explains 
stuff better :)


Ciao,
Michael.

> 
> Nevertheless for the testcase in the PR the compile-time spend in
> add_scope_conflicts at -O1 drops from previously 67s (39%) to 10s (9%).
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK for trunk?
> 
> Thanks,
> Richard.
> 
>   PR middle-end/114579
>   * cfgexpand.cc (add_scope_conflicts_1): Record all-to-all
>   conflicts only when there's a CFG merge but for all CFG merges.
> ---
>  gcc/cfgexpand.cc | 46 +-
>  1 file changed, 33 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index eef565eddb5..fa48a4c633f 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -640,21 +640,26 @@ add_scope_conflicts_1 (basic_block bb, bitmap work, 
> bool for_conflict)
>   {
> if (for_conflict && visit == visit_op)
>   {
> -   /* If this is the first real instruction in this BB we need
> -  to add conflicts for everything live at this point now.
> -  Unlike classical liveness for named objects we can't
> -  rely on seeing a def/use of the names we're interested in.
> -  There might merely be indirect loads/stores.  We'd not add any
> -  conflicts for such partitions.  */
> +   /* When we are inheriting live variables from our predecessors
> +  through a CFG merge we might not see an actual mention of
> +  the variables to record the approprate conflict as defs/uses
> +  might be through indirect stores/loads.  For this reason
> +  we have to make sure each live variable conflicts with
> +  each other.  When there's just a single predecessor the
> +  set of conflicts is already up-to-date.
> +  We perform this delayed at the first real instruction to
> +  allow clobbers starting this block to remove variables from
> +  the set of live variables.  */
> bitmap_iterator bi;
> unsigned i;
> -   EXECUTE_IF_SET_IN_BITMAP (work, 0, i, bi)
> - {
> -   class stack_var *a = _vars[i];
> -   if (!a->conflicts)
> - a->conflicts = BITMAP_ALLOC (_var_bitmap_obstack);
> -   bitmap_ior_into (a->conflicts, work);
> - }
> +   if (EDGE_COUNT (bb->preds) > 1)
> + EXECUTE_IF_SET_IN_BITMAP 

Re: [PATCH] middle-end/114480 - IDF compute is slow

2024-03-27 Thread Michael Matz
Hey,

On Wed, 27 Mar 2024, Jakub Jelinek wrote:

> > @@ -1712,12 +1711,9 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs)
> >gcc_checking_assert (bb_index
> >< (unsigned) last_basic_block_for_fn (cfun));
> >  
> > -  EXECUTE_IF_AND_COMPL_IN_BITMAP ([bb_index], phi_insertion_points,
> > - 0, i, bi)
> > -   {
> > +  EXECUTE_IF_SET_IN_BITMAP ([bb_index], 0, i, bi)
> > +   if (bitmap_set_bit (phi_insertion_points, i))
> >   bitmap_set_bit (work_set, i);
> > - bitmap_set_bit (phi_insertion_points, i);
> > -   }
> >  }
> 
> I don't understand why the above is better.
> Wouldn't it be best to do
>   bitmap_ior_and_compl_into (work_set, [bb_index],
>phi_insertion_points);
>   bitmap_ior_into (phi_insertion_points, [bb_index]);
> ?

I had the same hunch, but:

1) One would have to make work_set be non-tree-view again (which with the 
current structure is a wash anyway, and that makes sense as accesses to 
work_set aren't heavily random here).

2) But doing that and using bitmap_ior.._into is still measurably slower: 
on a reduced testcase with -O0 -fno-checking, proposed structure 
(tree-view or not-tree-view workset doesn't matter):

 tree SSA rewrite   :  14.93 ( 12%)   0.01 (  2%)  14.95 ( 
12%)27M (  8%)

with non-tree-view, and your suggestion:

 tree SSA rewrite   :  20.68 ( 12%)   0.02 (  4%)  20.75 ( 
12%)27M (  8%)

I can only speculate that the usually extreme sparsity of the bitmaps in 
question make the setup costs of the two bitmap_ior calls actually more 
expensive than the often skipped second call to bitmap_set_bit in Richis 
proposed structure.  (That or cache effects)


Ciao,
Michael.


Re: [PATCH] i386: For noreturn functions save at least the bp register if it is used [PR114116]

2024-02-29 Thread Michael Matz
Hello,

On Tue, 27 Feb 2024, Jakub Jelinek wrote:

> On Tue, Feb 27, 2024 at 10:13:14AM +0100, Jakub Jelinek wrote:
> > For __libc_start_main, glibc surely just could use no_callee_saved_registers
> > attribute, because that is typically the outermost frame in backtrace,
> > there is no need to save those there.
> > And for kernel if it really wants it and nothing will use the backtraces,
> > perhaps the patch wouldn't need to be reverted completely but just guarded
> > the implicit no_callee_saved_registers treatment of noreturn
> > functions on -mcmodel=kernel or -fno-asynchronous-unwind-tables.
> 
> Guarding on -fno-asynchronous-unwind-tables isn't a good idea,
> with just -g we emit in that case unwind info in .debug_frame section
> and even that shouldn't break, and we shouldn't generate different code for
> -g vs. -g0.
> The problem with the changes is that it breaks the unwinding and debugging
> experience not just in the functions on which the optimization triggers,
> but on all functions in the backtrace as well.
> 
> So, IMHO either revert the changes altogether, or guard on -mcmodel=kernel
> (but talk to kernel people on linux-toolchains if that is what they actually
> want).

What is the underlying real purpose of the changes anyway?  It's a 
nano-optimization: for functions to be called multiple times 
they must either return or be recursive.  The latter is not very likely, 
so a noreturn function is called only once in the vast majority of cases.  
Any optimizations that diddle with the frame setup code for functions 
called only once seems to be ... well, not so very useful, especially so 
when they impact anything that is actually useful, like debugging.

I definitely think this shouldn't be done by default.


Ciao,
Michael.


Re: [PATCH] tree-optimization/114074 - CHREC multiplication and undefined overflow

2024-02-26 Thread Michael Matz
Hello,

On Mon, 26 Feb 2024, Jakub Jelinek wrote:

> > Will update the patch, I think any improvement should be done
> > to get_range_pos_neg (it's a bit odd in behavior for unsigned
> > but I have only signed things incoming).
> 
> For unsigned if it always returned 1, it would be quite useless, there would
> be no point for the caller to call it in that case.

Which seems to make sense for a function called ...pos_neg on unsigned 
types.  I would expect calling it to be useless and always return "yep, 
non-negative, why did you ask?".


Ciao,
Michael.


Re: [PATCH] c: Handle scoped attributes in __has*attribute and scoped attribute parsing changes in -std=c11 etc. modes [PR114007]

2024-02-22 Thread Michael Matz
Hi,

On Thu, 22 Feb 2024, Jakub Jelinek wrote:

> > Hmm, shouldn't you be able to use (nonexistence of) the PREV_WHITE flag on 
> > the second COLON token to see that it's indeed a '::' without intervening 
> > whitespace?  Instead of setting a new flag on the first COLON token?
> > 
> > I.e. something like this:
> > 
> >if (c_parser_next_token_is (parser, CPP_SCOPE)
> > -  || (loose_scope_p
> > - && c_parser_next_token_is (parser, CPP_COLON)
> >   && c_parser_peek_2nd_token (parser)->type == CPP_COLON))
> > + && !(c_parser_peek_2nd_token (parser)->flags & PREV_WHITE)))
> > 
> > ?
> 
> That doesn't seem to work.

Too bad then.  I had hoped it would make the code easier without changes 
to c-lex.  Well, then ... was worth a try, I'll crouch back under my stone
:)


Ciao,
Michael.


Re: [PATCH] c: Handle scoped attributes in __has*attribute and scoped attribute parsing changes in -std=c11 etc. modes [PR114007]

2024-02-22 Thread Michael Matz
Hello,

On Thu, 22 Feb 2024, Jakub Jelinek wrote:

> So, the following patch adds a flag during preprocessing at the point
> where we normally create CPP_SCOPE tokens out of 2 consecutive colons
> on the first CPP_COLON to mark the consecutive case (as we are tight
> on the bits, I've reused the PURE_ZERO flag, which is used just by the
> C++ FE and only ever set (both C and C++) on CPP_NUMBER tokens, this
> new flag has the same value and is only ever used on CPP_COLON tokens)

Hmm, shouldn't you be able to use (nonexistence of) the PREV_WHITE flag on 
the second COLON token to see that it's indeed a '::' without intervening 
whitespace?  Instead of setting a new flag on the first COLON token?

I.e. something like this:

   if (c_parser_next_token_is (parser, CPP_SCOPE)
-  || (loose_scope_p
- && c_parser_next_token_is (parser, CPP_COLON)
  && c_parser_peek_2nd_token (parser)->type == CPP_COLON))
+ && !(c_parser_peek_2nd_token (parser)->flags & PREV_WHITE)))

?


Ciao,
Michael.


Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-17 Thread Michael Matz
Hello,

On Wed, 17 Jan 2024, Ajit Agarwal wrote:

> > first is even, since OOmode is only ok for even vsx register and its
> > size makes it take two consecutive vsx registers.
> > 
> > Hi Peter, is my understanding correct?
> > 
> 
> I tried all the combination in the past RA is not allocating sequential 
> register. I dont see any such code in RA that generates sequential 
> registers.

See HARD_REGNO_NREGS.  If you form a pseudo of a mode that's larger than a 
native-sized hardreg (and the target is correctly set up) then the RA will 
allocate the correct number of hardregs (consecutively) for this pseudo.  
This is what Kewen was referring to by mentioning the OOmode for the new 
hypothetical pseudo.  The individual parts of such pseudo will then need 
to use subreg to access them.

So, when you work before RA you simply will transform this (I'm going to 
use SImode and DImode for demonstration):

   (set (reg:SI x) (mem:SI (addr)))
   (set (reg:SI y) (mem:SI (addr+4)))
   ...
   ( ...use1... (reg:SI x))
   ( ...use2... (reg:SI y))

into this:

   (set (reg:DI z) (mem:DI (addr)))
   ...
   ( ...use1... (subreg:SI (reg:DI z) 0))
   ( ...use2... (subreg:SI (reg:DI z) 4))

For this to work the target needs to accept the (subreg...) in certain 
operands of instruction patterns, which I assume was what Kewen also 
referred to.  The register allocator will then assign hardregs X and X+1 
to the pseudo-reg 'z'.  (Assuming that DImode is okay for hardreg X, and 
HARD_REGNO_NREGS says that it needs two hardregs to hold DImode).

It will also replace the subregs by their appropriate concrete hardreg.

It seems your problems stem from trying to place your new pass somewhere 
within the register-allocation pipeline, rather than simply completely 
before.


Ciao,
Michael.


Re: [PATCH] Optimize (ne:SI (subreg:QI (ashift:SI x 7) 0) 0) as (and:SI x 1).

2023-10-10 Thread Michael Matz


On Tue, 10 Oct 2023, Roger Sayle wrote:

> 
> This patch is the middle-end piece of an improvement to PRs 101955 and
> 106245, that adds a missing simplification to the RTL optimizers.
> This transformation is to simplify (char)(x << 7) != 0 as x & 1.

Random observation:

So, why restrict to shifts of LEN-1 and mask 1?  It's always the case that
(type-of-LEN)(x << S)) != 0  ===  (x & ((1 << (LEN - S)) - 1)) != 0.

E.g. (char)(x << 5) != 0  ===  (x & 7) != 0.

(Eventually the mask will be a constant that's too costly to compute if S 
is target-dependendly too small, but all else being equal avoiding shifts 
seems sensible)


Ciao,
Michael.


Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-10 Thread Michael Matz via Gcc-patches
Hey,

On Thu, 10 Aug 2023, Martin Uecker wrote:

> > offset(struct foo_flex, t[0]) + N * sizeof(foo->t);
> > 
> > With GCC, offset(struct foo_flex,t[0]) == 6, which is also correct. 
> 
> This formula might be considered incorrect / dangerous because
> it might allocate less storage than sizeof(struct foo_flex). 

Oh indeed.  I hadn't even considered that.  That could be "fixed" with 
another max(theabove, sizeof(struct foo_flex)), but that starts to become 
silly when the obvious choice works fine.


Ciao,
Michael.


Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-10 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Aug 2023, Qing Zhao wrote:

> > So, should the equivalent FAM struct also have this sizeof()?  If no: 
> > there should be a good argument why it shouldn't be similar to the non-FAM 
> > one.
> 
> The sizeof() of a structure with FAM is defined as: (after I searched online,
>  I think that the one in Wikipedia is the most reasonable one):
> https://en.wikipedia.org/wiki/Flexible_array_member

Well, wikipedia has it's uses.  Here we are in language lawyering land 
together with a discussion what makes most sense in many circumstances.  
FWIW, in this case I think the cited text from wikipedia is correct in the 
sense of "not wrong" but not helpful in the sense of "good advise".

> By definition, the sizeof() of a struct with FAM might not be the same 
> as the non-FAM one. i.e, for the following two structures, one with FAM, 
> the other with fixed array:
> 
> struct foo_flex { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } };
> struct foo_fix {int a; short b; char t[3]; } 
> 
> With current GCC:
> sizeof(foo_flex) == 8
> sizeof(foo_fix) == 12
> 
> I think that the current behavior of sizeof for structure with FAM in 
> GCC is correct.

It is, yes.

> The major issue is what was pointed out by Martin in the previous email:
> 
> Whether using the following formula is correct to compute the 
> allocation?
> 
> sizeof(struct foo_flex) + N * sizeof(foo->t);
> 
> As pointed out  in the wikipedia, the value computed by this formula might
> be bigger than the actual size since “sizeof(struct foo_flex)” might include 
> paddings that are used as part of the array.

That doesn't make the formula incorrect, but rather conservatively 
correct.  If you don't want to be conservative, then yes, you can use a 
different formula if you happen to know the layout rules your compiler at 
hand uses (or the ones prescribed by an ABI, if it does that).  I think 
it would be bad advise to the general population do advertise this scheme 
as better.

> So the more accurate formula should be
> 
> offset(struct foo_flex, t[0]) + N * sizeof(foo->t);

"* sizeof(foo->t[0])", but yes.

> For the question, whether the compiler needs to allocate paddings after 
> the FAM field, I don’t know the answer, and it’s not specified in the 
> standard either. Does it matter?

It matters for two things:

1) Abstract reasons: is there as reason to deviate 
from the normal rules?  If not: it shouldn't deviate.  Future 
extensibility: while it right now is not possible to form an array 
of FMA-structs (in C!), who's to say that it may not be eventually added?  
It seems a natural enough extension of an extension, and while it has 
certain implementation problems (the "real" size of the elements needs to 
be computable, and hence be part of the array type) those could be 
overcome.  At that point you _really_ want to have the elements aligned 
naturally, be compatible with sizeof, and be the same as an individual 
object.

2) Practical reasons: codegeneration works better if the accessible sizes 
of objects are a multiple of their alignment, as often you have 
instructions that can move around alignment-sized blobs (say, words).  If 
you don't pad out objects you have to be careful to use only byte accesses 
when you get to the end of an object.

Let me ask the question in the opposite way: what would you _gain_ by not 
allocating padding?  And is that enough to deviate from the most obvious 
choices?  (Do note that e.g. global or local FMA-typed objects don't exist 
in isolation, and their surrounding objects have their own alignment 
rules, which often will lead to tail padding anyway, even if you would use 
the non-conservative size calculation; the same applies for malloc'ed 
objects).

> > Note that if one choses to allocate less space than sizeof implies that 
> > this will have quite some consequences for code generation, in that 
> > sometimes the instruction sequences (e.g. for copying) need to be careful 
> > to never access tail padding that should be there in array context, but 
> > isn't there in single-object context.  I think this alone should make it 
> > clear that it's advisable that sizeof() and allocated size agree.
> 
> Sizeof by definition is return the size of the TYPE, not the size of the 
> allocated object.

Sure.  Outside special cases like FMA it's the same, though.  And there 
sizeof does include tail padding.

> > And then the next question is what __builtin_object_size should do with 
> > these: should it return the size with or without padding at end (i.e. 
> > could/should it return 9 even if sizeof is 12).  I can see arguments for 
> > both.
> 
> Currently, GCC’s __builtin_object_size use the following formula to 
> compute the object size for The structure with FAM:
> 
> offset(struct foo_flex, t[0]) + N * sizeof(foo->t);
> 
> I think it’s correct.

See above.  It's non-conservatively correct.  And that may be the right 
choice for this builtin, considering its intended use-cases (strict 

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-09 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Aug 2023, Qing Zhao wrote:

> Although this is an old FAM related issue that does not relate to my current 
> patch 
> (and might need to be resolved in a separate patch).  I think that it’s 
> necessary to have
> more discussion on this old issue and resolve it. 
> 
> The first thing that I’d like to confirm is:
> 
> What the exact memory layout for the following structure x?
> 
> struct foo { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } };
> 
> And the key that is confusing me is, where should the field “t” start? 
> 
> A.  Starting at offset 8 as the following:
> 
> a 4-bytes
> b 2-bytes
> padding   2-bytes
> t 3-bytes

Why should there be padding before 't'?  It's a char array (FAM or not), 
so it can be (and should be) placed directly after 'b'.  So ...

> B. Starting at offset 6 as the following:
> 
> a 4-bytes
> b 2-bytes
> t 3-bytes

... this is the correct layout, when seen in isolation.  The discussion 
revolves around what should come after 't': if it's a non-FAM struct (with 
t[3]), then it's clear that there needs to be padding after it, so to pad 
out the whole struct to be 12 bytes long (for sizeof() purpose), as 
required by its alignment (due to the int field 'a').

So, should the equivalent FAM struct also have this sizeof()?  If no: 
there should be a good argument why it shouldn't be similar to the non-FAM 
one.

Then there is an argument that the compiler would be fine, when allocating 
a single object of such type (not as part of an array!), to only reserve 9 
bytes of space for the FAM-struct.  Then the question is: should it also 
do that for a non-FAM struct (based on the argument that the padding 
behind 't' isn't accessible, and hence doesn't need to be alloced).  I 
think it would be quite unexpected for the compiler to actually reserve 
less space than sizeof() implies, so I personally don't buy that argument.  
For FAM or non-FAM structs.

Note that if one choses to allocate less space than sizeof implies that 
this will have quite some consequences for code generation, in that 
sometimes the instruction sequences (e.g. for copying) need to be careful 
to never access tail padding that should be there in array context, but 
isn't there in single-object context.  I think this alone should make it 
clear that it's advisable that sizeof() and allocated size agree.

As in: I think sizeof for both structs should return 12, and 12 bytes 
should be reserved for objects of such types.

And then the next question is what __builtin_object_size should do with 
these: should it return the size with or without padding at end (i.e. 
could/should it return 9 even if sizeof is 12).  I can see arguments for 
both.


Ciao,
Michael.


RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Aug 2023, Zhang, Annita via Gcc-patches wrote:

> > The question is whether you want to mandate the 16-bit floating point
> > extensions.  You might get better adoption if you stay compatible with 
> > shipping
> > CPUs.  Furthermore, the 256-bit tuning apparently benefits current Intel 
> > CPUs,
> > even though they can do 512-bit vectors.
> > 
> > (The thread subject is a bit misleading for this sub-topic, by the way.)
> > 
> > Thanks,
> > Florian
> 
> Since 256bit and 512bit are diverged from AVX10.1 and will continue in 
> the future AVX10 versions, I think it's hard to keep a single version 
> number to cover both and increase monotonically. Hence I'd like to 
> suggest x86-64-v5 for 512bit and x86-64-v5-256 for 256bit, and so on.

The raison d'etre for the x86-64-vX scheme is to make life sensible as 
distributor.  That goal can only be achieved if this scheme contains only 
a few components that have a simple relationship.  That basically means: 
one dimension only.  If you now add a second dimension (with and without 
-512) we have to add another one if Intel (or whomever else) next does a 
marketing stunt for feature "foobar" and end up with x86-64-v6, 
x86-64-v6-512, x86-64-v6-1024, x86-64-v6-foobar, x86-64-v6-512-foobar, 
x86-64-v6-1024-foobar.

In short: no.

It isn't the right time anyway to assign meaning to x86-64-v5, as it 
wasn't the right time for assigning x86-64-v4 (as we now see).  These are 
supposed to reflect generally useful feature sets actually shipped in 
generally available CPUs in the market, and be vendor independend.  As 
such it's much too early to define v5 based purely on text documents.


Ciao,
Michael.


Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-08 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 8 Aug 2023, Martin Uecker via Gcc-patches wrote:

> There at least three different size expression which could
> make sense. Consider
> 
> short foo { int a; short b; char t[]; }; 
> 
> Most people seem to use
> 
> sizeof(struct foo) + N * sizeof(foo->t);
> 
> which for N == 3 yields 11 bytes on x86-64 because the formula
> adds the padding of the original struct. There is an example
> in the  C standard that uses this formula.
> 
> 
> But he minimum size of an object which stores N elements is
> 
> max(sizeof (struct s), offsetof(struct s, t[n]))
> 
> which is 9 bytes. 

But should it really?  struct sizes should usually be a multiple of a 
structs alignment, and if int is 4-aligned only size 12 would be correct. 
I don't see why one should deviate from that general rule for sizes for 
FAM structs.  (I realize that the above is not about sizeof(), but rather 
bdos/bos, but I think it's fairly useful that both agree when possbible).

Say, if you were to allocate an array of such struct foos, each having 3 
elements in foo.t[].  You need to add 12 bytes to go to the next array 
element, not 9.  (Of course arrays of FAM-structs are somewhat meh, but 
still).  It's true that you would be allowed to rely on only 9 bytes of 
those 12 bytes (the rest being padding), so perhaps it's really the right 
answer for __bos, but, hmm, 12 seems to be "more correct" to my guts :-)


Ciao,
Michael.


Re: [RFC] GCC Security policy

2023-08-08 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 8 Aug 2023, Jakub Jelinek via Gcc-patches wrote:

> What I'd really like to avoid is having all compiler bugs (primarily ICEs)
> considered to be security bugs (e.g. DoS category), it would be terrible to
> release every week a new compiler because of the "security" issues.
> Running compiler on untrusted sources can trigger ICEs (which we want to fix
> but there will always be some), or run into some compile time and/or compile
> memory issue (we have various quadratic or worse spots), compiler stack
> limits (deeply nested stuff e.g. during parsing but other areas as well).
> So, people running fuzzers and reporting issues is great, but if they'd get
> a CVE assigned for each ice-on-invalid-code, ice-on-valid-code,
> each compile-time-hog and each memory-hog, that wouldn't be useful.

This!  Double-this!

FWIW, the binutils security policy, and by extension the proposed GCC 
policy David posted, handles this.  (To me this is the most important 
aspect of such policy, having been on the receiving end of such nonsense 
on the binutils side).

> Runtime libraries or security issues in the code we generate for valid
> sources are of course a different thing.

Generate or otherwise provide for consumption.  E.g. a bug with security 
consequences in the runtime libs (either in source form (templates) or as 
executable code, but with the problem being in e.g. libgcc sources 
(unwinder!)) needs proper handling, similar to how glibc is handled.


Ciao,
Michael.


Re: _BitInt vs. _Atomic

2023-08-02 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 1 Aug 2023, Joseph Myers wrote:

> > Only because cmpxchg is defined in terms of memcpy/memcmp.  If it were 
> > defined in terms of the == operator (obviously applied recursively 
> > member-wise for structs) and simple-assignment that wouldn't be a problem.  
> 
> It also wouldn't work for floating point, where I think clearly the atomic 
> operations should consider positive and negative zero as different, and 
> should consider different DFP quantum exponents for the same real number 
> as different - but should also consider the same NaN (same payload, same 
> choice of quiet / signaling) as being the same.

That is all true.  But the current wording can't work either.  It happily 
requires copying around memory between types of different representations 
and sizes, it makes padding observable behaviour, and due to that makes 
basic algebraic guarantees not be observed (after two values are checked 
equal with the predicates of the algrabra, they are not then in fact equal 
with predicates of the same algebra).


Ciao,
Michael.


Re: _BitInt vs. _Atomic

2023-08-01 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 31 Jul 2023, Martin Uecker wrote:

> >  Say you have a loop like so:
> > 
> > _Atomic T obj;
> > ...
> > T expected1, expected2, newval;
> > newval = ...;
> > expected1 = ...;
> > do {
> >   expected2 = expected1;
> >   if (atomic_compare_exchange_weak(, , newval);
> > break;
> >   expected1 = expected2;
> > } while (1);
> > 
> > As written this looks of course stupid, and you may say "don't do that", 
> > but internally the copies might result from temporaries (compiler 
> > generated or wrapper function arguments, or suchlike). 
> >  Now, while 
> > expected2 will contain the copied padding bits after the cmpxchg the 
> > copies to and from expected1 will possibly destroy them.  Either way I 
> > don't see why the above loop should be out-of-spec, so I can write it and 
> > expect it to proceed eventually (certainly when the _strong variant is 
> > used).  Any argument that would declare the above loop out-of-spec I would 
> > consider a defect in the spec.
> 
> It is "out-of-spec" for C in the sense that it can not be
> expected work with the semantics as specified in the C standard.

(I call that a defect.  See below)

> In practice, what the semantics specified using memcpy/memcmp
> allow one to do is to also apply atomic operations on non-atomic 
> types.  This is not guaranteed to work by the C standard, but
> in practice  people often have to do this.  For example, nobody
> is going to copy a 256 GB numerical array with non-atomic types
> into another data structure with atomic versions of the same
> type just so that you can apply atomic operations on it.
> So one simply does an unsafe cast and hopes the compiler does
> not break this.
> 
> If the non-atomic struct now has non-zero values in the padding, 
> and the compiler would clear those automatically for "expected", 
> you would create the problem of an infinite loop (this time 
> for real).

Only because cmpxchg is defined in terms of memcpy/memcmp.  If it were 
defined in terms of the == operator (obviously applied recursively 
member-wise for structs) and simple-assignment that wouldn't be a problem.  
In addition that would get rid of all discussion of what happens or 
doesn't happen with padding.  Introducing reliance on padding bits (which 
IMHO goes against some fundamental ideas of the C standard) has 
far-reaching consequences, see below.  The current definition of the 
atomic_cmpxchg is also inconsistent with the rest of the standard:

We have:

  ... (C is non-atomic variant of A) ...
  _Bool atomic_compare_exchange_strong(volatile A *object,
   C *expected, C desired);
  ... (is equivalent to atomic variant of:) 
  if (memcmp(object, expected, sizeof (*object)) == 0)
{ memcpy(object, , sizeof (*object)); return true; }
  else
{ memcpy(expected, object, sizeof (*object)); return false; }

But we also have:

  The size, representation, and alignment of an atomic type need not be 
  the same as those of the corresponding unqualified type.

  (with later text only suggesting that at least for atomic integer 
  types these please be the same.  But here we aren't talking about
  integer types even.)

So, already the 'memcmp(object, expected, sizeof (*object)' may be 
undefined.  sizeof(*object) need not be the same as sizeof(*expected).
In particular the memcpy in the else branch might clobber memory outside 
*expected.

That alone should be sufficient to show that defining this all in terms of 
memcpy/memcmp is a bad idea.  But it also has other 
consequences: you can't copy (simple-assign) or compare (== operator) 
atomic values anymore reliably and expect the atomic_cmpxchg to work.  My 
example from earlier shows that you can't copy them, a similar one can be 
constructed for breaking ==.

But it goes further: you can also construct an example that shows an 
internal inconsistency just with using atomic_cmpxchg (of course, assume 
all this to run without concurrent accesses to the respective objects):

  _Atomic T obj;
  ...
  T expected, newval;
  expected = ...;
  newval = expected + 1; // just to make it different
  atomic_store (, expected);
  if (atomic_cmpxchg_strong (, , newval)) {
/* Now we have: obj == newval.
   Do we also have memcmp(,)==0? */
if (!atomic_cmpxchg_strong (, , expected)) {
  /* No, we can't rely on that!  */
  error("what's going on?");
}
  } else {
/* May happen, padding of expected may not be the same
   as in obj, even after atomic_store.  */
error("WTH? a compare after a store doesn't even work?");
  }

So, even though cmpxchg is defined in terms of memcpy/memcmp, we still 
can't rely on anything after it succeeded (or failed).  Simply because the 
by-value passing of the 'desired' argument will have unknown padding 
(within the implementation of cmpxchg) that isn't necessarily the same as 
the newval object.

Now, about your suggestion of clearing or ignoring the padding bits at 
specific 

Re: _BitInt vs. _Atomic

2023-07-31 Thread Michael Matz via Gcc-patches
Hello,

On Fri, 28 Jul 2023, Martin Uecker wrote:

> > > Sorry, somehow I must be missing something here.
> > > 
> > > If you add something you would create a new value and this may (in
> > > an object) have random new padding.  But the "expected" value should
> > > be updated by a failed atomic_compare_exchange cycle and then have
> > > same padding as the value stored in the atomic. So the next cycle
> > > should succeed.  The user would not change the representation of
> > > the "expected" value but create a new value for another object
> > > by adding something.
> > 
> > You're right that it would pass the expected value not something after an
> > operation on it usually.  But still, expected type will be something like
> > _BitInt(37) or _BitInt(195) and so neither the atomic_load nor what
> > atomic_compare_exchange copies back on failure is guaranteed to have the
> > padding bits preserved.
> 
> For atomic_load in C a value is returned. A value does not care about
> padding and when stored into a new object can produce new and different
> padding.  
> 
> But for atomic_compare_exchange the memory content is copied into 
> an object passed by pointer, so here the C standard requires to
> that the padding is preserved. It explicitely states that the effect
> is like:
> 
> if (memcmp(object, expected, sizeof(*object)) == 0)
>   memcpy(object, , sizeof(*object));
> else
>   memcpy(expected, object, sizeof(*object));
> 
> > It is true that if it is larger than 16 bytes the libatomic 
> > atomic_compare_exchange will memcpy the value back which copies the 
> > padding bits, but is there a guarantee that the user code doesn't 
> > actually copy that value further into some other variable?  
> 
> I do not think it would be surprising for C user when
> the next atomic_compare_exchange fails in this case.

But that is a problem (the same one the cited C++ paper tries to resolve, 
IIUC).  Say you have a loop like so:

_Atomic T obj;
...
T expected1, expected2, newval;
newval = ...;
expected1 = ...;
do {
  expected2 = expected1;
  if (atomic_compare_exchange_weak(, , newval);
break;
  expected1 = expected2;
} while (1);

As written this looks of course stupid, and you may say "don't do that", 
but internally the copies might result from temporaries (compiler 
generated or wrapper function arguments, or suchlike).  Now, while 
expected2 will contain the copied padding bits after the cmpxchg the 
copies to and from expected1 will possibly destroy them.  Either way I 
don't see why the above loop should be out-of-spec, so I can write it and 
expect it to proceed eventually (certainly when the _strong variant is 
used).  Any argument that would declare the above loop out-of-spec I would 
consider a defect in the spec.

It's never a good idea to introduce reliance on padding bits.  Exactly 
because you can trivially destroy them with simple value copies.

> > Anyway, for smaller or equal to 16 (or 8) bytes if 
> > atomic_compare_exchange is emitted inline I don't see what would 
> > preserve the bits.
> 
> This then seems to be incorrect for C.

Or the spec is.


Ciao,
Michael.


Re: [WIP RFC] Add support for keyword-based attributes

2023-07-17 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 17 Jul 2023, Richard Sandiford via Gcc-patches wrote:

> >> There are some existing attributes that similarly affect semantics
> >> in ways that cannot be ignored.  vector_size is one obvious example.
> >> But that doesn't make it a good thing. :)
> >...
> > If you had added __arm(bar(args)) instead of __arm_bar(args) you would only
> > need one additional keyword - we could set aside a similar one for each
> > target then.  I realize that double-nesting of arguments might prove a bit
> > challenging but still.
> 
> Yeah, that would work.

So, essentially you want unignorable attributes, right?  Then implement 
exactly that: add one new keyword "__known_attribute__" (invent a better 
name, maybe :) ), semantics exactly as with __attribute__ (including using 
the same underlying lists in our data structures), with only one single 
deviation: instead of the warning you give an error for unhandled 
attributes.  Done.

(Old compilers will barf of the unknown new keyword, new compilers will 
error on unknown values used within such attribute list)

> > In any case I also think that attributes are what you want and their 
> > ugliness/issues are not worse than the ugliness/issues of the keyword 
> > approach IMHO.
> 
> I guess the ugliness of keywords is the double underscore? What are the 
> issues with the keyword approach though?

There are _always_ problems with new keywords, the more new keywords the 
more problems :-)  Is the keyword context-sensitive or not?  What about 
existing system headers that use it right now?  Is it recognized in 
free-standing or not?  Is it only recognized for some targets?  Is it 
recognized only for certain configurations of the target?

So, let's define one new mechanism, for all targets, all configs, and all 
language standards.  Let's use __attribute__ with a twist :)


Ciao,
Michael.


Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Michael Matz via Gcc-patches
Hey,

On Tue, 11 Jul 2023, Alexander Monakov via Gcc-patches wrote:

> > > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> > > 
> > > This is the weak/active form; I'd suggest "preserve_high_sse".
> > 
> > But it preserves only the low parts :-)  You swapped the two in your 
> > mind when writing the reply?
> 
> Ahhh. By "high SSE" I mean the high-numbered SSE regs, i.e. xmm8-15, not
> the higher halves of (unspecified subset of) SSE regs.

Ah, gotcha :-)  It just shows that all these names are confusing.  Maybe 
I'll just go with "attribute1" and "attribute2" and rely on docu.  (SCNR)


Ciao,
Michael.


Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 10 Jul 2023, Alexander Monakov wrote:

> I think the main question is why you're going with this (weak) form
> instead of the (strong) form "may only clobber the low XMM regs":

I want to provide both.  One of them allows more arbitrary function 
definitions, the other allows more register (parts) to be preserved.  I 
feel both have their place.

> as Richi noted, surely for libcalls we'd like to know they preserve
> AVX-512 mask registers as well?

Yeah, mask registers.  I'm still pondering this.  We would need to split 
the 8 maskregs into two parts.  Hmm.

> Note this interacts with anything that interposes between the caller
> and the callee, like the Glibc lazy binding stub (which used to
> zero out high halves of 512-bit arguments in ZMM registers).
> Not an immediate problem for the patch, just something to mind perhaps.

Yeah, needs to be kept in mind indeed.  Anything coming in between the 
caller and a so-marked callee needs to preserve things.

> > I chose to make it possible to write function definitions with that
> > attribute with GCC adding the necessary callee save/restore code in
> > the xlogue itself.
> 
> But you can't trivially restore if the callee is sibcalling — what
> happens then (a testcase might be nice)?

I hoped early on that the generic code that prohibits sibcalls between 
call sites of too "different" ABIs would deal with this, and then forgot 
to check.  Turns out you had a good hunch here, it actually does a 
sibcall, destroying the guarantees.  Thanks! :)

> > Carefully note that this is only possible for the SSE2 registers, as 
> > other parts of them would need instructions that are only optional.
> 
> What is supposed to happen on 32-bit x86 with -msse -mno-sse2?

Hmm.  I feel the best answer here is "that should error out".  I'll add a 
test and adjust patch if necessary.

> > When a function doesn't contain calls to
> > unknown functions we can be a bit more lenient: we can make it so that
> > GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> > necessary.
> 
> What if the source code has a local register variable bound to xmm15,
> i.e. register double x asm("xmm15"); asm("..." : "+x"(x)); ?

Makes a good testcase as well.  My take: it's acceptable with the 
only-sse2-preserved attribute (xmm15 will in this case be saved/restored), 
and should be an error with the everything-preserved attribute (maybe we 
can make an exception as here we only specify an XMM reg, instead of 
larger parts).

> > To that end I introduce actually two related attributes (for naming
> > see below):
> > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> 
> This is the weak/active form; I'd suggest "preserve_high_sse".

But it preserves only the low parts :-)  You swapped the two in your 
mind when writing the reply?

> > I would welcome any comments, about the names, the approach, the attempt
> > at documenting the intricacies of these attributes and anything.
> 
> I hope the new attributes are supposed to be usable with function 
> pointers? From the code it looks that way, but the documentation doesn't 
> promise that.

Yes, like all ABI influencing attributes they _have_ to be part of the 
functions type (and hence transfer to function pointers), with appropriate 
incompatible-conversion errors and warnings at the appropriate places.  (I 
know that this isn't always the way we're dealing with ABI-infuencing 
attributes, and often refer to a decl only.  All those are actual bugs.)

And yes, I will adjust the docu to be explicit about this.


Ciao,
Michael.


Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 11 Jul 2023, Jan Hubicka wrote:

> > > > When a function doesn't contain calls to
> > > > unknown functions we can be a bit more lenient: we can make it so that
> > > > GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> > > > necessary.
> 
> One may also take into account that first 8 registers are cheaper to
> encode than the later 8, so perhaps we may want to choose range that
> contains both.

There is actually none in the low range that's usable.  xmm0/1 are used 
for return values and xmm2-7 are used for argument passing.  Arguments are 
by default callee clobbered, and we do not want to change this (or limit 
the number of register arguments for the alternate ABI).


Ciao,
Michael.


[x86-64] RFC: Add nosse abi attribute

2023-07-10 Thread Michael Matz via Gcc-patches
Hello,

the ELF psABI for x86-64 doesn't have any callee-saved SSE
registers (there were actual reasons for that, but those don't
matter anymore).  This starts to hurt some uses, as it means that
as soon as you have a call (say to memmove/memcpy, even if
implicit as libcall) in a loop that manipulates floating point
or vector data you get saves/restores around those calls.

But in reality many functions can be written such that they only need
to clobber a subset of the 16 XMM registers (or do the save/restore
themself in the codepaths that needs them, hello memcpy again).
So we want to introduce a way to specify this, via an ABI attribute
that basically says "doesn't clobber the high XMM regs".

I've opted to do only the obvious: do something special only for
xmm8 to xmm15, without a way to specify the clobber set in more detail.
I think such half/half split is reasonable, and as I don't want to
change the argument passing anyway (whose regs are always clobbered)
there isn't that much wiggle room anyway.

I chose to make it possible to write function definitions with that
attribute with GCC adding the necessary callee save/restore code in
the xlogue itself.  Carefully note that this is only possible for
the SSE2 registers, as other parts of them would need instructions
that are only optional.  When a function doesn't contain calls to
unknown functions we can be a bit more lenient: we can make it so that
GCC simply doesn't touch xmm8-15 at all, then no save/restore is
necessary.  If a function contains calls then GCC can't know which
parts of the XMM regset is clobbered by that, it may be parts
which don't even exist yet (say until avx2048 comes out), so we must
restrict ourself to only save/restore the SSE2 parts and then of course
can only claim to not clobber those parts.

To that end I introduce actually two related attributes (for naming
see below):
* nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
* noanysseclobber: claims (and ensures) that nothing of any of the
  registers overlapping xmm8-15 is clobbered (not even future, as of
  yet unknown, parts)

Ensuring the first is simple: potentially add saves/restore in xlogue
(e.g. when xmm8 is either used explicitely or implicitely by a call).
Ensuring the second comes with more: we must also ensure that no
functions are called that don't guarantee the same thing (in addition
to just removing all xmm8-15 parts alltogether from the available
regsters).

See also the added testcases for what I intended to support.

I chose to use the new target independend function-abi facility for
this.  I need some adjustments in generic code:
* the "default_abi" is actually more like a "current" abi: it happily
  changes its contents according to conditional_register_usage,
  and other code assumes that such changes do propagate.
  But if that conditonal_reg_usage is actually done because the current
  function is of a different ABI, then we must not change default_abi.
* in insn_callee_abi we do look at a potential fndecl for a call
  insn (only set when -fipa-ra), but doesn't work for calls through
  pointers and (as said) is optional.  So, also always look at the
  called functions type (it's always recorded in the MEM_EXPR for
  non-libcalls), before asking the target.
  (The function-abi accessors working on trees were already doing that,
  its just the RTL accessor that missed this)

Accordingly I also implement some more target hooks for function-abi.
With that it's possible to also move the other ABI-influencing code
of i386 to function-abi (ms_abi and friends).  I have not done so for
this patch.

Regarding the names of the attributes: gah!  I've left them at
my mediocre attempts of names in order to hopefully get input on better
names :-)

I would welcome any comments, about the names, the approach, the attempt
at documenting the intricacies of these attributes and anything.

FWIW, this particular patch was regstrapped on x86-64-linux
with trunk from a week ago (and sniff-tested on current trunk).


Ciao,
Michael.

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 37cb5a0dcc4..92358f4ac41 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3244,6 +3244,16 @@ ix86_set_indirect_branch_type (tree fndecl)
 }
 }
 
+unsigned
+ix86_fntype_to_abi_id (const_tree fntype)
+{
+  if (lookup_attribute ("nosseclobber", TYPE_ATTRIBUTES (fntype)))
+return ABI_LESS_SSE;
+  if (lookup_attribute ("noanysseclobber", TYPE_ATTRIBUTES (fntype)))
+return ABI_NO_SSE;
+  return ABI_DEFAULT;
+}
+
 /* Establish appropriate back-end context for processing the function
FNDECL.  The argument might be NULL to indicate processing at top
level, outside of any function scope.  */
@@ -3311,6 +3321,12 @@ ix86_set_current_function (tree fndecl)
   else
TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts ();
 }
+
+  unsigned prev_abi_id = 0;
+  if 

Re: [PATCH] Add targetm.libm_function_max_error

2023-04-27 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 27 Apr 2023, Jakub Jelinek wrote:

> The first really large error I see is for sinl with
> x/2gx 
> 0x748160ed90d9425b0xefd8b811d6293294
> i.e.
> 1.5926552660973502228303666578452949e+253
> with most significant double being
> 1.5926552660973502e+253
> and low double
> -5.9963639272208416e+230

Such large numbers will always be a problem with the range reduction step 
in sin/cos.  With double-double the possible mantissage length can be much 
larger than 106, and the range reduction needs to be precise to at 
least those many bits to give anything sensible.

Realistically I think you can only expect reasonably exact results for 
double-double on operations that require range-reductions for
(a) "smallish" values.  Where the low double is (say) <= 2^16 * PI, or
(b) where the low double is consecutive to the high double, i.e. the
overall mantissa size (including the implicit zeros in the middle) is 
less than 107 (or whatever the current precision for the 
range-reduction step on large values is)

> given is
> -0.4025472157704263326278375983156912
> and expected (mpfr computed)
> -0.46994008859023245970759964236618727
> But if I try on x86_64:
> #define _GNU_SOURCE
> #include 
> 
> int
> main ()
> {
>   _Float128 f, f2, f3, f4;
>   double d, d2;
>   f = 1.5926552660973502228303666578452949e+253f128;
>   d = 1.5926552660973502e+253;
>   f2 = d;
>   f2 += -5.9963639272208416e+230;
>   f3 = sinf128 (f);
>   f4 = sinf128 (f2);
>   d2 = sin (d);
>   return 0;
> }
> where I think f2 is what matches most closely the 106 bit precision value,
> (gdb) p f
> $7 = 1.5926552660973502228303666578452949e+253
> (gdb) p f2
> $8 = 1.59265526609735022283036665784527174e+253
> (gdb) p f3
> $9 = -0.277062522218693980443596385112227247
> (gdb) p f4
> $10 = -0.402547215770426332627837598315693221
> and f4 is much closer to the given than to expected.

Sure, but that's because f2 is only "close" to the double-double exact 
value of (1.5926552660973502e+253 + -5.9963639272208416e+230) relatively, 
not absolutely.  As you already wrote the mantissa to represent the exact 
value (which double-double and mpfr can!) is larger than 106 bits.  The 
necessary error of rounding to represent it in f128 is small in ULPs, but 
very very large in magnitude.  Large magnitude changes of input value to 
sin/cos essentially put the value into completely different quadrants and 
positions within those quadrants, and hence the result under such rounding 
in input can be _wildly_ off.

E.g. imagine a double-double representing (2^107 * PI + PI/2) exactly 
(assume PI is the 53-bit representation of pi, that's why I say 
"exactly").  The correct result of sin() on that is 1.  The result on the 
nearest f128 input value (2^107 * PI) will be 0.  So you really can't 
compare f128 arithmetic with double-double one when the mantissas are too 
far away.

So, maybe you want to only let your tester test "good" double-double 
values, i.e. those that can be interpreted as a about-106-bit number where 
(high-exp - low-exp) <= about 53.

(Or just not care about the similarities of cos() on double-double to a 
random number generator :) )


Ciao,
Michael.


Re: [PATCH] Add targetm.libm_function_max_error

2023-04-26 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 26 Apr 2023, Jakub Jelinek via Gcc-patches wrote:

> For glibc I've gathered data from:
> 4) using attached ulp-tester.c (how to invoke in file comment; tested
>both x86_64, ppc64, ppc64le 50M pseudo-random values in all 4 rounding
>modes, plus on x86_64 float/double sin/cos using libmvec - see
>attached libmvec-wrapper.c as well)

That ulp-tester.c file as attached here is not testing what you think it 
does.  (1) It doesn't compile as it doesn't #define the TESTS macro in the 
!LIBMVEC_TEST case, and (2) it almost never progresses 'm', the status 
variable used before the random numbers start, to beyond 1: you start with 
nextafter(0.0, 1.0), which is the smallest subnormal number (with a ERANGE 
error, but that's ignored), and you test for equality with THIS_MIN, the 
smallest normal (!) number, until you start incrementing 'm'.

>From subnormal smallest to normal smallest takes 1<

Re: [PATCH] lto: pass through -funwind-tables and -fasynchronous-unwind-tables

2023-01-18 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 18 Jan 2023, Jakub Jelinek wrote:

> > > > > Partly OT, what is riscv not defaulting that on as well?  Does it have
> > > > > usable unwind info even without that option, something else?
> > > > 
> > > > The RISC-V ABI does not address this, AFAICS.
> > > 
> > > And neither do many other ABIs, still we default there to
> > > -fasynchronous-unwind-tables because we've decided it is a good idea.
> > 
> > That might or might not be, but in the context of this thread that's 
> > immaterial.  Doing the same as the other archs will then simply hide the 
> > problem on risc-v as well, instead of fixing it.
> 
> Yeah, that is why I've mentioned "Partly OT".  We want this bug to be fixed
> (but the fix is not what has been posted but rather decide what we want to
> ask there; if it is at the end of compilation, whether it is at least one
> function with that flag has been compiled, or all functions have been with
> that flag, something else),

The answer to this should be guided by normal use cases.  The normal use 
case is that a whole input file is compiled with or without 
-funwind-tables, and not that individual functions change this.  So any 
solution in which that usecase doesn't work is not a solution.

The purest solution is to emit unwind tables for all functions that 
request it into .eh_frame and for those that don't request it put 
into .debug_frame (if also -g is on).  If that requires enabling 
unwind-tables globally first (i.e. if even just one input function 
requests it) then that is what needs to be done.  (this seems to be the 
problem currently, that the unwind-table activation on a per-function 
basis comes too late).

The easier solution might be to make funwind-tables also be a global 
special-cased option for LTO (so that the usual use-case above works), 
that would trade one or another bug, but I'd say the current bug is more 
serious than the other bug that would be introduced.

> and IMHO riscv should switch to
> -fasynchronous-unwind-tables by default.

I don't disagree, as long as that doesn't lead to the bug being ignored :)


Ciao,
Michael.


Re: [PATCH] lto: pass through -funwind-tables and -fasynchronous-unwind-tables

2023-01-18 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 18 Jan 2023, Jakub Jelinek wrote:

> On Wed, Jan 18, 2023 at 04:09:08PM +0100, Andreas Schwab wrote:
> > On Jan 18 2023, Jakub Jelinek wrote:
> > 
> > > Partly OT, what is riscv not defaulting that on as well?  Does it have
> > > usable unwind info even without that option, something else?
> > 
> > The RISC-V ABI does not address this, AFAICS.
> 
> And neither do many other ABIs, still we default there to
> -fasynchronous-unwind-tables because we've decided it is a good idea.

That might or might not be, but in the context of this thread that's 
immaterial.  Doing the same as the other archs will then simply hide the 
problem on risc-v as well, instead of fixing it.


Ciao,
Michael.


Re: [PATCH] lto: pass through -funwind-tables and -fasynchronous-unwind-tables

2023-01-18 Thread Michael Matz via Gcc-patches
On Wed, 18 Jan 2023, Andreas Schwab via Gcc-patches wrote:

> No unwind tables are generated, as if -funwind-tables is ignored.  If
> LTO is disabled, everything works as expected.

On Risc-V btw.  (which, unlike i386,aarch64,s390,rs6000 does not 
effectively enable funwind-tables always via either backend or 
common/config/$arch/ code, which is the reason why the problem can't be 
seen there).  It's an interaction with -g :

The problem (with cross compiler here, but shouldn't matter):

% riscv64-gcc -g -flto -funwind-tables -fPIC -c hello.c
% riscv64-gcc -shared hello.o
% readelf -wF a.out
... empty .eh_frame ...
Contents of the .debug_frame section:
... content ...

Using a compiler for any of the above archs makes this work (working means 
that the unwind info is placed into .eh_frame, _not_ into .debug_frame).  
Not using -g makes it work.  Adding -funwind-tables to the link command 
makes it work.  Adding -fexceptions to the compile command makes it work.  
Not using LTO makes it work.

So, it's quite clear that the option merging algorithm related to all this 
is somewhat broken, the global (or per function, or whatever) 
-funwind-tables option from hello.o doesn't make it correctly into the 
output (when -g is there).  Adding -fexception makes it work because then 
the functions will have personalities and on LTO-read-in _that_ will 
implicitely enable funwind-tables again (which should have been enabled 
already by the option-read-in).

As written initially the other archs are working because they all have 
various ways of essentially setting flag_unwind_tables always:

i386 via common/config/i386/i386-common.cc
   opts->x_flag_asynchronous_unwind_tables = 2;
  config/i386/i386-options.cc
 if (opts->x_flag_asynchronous_unwind_tables == 2)
   opts->x_flag_unwind_tables
 = opts->x_flag_asynchronous_unwind_tables = 1;

rs6000 via common/config/rs6000/rs6000-common.cc
   #ifdef OBJECT_FORMAT_ELF
 opts->x_flag_asynchronous_unwind_tables = 1;
   #endif
  (which ultimately also enabled flag_unwind_tables)

s390 via common/config/s390/s390-common.cc
opts->x_flag_asynchronous_unwind_tables = 1;

aarch64 via common/config/aarch64/aarch64-common.cc
  #if (TARGET_DEFAULT_ASYNC_UNWIND_TABLES == 1)
{ OPT_LEVELS_ALL, OPT_fasynchronous_unwind_tables, NULL, 1 },
{ OPT_LEVELS_ALL, OPT_funwind_tables, NULL, 1},
  #endif

  (the #define here is set for aarch64*-*-linux* )

So the problem cannot be readily demonstrated on these architectures.  
risc-v has no such code (and doesn't need to).


Ciao,
Michael.


Re: [PATCH 1/3] STABS: remove -gstabs and -gxcoff functionality

2022-11-10 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 10 Nov 2022, Martin Liška wrote:

> > These changes are part of
> > commit r13-2361-g7e0db0cdf01e9c885a29cb37415f5bc00d90c029
> > "STABS: remove -gstabs and -gxcoff functionality".  What this does is
> > remove these identifiers from "poisoning":
> > 
> >  /* As the last action in this file, we poison the identifiers that
> > shouldn't be used.
> >  [...]
> >  /* Other obsolete target macros, or macros that used to be in target
> > headers and were not used, and may be obsolete or may never have
> > been used.  */
> >   #pragma GCC poison [...]
> > 
> > Shouldn't these identifiers actually stay (so that any accidental future
> > use gets flagged, as I understand this machinery), and instead more
> > identifiers be added potentially: those where their definition/use got
> > removed with "STABS: remove -gstabs and -gxcoff functionality"?  (I've
> > not checked.)
> 
> Well, the identifiers are not used any longer, so I don't think we should
> poison them. Or do I miss something?

It's the very nature of poisoned identifiers that they aren't used (every 
use would get flagged as error).  The point of poisoning them is to avoid 
future new uses to creep in (e.g. via mislead back- or forward-ports, 
which can for instance happen easily with backend macros when an 
out-of-tree port is eventually tried to be integrated).  Hence, generally 
the list of those identifiers is only extended, never reduced.  (There may 
be exceptions of course)


Ciao,
Michael.


Re: [PATCH] sphinx: support Sphinx in lib*/Makefile.am.

2022-11-10 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 10 Nov 2022, Martin Liška wrote:

> This is a patch which adds support for Sphinx in lib*/Makefile.am where
> I wrongly modified Makefile.in that are generated.
> 
> One thing that's missing is that the generated Makefile.in does not
> contain 'install-info-am' target and thus the created info files
> are not installed with 'make install'. Does anybody know?

The whole generation/processing of '*info*' targets (and dvi,pdf,ps,html 
targets) is triggered by the presence of a 'TEXINFO' primary 
(here in the 'info_TEXINFO' variable), which you removed.  As the sphinx 
result is not appropriate for either TEXINFO or MANS primaries (the only 
ones in automake related specifically to documentation), you probably want 
to include them in the DATA primary.  For backward compatibility you might 
want to add your own {un,}install-info-am targets depending on 
{un,}install-data-am then, though I'm not sure why one would need one.

I currently don't quite see how you make the Sphinx results be installed 
at all, AFAICS there's no mention of them in any of the automake 
variables.  You have to list something somewhere (as said, probably in 
DATA) to enable automake to generate the usual set of Makefile targets.

(beware: I'm not an automake expert, so the above might turn out to be 
misleading advise :-) )


Ciao,
Michael.


> 
> Thanks,
> Martin
> 
> ---
>  libgomp/Makefile.am   |  27 ++-
>  libgomp/Makefile.in   | 275 +++---
>  libgomp/testsuite/Makefile.in |   3 +
>  libitm/Makefile.am|  26 ++-
>  libitm/Makefile.in| 278 ++
>  libitm/testsuite/Makefile.in  |   3 +
>  libquadmath/Makefile.am   |  37 ++--
>  libquadmath/Makefile.in   | 307 +++---
>  8 files changed, 208 insertions(+), 748 deletions(-)
> 
> diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
> index 428f7a9dab5..ab5e86b0f98 100644
> --- a/libgomp/Makefile.am
> +++ b/libgomp/Makefile.am
> @@ -11,6 +11,8 @@ config_path = @config_path@
>  search_path = $(addprefix $(top_srcdir)/config/, $(config_path))
> $(top_srcdir) \
> $(top_srcdir)/../include
>  +abs_doc_builddir = @abs_top_builddir@/doc
> +
>  fincludedir =
> $(libdir)/gcc/$(target_alias)/$(gcc_version)$(MULTISUBDIR)/finclude
>  libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
>  @@ -100,18 +102,6 @@ fortran.o: libgomp_f.h
>  env.lo: libgomp_f.h
>  env.o: libgomp_f.h
>  -
> -# Automake Documentation:
> -# If your package has Texinfo files in many directories, you can use the
> -# variable TEXINFO_TEX to tell Automake where to find the canonical
> -# `texinfo.tex' for your package. The value of this variable should be
> -# the relative path from the current `Makefile.am' to `texinfo.tex'.
> -TEXINFO_TEX   = ../gcc/doc/include/texinfo.tex
> -
> -# Defines info, dvi, pdf and html targets
> -MAKEINFOFLAGS = -I $(srcdir)/../gcc/doc/include
> -info_TEXINFOS = libgomp.texi
> -
>  # AM_CONDITIONAL on configure option --generated-files-in-srcdir
>  if GENINSRC
>  STAMP_GENINSRC = stamp-geninsrc
> @@ -127,7 +117,7 @@ STAMP_BUILD_INFO =
>  endif
>   -all-local: $(STAMP_GENINSRC)
> +all-local: $(STAMP_GENINSRC) $(STAMP_BUILD_INFO)
>   stamp-geninsrc: libgomp.info
>   cp -p $(top_builddir)/libgomp.info $(srcdir)/libgomp.info
> @@ -135,8 +125,15 @@ stamp-geninsrc: libgomp.info
>   libgomp.info: $(STAMP_BUILD_INFO)
>  -stamp-build-info: libgomp.texi
> - $(MAKEINFO) $(AM_MAKEINFOFLAGS) $(MAKEINFOFLAGS) -I $(srcdir) -o
> libgomp.info $(srcdir)/libgomp.texi
> +RST_FILES:=$(shell find $(srcdir) -name *.rst)
> +SPHINX_CONFIG_FILES:=$(srcdir)/doc/conf.py $(srcdir)/../doc/baseconf.py
> +SPHINX_FILES:=$(RST_FILES) $(SPHINX_CONFIG_FILES)
> +
> +stamp-build-info: $(SPHINX_FILES)
> + + if [ x$(HAS_SPHINX_BUILD) = xhas-sphinx-build ]; then \
> +   make -C $(srcdir)/../doc info SOURCEDIR=$(abs_srcdir)/doc
> BUILDDIR=$(abs_doc_builddir)/info SPHINXBUILD=$(SPHINX_BUILD); \
> +   cp ./doc/info/texinfo/libgomp.info libgomp.info; \
> + else true; fi
>   @touch $@
>   diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
> index 814ccd13dc0..4d0f2184e95 100644
> --- a/libgomp/Makefile.in
> +++ b/libgomp/Makefile.in
> @@ -177,7 +177,7 @@ am__uninstall_files_from_dir = { \
>  || { echo " ( cd '$$dir' && rm -f" $$files ")"; \
>   $(am__cd) "$$dir" && rm -f $$files; }; \
>}
> -am__installdirs = "$(DESTDIR)$(toolexeclibdir)" "$(DESTDIR)$(infodir)" \
> +am__installdirs = "$(DESTDIR)$(toolexeclibdir)" \
>   "$(DESTDIR)$(fincludedir)" "$(DESTDIR)$(libsubincludedir)" \
>   "$(DESTDIR)$(toolexeclibdir)"
>  LTLIBRARIES = $(toolexeclib_LTLIBRARIES)
> @@ -269,16 +269,9 @@ am__v_FCLD_0 = @echo "  FCLD" $@;
>  am__v_FCLD_1 =
>  SOURCES = $(libgomp_plugin_gcn_la_SOURCES) \
>   $(libgomp_plugin_nvptx_la_SOURCES) $(libgomp_la_SOURCES)
> -AM_V_DVIPS = $(am__v_DVIPS_@AM_V@)
> -am__v_DVIPS_ = 

Re: [RFC] docs: remove documentation for unsupported releases

2022-11-09 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Nov 2022, Martin Liška wrote:

> I think we should remove documentation for unsupported GCC releases
> as it's indexed by Google engine. Second reason is that the page is long
> one one can't easily jump to Current development documentation.
> 
> Thoughts?

I think everything that's on the web server (even the 2.95 docu) has to be 
reachable via a (reasonable) set of clicks from the main index.html.  It 
doesn't need to be _directly_ from the main index.html, though.

Also, you simply might want to move the "Current Development" section 
to the front if it's currently too cumbersome to reach.

(E.g. one could move the index of the obsolete versions to a different web 
page, make that one un-indexable by google, but leave a trace to that one 
from the main index.html).


Ciao,
Michael.


> Martin
> 
> ---
>  htdocs/onlinedocs/index.html | 1294 --
>  1 file changed, 1294 deletions(-)
> 
> diff --git a/htdocs/onlinedocs/index.html b/htdocs/onlinedocs/index.html
> index cfa8bf5a..138dde95 100644
> --- a/htdocs/onlinedocs/index.html
> +++ b/htdocs/onlinedocs/index.html
> @@ -255,1300 +255,6 @@
>   href="https://gcc.gnu.org/onlinedocs/gcc-10.4.0/docs-sources.tar.gz;>Texinfo
>   sources of all the GCC 10.4 manuals
>
> -
> -  GCC 9.5 manuals:
> -  
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gcc/;>GCC
> - 9.5 Manual ( - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gcc.pdf;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gcc.ps.gz;>PostScript or  - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gcc-html.tar.gz;>an
> - HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gfortran/;>GCC
> - 9.5 GNU Fortran Manual ( - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gfortran.pdf;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gfortran.ps.gz;>PostScript 
> or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gfortran-html.tar.gz;>an
> - HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/cpp/;>GCC
> - 9.5 CPP Manual ( - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/cpp.pdf;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/cpp.ps.gz;>PostScript or  - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/cpp-html.tar.gz;>an
> - HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_rm/;>GCC
> - 9.5 GNAT Reference Manual ( - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_rm.pdf;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_rm.ps.gz;>PostScript 
> or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_rm-html.tar.gz;>an
> - HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_ugn/;>GCC
> - 9.5 GNAT User's Guide ( - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_ugn.pdf;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_ugn.ps.gz;>PostScript 
> or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_ugn-html.tar.gz;>an
> - HTML tarball)
> - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++/manual/;>GCC
> - 9.5 Standard C++ Library Manual  ( - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-manual.pdf.gz;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-manual.xml.gz;>XML
>  or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-manual-html.tar.gz;>an
> - HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++/api/;>GCC
> - 9.5 Standard C++ Library Reference Manual  ( - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-api.pdf.gz;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-api.xml.gz;>XML 
> GPL or
> -  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-api-gfdl.xml.gz;>XML 
> GFDL or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-api-html.tar.gz;>an
> - HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gccgo/;>GCCGO 9.5 
> Manual ( -   href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gccgo.pdf;>also in
> -   PDF or  -   
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gccgo.ps.gz;>PostScript or 
>  -   
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gccgo-html.tar.gz;>an
> -   HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libgomp/;>GCC 9.5
> - GNU Offloading and Multi Processing Runtime Library Manual ( - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libgomp.pdf;>also in
> - PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libgomp.ps.gz;>PostScript 
> or  -

Re: Remove support for Intel MIC offloading (was: [PATCH] Remove dead code.)

2022-10-20 Thread Michael Matz via Gcc-patches
Hey,

On Thu, 20 Oct 2022, Thomas Schwinge wrote:

> This had been done in
> wwwdocs commit 5c7ecfb5627e412a3d142d8dc212f4cd39b3b73f
> "Document deprecation of OpenMP MIC offloading in GCC 12".
> 
> I'm sad about this, because -- in theory -- such a plugin is very useful
> for offloading simulation/debugging (separate host/device memory spaces,
> allow sanitizers to run on offloaded code

Yeah, I think that's a _very_ useful feature, but indeed ...

> (like LLVM a while ago
> implemented), and so on), but all that doesn't help -- in practice -- if
> nobody is maintaining that code.

... it should then be somewhat maintained properly.  Maybe the 
MIC-specifics could be removed from the code, and it could be transformed 
into a "null"-offload target, as example and testing vehicle (and implying 
that such new liboffloadmic^H^H^Hnull would have its upstream in the GCC 
repo).  Alas, if noone is going to do that work removing is the right 
choice.


Ciao,
Michael.


Re: [PATCH 1/2] gcov: test switch/break line counts

2022-10-11 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 11 Oct 2022, Jørgen Kvalsvik wrote:

> I apologise for the confusion. The diff there is not a part of the 
> change itself (note the indentation) but rather a way to reproduce,

Ah!  Thanks, that explains it, sorry for adding confusion on top :-)


Ciao,
Michael.


Re: [PATCH 1/2] gcov: test switch/break line counts

2022-10-11 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 11 Oct 2022, Jørgen Kvalsvik via Gcc-patches wrote:

> The coverage support will under some conditions decide to split edges to
> accurately report coverage. By running the test suite with/without this
> edge splitting a small diff shows up, addressed by this patch, which
> should catch future regressions.
> 
> Removing the edge splitting:
> 
> $ diff --git a/gcc/profile.cc b/gcc/profile.cc
> --- a/gcc/profile.cc
> +++ b/gcc/profile.cc
> @@ -1244,19 +1244,7 @@ branch_prob (bool thunk)
> Don't do that when the locuses match, so
> if (blah) goto something;
> is not computed twice.  */
> - if (last
> - && gimple_has_location (last)
> - && !RESERVED_LOCATION_P (e->goto_locus)
> - && !single_succ_p (bb)
> - && (LOCATION_FILE (e->goto_locus)
> - != LOCATION_FILE (gimple_location (last))
> - || (LOCATION_LINE (e->goto_locus)
> - != LOCATION_LINE (gimple_location (last)
> -   {
> - basic_block new_bb = split_edge (e);
> - edge ne = single_succ_edge (new_bb);
> - ne->goto_locus = e->goto_locus;
> -   }
> +

Assuming this is correct (I really can't say) then the comment needs 
adjustments.  It specifically talks about this very code you remove.


Ciao,
Michael.


Re: [PATCH] [PR68097] frange::set_nonnegative should not contain -NAN.

2022-09-20 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 20 Sep 2022, Aldy Hernandez wrote:

> > FWIW, in IEEE, 'abs' (like 'copy, 'copysign' and 'negate') are not
> > arithmetic, they are quiet-computational.  Hence they don't rise
> > anything, not even for sNaNs; they copy the input bits and appropriately
> > modify the bit pattern according to the specification (i.e. fiddle the
> > sign bit).
> >
> > That also means that a predicate like negative_p(x) that would be
> > implemented ala
> >
> >   copysign(1.0, x) < 0.0
> 
> I suppose this means -0.0 is not considered negative,

It would be considered negative if the predicate is implemented like 
above:
   copysign(1.0, -0.0) == -1.0

But really, that depends on what _our_ definition of negative_p is 
supposed to be.  I think the most reasonable definition is indeed similar 
to above, which in turn is equivalent to simply looking at the sign bit 
(which is what copysign() does), i.e. ...

> though it has
> the signbit set?  FWIW, on real_value's real_isneg() returns true for
> -0.0 because it only looks at the sign.

... this seems the sensible thing.  I just wanted to argue the case that 
set_negative (or the like) which "sets" the sign bit does not make the 
nan-ness go away.  They are orthogonal.

> > deal with NaNs just fine and is required to correctly capture the sign of
> > 'x'.  If frange::set_nonnegative is supposed to be used in such contexts
> > (and I think it's a good idea if that were the case), then set_nonnegative
> > does _not_ imply no-NaN.
> >
> > In particular I would assume that, given an VAYRING frange FR, that
> > FR.set_nonnegative() would result in an frange {[+0.0,+inf],+nan} .
> 
> That was my understanding as well, and what my original patch did.
> But again, I'm just the messenger.

Ah, I obviously haven't followed the thread carefully then.  If that's 
what it was doing then IMO it was the right thing.


Ciao,
Michael.


Re: [PATCH] [PR68097] frange::set_nonnegative should not contain -NAN.

2022-09-19 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 19 Sep 2022, Richard Biener via Gcc-patches wrote:

> > but I guess it's good we do the right thing for correctness sake, and
> > if it ever gets used by someone else.
> >
> > >
> > > That said, 'set_nonnegative' could be interpreted to be without
> > > NaNs?
> >
> > Sounds good to me.  How's this?
> 
> Hmm, I merely had lots of questions above so I think to answer them
> we at least should document what 'set_nonnegative' implies in the
> abstract vrange class.
> 
> It's probably safer to keep NaN in.  For example unfolded copysign (x, 1.)
> will return true for x == -NaN but the result will be a NaN.

FWIW, in IEEE, 'abs' (like 'copy, 'copysign' and 'negate') are not 
arithmetic, they are quiet-computational.  Hence they don't rise 
anything, not even for sNaNs; they copy the input bits and appropriately 
modify the bit pattern according to the specification (i.e. fiddle the 
sign bit).

That also means that a predicate like negative_p(x) that would be 
implemented ala

  copysign(1.0, x) < 0.0

deal with NaNs just fine and is required to correctly capture the sign of 
'x'.  If frange::set_nonnegative is supposed to be used in such contexts 
(and I think it's a good idea if that were the case), then set_nonnegative 
does _not_ imply no-NaN.

In particular I would assume that, given an VAYRING frange FR, that 
FR.set_nonnegative() would result in an frange {[+0.0,+inf],+nan} .


Ciao,
Michael.


Re: [PATCH 2/3] rename DBX_REGISTER_NUMBER to DEBUGGER_REGISTER_NUMBER

2022-09-01 Thread Michael Matz via Gcc-patches
Hello,

okay, I'll bite :)  DBG_REGISTER_NUMBER?  DEBUGGER_REGNO?


Ciao,
Michael.


Re: [PATCH] match.pd: Add bitwise and pattern [PR106243]

2022-08-04 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 3 Aug 2022, Jeff Law via Gcc-patches wrote:

> > .optimized dump shows:
> >_1 = ABS_EXPR ;
> >_3 = _1 & 1;
> >return _3;
> > 
> > altho combine simplifies it to x & 1 on RTL, resulting in code-gen:
> > f1:
> >  and w0, w0, 1
> >  ret
> Doesn't the abs(x) & mask simplify to x & mask for any mask where the sign bit
> of x is off

No.  Only the lowest bit remains the same between x and -x, all others 
might or might not be inverted (first counter example: x=-3, mask=3).


Ciao,
Michael.


Re: [PATCH] Add GIMPLE switch support to loop unswitching

2022-05-25 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 25 May 2022, Richard Biener via Gcc-patches wrote:

> I guess we might want to (warn about labels in the "toplevel"
> scope in switch statements).  So warn about
> 
> switch (x)
> {
> case 1:
> bar:
> };

That style is actually used quite some time in GCC itself.  Sometimes with 
statements between 'case 1:' and 'bar:'.

> Maybe we just want to warn when the label is otherwise unused?

We do with -Wunused-labels (part of -Wall).  The testcases simply aren't 
compiled with that.


Ciao,
Michael.


Re: [x86 PATCH] Avoid andn and generate shorter not;and with -Oz.

2022-04-13 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 13 Apr 2022, Roger Sayle wrote:

> The x86 instruction encoding for SImode andn is longer than the
> equivalent notl/andl sequence when the source for the not operand
> is the same register as the destination.

_And_ when no REX prefixes are necessary for the notl,andn, which they are 
if the respective registers are %r8 or beyond.  As you seem to be fine 
with saving just a byte you ought to test that as well to not waste one 
again :-)


Ciao,
Michael.


Re: Consider 'TDF_UID', 'TDF_NOUID' in 'print_node_brief', 'print_node'

2022-02-10 Thread Michael Matz via Gcc-patches
Hi,

On Thu, 10 Feb 2022, Richard Biener via Gcc-patches wrote:

> On Wed, Feb 9, 2022 at 2:21 PM Thomas Schwinge  
> wrote:
> >
> > Hi!
> >
> > OK to push (now, or in next development stage 1?) the attached
> > "Consider 'TDF_UID', 'TDF_NOUID' in 'print_node_brief', 'print_node'",
> > or should that be done differently -- or, per the current state (why?)
> > not at all?
> >
> > This does work for my current debugging task, but I've not yet run
> > 'make check' in case anything needs to be adjusted there.
> 
> Hmm, I wonder if we shouldn't simply dump DECL_UID as
> 
>  'uid NNN'

Yes, much better in line with the normal dump_tree output.


Ciao,
Michael.

> 
> somewhere.  For example after or before DECL_NAME?
> 
> >
> > Grüße
> >  Thomas
> >
> >
> > -
> > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 
> > 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: 
> > Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; 
> > Registergericht München, HRB 106955
> 


Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth

2022-02-09 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Feb 2022, Richard Biener wrote:

> > That breaks down when a birth is there (because it was otherwise 
> > reachable) but not on the taken path:
> > 
> >   if (nevertrue)
> > goto start;
> >   goto forw;
> >   start:
> >   {
> > int i;
> > printf("not really reachable, but unknowingly so\n");
> >   forw:
> > i = 1;
> >   }
> 
> I think to cause breakage you need a use of 'i' on the side-entry
> path that is not reachable from the path with the birth.  I guess sth like
> 
>if (nevertrue)
>  goto start;
>goto forw;
>start:
>{
>  int i = 0;
>  printf("not really reachable, but unknowingly so\n");
>  goto common;
>forw:
>  i = 1;
>common:
>  foo ();
>}
> 
> if we'd have a variable that's live only on the side-entry path
> then it could share the stack slot with 'i' this way, breaking
> things (now we don't move CLOBBERs so it isn't easy to construct
> such case).  The present dataflow would, for the above, indeed
> compute 'i' not live in the forw: block.

Yes, now that we have established (in the subthread with Joseph) that the 
value becomes indeterminate at decls we only need to care for not sharing 
storage invalidly, so yeah, some changes in the conflict computation still 
are needed.

> > Except for switches side-entries are really really seldom, so we might 
> > usefully get away with that latter solution.  And for switches it might be 
> > okay to put the births at the block containing the switch (if it itself 
> > doesn't have side entries, and the switch block doesn't have side 
> > entries except the case labels).
> > 
> > If the birth is moved to outward blocks it might be best if also the 
> > dealloc/death clobbers are moved to it, otherwise there might be paths 
> > containing a birth but no death.
> > 
> > The less precise you get with those births the more non-sharing you'll 
> > get, but that's the price.
> 
> Yes, sure.  I don't see a good way to place births during gimplification
> then.

Well, for each BIND you can compute if there are side entries at all, 
then, when lowering a BIND you put the births into the containing 
innermost BIND that doesn't have side-entries, instead of into the current 
BIND.

> The end clobbers rely on our EH lowering machinery.  For the entries we 
> might be able to compute GIMPLE BIND transitions during BIND lowering if 
> we associate labels with BINDs.  There should be a single fallthru into 
> the BIND at this point.  We could put a flag on the goto destination 
> labels whether they are reached from an outer BIND.
> 
>  goto inner;
>  {
>{
> int i;
>  {
>int j;
> inner:
>goto middle;
>  }
> middle:
>}
>  }
> 
> Since an entry CLOBBER is also a clobber we have to insert birth
> clobbers for all decls of all the binds inbetwee the goto source
> and the destination.  So for the above case on the edge to
> inner: have births for i and j and at the edge to middle we'd
> have none.

Correct, that's basically the most precise scheme, it's what I called 
try-finally topside-down ("always-before"? :) ).  (You have to care for 
computed goto! i.e. BINDs containing address-taken labels, which make 
things even uglier)  I think the easier way to deal with the above is to 
notice that the inner BIND has a side entry and conceptually move the 
decls outwards to BINDs that don't have such (or bind-crossing gotos), 
i.e. do as if it were:

  int i;  // moved
  int j;  // moved
  goto inner;
  {
{
  {
  inner:
goto middle;
  }
  middle:
}
  }

> Requires some kind of back-mapping from label to goto sources
> that are possibly problematic.  One issue is that GIMPLE
> lowering re-builds the BLOCK tree (for whatever reason...),
> so I'm not sure if we need to do it before that (for correctness).
> 
> Does that make sense?

I honestly can't say for 100% :-)  It sounds like it makes sense, yes.


Ciao,
Michael.


Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth

2022-02-09 Thread Michael Matz via Gcc-patches
Hey,

On Tue, 8 Feb 2022, Joseph Myers wrote:

> On Tue, 8 Feb 2022, Richard Biener via Gcc-patches wrote:
> 
> > which I think is OK?  That is, when the abstract machine
> > arrives at 'int i;' then the previous content in 'i' goes
> > away?  Or would
> 
> Yes, that's correct.  "If an initialization is specified for the object, 
> it is performed each time the declaration or compound literal is reached 
> in the execution of the block; otherwise, the value becomes indeterminate 
> each time the declaration is reached.".

Okay, that makes things easier then.  We can put the birth 
clobbers at their point of declaration, just the storage associated with a 
decl needs to survive for the whole block.  We still need to make sure 
that side entries skipping declarations correctly "allocate" such storage 
(by introducing proper conflicts with other objects), but at least values 
don't need to survive decls.


Ciao,
Michael.


Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth

2022-02-08 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 8 Feb 2022, Richard Biener wrote:

> > int state = 2, *p, camefrom1 = 0;
> > for (;;) switch (state) {
> >   case 1: 
> >   case 2: ;
> > int i;
> > if (state != 1) { p =  i = 0; }
> > if (state == 1) { (*p)++; return *p; }
> > state = 1;
> > continue;
> > }
> > 
> > Note how i is initialized during state 2, and needs to be kept initialized 
> > during state 1, so there must not be a CLOBBER (birth or other) at the 
> > point of the declaration of 'i'.  AFAICS in my simple tests a DECL_EXPR 
> > for 'i' is placed with the statement associated with 'case 2' label, 
> > putting a CLOBBER there would be the wrong thing.  If the decl had an 
> > initializer it might be harmless, as it would be overwritten at that 
> > place, but even so, in this case there is no initializer.  Hmm.
> 
> You get after gimplification:
> 
>   state = 2;
>   camefrom1 = 0;
>   :
>   switch (state) , case 1: , case 2: >
>   {
> int i;
> 
> try
>   {
> i = {CLOBBER(birth)};  /// ignore, should go away
> :
> :
> i = {CLOBBER(birth)};

I think this clobber here would be the problem, because ...

> which I think is OK?  That is, when the abstract machine
> arrives at 'int i;' then the previous content in 'i' goes
> away?  Or would
> 
> int foo()
> {
>goto ick;
> use:
>int i, *p;
>return *p;
> ick:
>i = 1;
>p = 
>goto use;
> 
> }
> 
> require us to return 1?

... I think this is exactly the logical consequence of lifetime of 'i' 
being the whole block.  We need to return 1. (Joseph: correct me on that! 
:) )  That's what I was trying to get at with my switch example as well.

> With the current patch 'i' is clobbered before the return.
> 
> > Another complication arises from simple forward jumps:
> > 
> >   goto forw;
> >   {
> > int i;
> > printf("not reachable\n");
> >   forw:
> > i = 1;
> >   }
> > 
> > Despite the jump skiping the unreachable head of the BLOCK, 'i' needs to 
> > be considered birthed at the label.  (In a way the placement of births 
> > exactly mirrors the placements of deaths (of course), every path from 
> > outside a BLOCK to inside needs to birth-clobber all variables (in C), 
> > like every path leaving needs to kill them.  It's just that we have a 
> > convenient construct for the latter (try-finally), but not for the former)
> 
> The situation with an unreachable birth is handled conservatively
> since we consider a variable without a (visible at RTL expansion time)
> birth to conflict with all other variables.

That breaks down when a birth is there (because it was otherwise 
reachable) but not on the taken path:

  if (nevertrue)
goto start;
  goto forw;
  start:
  {
int i;
printf("not really reachable, but unknowingly so\n");
  forw:
i = 1;
  }

> I don't see a good way to have a birth magically appear at 'forw' 
> without trying to argue that the following stmt is the first mentioning 
> the variable.

That's what my 'Hmm' aluded to :)  The only correct and precise way I see 
is to implement something similar like try-finally topside-down.  An 
easier but less precise way is to place the births at the (start of) 
innermost block containing the decl _and all jumps into the block_.  Even 
less presice, but perhaps even easier is to place the births for decls in 
blocks with side-entries into the function scope (and for blocks without 
side entries at their start).

Except for switches side-entries are really really seldom, so we might 
usefully get away with that latter solution.  And for switches it might be 
okay to put the births at the block containing the switch (if it itself 
doesn't have side entries, and the switch block doesn't have side 
entries except the case labels).

If the birth is moved to outward blocks it might be best if also the 
dealloc/death clobbers are moved to it, otherwise there might be paths 
containing a birth but no death.

The less precise you get with those births the more non-sharing you'll 
get, but that's the price.


Ciao,
Michael.


Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth

2022-02-08 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 8 Feb 2022, Richard Biener wrote:

> int foo(int always_true_at_runtime)
> {
>   {
> int *p;
> if (always_true_at_runtime)
>   goto after;
> lab:
> return *p;
> after:
> int i = 0;
> p = 
> if (always_true_at_runtime)
>   goto lab;
>   }
>   return 0;
> }
> 
> For the implementation I considered starting lifetime at a DECL_EXPR
> if it exists and otherwise at the start of the BIND_EXPR.  Note the
> complication is that control flow has to reach the lifetime-start
> clobber before the first access.  But that might also save us here,
> since for the example above 'i' would be live via the backedge
> from goto lab.
> 
> That said, the existing complication is that when the gimplifier
> visits the BIND_EXPR it has no way to know whether there will be
> a following DECL_EXPR or not (and the gimplifier notes there are
> cases a DECL_EXPR variable is not in a BIND_EXPR).  My plan is to
> compute this during one of the body walks the gimplifier performs
> before gimplifying.
> 
> Another complication is that at least the C frontend + gimplifier
> combination for
> 
>   switch (i)
>   {
>   case 1:
> int i;
> break;
>   }
> 
> will end up having the BIND_EXPR mentioning 'i' containing the
> case label, so just placing the birth at the BIND will make it
> unreachable as
> 
>   i = BIRTH;
> case_label_for_1:
>   ...

I think anything that needs to happen (conceptually) during the jump from 
switch to case-label needs to happen right before the jump, that might 
mean creating a new fake BLOCK that contains just the jump and the 
associated births.

> conveniently at least the C frontend has a DECL_EXPR for 'i'
> which avoids this and I did not find a way (yet) in the gimplifier
> to rectify this when gimplifying switches.

In C a 'case' label is nothing else than a normal label, it doesn't start 
it's own block or the like.  So also for that one the births would need to 
be at the start of their containing blocks.

> So there's still work to do but I think that starting the lifetime at a 
> DECL_EXPR if it exists is the way to go?

Depends on where the DECL_EXPR is exactly placed, but probably it wouldn't 
be okay.  You can't place the birth at the fall-through path, because this 
needs to work (basically your jump example above rewritten as switch):

int state = 2, *p, camefrom1 = 0;
for (;;) switch (state) {
  case 1: 
  case 2: ;
int i;
if (state != 1) { p =  i = 0; }
if (state == 1) { (*p)++; return *p; }
state = 1;
continue;
}

Note how i is initialized during state 2, and needs to be kept initialized 
during state 1, so there must not be a CLOBBER (birth or other) at the 
point of the declaration of 'i'.  AFAICS in my simple tests a DECL_EXPR 
for 'i' is placed with the statement associated with 'case 2' label, 
putting a CLOBBER there would be the wrong thing.  If the decl had an 
initializer it might be harmless, as it would be overwritten at that 
place, but even so, in this case there is no initializer.  Hmm.

Another complication arises from simple forward jumps:

  goto forw;
  {
int i;
printf("not reachable\n");
  forw:
i = 1;
  }

Despite the jump skiping the unreachable head of the BLOCK, 'i' needs to 
be considered birthed at the label.  (In a way the placement of births 
exactly mirrors the placements of deaths (of course), every path from 
outside a BLOCK to inside needs to birth-clobber all variables (in C), 
like every path leaving needs to kill them.  It's just that we have a 
convenient construct for the latter (try-finally), but not for the former)

Ciao,
Michael.


Re: [PATCH] libgcc: Actually force divide by zero

2022-02-03 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 3 Feb 2022, Richard Biener via Gcc-patches wrote:

> /* Preserve explicit divisions by 0: the C++ front-end wants to detect
>undefined behavior in constexpr evaluation, and assuming that the 
> division
>traps enables better optimizations than these anyway.  */
> (for div (trunc_div ceil_div floor_div round_div exact_div)
>  /* 0 / X is always zero.  */
>  (simplify
>   (div integer_zerop@0 @1)
>   /* But not for 0 / 0 so that we can get the proper warnings and errors.  
> */
>   (if (!integer_zerop (@1))
>@0))
> 
> it suggests we want to preserve all X / 0 when the 0 is literal but
> I think we can go a bit further and require 0/0 to not unnecessarily
> restrict other special cases.

Just remember that 0/0 is completely different from X/0 (with X != 0), the 
latter is a sensible limit, the former is just non-sense.  There's a 
reason why one is a NaN and the other Inf in floating point.  So it does 
make sense to differ between both on integer side as well.

(i'm split mind on the usefullness of "1/x -> 0" vs. its effect on 
trapping behaviour)


Ciao,
Michael.

> 
> Comments on the libgcc case below
> 
> > 2022-02-03  Jakub Jelinek  
> > 
> > * libgcc2.c (__udivmoddi4): Add optimization barriers to actually
> > ensure division by zero.
> > 
> > --- libgcc/libgcc2.c.jj 2022-01-11 23:11:23.810270199 +0100
> > +++ libgcc/libgcc2.c2022-02-03 09:24:14.513682731 +0100
> > @@ -1022,8 +1022,13 @@ __udivmoddi4 (UDWtype n, UDWtype d, UDWt
> > {
> >   /* qq = NN / 0d */
> >  
> > - if (d0 == 0)
> > -   d0 = 1 / d0;/* Divide intentionally by zero.  */
> > + if (__builtin_expect (d0 == 0, 0))
> > +   {
> > + UWtype one = 1;
> > + __asm ("" : "+g" (one));
> > + __asm ("" : "+g" (d0));
> > + d0 = one / d0;/* Divide intentionally by zero.  */
> > +   }
> 
> I'm not sure why we even bother - division or modulo by zero is
> undefined behavior and we are not emulating CPU behavior because
> the wide instructions emulated here do not actually exist.  That
> gives us the freedom of choosing the implementation defined
> behavior.
> 
> That said, _if_ we choose to "care" I'd rather choose to
> define the implementation to use the trap mechanism the
> target provides and thus use __builtin_trap ().  That then
> at least traps reliably, compared to the division by zero
> which doesn't do that on all targets.
> 
> So I'm not sure there's anything to fix besides eventually
> just deleting the d0 == 0 special case?
> 
> Richard.
> 


Re: [PATCH] Add CLOBBER_MARKS_EOL to mark storage end-of-life clobbers

2022-02-03 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 3 Feb 2022, Richard Biener via Gcc-patches wrote:

> > > It indeed did occur to me as well, so as we're two now I tried to 
> > > see how it looks like.  It does like the following (didn't bother to 
> > > replace all build_clobber calls but defaulted the parameter - there 
> > > are too many).
> > 
> > Except for this part I like it (I wouldn't call ca. 25 calls too 
> > many).  I often find optional arguments to be a long term maintenance 
> > burden when reading code.
> 
> But I think in this case it makes clear that the remaining callers are 
> un-audited

My pedantic answer to that would be that to make something clear you want 
to be explicit, and being explicit means writing something, not 
leaving something out, so you'd add another "CLOBBER_DONTKNOW" and use 
that for the unaudited calls.  Pedantic, as I said :)

But, of course, you built the shed, so paint it green, all right :)


Ciao,
Michael.


Re: [PATCH] Add CLOBBER_MARKS_EOL to mark storage end-of-life clobbers

2022-02-03 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 3 Feb 2022, Richard Biener wrote:

> > > This adds a flag to CONSTRUCTOR nodes indicating that for clobbers this 
> > > marks the end-of-life of storage as opposed to just ending the lifetime 
> > > of the object that occupied it. The dangling pointer diagnostics uses 
> > > CLOBBERs but is confused by those emitted by the C++ frontend for 
> > > example which emits them for the second purpose at the start of CTORs.  
> > > The issue is also appearant for aarch64 in PR104092.
> > > 
> > > Distinguishing the two cases is also necessary for the PR90348 fix.
> > 
> > (Just FWIW: I agree with the plan to have different types of CLOBBERs, in 
> > particular those for marking births)
> > 
> > A style nit:
> > 
> > > tree clobber = build_clobber (TREE_TYPE (t));
> > > +   CLOBBER_MARKS_EOL (clobber) = 1;
> > 
> > I think it really were better if build_clobber() gained an additional 
> > argument (non-optional!) that sets the type of it.
> 
> It indeed did occur to me as well, so as we're two now I tried to see
> how it looks like.  It does like the following (didn't bother to
> replace all build_clobber calls but defaulted the parameter - there
> are too many).

Except for this part I like it (I wouldn't call ca. 25 calls too 
many).  I often find optional arguments to be a long term maintenance 
burden when reading code.

(And yeah, enum good, putting the enum to tree_base.u, if it works, 
otherwise use your bit shuffling)


Ciao,
Michael.


Re: [PATCH] Add CLOBBER_MARKS_EOL to mark storage end-of-life clobbers

2022-02-02 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 2 Feb 2022, Richard Biener via Gcc-patches wrote:

> This adds a flag to CONSTRUCTOR nodes indicating that for clobbers this 
> marks the end-of-life of storage as opposed to just ending the lifetime 
> of the object that occupied it. The dangling pointer diagnostics uses 
> CLOBBERs but is confused by those emitted by the C++ frontend for 
> example which emits them for the second purpose at the start of CTORs.  
> The issue is also appearant for aarch64 in PR104092.
> 
> Distinguishing the two cases is also necessary for the PR90348 fix.

(Just FWIW: I agree with the plan to have different types of CLOBBERs, in 
particular those for marking births)

A style nit:

> tree clobber = build_clobber (TREE_TYPE (t));
> +   CLOBBER_MARKS_EOL (clobber) = 1;

I think it really were better if build_clobber() gained an additional 
argument (non-optional!) that sets the type of it.


Ciao,
Michael.


Re: [PATCH] x86_64: Ignore zero width bitfields in ABI and issue -Wpsabi warning about C zero width bitfield ABI changes [PR102024]

2022-01-10 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 20 Dec 2021, Uros Bizjak wrote:

> > Thanks.
> > I see nobody commented on Micha's post there.
> >
> > Here is a patch that implements it in GCC, i.e. C++ doesn't change ABI (at 
> > least
> > not from the past few releases) and C does for GCC:
> >
> > 2021-12-15  Jakub Jelinek  
> >
> > PR target/102024
> > * config/i386/i386.c (classify_argument): Add zero_width_bitfields
> > argument, when seeing DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD bitfields,
> > always ignore them, when seeing other zero sized bitfields, either
> > set zero_width_bitfields to 1 and ignore it or if equal to 2 process
> > it.  Pass it to recursive calls.  Add wrapper
> > with old arguments and diagnose ABI differences for C structures
> > with zero width bitfields.  Formatting fixes.
> >
> > * gcc.target/i386/pr102024.c: New test.
> > * g++.target/i386/pr102024.C: New test.
> 
> Please get a signoff on the ABI change (perhaps HJ can help here),
> I'll approve the implementation after that.

Christmas came in the way, but I just merged the proposed change 
(zero-with bit-fields -> NO_CLASS) into the psABI.


Ciao,
Michael.

> 
> Uros.
> 
> >
> > --- gcc/config/i386/i386.c.jj   2021-12-10 17:00:06.024369219 +0100
> > +++ gcc/config/i386/i386.c  2021-12-15 15:04:49.245148023 +0100
> > @@ -2065,7 +2065,8 @@ merge_classes (enum x86_64_reg_class cla
> >
> >  static int
> >  classify_argument (machine_mode mode, const_tree type,
> > -  enum x86_64_reg_class classes[MAX_CLASSES], int 
> > bit_offset)
> > +  enum x86_64_reg_class classes[MAX_CLASSES], int 
> > bit_offset,
> > +  int _width_bitfields)
> >  {
> >HOST_WIDE_INT bytes
> >  = mode == BLKmode ? int_size_in_bytes (type) : (int) GET_MODE_SIZE 
> > (mode);
> > @@ -2123,6 +2124,16 @@ classify_argument (machine_mode mode, co
> >  misaligned integers.  */
> >   if (DECL_BIT_FIELD (field))
> > {
> > + if (integer_zerop (DECL_SIZE (field)))
> > +   {
> > + if (DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD (field))
> > +   continue;
> > + if (zero_width_bitfields != 2)
> > +   {
> > + zero_width_bitfields = 1;
> > + continue;
> > +   }
> > +   }
> >   for (i = (int_bit_position (field)
> > + (bit_offset % 64)) / 8 / 8;
> >i < ((int_bit_position (field) + (bit_offset % 
> > 64))
> > @@ -2160,7 +2171,8 @@ classify_argument (machine_mode mode, co
> >   num = classify_argument (TYPE_MODE (type), type,
> >subclasses,
> >(int_bit_position (field)
> > -   + bit_offset) % 512);
> > +   + bit_offset) % 512,
> > +  zero_width_bitfields);
> >   if (!num)
> > return 0;
> >   pos = (int_bit_position (field)
> > @@ -2178,7 +2190,8 @@ classify_argument (machine_mode mode, co
> >   {
> > int num;
> > num = classify_argument (TYPE_MODE (TREE_TYPE (type)),
> > -TREE_TYPE (type), subclasses, 
> > bit_offset);
> > +TREE_TYPE (type), subclasses, 
> > bit_offset,
> > +zero_width_bitfields);
> > if (!num)
> >   return 0;
> >
> > @@ -2211,7 +2224,7 @@ classify_argument (machine_mode mode, co
> >
> >   num = classify_argument (TYPE_MODE (TREE_TYPE (field)),
> >TREE_TYPE (field), subclasses,
> > -  bit_offset);
> > +  bit_offset, 
> > zero_width_bitfields);
> >   if (!num)
> > return 0;
> >   for (i = 0; i < num && i < words; i++)
> > @@ -2231,7 +2244,7 @@ classify_argument (machine_mode mode, co
> >  X86_64_SSEUP_CLASS, everything should be passed in
> >  memory.  */
> >   if (classes[0] != X86_64_SSE_CLASS)
> > - return 0;
> > +   return 0;
> >
> >   for (i = 1; i < words; i++)
> > if (classes[i] != X86_64_SSEUP_CLASS)
> > @@ -2257,8 +2270,8 @@ classify_argument (machine_mode mode, co
> >   classes[i] = X86_64_SSE_CLASS;
> > }
> >
> > - /*  If X86_64_X87UP_CLASS isn't preceded by X86_64_X87_CLASS,
> > -  everything should be passed in 

Re: [PATCH] Remove unreachable gcc_unreachable () at the end of functions

2021-11-25 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 25 Nov 2021, Richard Biener via Gcc-patches wrote:

> It seems to be a style to place gcc_unreachable () after a
> switch that handles all cases with every case returning.
> Those are unreachable (well, yes!), so they will be elided
> at CFG construction time and the middle-end will place
> another __builtin_unreachable "after" them to note the
> path doesn't lead to a return when the function is not declared
> void.
> 
> So IMHO those explicit gcc_unreachable () serve no purpose,
> if they could be replaced by a comment.

Never document in comments what you can document in code (IMO).  I think 
the code as-is clearly documents the invariants and expectations and 
removing the gcc_unreachable() leads to worse sources.

Can't you simply exempt warning on unreachable __builtin_unreachable()?
It seems an obvious thing that the warning should _not_ warn about, after 
all, quite clearly, the author is aware of that being unreachable, it says 
so, right there.


Ciao,
Michael.


Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code

2021-11-25 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 25 Nov 2021, Richard Biener wrote:

> > Yes, that's definitely the case - I was too lazy to re-use the old
> > option name here.  But I don't have a good name at hand, maybe clang
> > has an option covering the cases I'm thinking about.

As you asked: I already have difficulties to describe the exact semantics 
of the warning in sentences, so I don't find a good name either :-)

> > Btw, the diagnostic spotted qsort_chk doing
> > 
> > if (CMP (i1, i2))
> >   break;
> > else if (CMP (i2, i1))
> >   return ERR2 (i1, i2);
> > 
> > where ERR2 expands to a call to a noreturn void "returning"
> > qsort_chk_error, so the 'return' stmt is not reachable.  Not exactly
> > a bug but somewhat difficult to avoid the diagnostic for.  I suppose
> > the pointless 'return' is to make it more visible that the loop
> > terminates here (albeit we don't return normally).

Tough one.  You could also disable the warning when the fallthrough 
doesn't exist because of a non-returning call.  If it's supposed to find 
obvious programming mistakes it might make sense to regard all function 
calls the same, like they look, i.e. as function calls that can return.  
Or it might make sense to not do that for programmers who happen to know 
about non-returning functions.  :-/

> It also finds this strange code in label_rtx_for_bb:

So the warning is definitely useful!

> indeed the loop looks pointless.  Unless the DECL_NONLOCAL case was 
> meant to continue;

It's like that since it was introduced in 2007.  It's an invariant that 
DECL_NONLOCAL labels are first in a BB and are not followed by normal 
labels, so a 'continue' wouldn't change anything; the loop is useless.


Ciao,
Michael.


Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code

2021-11-24 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 24 Nov 2021, Richard Biener wrote:

> >> +/* Unreachable code in if (0) block.  */
> >> +void baz(int *p)
> >> +{
> >> +   if (0)
> >> + {
> >> +return;  /* { dg-bogus "not reachable" } */
> >
> >Hmm?  Why are you explicitely saying that warning here would be bogus? 
> 
> Because I don't think we want to warn here. Such code is common from 
> template instantiation or macro expansion.

Like all code with an (const-propagated) explicit 'if (0)', which is of 
course the reason why -Wunreachable-code is a challenge.  IOW: I could 
accept your argument but then wonder why you want to warn about the second 
statement of the guarded block.  The situation was:

  if (0) {
return;  // (1) don't warn here?
whatever++;  // (2) but warn here?
  }

That seems even more confusing.  So you don't want to warn about 
unreachable code (the 'return') but you do want to warn about unreachable 
code within unreachable code (point (2) is unreachable because of the 
if(0) and because of the return).  If your worry is macro/template 
expansion resulting if(0)'s then I don't see why you would only disable 
warnings for some of the statements therein.

It seems we are actually interested in code unreachable via fallthrough or 
labels, not in all unreachable code, so maybe the warning is mis-named.

Btw. what does the code now do about this situation:

  if (0) {
something++;  // 1
return;   // 2
somethingelse++;  // 3
  }

does it warn at (1) or not?  (I assume it unconditionally warns at (3))


Ciao,
Michael.


Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code

2021-11-24 Thread Michael Matz via Gcc-patches
Hello,

> +/* Unreachable code in if (0) block.  */
> +void baz(int *p)
> +{
> +   if (0)
> + {
> +return;  /* { dg-bogus "not reachable" } */

Hmm?  Why are you explicitely saying that warning here would be bogus?  It 
quite clearly _is_ unreachable, so warning there makes sense.  Maybe you 
want an XFAILed dg-warning if your current implementation fails to warn, 
and a further XFAILed dg-bogus on the next line?

(Or at the very least a comment in the test case that this is actually not 
what we really want, but rather what current GCCs produce)


Ciao,
Michael.


Re: [PATCH][RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

2021-10-18 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 18 Oct 2021, Richard Sandiford wrote:

> > (It's a really cute hack that works as a micro optimization, the question 
> > is, do we really need to go there already, are all other less hacky 
> > approaches not bringing similar improvements?  The cuter the hacks the 
> > less often they pay off in the long run of production software :) )
> 
> FWIW, having been guilty of adding a similar hack(?) to SYMBOL_REFs
> for block_symbol, I like the approach of concatenating/combining structures
> based on flags.

The problem is that if you unset the flag you can't free the (now useless) 
storage.  What's worse is that you can't even reuse it anymore, because 
you lost the knowledge that it exists (except if you want to use another 
flag to note that).  It's of course obvious, but it helps to spell that 
out if we want to argue about ...

> The main tree and rtl types have too much baggage and

... baggage.  What you actually gain by associating different info pieces 
by address (e.g. concatenate allocations) is that you don't need to refer 
to one from the other, that's the space you spare, not anything inherent 
in the structures (which remain to have the members they would have 
anyway).  So, you basically trade one pointer (or index), which would 
possibly be optional, with address association and inflexibility (with the 
impossibility to manage both pieces individually: you can't free the 
second piece, and you can't add the second piece post-allocation).  It 
might be a good trade off sometimes, but in the abstract it's not a good 
design.

Regarding trees and space: to make something a tree you need 8 bytes and 
get a number of flags, and an arbitrary 4-byte blob in return.  I don't 
see that as much baggage.  We could reduce it further by splitting the 
arbitrary union and the tree_code+flags parts.  Especially for things 
referred to from tree_exp it makes sense to try making them trees 
themself.

> so I think there are some things that are better represented outside
> of them.
> 
> I suppose cselib VALUE rtxes are also similar, although they're more
> of a special case, since cselib data doesn't survive between passes.


Ciao,
Michael.


Re: [PATCH] Allow different vector types for stmt groups

2021-10-14 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 14 Oct 2021, Richard Biener via Gcc-patches wrote:

> > I have bisected an AMD zen2 10% performance regression of SPEC 2006 FP
> > 433.milc bechmark when compiled with -Ofast -march=native -flto to this
> > commit.  See also:
> > 
> >   
> > https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=412.70.0=289.70.0;
> > 
> > I am not sure if a bugzilla bug is in order because I cannot reproduce
> > the regression neither on an AMD zen3 machine nor on Intel CascadeLake,
> 
> It's for sure worth a PR for tracking purposes.  But I've not been
> very successful in identifying regression causes on Zen2 - what perf
> points to is usually exactly the same assembly in both base and peak :/

Well, in this case it's at least a fairly impressive (40%) difference in 
samples for add_force_to_mom* :

> > BEFORE:
> > 18.47%105497  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> > add_force_to_mom
> > 
> > AFTER:
> > 24.04%149010  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> > add_force_to_mom

(as the change was in the vectorizer this shouldn't be different inlining 
decisions hopefully).


Ciao,
Michael.


Re: [PATCH][RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

2021-10-14 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 14 Oct 2021, Richard Biener wrote:

> > So, at _this_ write-through of the email I think I like the above idea 
> > best: make ao_ref be a tree (at least its storage, because it currently 
> > is a one-member-function class), make ao_ref.volatile_p be 
> > tree_base.volatile_flag (hence TREE_VOLATILE(ao_ref)) (this reduces 
> > sizeof(ao_ref) by 8), increase all nr-of-operand of each tcc_reference by 
> > 1, and make TREE_AO_REF(reftree) be "TREE_OPERAND(reftree, 
> > TREE_CODE_LENGTH(reftree) - 1)", i.e. the last operand of such 
> > tcc_reference tree.
> 
> Hmm.  I'm not sure that's really something I like - it's especially
> quite some heavy lifting while at the same time lacking true boldness
> as to changing the representation of memory refs ;)

Well, it would at least enable such changes later in an orderly fashion.

> That said - I've prototyped the TREE_ASM_WRITTEN way now because it's 
> even simpler than the original TREE_AOREFWRAP approach, see below.
> 
> Note that I'm not embedding it into the tree structure, I'm merely
> using the same allocation to store two objects, the outermost ref
> and the ao_ref associated with it.  Quote:
> 
> +  size_t length = tree_code_size (TREE_CODE (lhs));
> +  if (!TREE_ASM_WRITTEN (lhs))
> +{
> +  tree alt_lhs
> +   = ggc_alloc_cleared_tree_node_stat (length + sizeof (ao_ref));
> +  memcpy (alt_lhs, lhs, length);
> +  TREE_ASM_WRITTEN (alt_lhs) = 1;
> +  *ref = new ((char *)alt_lhs + length) ao_ref;

You need to ensure that alt_lhs+length is properly aligned for ao_ref, but 
yeah, for a hack that works.  If you really want to go that way you need 
good comments about this hack.  It's really somewhat worrisome that the 
size of the allocation depends on a bit in tree_base.

(It's a really cute hack that works as a micro optimization, the question 
is, do we really need to go there already, are all other less hacky 
approaches not bringing similar improvements?  The cuter the hacks the 
less often they pay off in the long run of production software :) )


Ciao,
Michael.


Re: [PATCH][RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

2021-10-13 Thread Michael Matz via Gcc-patches
Hello,

[this is the fourth attempt to write a comment/review/opinion for this 
ao_ref-in-tcc_reference, please accept some possible incoherence]

On Tue, 12 Oct 2021, Richard Biener via Gcc-patches wrote:

> This prototype hack introduces a new tcc_reference TREE_AOREFWRAP
> which we can use to wrap a reference tree, recording the ao_ref
> associated with it.  That comes in handy when trying to optimize
> the constant factor involved with alias stmt walking (or alias
> queries in general) where there's parts that are liner in the
> reference expression complexity, namely get_ref_base_and_extent,
> which shows up usually high on profiles.

So, generally I like storing things into the IL that are impossible to 
(re-)discover.  Remembering things that are merely slowish to rediscover 
is less clear, the increase of IL memory use, and potentially the 
necessary pointer chasing might just trade one clearly measurable slow 
point (the rediscover computation) with many slow points all over the 
place (the pointer chasing/cache effects).  Here ...

> The following patch is minimal as to make tree-ssa.exp=ssa-fre-*
> not ICE and make the testcases from PR28071 and PR39326 compile
> successfully at -O1 (both testcases show a moderately high
> load on alias stmt walking around 25%, resp. 34%).  With the
> patch which makes use of the cache only from stmt_may_clobber_ref_p
> for now the compile-time improves by 7%, resp. 19% which means
> overall the idea might be worth pursuing.

... you seem to have a point, though.  Also, I am of the opinion that our 
gimple memrefs could be even fatter (and also encode things like 
multi-dimensional accesses, either right from the source code or 
discovered by index analysis), and your idea goes into that direction.  
So, yay for encoding memref properties into the IL, even though here it's 
only a cache.  You solved the necessary invalidation already.  Perhaps 
only partly, that will be seen once exposed to the wild.

So, the only question seems to be how to encode it: either by ...

> transparent by instead of wrapping the refs with another tree

... this (wrapping) ...

> to reallocate the outermost handled_component_p (and only those),

... or that (aggregation).  There is a third possibility if you want this 
only in the gimple world (which is the case): encode it not in the trees 
but in the gimple statements.  This sort of works easily for everything 
except calls.  I will not consider this variant, nor the side table 
implementation.

While writing this email I switched between more liking one or the other, 
multiple times.  So, I'll write down some basic facts/requirements:

1) You basically want to add stuff to an existing structure:
(a) by wrapping: to work seemlessly the outer tree should have similar 
enough properties to the inner tree (e.g. also be tcc_reference) to be 
used interchangably in most code, except that which needs to look at 
the added stuff.
(b) by aggregating the stuff into the existing structure itself: if you 
need both structs (with and without stuff) the pure thing to do is to 
actually create two structs, once with, once without stuff.
2) the added stuff is optional
3) we have multiple things (all tcc_reference) to which to add stuff
4) all tcc_reference are tree_exp, which is variable number of operands,
   which constrain things we can do naturally (e.g. we can't add stuff 
   after tree_exp, except by constraining the number of operands)

Considering this it seems that aggregation is worse: you basically double 
the number of structure types (at least conceptually, if you go with your 
bit-idea).  So, some idea of wrapping seems more natural.

(I think your idea of aggregation but going with a bit flag to indicate if 
this tcc_reference is or isn't annotated, and therefore has things 
allocated after the variable number of operands, is a terrible hack)

There is another possibility doing something like your bit-flag 
aggregation but with fewer hackery: if ao_ref would be a tree it could be 
a normal operand of a tree_exp (and hence tcc_reference), just that the 
number of operands then would vary depending on if it's annotated or not.

Making ao_ref into a tree would also enable the use of ANNOTATE_EXPR for a 
generic wrapping tree.  (Currently its used only in very specific cases, 
so ANNOTATE_EXPR handling would need to be extended all over, and as it's 
not a tcc_reference it would probably rather mean to introduce a new 
ANNOTATE_REF).

Anyway, with this:

struct tree_ref_annotation {
  struct tree_base base;
  struct ao_ref ao_ref;
};

DEFTREECODE(TREE_MEM_ANNO, "mem_anno", tcc_exceptional, 0);

you could then add

DEFTREECODE(MEM_REF_A, "mem_ref_a", tcc_reference, 3);

where TREE_OPERAND(memref, 2) would then be a TREE_MEM_ANNO.  If we were 
to add one operand slot to each tcc_reference we could even do without new 
tree codes: the existence of an ao_ref would simply be indicated by 
TREE_OPERAND(ref, position) 

Re: [PATCH] tree-optimization: [PR102622]: wrong code due to signed one bit integer and "a?-1:0"

2021-10-11 Thread Michael Matz via Gcc-patches
Hello,

On Sat, 9 Oct 2021, apinski--- via Gcc-patches wrote:

> +  (lshift (convert (convert:boolean_type_node @0)) { shift; })))
> +/* a ? -1 : 0 -> -a.  No need to check the TYPE_PRECISION not being 1
> +   here as the powerof2cst case above will handle that case correctly.  
> */

Well, but the QoI will improve quite a bit when you just do the check, 
instead of relying on order of patterns.  It's not slow or harmful to 
check and will make the order irrelevant, which, given the number of 
patterns we already have, is a good thing.  (It will also be smaller to 
check than to document why the check isn't needed :-) )


Ciao,
Michael.


Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-05 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 5 Oct 2021, Richard Biener wrote:

> > First notice that this doesn't have an empty latch block to start with 
> > (in fact, there is no separate latch block at all), so the loop 
> > optimizers haven't been initialized for simple latches at this point.  
> > Still, I would say that even then we want to be careful with the latch 
> > edge, as putting code onto it will most probably create a problem 
> > downstream once we _do_ want to intialize the loop optimizers for 
> > simple latches.  So, that we are careful here is okay, we are just too 
> > careful in this specific case.
> 
> Not sure if the argument about empty or not empty latches is important...

In this case it's not (as there are no separate latches anyway), but 
generally a latch that is already non-empty (i.e. problematic) only 
becomes more non-empty, so doing the threading doesn't introduce that 
specific problem.

> > > There's a pretty obvious jump thread in there.  3->4->5.
> > >
> > > We used to allow this, but it's currently being rejected and I'm not
> > > sure it should.
> > >
> > > In simplest terms we're threading from inside the loop, through the
> > > latch to a point outside the loop.  At first glance, that seems safe.
> 
> All threadings that start inside the loop and end outside of it are OK
> in principle.  Even if the path crosses the latch or the header.
> 
> We _might_ end up creating a loop with multiple exits though.

And entries (and hence irreducable loops)!  That's why I also added the 
piece about "all of B2..Bn-1 don't contain jumps back into the loop".  
I'm not sure if candidate paths going into our threader can have this 
problem, but if they have then you also don't want to do the block 
copying.

> > (a) while the thread is through the latch block, it's _not_ through the
> > latch edge (which is 4->3).  That would create the problem, so here
> > we're fine.
> > (b) even if it were through the latch edge, and would be followed by
> > to-be-copied instructions (and hence fill the latch edge) the ultimate
> > end of the path is outside the loop, so all the edges and blocks that
> > would now carry instructions would also be outside the loop, and hence
> > be no problem
> 
> Yep.
> 
> > Now, capture this precisely ;-)
> 
> I tried to capture this with
> 
> +  // The path should either start and end in the same loop or exit the
> +  // loop it starts in but never enter a loop.  This also catches
> +  // creating irreducible loops, not only rotation.
> +  if (entry->src->loop_father != exit->dest->loop_father
> +  && !flow_loop_nested_p (exit->src->loop_father,
> + entry->dest->loop_father))
> +{
> +  cancel_thread (, "Path rotates loop");
> +  return true;
> +}
> 
> it's not really necessary to have (simple) latches to argue about this
> I think.

Correct.  I don't think the above captures the re-entry problem (when 
intermediary blocks contain jumps inside the loop), the one I'm not sure 
we even have, but otherwise I think it does capture the (a) and (b) parts.


Ciao,
Michael.


Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-04 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 4 Oct 2021, Jeff Law wrote:

> And just in case it got lost.  Here's the analysis on 960218-1 on visium:
> 
> We've got this going into ethread:
> 
> int f (int x)
> {
>   int D.1420;
>   int a;
> 
> ;;   basic block 2, loop depth 0, maybe hot
> ;;    prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:   ENTRY (FALLTHRU,EXECUTABLE)
>   a_4 = ~x_3(D);
>   goto ; [INV]
> ;;    succ:   4 (FALLTHRU,EXECUTABLE)
> 
> ;;   basic block 3, loop depth 1, maybe hot
> ;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:   4 (TRUE_VALUE,EXECUTABLE)
>   gl = a_1;
> ;;    succ:   4 (FALLTHRU,DFS_BACK,EXECUTABLE)
> 
> ;;   basic block 4, loop depth 1, maybe hot
> ;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:   2 (FALLTHRU,EXECUTABLE)
> ;;    3 (FALLTHRU,DFS_BACK,EXECUTABLE)
>   # a_1 = PHI 
>   if (a_1 != 0)
>     goto ; [INV]
>   else
>     goto ; [INV]
> ;;    succ:   3 (TRUE_VALUE,EXECUTABLE)
> ;;    5 (FALSE_VALUE,EXECUTABLE)
> 
> ;;   basic block 5, loop depth 0, maybe hot
> ;;    prev block 4, next block 1, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:   4 (FALSE_VALUE,EXECUTABLE)
>   return;
> ;;    succ:   EXIT
> 
> }

First notice that this doesn't have an empty latch block to start with 
(in fact, there is no separate latch block at all), so the loop optimizers 
haven't been initialized for simple latches at this point.  Still, I would 
say that even then we want to be careful with the latch edge, as putting 
code onto it will most probably create a problem downstream once we _do_ 
want to intialize the loop optimizers for simple latches.  So, that we are 
careful here is okay, we are just too careful in this specific case.

> There's a pretty obvious jump thread in there.  3->4->5.
> 
> We used to allow this, but it's currently being rejected and I'm not 
> sure it should.
> 
> In simplest terms we're threading from inside the loop, through the 
> latch to a point outside the loop.  At first glance, that seems safe.

Is at least the unrestricted jump threader (the one after loop optimizers) 
picking this up?

Independend of that, I agree, that this specific instance of threading 
through the latch block, even though followed by to-be-copied 
instructions, is okay.  We need to determine precisely when that is okay, 
though.  In this case there are several reasons I could see why this is 
so:
(a) while the thread is through the latch block, it's _not_ through the
latch edge (which is 4->3).  That would create the problem, so here
we're fine.
(b) even if it were through the latch edge, and would be followed by 
to-be-copied instructions (and hence fill the latch edge) the ultimate 
end of the path is outside the loop, so all the edges and blocks that 
would now carry instructions would also be outside the loop, and hence 
be no problem

Now, capture this precisely ;-)

I think something like so: we have a candidate path

  S -> L -> B1 (-> B2 ... Bn)

(Start, Latch, Blocks 1 to n following the latch).  (I think in our 
notation that means that the jump in L is redirected to Bn, and all code 
from B1..Bn be copied, right?  Do we even support multiple blocks after 
the to-be-redirected jump?)

Now if L is a latch block, but L->B1 is no latch/back edge : no 
restrictions apply.

Otherwise, L->B1 is a latch/back edge (that means B1 is a loop header): if 
all of B2..Bn-1 don't contain jumps back into the loop, and Bn is outside 
the loop, then the thread is okay as well.  (B2..Bn-1 can be inside the 
loop, as long as they don't contain jumps back into the loop, after 
copying by the threader, they don't create problems: their copies will be 
placed outside the loop and won't generate side entries back into the 
loop; the copied latch edge will not be a latch edge anymore, but a loop 
exit edge).

It's quite possible that the full generality above is not necessary.  
It's probably enough to just not deal with the (b) case above (and 
continue to reject it), and simply only accept the candidate path if none 
of the involved edges is a latch/back edge (despite one block being the 
latch block).  Especially so because I'm not convinced that I handled 
the idea of case (b) correctly above ;-)


Ciao,
Michael.


Re: [PATCH] middle-end/102360 - adjust .DEFERRED_INIT expansion

2021-09-16 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 16 Sep 2021, Richard Biener via Gcc-patches wrote:

> > Typically for the native_interpret/native_encode we punt if 
> > BITS_PER_UNIT != 8 || CHAR_BIT != 8 because nobody had the energy to 
> > deal with the weird platforms (especially if we have currently none, I 
> > believe dsp16xx that had 16-bit bytes had been removed in 4.0 and c4x 
> > that had 32-bit bytes had been removed in 4.3) - dunno if the 
> > DEFERRED_INIT etc. code has those guards or not and it clearly 
> > documents that this code is not ready for other configurations. A byte 
> > is not necessarily 8 bits, that is just the most common size for it, 
> > and TYPE_SIZE_UNIT is number of BITS_PER_UNIT bit units.
> 
> OK, I'll do s/8/BITS_PER_UNIT/ - I also see that we have 
> int_size_in_bytes returning TYPE_SIZE_UNIT and that TYPE_SIZE_UNIT is 
> documented to yeild the type size in 'bytes'.

For better or worse GCCs meaning of 'byte' is really 'unit'; I guess most 
introductions of the term 'byte' in comments and function names really 
came from either carelessness or from people who knew this fact and 
thought it obvious that 'byte' of course is the same as 'unit', but not 
the same as octet.

FWIW: (for GCC) both mean the smallest naturally addressable piece of 
memory (i.e. what you get when you increase an address by 'one'), and that 
is not necessarily 8 bit (but anything else is bit-rotten of course).

As modern use of 'byte' tends to actually mean octet, but old use of byte 
(and use in GCC) means unit, we probably should avoid the term byte 
alltogether in GCC.  But that ship has sailed :-/

> I do believe that we should officially declare hosts with CHAR_BIT != 8 
> as unsupported and as you say support for targets with BITS_PER_UNIT != 
> 8 is likely bit-rotten.

(And characters are still something else entirely, except on those couple 
platforms where chars, units and octets happen to be the same :) )  
But yes.


Ciao,
Michael.


Re: Regression with recent change

2021-09-14 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 13 Sep 2021, Aldy Hernandez via Gcc-patches wrote:

The testcase still tests what it's supposed to test with ...

> > typedef unsigned short uint16_t;
> >
> > uint16_t a, b;
> >
> > int *j_global;
> > uint16_t f(void)
> > {
> >int c, **p;
> >short d = 2, e = 4;
> >

... "c = a;" added here (i.e. it still hangs before the pr55107 change).

> >for (;; b++)
> >  {
> >int *j = j_global, k = 0;
> >
> >for (; *j; j++)
> >   {
> > for(; c; c++)
> >   for(; k < 1; k++)
> > {
> >   short *f = 
> >
> >   if(b)
> > return *f;
> > }
> >   }
> >
> >if(!c)
> >   d *= e;
> >
> >a = d;
> >if ((a ? b = 0 : (**p ? : 1) != (d != 1 ? 1 : (b = 0))) != ((k ? a
> : 0)
> > < (a * (c = k
> >   **p = 0;
> >  }
> > }
> >
> 
> Thanks for getting rid of the noise here.
> 
> I've simplified the above to show what's going on in the warning on
> nds32-elf:
> 
> int george, *global;
> int stuff(), readme();
> 
> int
> f (void)
> {
>int store;
> 
>for (;;)
>  {
>int k = 0;
> 
>while (global)
> {
>   for (; store; ++store)

Yeah, that seems a correct warning, your 'store' (the 'c' in the original 
testcase) is really used uninitialized (when 'global' aka '*j_global' is 
non-zero).  Sorry for not noticing earlier, I was only getting rid of 
warnings, not carefully looking at the testcase itself.  I think with 
above initialization of 'c' it's conforming, and still a correct test for 
the original bug.

> This looks like a latent bug.  For that matter, the above snippet warns 
> with -fdisable-tree-thread2, even on x86-64 (and before my patch).


Ciao,
Michael.


Re: Regression with recent change

2021-09-13 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 13 Sep 2021, Jeff Law via Gcc-patches wrote:

> > So it looks like there's some undefined behavior going on, even before
> > my patch.  I'd like to get some feedback, because this is usually the
> > type of problems I see in the presence of a smarter threader... things
> > get shuffled around, problematic code gets isolated, and warning
> > passes have an easier time (or sometimes harder time) diagnosing
> > things.
> The original issue was PRE hanging, so I'd lean towards keeping the test as-is
> and instead twiddling any warning flags we can to make the diagnostics go
> away.

Or use this changed test avoiding the issues that I see with -W -Wall on 
this testcase.  I've verified that it still hangs before r194358 and is 
fixed by that revision.

Generally I think, our testsuite, even for ICEs or these kinds of hangs, 
should make an effort to try to write conforming code; if at all possible.  
Here it is possible.

(I don't know if the new threader causes additional warnings, of course, 
but at least the problems with sequence points and uninitialized use of 
'j' aren't necessary to reproduce the bug)


Ciao,
Michael.

/* { dg-do compile } */
/* { dg-additional-options "-fno-split-loops" } */

typedef unsigned short uint16_t;

uint16_t a, b;

int *j_global;
uint16_t f(void)
{
  int c, **p;
  short d = 2, e = 4;

  for (;; b++)
{
  int *j = j_global, k = 0;

  for (; *j; j++)
{
  for(; c; c++)
for(; k < 1; k++)
  {
short *f = 

if(b)
  return *f;
  }
}

  if(!c)
d *= e;

  a = d;
  if ((a ? b = 0 : (**p ? : 1) != (d != 1 ? 1 : (b = 0))) != ((k ? a : 0)
  < (a * (c = k
**p = 0;
}
}


Re: [PATCH] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-09-10 Thread Michael Matz via Gcc-patches
Hello,

On Fri, 10 Sep 2021, Richard Biener via Gcc-patches wrote:

> diagnostic from the Ada frontend.  The warnings are pruned from the
> testsuite output via prune_gcc_output but somehow this doesn't work
> for the tests in gfortran.dg/debug which are now failing with excess
> errors.  That seems to be the case for other fortran .exp as well
> when appending -gstabs, something which works fine for gcc or g++ dgs.

Fortran emits warnings with a capitalized 'W'.  Your regexp only checks 
for lower-case.

> +# Ignore stabs obsoletion warnings
> +regsub -all "(^|\n)\[^\n\]*warning: STABS debugging information is 
> obsolete and not supported anymore\[^\n\]*" $text "" text

This needs to be ".arning" or "\[Ww\]arning".


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-10 Thread Michael Matz via Gcc-patches
Hello,

On Fri, 10 Sep 2021, Aldy Hernandez via Gcc-patches wrote:

> Like this?

Yep, but I can't approve.


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-10 Thread Michael Matz via Gcc-patches
Hi,

On Fri, 10 Sep 2021, Aldy Hernandez via Gcc-patches wrote:

>  }
> +
> +  /* Threading through a non-empty latch would cause code to be added

"through an *empty* latch".  The test in code is correct, though.

And for the before/after loops flag you added: we have a 
cfun->curr_properties field which can be used.  We even already have a 
PROP_loops flag but that is set throughout compilation from CFG 
construction until the RTL loop optimizers, so can't be re-used for what 
is needed here.  But you still could invent another PROP_ value instead of 
adding a new field in struct function.


Ciao,
Michael.


Re: [Patch] gfortran: Fix in-build-tree testing [PR101305, PR101660]

2021-08-10 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 10 Aug 2021, Jakub Jelinek via Gcc-patches wrote:

> > +# Place ISO_Fortran_binding.h also under include/ in the build directory 
> > such
> > +# that it can be used for in-built-tree testsuite runs without 
> > interference of
> > +# other files in the build dir - like intrinsic .mod files or other .h 
> > files.
> >  ISO_Fortran_binding.h: $(srcdir)/ISO_Fortran_binding-1-tmpl.h \
> >$(srcdir)/ISO_Fortran_binding-2-tmpl.h \
> >$(srcdir)/ISO_Fortran_binding-3-tmpl.h \
> > @@ -1085,6 +1088,8 @@ ISO_Fortran_binding.h: 
> > $(srcdir)/ISO_Fortran_binding-1-tmpl.h \
> > $(COMPILE) -E -dD $(srcdir)/ISO_Fortran_binding-2-tmpl.h \
> > | grep '^#define CFI_type' >> $@
> > cat $(srcdir)/ISO_Fortran_binding-3-tmpl.h >> $@
> > +   $(MKDIR_P) include
> > +   cp $@ include/ISO_Fortran_binding.h
> 
> I see many Makefile.* in GCC doing rm -f file before cp whatever file,
> but don't know if that is just trying to be extra cautious or if it is
> needed for portability.  coreutils cp (and what POSIX says) is that
> overwriting the destination file should be fine and shouldn't cause
> failures, at least when the destination is writable.

I think this is to deal cautiously with symlinks: if the destination 
filename is a symlink to an existing file that target file is overwritten 
by cp, not the symlink (which continues to point to the now changed target 
file).


Ciao,
Michael.


Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Michael Matz
Hello,

On Thu, 1 Jul 2021, Richard Sandiford wrote:

> Well, it does feel like this is pressing the reset button on a thread
> whose message count is already in the high double figures.  There's a
> risk that restarting the discussion now is going to repeat a lot of the
> same ground, probably at similar length.  (But I realise that the sheer
> length of the thread is probably also the main reason for not having
> followed it closely. :-))

Yeah, I thought of not sending the mail because of that.  But it itched 
too itchy ;-)

> (2) continue to treat uses of uninitialised automatic variables as
> (semantically) undefined behaviour
> 
> When this was discussed on the clang list, (2) was a key requirement
> and was instrumental in getting the feature accepted.
> ... 
> since the behaviour is at least deterministic.  But from a debugging QoI
> perspective, this should not happen.  We should generate the same code
> as for:
> 
>int foo = ;
>if (flag)
>  foo = 1;
>x = foo;
> 
> However, because of (2), we should still generate a 
> -Wmaybe-uninitialized warning for the “x = foo” assignment.
> 
> This is not a combination we support at the moment, so something needs 
> to give.  The question of course is what.

Nothing needs to give.  You add the initialization with , 
_without an uninitialized use_, i.e. just:

   foo = .deferred_init(pattern)

or the like.  Then you let the optimizers do their thing.  If 'foo' is 
provably initialized (or rather overwritten in this setting) on all paths 
from above init to each use, that init will be removed.  IOW: if the init 
remains then not all paths are provably initializing.  I.e. the warning 
code can be triggered based on the simple existence of the deferred_init 
initializer.

Basically the .deferred_init acts as a "here the scope starts" marker.  
That's indeed what you want, not the "here the scope ends" clobber.  If a 
use reaches the scope-start marker you have an (possibly) uninitialized 
use and warn.

(I'm obviously missing something because the above idea seems natural, so 
was probably already discussed and rejected, but why?)

> > Garbage in, garbage out.  It makes not too much sense to argue that
> > the generated PHI node on this loop (generated because of the
> > exposed-upward use of foo) is wrong, or right, or should better be
> > something else.  The input program was broken, so anything makes sense.
> 
> Yeah, without the new option it's GIGO.  But I think it's still possible
> to rank different forms of garbage output in terms of how self-consistent
> they are.
> 
> IMO it makes no sense to say that “foo” is upwards-exposed to somewhere
> where “foo” doesn't exist.  Using “foo_N(D)” doesn't have that problem,
> so I think it's possible to say that it's “better” garbage output. :-)
> 
> And the point is that with the new option, this is no longer garbage
> input as far as SSA is concerned: carrying the old value of “foo”
> across iterations would not be implementing the option correctly.
> How SSA handles this becomes a correctness issue.

I disagree.  The notion of "foo doesn't exist" simply doesn't exist (ugh!) 
in SSA; SSA is not the issue here.  The notion of ranges of where foo does 
or doesn't exist are scopes, that's orthogonal; so you want to encode that 
info somehow in a way compatible with SSA (!).  Doing that 
with explicit instructions is sensible (like marking scope endings with 
the clobber insn was sensible).  But those instructions simply need to not 
work against other invariants established and required by SSA.  Don't 
introduce new undefined uses and you're fine.

> > I think it's a chicken egg problem: you can't add undefined uses, for 
> > which you need to know if there was one, but the facility is supposed to 
> > help detecting if there is one to start with.
> 
> The idea is that .DEFERRED_INIT would naturally get optimised away by
> DCE/DSE if the variable is provably initialised before it's used.

Well, that's super.  So, why would you want or need the uninitialized use 
in the argument of .DEFERRED_INIT?


Ciao,
Michael.


Re: [PATCH] Port GCC documentation to Sphinx

2021-07-01 Thread Michael Matz
Hello,

On Thu, 1 Jul 2021, Martin Liška wrote:

> On 7/1/21 3:33 PM, Eli Zaretskii wrote:
> > > Cc: jos...@codesourcery.com, g...@gcc.gnu.org, gcc-patches@gcc.gnu.org
> > > From: Martin Liška 
> > > Date: Thu, 1 Jul 2021 14:44:10 +0200
> > > 
> > > > It helps some, but not all of the issues disappear.  For example,
> > > > stuff like this is still hard to read:
> > > > 
> > > > To select this standard in GCC, use one of the options -ansi
> > > >-
> > > > -std.‘=c90’ or -std.‘=iso9899:1990’
> > > >    
> > > 
> > > If I understand the notes correct, the '.' should be also hidden by e.g.
> > > Emacs.
> > 
> > No, it doesn't.  The actual text in the Info file is:
> > 
> > *note -std: f.‘=iso9899:1990’
> > 
> > and the period after " f" isn't hidden.  Where does that "f" come from
> > and what is its purpose here? can it be removed (together with the
> > period)?
> 
> It's name of the anchor used for the @ref. The names are automatically
> generated
> by makeinfo. So there's an example:
> 
> This is the warning level of @ref{e,,-Wshift-overflow3} and …
> 
> becomes in info:
> This is the warning level of *note -Wshift-overflow3: e. and …
> 
> I can ask the question at Sphinx, the Emacs script should hide that.

Not everyone reads info within Emacs; even if there's an emacs solution to 
postprocess info pages to make them nicer we can't rely on that.  It must 
look sensible without that.  In this case it seems that already the 
generated .texinfo input to makeinfo is bad, where does the 'e' (or 'f') 
come from?  The original texinfo file simply contains:

  @option{-std=iso9899:1990}

so that's what perhaps should be generated, or maybe the anchor in @ref is 
optional and could be left out if it doesn't provide any info (a single 
character as anchor doesn't seem very useful)?

> > > > > We can adjust 'emph' formatting to nil, what do you think?
> > > > 
> > > > Something like that, yes.  But the problem is: how will you format it
> > > > instead?  The known alternatives, _foo_ and *foo* both use punctuation
> > > > characters, which will get in the way similarly to the quotes.  Can
> > > > you format those in caps, like makeinfo does?
> > > 
> > > You are fully right, info is very simple format and it uses wrapping for
> > > the formatting
> > > purpose (by default * and _). So, I don't have any elegant solution.
> > 
> > Well, it sounds from another mail that you did find a solution: to
> > up-case the string in @var.
> 
> I don't know. Some of them can be e.g. keywords and using upper-case 
> does not seem to me feasible.

Then that needs to be different already in the input, so that the 
directive that (in info) capitalizes is only used in contexts where that 
makes sense.  People reading info pages will know that an all-caps word 
often means a syntactic variable/placeholder, so that should be preserved.


Ciao,
Michael.


Re: [PATCH] tree-optimization/101280 - revise interchange fix for PR101173

2021-07-01 Thread Michael Matz
Hello,

On Thu, 1 Jul 2021, Richard Biener wrote:

> diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
> index 43045c5455e..43ef112a2d0 100644
> --- a/gcc/gimple-loop-interchange.cc
> +++ b/gcc/gimple-loop-interchange.cc
> @@ -1043,8 +1043,11 @@ tree_loop_interchange::valid_data_dependences 
> (unsigned i_idx, unsigned o_idx,
>   continue;
>  
> /* Be conservative, skip case if either direction at i_idx/o_idx
> -  levels is not '=' (for the inner loop) or '<'.  */
> -   if (dist_vect[i_idx] < 0 || dist_vect[o_idx] <= 0)
> +  levels is not '=' or '<'.  */
> +   if (dist_vect[i_idx] < 0
> +   || (DDR_REVERSED_P (ddr) && dist_vect[i_idx] > 0)
> +   || dist_vect[o_idx] < 0
> +   || (DDR_REVERSED_P (ddr) && dist_vect[o_idx] > 0))

Hmm, if DDR_REVERSED_P matters here, then it should matter for all arms.  
IOW: < 0 should be tested only when !DDR_REVERSED_P, not always:

   if ((!DDR_REVERSED_P (ddr) && dist_vect[i_idx] < 0)
   || (DDR_REVERSED_P (ddr) && dist_vect[i_idx] > 0)
   || (!DDR_REVERSED_P (ddr) && dist_vect[o_idx] < 0)
   || (DDR_REVERSED_P (ddr) && dist_vect[o_idx] > 0))

(what you have effectively written is a condition that allows only 0 when 
DDR_REVERSED_P)


Ciao,
Michael.


Re: HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-07-01 Thread Michael Matz
Hello,

I haven't followed this thread too closely, in particular I haven't seen 
why the current form of the .DEFERRED_INIT call was chosen or suggested, 
but it triggered my "well, that's obviously wrong" gut feeling; so sorry 
for stating something which might be obvious to thread participants here.  
Anyway:

On Thu, 1 Jul 2021, Richard Sandiford via Gcc-patches wrote:

> >> It's not a bug in SSA, it's at most a missed optimization there.
> >
> > I still think that SSA cannot handle block-scoped variable correctly 
> > in this case, it should not move the variable out side of the block 
> > scope. -:)
> >
> >>  But with -ftrivial-auto-var-init it becomes a correctness issue.
> 
> I might have lost track of what “it” is here.  Do you mean the 
> progation, or the fact that we have a PHI in the first place?
> 
> For:
> 
> unsigned int
> f (int n)
> {
>   unsigned int res = 0;
>   for (int i = 0; i < n; i += 1)
> {
>   unsigned int foo;
>   foo += 1;

Note that here foo is used uninitialized.  That is the thing from which 
everything else follows.  Garbage in, garbage out.  It makes not too much 
sense to argue that the generated PHI node on this loop (generated because 
of the exposed-upward use of foo) is wrong, or right, or should better be 
something else.  The input program was broken, so anything makes sense.

That is the same with Qings shorter testcase:

  6   for (i = 0; i < a; i++) {
  7 if (__extension__({int size2;
  8 size2 = ART_INIT (size2, 2);

Uninitialized use of size2 right there.  And it's the same for the use of 
.DEFERRED_INIT as the patch does:

{
  int size2;
  size2 = .DEFERRED_INIT (size2, 2);
  size2 = 4;
  _1 = size2 > 5;
  D.2240 = (int) _1;
}

Argument of the pseudo-function-call to .DEFERRED_INIT uninitialized -> 
boom.

You can't solve any of this by fiddling with how SSA rewriting behaves at 
a large scale.  You need to change this and only this specific 
interpretation of a use.  Or preferrably not generate it to start with.

> IMO the foo_3 PHI makes no sense.  foo doesn't live beyond its block,
> so there should be no loop-carried dependency here.
> 
> So yeah, if “it” meant that the fact that variables live too long,
> then I agree that becomes a correctness issue with this feature,
> rather than just an odd quirk.  Could we solve it by inserting
> a clobber at the end of the variable's scope, or something like that?

It would possibly make GCC not generate a PHI on this broken input, yes.  
But why would that be an improvement?

> > Agreed, I have made such change yesterday, and this work around the 
> > current issue.
> >
> > temp = .DEFERRED_INIT (temp/WITH_SIZE_EXPR(temp), init_type)
> >
> > To:
> >
> > temp = .DEFERRED_INIT (SIZE_of_temp, init_type)
> 
> I think we're going round in circles here though.  The point of having
> the undefined input was so that we would continue to warn about undefined
> inputs.  The above issue doesn't seem like a good justification for
> dropping that.

If you want to rely on the undefined use for error reporting then you must 
only generate an undefined use when there was one before, you can't just 
insert new undefined uses.  I don't see how it could be otherwise, as soon 
as you introduce new undefined uses you can and will run into GCC making 
use of the undefinedness, not just this particular issue with lifetime and 
PHI nodes which might be "solved" by clobbers.

I think it's a chicken egg problem: you can't add undefined uses, for 
which you need to know if there was one, but the facility is supposed to 
help detecting if there is one to start with.


Ciao,
Michael.


Re: Aligning stack offsets for spills

2021-06-08 Thread Michael Matz
Hello,

On Tue, 8 Jun 2021, Jeff Law wrote:

> On 6/8/2021 9:06 AM, H.J. Lu wrote:
> > On Tue, Jun 8, 2021 at 7:56 AM Jakub Jelinek via Gcc-patches
> >  wrote:
> >> On Tue, Jun 08, 2021 at 08:47:26AM -0600, Jeff Law wrote:
>  Why is the machinery involving STACK_SLOT_ALIGNMENT and
>  spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for
>  backing stack slots) not working for you?  If everything is setup
>  correctly the input alignment to try_fit_stack_local ought to be correct
>  already.
> >>> We don't need the MEM as a whole aligned, just the offset in the address
> >>> calculation due to how we encode those instructions.  If I've read that
> >>> code
> >>> correctly, it would arrange for a dynamic realignment of the stack  so
> >>> that
> >>> it could then align the slot. None of that is necessary for us and we'd
> >>> like
> >>> to avoid forcing the dynamic stack realignment.  Or did I misread the
> >>> code?
> >> I think dynamic stack realignment is done only on x86, no other backend has
> > I believe that all pieces of infrastructure to realign the stack are
> > in place.  You
> > just need to properly align the stack in the backend.
> 
> As I've stated, we don't need the stack aligned to these higher boundaries. 
> Nor do we need the slot as a whole aligned.   That's ultimately just wasteful
> since we don't need them.  We just want to get an aligned offset.

Well, but isn't that creating a difference when there is none?  You need 
an aligned offset; when given an aligned stack pointer that then is 
equivalent to an aligned stack address.  You are saying that you don't 
need the aligned stack pointer, sure, but would it be a problem for you?

Apart from that: dynamic stack realignment can be disabled (or probably 
isn't enabled for your target to start with), then the stack offset 
alignment machinery should still work in isolation.  (Well it might 
generate alignment claims in MEM RTL which then isn't in fact true, 
depends on the architecture if that's a problem for you).

Either way, I think whatever you need should probably be somehow 
integrated with the existing stack slot alignment knobs.  Or rather the 
two orthogonal pieces (stack pointer alignment and stack offset alignment) 
be separated and you then using only the latter.

(Btw: are you also trying to improve non-stack addresses?  Because 
ultimately your constraints aren't about stack at all but about all 
address forms.  In a way this all is more like a job for addressing 
mode selection and massaging, but of course our support for such in GCC is 
limited beyond avoiding invalid modes by reloading into registers, which 
is exactly what you don't want :) )


Ciao,
Michael.


Re: Aligning stack offsets for spills

2021-06-08 Thread Michael Matz
Hello,

On Mon, 7 Jun 2021, Jeff Law wrote:

> 
> So, as many of you know I left Red Hat a while ago and joined Tachyum.  We're
> building a new processor and we've come across an issue where I think we need
> upstream discussion.
> 
> I can't divulge many of the details right now, but one of the quirks of our
> architecture is that reg+d addressing modes for our vector loads/stores
> require the displacement to be aligned.  This is an artifact of how these
> instructions are encoded.
> 
> Obviously we can emit a load of the address into a register when the
> displacement isn't aligned.  From a correctness point that works perfectly. 
> Unfortunately, it's a significant performance hit on some standard benchmarks
> (spec) where we have a great number of spills of vector objects into the stack
> at unaligned offsets in the hot parts of the code.
> 
> 
> We've considered 3 possible approaches to solve this problem.
> 
> 1. When the displacement isn't properly aligned, allocate more space in
> assign_stack_local so that we can make the offset aligned.  The downside is
> this potentially burns a lot of stack space, but in practice the cost was
> minimal (16 bytes in a 9k frame)  From a performance standpoint this works
> perfectly.
> 
> 2. Abuse the register elimination code to create a second pointer into the
> stack.  Spills would start as  + offset, then either get eliminated
> to sp+offset' when the offset is aligned or gpr+offset'' when the offset
> wasn't properly aligned. We started a bit down this path, but with #1 working
> so well, we didn't get this approach to proof-of-concept.
> 
> 3. Hack up the post-reload optimizers to fix things up as best as we can. 
> This may still be advantageous, but again with #1 working so well, we didn't
> explore this in any significant way.  We may still look at this at some point
> in other contexts.
> 
> Here's what we're playing with.  Obviously we'd need a target hook to 
> drive this behavior.  I was thinking that we'd pass in any slot offset 
> alignment requirements (from the target hook) to assign_stack_local and 
> that would bubble down to this point in try_fit_stack_local:

Why is the machinery involving STACK_SLOT_ALIGNMENT and 
spill_slot_alignment() (for spilling) or get_stack_local_alignment() (for 
backing stack slots) not working for you?  If everything is setup 
correctly the input alignment to try_fit_stack_local ought to be correct 
already.


Ciao,
Michael.


Re: GCC documentation: porting to Sphinx

2021-06-01 Thread Michael Matz
Hello,

On Tue, 1 Jun 2021, Martin Liška wrote:

> On 5/31/21 5:49 PM, Michael Matz wrote:
> > Hello Martin,
> > 
> > On Mon, 31 May 2021, Martin Liška wrote:
> > 
> >> I've made quite some progress with the porting of the documentation and
> >> I would like to present it to the community now:
> >> https://splichal.eu/scripts/sphinx/
> >>   Note the documentation is automatically ([1]) generated from texinfo with
> >> a
> >> GitHub workflow ([2]).
> > 
> > One other thing I was recently thinking about, in the Spinx vs. texinfo
> > discussion: locally available documentation browsable/searchable in
> > terminal with info(1) (or equivalents).
> 
> Yes, that's handy.
> 
> > I think the above (i.e. generating .rst from the texinfo file) would 
> > immediately nullify all my worries.  So, just to be extra sure: your 
> > proposal now is to generate the .rst files, and that .texinfo remains 
> > the maintained sources, right?
> 
> No, .texinfo files will be gone. However, Sphinx can output to info 
> format: 
> https://www.sphinx-doc.org/en/master/man/sphinx-build.html#cmdoption-sphinx-build-M

I see, that's good to hear.

> And I've just added the generated Info pages here:
> https://splichal.eu/scripts/sphinx/

Okay, but there's something amiss, just compare a local gcc.info with 
that.  The sphinx generated one seems to only contain command line 
options, but none of the other topics, in particular it seems to contain 
the "Invoking GCC" chapter (and only that) as top-level, and all other 
ones are missing (like "C implementation", "C++ implementation", "C 
extension", and so on).

Looking at gccint.info I also seem quite some confusion, it's unclear to 
me if content is missing or not.  But e.g. the top-level structure has a 
different order (a less logical one, this one is btw. shared with the 
order of the HTML generated docu, so it's probably specific to sphinx 
setup or such).

Ignoring that missing content what is there right now does seem somewhat 
acceptable for local use, though.


Ciao,
Michael.


Re: GCC documentation: porting to Sphinx

2021-05-31 Thread Michael Matz
Hello Martin,

On Mon, 31 May 2021, Martin Liška wrote:

> I've made quite some progress with the porting of the documentation and
> I would like to present it to the community now:
> https://splichal.eu/scripts/sphinx/
>  
> Note the documentation is automatically ([1]) generated from texinfo with a
> GitHub workflow ([2]).

One other thing I was recently thinking about, in the Spinx vs. texinfo 
discussion: locally available documentation browsable/searchable in 
terminal with info(1) (or equivalents).  I think the above (i.e. 
generating .rst from the texinfo file) would immediately nullify all my 
worries.  So, just to be extra sure: your proposal now is to generate the 
.rst files, and that .texinfo remains the maintained sources, right?


Ciao,
Michael.


Re: [PATCH] Generate gimple-match.c and generic-match.c earlier

2021-05-28 Thread Michael Matz
Hello,

On Fri, 28 May 2021, Bernd Edlinger wrote:

> >> I was wondering, why gimple-match.c and generic-match.c
> >> are not built early but always last, which slows down parallel
> >> makes significantly.
> >>
> >> The reason seems to be that generated_files does not
> >> mention gimple-match.c and generic-match.c.
> >>
> >> This comment in Makefile.in says it all:
> >>
> >> $(ALL_HOST_OBJS) : | $(generated_files)
> >>
> >> So this patch adds gimple-match.c generic-match.c to generated_files.
> >>
> >>
> >> Tested on x86_64-pc-linux-gnu.
> >> Is it OK for trunk?
> > 
> > This should help for what I was complaining about in
> > https://gcc.gnu.org/pipermail/gcc/2021-May/235963.html . I build with
> > -j24 and it was stalling on compiling gimple-match.c for me.
> > Looks like insn-attrtab.c is missed too; I saw genattrtab was running last 
> > too.
> > 
> 
> Yeah, probably insn-automata.c as well, sometimes it is picked up early 
> sometimes not. maybe $(simple_generated_c) should be added to 
> generated_files, but insn-attrtab.c is yet another exception.

You can't put files in there that are sometimes slow to generate (which 
insn-{attrtab,automata}.c are on some targets), as _everything_ then waits 
for them to be created first.

Ideally there would be a way for gnumake to mark some targets as 
"ugh-slow" and back-propagate this to all dependencies so that those are 
put in front of the work queue in a parallel make.  Alas, something like 
that never came into existence :-/  (When order-only deps were introduced 
I got excited, but then came to realize that that wasn't what was really 
needed for this case, a "weak" version of it would be required at least, 
or better yet a specific facility to impose a cost with a target)


Ciao,
Michael.

> 
> 
> Bernd.
> 
> > Thanks,
> > Andrew
> > 
> >>
> >>
> >> Thanks
> >> Bernd.
> >>
> >>
> >> 2021-05-28  Bernd Edlinger  
> >>
> >> * Makefile.in (generated_files): Add gimple-match.c and
> >> generic-match.c
> 


Re: [PATCH] Avoid DSE/DCE of pure call that throws

2021-05-03 Thread Michael Matz
Hello,

On Mon, 3 May 2021, Jan Hubicka wrote:

> > (it should not abort).  The documentation of 'pure' must be read
> > as that 'pure' is not valid for 'foo' since throwing an exception
> > is obviously an observable effect on the state of the program
> > (in particular for the testcase at hand).  But for example
> > IPA pure const does not cause us to infer nothrow on pure
> > declared functions and the C++ program
> > 
> > ...
> > 
> > So - what is it?  Are pure functions allowed to throw or not?
> 
> I was one adding pure functions and it was really intended to be same
> thing as const+reading memory.  So does const function throw or not?

I really want to say that const/pure and throw are mutually exclusive.  
But in reality they are orthogonal, so a const function may throw.  (It 
certainly can in C++).

This is because while we implement exceptions with memory state, that 
state in not accessible to the application.  Exceptions could for instance 
also be implemented via return type extension.  And then, as long as other 
guarantees of const or pure functions are adhered to (same input -> same 
output), throwing an exception from a const function would be completely 
natural.  E.g. if a const function, given a specific set of arguments, 
always throws the same value that would still allow to CSE two calls to it 
in a row, and it would still allow to assume that no reachable memory was 
changed by that call.

So, I think const and pure functions may validly throw (but then must 
always do so given the same inputs).


Ciao,
Michael.


> Internally we definitly want to make this separate - with symbol
> summaries it is now reasonably easy to do.  I was thinking of doing
> similar summary as modref handles to pass down separate info tracked by
> pure-const patch rather than trying to re-synthetize it to rather random
> attributes we have now...
> 
> Honza
> > 
> > 2021-05-03  Richard Biener  
> > 
> > * calls.c (expand_call): Preserve possibly throwing calls.
> > * cfgexpand.c (expand_call_stmt): When a call can throw signal
> > RTL expansion there are side-effects.
> > * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Simplify.
> > * tree-ssa-dse.c (pass_dse::execute): Preserve exceptions unless
> > -fdelete-dead-exceptions.
> > 
> > * g++.dg/tree-ssa/pr100382.C: New testcase.
> > ---
> >  gcc/calls.c  |  1 +
> >  gcc/cfgexpand.c  |  5 -
> >  gcc/testsuite/g++.dg/tree-ssa/pr100382.C | 25 
> >  gcc/tree-ssa-dce.c   | 21 +++-
> >  gcc/tree-ssa-dse.c   |  3 ++-
> >  5 files changed, 35 insertions(+), 20 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr100382.C
> > 
> > diff --git a/gcc/calls.c b/gcc/calls.c
> > index 883d08ba5f2..f3da1839dc5 100644
> > --- a/gcc/calls.c
> > +++ b/gcc/calls.c
> > @@ -3808,6 +3808,7 @@ expand_call (tree exp, rtx target, int ignore)
> >   side-effects.  */
> >if ((flags & (ECF_CONST | ECF_PURE))
> >&& (!(flags & ECF_LOOPING_CONST_OR_PURE))
> > +  && (flags & ECF_NOTHROW)
> >&& (ignore || target == const0_rtx
> >   || TYPE_MODE (rettype) == VOIDmode))
> >  {
> > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> > index a6b48d3e48f..556a0b22ed6 100644
> > --- a/gcc/cfgexpand.c
> > +++ b/gcc/cfgexpand.c
> > @@ -2795,7 +2795,10 @@ expand_call_stmt (gcall *stmt)
> >CALL_EXPR_ARG (exp, i) = arg;
> >  }
> >  
> > -  if (gimple_has_side_effects (stmt))
> > +  if (gimple_has_side_effects (stmt)
> > +  /* ???  Downstream in expand_expr_real_1 we assume that expressions
> > +w/o side-effects do not throw so work around this here.  */
> > +  || stmt_could_throw_p (cfun, stmt))
> >  TREE_SIDE_EFFECTS (exp) = 1;
> >  
> >if (gimple_call_nothrow_p (stmt))
> > diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr100382.C 
> > b/gcc/testsuite/g++.dg/tree-ssa/pr100382.C
> > new file mode 100644
> > index 000..b9948d3034a
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/tree-ssa/pr100382.C
> > @@ -0,0 +1,25 @@
> > +// { dg-do run }
> > +// { dg-options "-O2" }
> > +
> > +int x, y;
> > +int __attribute__((pure,noinline)) foo () { if (x) throw; return y; }
> > +
> > +int __attribute__((noinline)) bar()
> > +{
> > +  int a[2];
> > +  x = 1;
> > +  try {
> > +int res = foo ();
> > +a[0] = res;
> > +  } catch (...) {
> > +  return 0;
> > +  }
> > +  return 1;
> > +}
> > +
> > +int main()
> > +{
> > +  if (bar ())
> > +__builtin_abort ();
> > +  return 0;
> > +}
> > diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
> > index 096cfc8721d..ff0389af36f 100644
> > --- a/gcc/tree-ssa-dce.c
> > +++ b/gcc/tree-ssa-dce.c
> > @@ -250,14 +250,6 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool 
> > aggressive)
> > && DECL_IS_REPLACEABLE_OPERATOR_NEW_P (callee))
> >   return;
> >  
> > -   /* Most, but not all function 

Re: RFC: Sphinx for GCC documentation

2021-04-07 Thread Michael Matz
Hello,

On Wed, 7 Apr 2021, Martin Liška wrote:

> > I like the output quite a bit, especially that things like option
> > references are automagically linked to their documentation.  I spent
> > just a bit of time looking through the HTML and PDF above and found
> > a few minor issues.  I suspect they're due to the conversion script
> > not being finished yet but just to confirm please see below.
> 
> Exactly, the script is supposed to do 99% of the transition automatically,
> the rest needs a manual intervention.
> 
> > 
> > Is the source we'd work with the same as what shows up when I click
> > the 'View page source' link?  (Or is there some preprocessing involved?)
> 
> Yes, it's the source as is.

I think doing that might be a mistake.  I do like the end result quite 
much (ignoring things missing, like @itemx), no doubt, but if the .rst 
files are the sources then we loose something: they contain markup to 
describe presentation ("make this bold/italic/a list").  The .texi files 
(want to) contain markup to describe meaning ("this is an 
chapter/section/option/attribute/code").  The former can always be 
generated from the latter, the other direction is not possible.  For 
maintaining and writing documentation it's usually better (because more 
future proof) to describe meaning.  (I'm aware that our .texi files don't 
completely adhere to that ideal and also contain quite some presentation 
markup)

Random snippet for what I mean: the .texi contains:

"The @code{access} attribute specifies that a function to whose 
by-reference arguments the attribute applies accesses the referenced 
object according to @var{access-mode}.  The @var{access-mode} argument is 
required and must be"

the .rst has:

"The ``access`` attribute specifies that a function to whose by-reference 
arguments the attribute applies accesses the referenced object according 
to :samp:`{access-mode}`.  The :samp:`{access-mode}` argument is required 
and must be"

So, @code{}/@var{} vs. `` `` / :samp:``.  Keeping in mind that 
documentation also needs to be written it's inconceivable why there should 
be such difference in expressing code vs var markup in .rst.  
Now, you could say, just use `` in the .rst file, it's rendered the same 
anyway; but then the distinction gets lost.

(Again, I'm aware that the .texi files aren't perfect here, and @code/@var 
also seems a bit forced :) )

(I'm not proposing we go the full extreme of semantic markup that docbook 
tries, but the other end of the spectrum, i.e. markdown/rst with only 
presentation markup, also doesn't seem very well fitting)

One other practical concern: it seems there's a one-to-one correspondence 
of .rst files and (web)page.  Do we really want to maintain hundreds (or 
how many?) .rst files, instead of 60 .texi files?


Ciao,
Michael.


Re: [00/23] Make fwprop use an on-the-side RTL SSA representation

2020-11-30 Thread Michael Matz
Hello,

On Mon, 30 Nov 2020, Jeff Law wrote:

> >> So, then let's start with one of 
> >> the prime examples of SSA deconstruction problems, the lost swap, and how 
> >> it comes to be: we start with a swap:
> >>
> >>   x = ..., y = ...
> >>   if (cond)
> >> tmp=x, x=y, y=tmp
> >>
> >> (1) into SSA:
> >>
> >>   x0 = ..., y0 = ...
> >>   if (cond)
> >> tmp = x0, x1=y0, y1=tmp;
> >>   x2 = PHI(x0,x1),  y2 = PHI(y0,y1)
> >>
> >> (2) copy-prop:
> >>
> >>   x0 = ..., y0 = ...
> >>   if (cond)
> >> ;
> >>   x2 = PHI(x0,y0),  y2 = PHI(y0,x0)
> > So the point is that this isn't what the RTL would look like even
> > when using RTL SSA.  Putting y0 in x2 PHI and x0 in the y2 PHI is
> > representationally invalid.
> >
> > Like I say, this isn't a “native” SSA form: it's just using SSA
> > constructs to represent dataflow in normal RTL.
> It appears that the PHI arguments have to be different instances of the
> result.  So the case above can't happen, which helps, but I'm not sure
> it's necessarily sufficient to avoid all the problems in this space.
> IIRC you can get into a similar scenario by transformations that result
> in overlapping lifetimes for different instances of the same object. 
> They didn't necessarily overlap when the SSA form was created, but may
> after things like CSE or copy propagation.

I think the reasoning why this can't (or should not) happen is the 
following: if different instances of the same objects (say, one before, 
one after a modification) exist, they must necessarily be stored in 
different pseudos (otherwise the RTL transformation itself was already 
invalid), and that causes them to be invalid operands of the same PHI 
node.  Ala:

input:

   regA =  /1
   use1(regA)  /2
   regA += ... /3
   use2(regA)  /4

let's try creating different instances of regA (from point 2 and 4) that 
overlap, e.g. by swapping insns 2 and 3.  We _have_ to rename regA from 
insn 3 into a new pseudo, otherwise the uses of 2 and 4 can't be 
differentiated anymore, so:

   regA  =  /1
   regA' = regA
   regA' += /3'
   use1(regA)   /2
   use2(regA')  /4'

So if Richards model constrains the pseudo PHI nodes such that regA and 
regA' can't be operands of one, that might solve the issue, as both the 
lost copy and the swap problem need overlaps of different values to occur.

> The fact that passes don't directly manipulate the PHIs definitely helps
> as well.  But I've still got some reading to do in this space to refresh
> my memory of the issues.

AFAIU Richards approach is more comparable to factored def-use chains than 
to real SSA, which might indeed have no issues, though I then think the 
problem moves into keeping _those_ consistent with the real instruction 
stream as it changes.


Ciao,
Michael.


Re: [00/23] Make fwprop use an on-the-side RTL SSA representation

2020-11-27 Thread Michael Matz
Hello,

On Thu, 26 Nov 2020, Richard Sandiford via Gcc-patches wrote:

> >> The aim is only to provide a different view of existing RTL instructions.
> >> Unlike gimple, and unlike (IIRC) the old RTL SSA project from way back,
> >> the new framework isn't a “native” SSA representation.  This means that
> >> all inputs to a phi node for a register R are also definitions of
> >> register R; no move operation is “hidden” in the phi node.
> > Hmm, I'm trying to parse what the last phrase means.  Does it mean that
> > the "hidden copy" problem for out-of-ssa is avoided?  And if so, how is
> > that maintained over time.  Things like copy-prop will tend to introduce
> > those issues even if they didn't originally exist.
> 
> Yeah, the phi nodes simply say which definition of register R provides
> the value of R on a particular incoming edge.  That definition will
> itself be a phi node for R, an artificial definition of R created by DF
> (e.g. for incoming function arguments or for EH data registers), or an
> actual instruction that sets R.
> 
> In other words, the SSA form is a purely on-the-side thing and the
> underlying RTL instructions are maintained in the same way as normal.
> The SSA form can be deleted at any time without performing a separate
> out-of-ssa step.  In that respect it's different from cfglayout,
> for example.

Hmm, I don't see how that answers Jeffs question, if I got it correctly.  
If I didn't get it correctly let me ask my own version of the question :)

(I haven't studied your implementation in detail, if I had maybe answers 
to the below would become obvious, sorry if so :) )
 
So, you're saying that in your implementation the operands of PHIs can be 
PHIs and real defs.  Further you say nothing about any restriction in RTL 
instruction moving and/or propagation.  So, then let's start with one of 
the prime examples of SSA deconstruction problems, the lost swap, and how 
it comes to be: we start with a swap:

  x = ..., y = ...
  if (cond)
tmp=x, x=y, y=tmp

(1) into SSA:

  x0 = ..., y0 = ...
  if (cond)
tmp = x0, x1=y0, y1=tmp;
  x2 = PHI(x0,x1),  y2 = PHI(y0,y1)

(2) copy-prop:

  x0 = ..., y0 = ...
  if (cond)
;
  x2 = PHI(x0,y0),  y2 = PHI(y0,x0)

Now you're also saying that the SSA form can simply be deleted without any 
consideration of the parallel copy nature, i.e. no real out-of-ssa phase.  
In the above example that would lead to wrong code, so that can't be it.  
So what in your representation avoids either (1) or (2)?  Do these 
restrictions also work if the above crucial code is within a loop (and 
hence the inputs to PHIs are the PHIs themself, which is the actual 
canonical variant of the lost-copy and swap problems).


Ciao,
Michael.


Re: [PATCH] Canonicalize real X - CST to X + -CST

2020-09-17 Thread Michael Matz
Hello,

On Wed, 16 Sep 2020, Richard Biener wrote:

> This removes the canonicalization of X + -CST to X - CST to facilitate
> SLP vectorization analysis where we currently treat the added testcase
> as two-operation plus blend vectorization opportunity.  While there
> may be the possibility to avoid this in the vectorizer itself the
> canonicalization is somewhat premature - it's motivation is that
> a FP constant of both signs is common and thus canonicalizing to
> positive values optimizes constant pool.  But if you mix, say,
> x * -7.1 and x - 7.1 this backfires - this instead looks like
> a local optimization to be applied in RTL CSE or even LRA
> load inheritance.  I filed PR97071 for this.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, any objections?

You need to adjust the comments, this will be very confusing in a couple 
years:

   /* We want to canonicalize to positive real constants.  Pretend
  that only negative ones can be easily negated.  */
   return true;


Ciao,
Michael.


> 
> Thanks,
> Richard.
> 
> 2020-09-16  Richard Biener  
> 
>   * fold-const.c (fold_binary_loc): Remove condition
>   on REAL_CST for A - B -> A + (-B) transform.
>   (negate_expr_p): Remove condition on REAL_CST.
>   * match.pd (negate_expr_p): Likewise.
>   (X + -C -> X - C): Remove.
> 
>   * gcc.dg/vect/slp-49.c: New testcase.
>   * gcc.dg/tree-ssa/pr90356-1.c: Adjust.
>   * gcc.dg/tree-ssa/pr90356-3.c: Likewise.
>   * gcc.dg/tree-ssa/pr90356-4.c: Likewise.
>   * gcc.target/i386/pr54855-3.c: Likewise.
>   * gfortran.dg/reassoc_1.f90: Likewise.
>   * gfortran.dg/reassoc_2.f90: Likewise.
> ---
>  gcc/fold-const.c  |  7 ++-
>  gcc/match.pd  | 11 +--
>  gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c |  4 ++--
>  gcc/testsuite/gcc.dg/tree-ssa/pr90356-3.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/pr90356-4.c |  2 +-
>  gcc/testsuite/gcc.dg/vect/slp-49.c| 15 +++
>  gcc/testsuite/gcc.target/i386/pr54855-3.c |  2 +-
>  gcc/testsuite/gfortran.dg/reassoc_1.f90   |  2 +-
>  gcc/testsuite/gfortran.dg/reassoc_2.f90   |  2 +-
>  9 files changed, 25 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-49.c
> 
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 0cc80adf632..ec369169a4c 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -401,7 +401,7 @@ negate_expr_p (tree t)
>  case REAL_CST:
>/* We want to canonicalize to positive real constants.  Pretend
>   that only negative ones can be easily negated.  */
> -  return REAL_VALUE_NEGATIVE (TREE_REAL_CST (t));
> +  return true;
>  
>  case COMPLEX_CST:
>return negate_expr_p (TREE_REALPART (t))
> @@ -10962,10 +10962,7 @@ fold_binary_loc (location_t loc, enum tree_code 
> code, tree type,
>/* A - B -> A + (-B) if B is easily negatable.  */
>if (negate_expr_p (op1)
> && ! TYPE_OVERFLOW_SANITIZED (type)
> -   && ((FLOAT_TYPE_P (type)
> -   /* Avoid this transformation if B is a positive REAL_CST.  */
> -&& (TREE_CODE (op1) != REAL_CST
> -|| REAL_VALUE_NEGATIVE (TREE_REAL_CST (op1
> +   && (FLOAT_TYPE_P (type)
> || INTEGRAL_TYPE_P (type)))
>   return fold_build2_loc (loc, PLUS_EXPR, type,
>   fold_convert_loc (loc, type, arg0),
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7d63bb973cb..9f14d50ae2d 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1307,8 +1307,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (negate @0)
>   (if (!TYPE_OVERFLOW_SANITIZED (type
>  (match negate_expr_p
> - REAL_CST
> - (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (t)
> + REAL_CST)
>  /* VECTOR_CST handling of non-wrapping types would recurse in unsupported
> ways.  */
>  (match negate_expr_p
> @@ -3326,14 +3325,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  
>  /* Canonicalization of binary operations.  */
>  
> -/* Convert X + -C into X - C.  */
> -(simplify
> - (plus @0 REAL_CST@1)
> - (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
> -  (with { tree tem = const_unop (NEGATE_EXPR, type, @1); }
> -   (if (!TREE_OVERFLOW (tem) || !flag_trapping_math)
> -(minus @0 { tem; })
> -
>  /* Convert x+x into x*2.  */
>  (simplify
>   (plus @0 @0)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c
> index c3a15ea21af..ac95ec56810 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c
> @@ -2,8 +2,8 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fno-rounding-math -fsignaling-nans -fsigned-zeros 
> -fdump-tree-optimized" } */
>  /* { dg-final { scan-tree-dump-times "x_\[0-9]*.D. \\+ 0.0;" 12 "optimized" 
> } } */
> -/* { dg-final { scan-tree-dump-times "y_\[0-9]*.D. - 0.0;" 4 "optimized" } } 
> */
> -/* { dg-final { 

Re: [PATCH] tree-optimization/96043 - BB vectorization costing improvement

2020-09-09 Thread Michael Matz
Hello,

On Tue, 8 Sep 2020, Richard Biener wrote:

> CCing some people to double-check my graph partitioning algorithm.

I seem to not know the pre-existing data structures enough to say anything 
about this, but I noticed small things which might or might not indicate 
thinkos or incompleteness:

> +static void
> +vect_bb_partition_graph_r (bb_vec_info bb_vinfo,
> +slp_instance instance, slp_tree node,
> +hash_map 
> _to_instance,
> +hash_map 
> _leader)
> +{
> +  stmt_vec_info stmt_info;
> +  unsigned i;
> +  bool all = true;
> +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> +{
> +  bool existed_p;
> +  slp_instance _instance
> + = stmt_to_instance.get_or_insert (stmt_info, _p);
> +  if (!existed_p)
> + {
> +   all = false;
> + }
> +  else if (stmt_instance != instance)
> + {
> +   /* If we're running into a previously marked stmt make us the
> +  leader of the current ultimate leader.  This keeps the
> +  leader chain acyclic and works even when the current instance
> +  connects two previously independent graph parts.  */
> +   stmt_instance = get_ultimate_leader (stmt_instance, instance_leader);
> +   if (stmt_instance != instance)
> + instance_leader.put (stmt_instance, instance);
> + }
> +  stmt_instance = instance;

This last assignment is useless.

> +/* Partition the SLP graph into pieces that can be costed independently.  */
> +
> +static void
> +vect_bb_partition_graph (bb_vec_info bb_vinfo)
> +{
...
> +  /* Then collect entries to each independent subgraphs.  */
> +  for (unsigned i = 0; bb_vinfo->slp_instances.iterate (i, ); ++i)
> +{
> +  slp_instance leader = get_ultimate_leader (instance, instance_leader);
> +  if (leader == instance)
> + leader->subgraph_entries.safe_push (leader);
> +  else
> + {
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> +  "instance %p is leader of %p\n",
> +  leader, instance);
> +   leader->subgraph_entries.safe_push (instance);
> + }

So the 'leader->subgraph_entries.safe_push (instance)' is actually done 
unconditionally (the leader is leader of itself), only the debug dump is 
conditional.


Ciao,
Michael.


Re: New mklog script

2020-05-19 Thread Michael Matz
Hello,

On Tue, 19 May 2020, Jakub Jelinek wrote:

> On Tue, May 19, 2020 at 05:21:16PM +0100, Richard Earnshaw wrote:
> > This is really a wart in the GNU coding style.  And one reason why I
> > tend to indent such labels by a single space.  It particularly affects
> > things like class definitions where public, private, etc statements
> > often appear in column 0.
> > 
> > IMO, it would be nice to get an official change in the coding style for
> > this, it's really irritating.
> 
> It doesn't have to be just label,
> void
> foo ()
> {
>   ...
> #define X ...
>   ...
> #undef X
>   ...
> }
> does the similar thing for mklog.

That particular one would be a mere bug in mklog then.  diff -p regards 
only members of [[:alpha:]$_] as acceptable start characters of function 
names (i.e. indeed things that can start a C identifier (ignoring details 
like non-base characters) with the '$' extension), of which '#' is none.


Ciao,
Michael.


Re: New mklog script

2020-05-19 Thread Michael Matz
Hello,

On Tue, 19 May 2020, Martin Liška wrote:

> > The common problems I remember is that e.g. when changing a function comment
> > above some function, it is attributed to the previous function rather than
> > following, labels in function confusing it:
> >   void
> >   foo ()
> >   {
> > ...
> >   label:
> > ...
> > -  ...
> > +  ...
> >   }
> 
> I've just tested that and it will take function for patch context
> (sem_variable::equals):
> @@ -1875,6 +1875,7 @@ sem_variable::equals (tree t1, tree t2)
>  default:
>return return_false_with_msg ("Unknown TREE code reached");
>  }
> +
>  }

No, the problem happens when the label is at column 0, like function names 
are.  Basically diff -p uses a regexp morally equivalent to 
'^[[:alpha:]$_]' to detect function headers, and git diff -p and friends 
followed suit.  But it should use something like
'^[[:alpha:]$_].*[^:]$' to rule out things ending with ':'.  See also diff 
-F for GNU diff.


Ciao,
Michael.


Re: [PATCH] Implement no_stack_protect attribute.

2020-05-18 Thread Michael Matz
Hello,

On Mon, 18 May 2020, Florian Weimer wrote:

> * Michael Matz:
> 
> >> So does -fcf-protection. -fno-omit-frame-pointer does not work for me at 
> >> all for some reason, the frame pointer is always missing?
> >
> > Not for me:
> 
> I see.  I didn't know that -fno-omit-frame-pointer only applies to
> non-leaf functions.

Yeah, I actually consider this a bug as well (unrelated though).  The user 
of that flag quite surely has good reasons to want to have a frame 
pointer, and those good reasons usually extend to all functions, not just 
to leaf ones.  (E.g. ensuring that backtraces from async signal handlers 
are meaningful, for instance for poor mans profiling).


Ciao,
Michael.


Re: [PATCH] Implement no_stack_protect attribute.

2020-05-18 Thread Michael Matz
Hello,

On Mon, 18 May 2020, Florian Weimer wrote:

> >> In glibc, we already have this:
> >> 
> >> /* Used to disable stack protection in sensitive places, like ifunc
> >>resolvers and early static TLS init.  */
> >> #ifdef HAVE_CC_NO_STACK_PROTECTOR
> >> # define inhibit_stack_protector \
> >> __attribute__ ((__optimize__ ("-fno-stack-protector")))
> >> #else
> >> # define inhibit_stack_protector
> >> #endif
> >> 
> >> Is it broken?
> >
> > Depends on what your expectations are.  It completely overrides all 
> > options given on the command line (including things like 
> > fno-omit-frame-pointer and -O2!).  At least I was very surprised by that 
> > even though the current docu can be read that way; if you're similarly 
> > surprised, then yes, the above is broken, it does not only disable stack 
> > protection (but also e.g. all optimizations, despite the attributes name 
> > :-) ).
> 
> Yes, that would qualify as broken.
> 
> This is not what I observe with gcc-9.3.1-2.fc31.x86_64 from Fedora.
> -O2 still has an effect.

Indeed, I definitely remember an interaction with the attribute and -O{,2} 
(or something that I interpreted as such) but it obviously isn't as simple 
as plain disabling it, and right now I can't recreate the situation :-/
(It might be disabling of some further cmdline flags that I conflated in 
my brain with "effect of -O2")

> So does -fcf-protection. -fno-omit-frame-pointer does not work for me at 
> all for some reason, the frame pointer is always missing?

Not for me:

% cat simple.c
extern int bla(int *);
int
#ifdef ATTR
__attribute__((__optimize__ ("-fno-stack-protector")))
#endif
foo(int a, int b)
{
  int c = b;
  return a * 42 + bla();
}
% gcc-9 -fno-omit-frame-pointer -O -S -o - tryme.c | grep bp
pushq   %rbp
movq%rsp, %rbp
movl%esi, -20(%rbp)
leaq-20(%rbp), %rdi
popq%rbp
% gcc-9 -fstack-protector-all -fno-omit-frame-pointer -O -S -o - tryme.c | grep 
bp
pushq   %rbp
movq%rsp, %rbp
movq%rax, -24(%rbp)
movl%esi, -28(%rbp)
leaq-28(%rbp), %rdi
movq-24(%rbp), %rdx
popq%rbp

But using the attr:

% gcc-9 -DATTR -fstack-protector-all -fno-omit-frame-pointer -O -S -o - tryme.c 
| grep bp
% 

(gcc9 is gcc9-9.2.1+r275327-1.1.x86_64 on opensuse)


Ciao,
Michael.


Re: [PATCH] Implement no_stack_protect attribute.

2020-05-18 Thread Michael Matz
Hello,

On Mon, 18 May 2020, Florian Weimer via Gcc-patches wrote:

> > The patch adds new no_stack_protect attribute. The change is requested
> > from kernel folks and is direct equivalent of Clang's no_stack_protector.
> > Unlike Clang, I chose to name it no_stack_protect because we already
> > have stack_protect attribute (used with -fstack-protector-explicit).
> >
> > First part of the patch contains a small refactoring of an enum, second
> > implements the functionality.
> 
> In glibc, we already have this:
> 
> /* Used to disable stack protection in sensitive places, like ifunc
>resolvers and early static TLS init.  */
> #ifdef HAVE_CC_NO_STACK_PROTECTOR
> # define inhibit_stack_protector \
> __attribute__ ((__optimize__ ("-fno-stack-protector")))
> #else
> # define inhibit_stack_protector
> #endif
> 
> Is it broken?

Depends on what your expectations are.  It completely overrides all 
options given on the command line (including things like 
fno-omit-frame-pointer and -O2!).  At least I was very surprised by that 
even though the current docu can be read that way; if you're similarly 
surprised, then yes, the above is broken, it does not only disable stack 
protection (but also e.g. all optimizations, despite the attributes name 
:-) ).

And given that there's no possibility to express "and please only _add_ to 
the cmdline options" (which implies also being able to override values 
from the cmdline with -fno-xxx or other -fyyy options) I consider our 
current GCC behaviour for the optimize attribute basically unusable.

(One work-around would be to define a macro containing all cmdline options 
in string form in the build system.  Obviously that's a silly solution.)

So, yeah, IMHO the semantics of that attribute should be revised to be 
more useful by default (with the possibility to express the current 
behaviour, for whomever would like to have that (but who?)).


Ciao,
Michael.


Re: [RFC] Fix for pr64706 testcase faulre

2020-03-30 Thread Michael Matz
Hello,

On Thu, 26 Mar 2020, Jan Hubicka wrote:

> > I think we should continue to try to model ELF semantics re weak and 
> > aliases.  If so, then yes, LTO gets it wrong and the above testcase should 
> > not abort.  Weak doesn't enter the picture for creating aliases (the 
> > aliases is created with the declaration, and we require an available 
> > definition, and the alias is henceforth bound to _that_ very definition).  
> > I.e. the 'a.c:b' name should continue to refer to the same code snippet 
> > identified by the a.c:a name, not be redirected to the overriding b.c:a.
> > 
> > I'm wondering about the amount of code necessary to fix this.  I think 
> > that points towards a wrong level of representation somewhere, and perhaps 
> > _that_ should be changed instead of trying to work around it.
> > 
> > For instance, right now aliases are different from non-aliases, whereas in 
> > reality there's no difference: there are simply names referring to code or 
> > data snippets, and it so happens that for some snippets there are multiple 
> > names, and it further just so happens that some names for a code snippet 
> > become overridden/removed by other names for other code snippets, which 
> > doesn't invalidate the fact that there still is another name for the first 
> > snippet.
> > 
> > If it were modelled like that in cgraph/lto all the rest would naturally 
> > follow.  (Of course you would need tracking of when some code/data 
> > snippets can't be referred to by name anymore (and hence are useless), 
> > because all names for them went away, or none of the existing names is 
> > used anymore, but I don't think that materially would change much in our 
> > internal data structures).
> 
> Yep, the trouble is caused by fact that GCC representation is not quite 
> what linker does. I.e. we bind function bodies with their declarations 
> and declarations with symbols. We are bouit about assumption of 1to1 
> correspondence between function bodies and their symbols. This is not 
> true because one body may have multiple aliases but also it may define 
> mutliple different symbols (alternative entry points & labels).
> 
> Reorganizing this design is quite some work.
> 
> We need to change trees/GIMPLE to use symbols instead of DECLs when
> referring to them.  So parameter of CALL_EXPR would not be decl but
> symbol associated with it. 
> 
> We to move everyting that is property of symbol away from decl into
> symbols (i.e. assembler name, visibility, section etc) and thus we would
> need to change everything from frontends all the way to RTL.

I'm not sure I agree that it's that complicated.  With the above you 
essentially do away with the need of DECLs at all.  calls would refer to 
your fat symbols, symbols would refer to code/data bodies.  Data 
(e.g. vtables) would also refer to symbols.  Nowhere would be a DECL in 
that chain.  At that point you can just as well call your new symbols 
"DECLs".  The crucial part is merely that the DECLs/fat-symbols can change 
their association with code/data blobs.

(Well, I can see that some properties of code blobs, for instance the API, 
and hence the (function) type, is inherent to the code blob, and therefore 
should be 1to1 associated with it, some there's some argument to be made 
to retain some info of the DECL-as-now to be put tighly to the code blob 
representation).

> I am gradually shuffling things in this direction (I plan to move 
> assembler names to symbols and gradually do that for visibilitie so we 
> can have cross-module referneces to labels and finally support static 
> initializers taking addresses of them), but the process is slow and I 
> think it makes sense to first fix correctness issues with what we have 
> rahter than waiting for major surgery to be finished.

True.  I was just surprised by the size of this change, many diff pluses, 
few minuses :)


Ciao,
Michael.


Re: [RFC] Fix for pr64706 testcase faulre

2020-03-26 Thread Michael Matz
Hello,

On Thu, 26 Mar 2020, Richard Biener wrote:

> > = b.c:
> > 
> > __attribute__((weak))
> > __attribute__((noinline))
> > 
> > int a()
> > {
> >   return 1;
> > }
> > 
> > __attribute__((noinline))
> > static int b() __attribute__((alias("a")));
> > void
> > test()
> > {
> >   if (b()!=1)
> > __builtin_abort ();
> > }
> > 
> > = b.c:
> > 
> > __attribute__((noinline))
> > int a();
> > 
> > int a()
> > {
> >   return 2;
> > }
> > extern void test ();
> > int
> > main()
> > {
> >   if (a() != 2)
> > __builtin_abort ();
> >   test();
> >   return 0;
> > }
> > 
> > Here LTO linking will replace weak symbol a() by definition from b.c and
> > rediect static alias b in a.c to this definition which is wrong.
> 
> Is it?  What does a non-weak local alias to a weak function mean?

I think we should continue to try to model ELF semantics re weak and 
aliases.  If so, then yes, LTO gets it wrong and the above testcase should 
not abort.  Weak doesn't enter the picture for creating aliases (the 
aliases is created with the declaration, and we require an available 
definition, and the alias is henceforth bound to _that_ very definition).  
I.e. the 'a.c:b' name should continue to refer to the same code snippet 
identified by the a.c:a name, not be redirected to the overriding b.c:a.

I'm wondering about the amount of code necessary to fix this.  I think 
that points towards a wrong level of representation somewhere, and perhaps 
_that_ should be changed instead of trying to work around it.

For instance, right now aliases are different from non-aliases, whereas in 
reality there's no difference: there are simply names referring to code or 
data snippets, and it so happens that for some snippets there are multiple 
names, and it further just so happens that some names for a code snippet 
become overridden/removed by other names for other code snippets, which 
doesn't invalidate the fact that there still is another name for the first 
snippet.

If it were modelled like that in cgraph/lto all the rest would naturally 
follow.  (Of course you would need tracking of when some code/data 
snippets can't be referred to by name anymore (and hence are useless), 
because all names for them went away, or none of the existing names is 
used anymore, but I don't think that materially would change much in our 
internal data structures).


Ciao,
Michael.


Re: [PATCH v2] generate EH info for volatile asm statements (PR93981)

2020-03-19 Thread Michael Matz
Hello,

On Thu, 19 Mar 2020, J.W. Jagersma via Gcc-patches wrote:

> I just realized that changing all outputs to in+out would generate
> worse code for *every* single asm that has any outputs.

Under -fnon-call-exception only.  And I'm not sure what you mean, the only 
effect of the additional 'in+' part that wasn't there before should be 
that some instructions setting those operands to values before the asm 
aren't deleted.  If there are none, the code should come out the same.

> Whereas changing outputs to a temporary will not generate any extra code 
> if there is no local try/catch block.  I think that is something that 
> should be considered.

But it will also disallow values to be given out of the asm in the 
exception case. (See my other mail).  I think that's more worthwhile than 
some superflous moves (that, if they turn out to be really useless could 
be removed after reload).


Ciao,
Michael.


Re: [PATCH v2] generate EH info for volatile asm statements (PR93981)

2020-03-19 Thread Michael Matz
Hello,

On Wed, 18 Mar 2020, J.W. Jagersma via Gcc-patches wrote:

> > Well, it's both: on the exception path the compiler has to assume that the 
> > the value wasn't changed (so that former defines are regarded as dead) or 
> > that it already has changed (so that the effects the throwing 
> > "instruction" had on the result (if any) aren't lost).  The easiest for 
> > this is to regard the result place as also being an input.
> 
> The way I see it, there are two options: either you use the outputs
> when an exception is thrown, or you don't.

Assuming by "use the outputs" you mean "transform them implicitely to 
in-out operands", not "the source code uses the output operands after the 
asm on except and no-except paths".

> The first option is more or less what my first patch did, but it was
> incomplete.  Changing each output to in+out would make that work
> correctly.

Right.

> The second is what I have implemented now, each output is assigned via
> a temporary which is then assigned back to the variable bound to this
> output.  On exception, this temporary is discarded.  However this is
> not possible for asms that write directly to memory, so those cases are
> treated like option #1.

Right again, somewhat.  Except that the determination of which outputs are 
going into memory is a fuzzy notion until reload/LRA (which is very late 
in the compile pipeline).  You can infer some things from the constraint 
letters, but gimple might still put things into memory (e.g. when the 
output place is address taken), even though the constraint only allows a 
register (in which case reloads will be generated), or place something 
into a register, even though the constraint only allows memory (again, 
reloads will be generated).

Some of these reloads will be done early in the gimple pipeline to help 
optimizations (they basically look like the insns that you generate for 
copy-out), some of them will be generated only very late.

> I think the second option is easier on optimization since any previous
> assignments can be removed from the normal code path, and registers
> don't need to be loaded with known-valid values before the asm.

True (easier to optimizations) but immaterial (see below).

> The first option is more consistent since all outputs are treated the 
> same, but more dangerous, as the asm may write incomplete values before 
> throwing.

You have to assume that the author of the asm and its surrounding code is 
written by a knowledgable person, so if the asm possibly writes partially 
to outputs and then throws, then the output must not be accessed on the 
exception path.  If the asm does not write partially, then the author can 
access it.  So, what can or cannot be accessed on the exception path 
really is an inherent property of the contents of the asm.

Now, your second case tries to create additional guarantees: it makes it 
so that for some operands the user can depend on certain behaviour, namely 
that the old value before the asm was entered is available on the except 
path.  As you can't ensure this for all operands (those in memory are the 
problem) you want to tighten the definition to only include the operands 
where you can guarantee it, but this is fairly brittle, as above, because 
some decisions are taken very late.

There's another case to consider: assume I'm writing an asm that writes 
meaningful values to an output before and after a potential exception is 
thrown, ala this:

asm (" mov %2, %0
   xyz %2, %1
   mov $0, %0" : "=r" (status), "+r" (a) : "r" (b));

Assume 'xyz' can fault depending on input.  The intention above would be 
that on the exception path 'status' would contain a meaningful value (here 
the value of input b, and on the non-except path would contain zero.

Your proposal of copyin/out for register values would make the above 
impossible.  (Basically you wouldn't be able to output any reliable 
information out of the asm in the except path).

Given that, and the complication of determining what the safe-for-copy-out 
operands really are, and the fact that being easier on optimizations in 
connection with asms and -fnon-call-exceptions isn't a high priority it 
seems best to opt for the option that is guaranteed to work correctly in 
most cases, and in fact allows more freedom in using the asm.


Ciao,
Michael.


Re: [PATCH v2] generate EH info for volatile asm statements (PR93981)

2020-03-19 Thread Michael Matz
Hello,

On Wed, 18 Mar 2020, Segher Boessenkool wrote:

> > > Similarly for non-call exceptions on other statements.  It sounds like 
> > > what you're describing requires the corresponding definition to happen 
> > > for memory outputs regardless of whether the asm throws or not, so that 
> > > the memory appears to change on both excecution paths.  Otherwise, the 
> > > compiler would be able to assume that the memory operand still has its 
> > > original value in the exception handler.
> > 
> > Well, it's both: on the exception path the compiler has to assume that the 
> > the value wasn't changed (so that former defines are regarded as dead) or 
> > that it already has changed (so that the effects the throwing 
> > "instruction" had on the result (if any) aren't lost).  The easiest for 
> > this is to regard the result place as also being an input.
> > 
> > (If broadened to all instructions under -fnon-call-exceptions, and not 
> > just to asms will have quite a bad effect on optimization capabilities, 
> > but I believe with enough force it's already possible now to construct 
> > miscompiling testcases with the right mixtures of return types and ABIs)
> 
> It's a tradeoff: do we want this to work for almost no one and get PRs
> that we cannot solve, or do we generate slightly worse assembler code
> for -fnon-call-exceptions?  I don't think this is a difficult decision
> to make, considering that you already get pretty bad performance with
> that flag (if indeed it works correctly at all).

Oh, I wasn't advocating doing anything else than you suggested (i.e. make 
all operands in-out), I merely pointed out that the inherent problem here 
is not really specific to asms.


Ciao,
Michael.


Re: [PATCH v2] generate EH info for volatile asm statements (PR93981)

2020-03-17 Thread Michael Matz
Hello,

On Mon, 16 Mar 2020, Richard Sandiford wrote:

> Segher Boessenkool  writes:
> > On Mon, Mar 16, 2020 at 05:47:03PM +, Richard Sandiford wrote:
> >> Segher Boessenkool  writes:
> >> >> we do delete "x = 1" for f1.   I think that's the expected behaviour.
> >> >> We don't yet delete the initialisation in f2, but I think in principle
> >> >> we could.
> >> >
> >> > Right.  And this is incorrect if the asm may throw.
> >> 
> >> Well...
> >> 
> >> >> So the kind of contract I was advocating was:
> >> >> 
> >> >> - the compiler can't make any assumptions about what the asm does
> >> >>   or doesn't do to output operands when an exception is raised
> >> >> 
> >> >> - the source code can't make any assumption about the values bound
> >> >>   to output operands when an exception is raised
> >> 
> >> ...with this interpretation, the deletions above would be correct even
> >> if the asm throws.
> >
> > The write to "x" *before the asm* is deleted.  I cannot think of any
> > interpretation where that is correct (this does not involve inline asm
> > at all: it is deleting an observable side effect before the exception).
> 
> It's correct under the contract above :-)
> 
> >> > And the easiest (and only feasible?) way to do this is for the compiler
> >> > to automatically make an input for every output as well, imo.
> >> 
> >> Modifying the asm like that feels a bit dangerous,
> >
> > Yes, obviously.  The other option is to accept that almost all existing
> > inline asm will have UB, with -fnon-call-exceptions.  I think that is
> > an even less desirable option.
> >
> >> And the other problem
> >> still exists: he compiler might assume that the output isn't modified
> >> unless the asm completes normally.
> >
> > I don't understand what this means?  As far as the compiler is concerned
> > any asm is just one instruction?  And it all executes completely always.
> > You need to do things with the constraints to tell the compiler it does
> > not know some of the values around.  If you have both an input and an
> > output for a variable, the compiler does not know what value is written
> > to it, and it might just be the one that was the input already (which is
> > the same effect as not writing it at all).
> 
> Normally, for SSA names in something like:
> 
>   _1 = foo ()
> 
> the definition of _1 does not take place when foo throws.

Mostly, but maybe we need to lift this somewhen.  E.g. when we support 
SSA form for non-registers; the actual return migth then be via invisible 
reference, and hence the result might be changed even if foo throws.  That 
also could happen right now for some return types depending on the 
architecture (think large float types).  Our workaround for some of these 
cases (where it's obvious that the result will lie in memory) is to put 
the real copy-out into an extra gimple insn and make the LHS be a 
temporary; but of course we don't want that with too large types.

> Similarly for non-call exceptions on other statements.  It sounds like 
> what you're describing requires the corresponding definition to happen 
> for memory outputs regardless of whether the asm throws or not, so that 
> the memory appears to change on both excecution paths.  Otherwise, the 
> compiler would be able to assume that the memory operand still has its 
> original value in the exception handler.

Well, it's both: on the exception path the compiler has to assume that the 
the value wasn't changed (so that former defines are regarded as dead) or 
that it already has changed (so that the effects the throwing 
"instruction" had on the result (if any) aren't lost).  The easiest for 
this is to regard the result place as also being an input.

(If broadened to all instructions under -fnon-call-exceptions, and not 
just to asms will have quite a bad effect on optimization capabilities, 
but I believe with enough force it's already possible now to construct 
miscompiling testcases with the right mixtures of return types and ABIs)


Ciao,
Michael.


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-10 Thread Michael Matz
Hello,

On Tue, 10 Mar 2020, Martin Liška wrote:

> >>> We nee to support different variables, like TLS, data and bss variables.
> >>
> >> Why do we need TLS? Right now, it's not supported by nm.
> > 
> > Of course it does.  It's the 'T' (or 't') character.
> 
> Thank you reply!
> 
> Are you sure about it?

Err, I bungled in between writing emails, you are right, nm(1) in BSD mode 
doesn't make a difference for TLS symbols.  (And of course T/t are for 
text, aka code, symbols).

Doesn't invalidate the rest of my email, though.


Ciao,
Michael.


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-09 Thread Michael Matz
Hello,

On Mon, 9 Mar 2020, Martin Liška wrote:

> On 3/9/20 4:36 PM, H.J. Lu wrote:
> > We nee to support different variables, like TLS, data and bss variables.
> 
> Why do we need TLS? Right now, it's not supported by nm.

Of course it does.  It's the 'T' (or 't') character.  When you introduce 
symbol categories into the plugin system it would be advisable to include 
all we usually care about, and as the ELF categories are (roughly) a 
superset of everything we support, I'd say that should be the list to look 
at.  I.e. a mixture of visibility, locality (aka binding) and type:

   {object,function,common,tls}
 x {local,global,weak,unique}
 x {default,internal,hidden,protected}

That doesn't include symbols types section,file,ifunc or os or arch 
specific types or visibilities or bindings.  But it would probably not be 
the worst idea to simply encode what we need with ELF constants and names.  
While not all the world is ELF, all concepts we have can be mapped onto 
ELF.


Ciao,
Michael.


Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Michael Matz
Hello,

On Mon, 3 Feb 2020, Richard Earnshaw (lists) wrote:

> Well, I'd review a patch differently depending on whether or not it was 
> already committed, a patch requiring review or an RFC looking for more 
> general comments, so I *do* think such an email prefix is useful.

As I said: a very good argument must be made; it might be that rfc falls 
into the useful-tag category.

> >> 'git am' would strip leading [...] automatically unless
> >> you've configured, or asked git to do otherwise.  So that leading part
> >> is not counted for the length calculation.
> > 
> > There's still e-mail netiquette which also should be obeyed, or at least
> > not contradicted by git netiquette.
> 
> The 50 char limit seems to come from wanting git log --oneline to not wrap in
> an 80 column terminal.  Whilst laudable, I'm not sure that such a limit
> doesn't become too restrictive and then lead to hard-to-understand summaries.

In my experience hard-to-understand summaries are more related to people 
writing them than to length, IOW, I fear a larger limit like 72 characters 
won't help that.  And, as Segher put it, we aren't really talking about 
limits, only about suggestions, if you _really_ have to mention 
that 40-character function name in which you fixed something in your 
subject, then yeah, you'll go over the 50 chars.  But as recommendation 
the 50 chars make more sense than the 72 chars, IMHO.


Ciao,
Michael.


Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Michael Matz
Hello,

On Mon, 3 Feb 2020, Jakub Jelinek wrote:

> > > The idea is that the [...] part is NOT part of the commit, only part of 
> > > the email.
> > 
> > I understand that, but the subject line of this thread says "e-mail 
> > subject lines", so I thought we were talking about, well, exactly that; 
> > and I see no value of these tags in e-mails either.
> 
> In email, they do carry information that is useful there, the distinction
> whether a patch has been committed already and doesn't need review from
> others, or whether it is a patch intended for patch review, or just a RFC
> patch that is not yet ready for review, but submitter is looking for some
> feedback.

For tags like [cmt] or [rfc] I don't have much gripe, though I do think 
that info could be given in the body, and that e.g. in e-mail archives 
(where the tags are not removed automatically) they carry the same value 
as in git log, namely zero.

But suggesting that using the subject line for tagging is recommended can 
lead to subjects like

 [PATCH][GCC][Foo][component] Fix foo component bootstrap failure

in an e-mail directed to gcc-patches@gcc.gnu.org (from somewhen last year, 
where Foo/foo was an architecture; I'm really not trying to single out the 
author).  That is, _none_ of the four tags carried any informational 
content.


Ciao,
Michael.


Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Michael Matz
Hi,

On Mon, 3 Feb 2020, Richard Earnshaw (lists) wrote:

> The idea is that the [...] part is NOT part of the commit, only part of 
> the email.

I understand that, but the subject line of this thread says "e-mail 
subject lines", so I thought we were talking about, well, exactly that; 
and I see no value of these tags in e-mails either.

(They might have a low but non-zero value for projects that use 
a single mailing list for patches and generic discussion, but we are not 
such project)

Basically: if they are deemed to clutter the git log for whatever reason, 
then there must be a very good argument for why they not also clutter 
e-mail subject lines, but instead are essential to have there, 
but not in the log.

> 'git am' would strip leading [...] automatically unless 
> you've configured, or asked git to do otherwise.  So that leading part 
> is not counted for the length calculation.

There's still e-mail netiquette which also should be obeyed, or at least 
not contradicted by git netiquette.


Ciao,
Michael.


Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Michael Matz
Hello,

On Mon, 3 Feb 2020, Richard Earnshaw (lists) wrote:

> Where does your '50 chars' limit come from?  It's not in the glibc text, 
> and it's not in the linux kernel text either.  AFAICT this is your 
> invention and you seem to be the only person proposing it.

Nope, it's fairly common, so much so that it's included in the "commonly 
accepted rules" that googling for "git subject lines" gives you (as a 
snippet glimpse from some website), and that vim changes color when 
spilling over 50 chars.  I actually thought it was universal and obvious 
until this thread (which is why I admittedly only did the above google 
right before writing this mail).  For the rationale: 'git log --oneline' 
with hash and author or date should fit the usual 72 char limit.  (An 
11-character hash plus space alone would come out as 60 chars for the 
subject)

That's also the reason why some people (certainly me) are nervous about or 
dislike all the "tags" in the subject line.  E.g. what essential 
information (and subjects are for essential info, right?) is "[committed]" 
(or, even worse "[patch]") supposed to transport?  If the rest of the 
subject doesn't interest me, I don't care if something was committed or 
not; if it _does_ interest me, then I'm going to look at the mail/patch 
either way, if committed or not; at which point the info if the author 
required review or has already committed it could be gives in the body as 
well.  Similar for some other metainfo tags.  (The "subsystem:" is good, 
though).

And if we must have these tags, then why not at least short ones?  Why 
isn't "[cmt]" or something enough?  There will be very few tags, so they 
become mnemonic pretty much immediately.  What becomes clearer when 
writing "[patch v2 1/13]" in comparison to "[v2 1/13]"?


Ciao,
Michael.



Re: [PATCH] doc: clarify the situation with pointer arithmetic

2020-01-30 Thread Michael Matz
Hi,

On Thu, 30 Jan 2020, Michael Matz wrote:

> > and the pointers have the same address, then it would evaluate to true 
> > at run-time. If I understand correctly, you somehow want to make this 
> > case be UB, but I haven't quite understood how (if it is not the 
> > comparison of such pointers that invokes UB).
> 
> By saying something like "if two pointers compare equal they must have 
> the same provenance, otherwise the behaviour is undefined".
> 
> (I don't know if this definition would or would not help with the 
> problems PVNI poses to compilers).

Or, actually I know at least one case.  The problem with allowing 
value-equivalent pointers to have non-overlapping provenance is the 
following: many of the compiler optimizations are based on as-if rules.  
Now, if it's very easy for users to detect certain situations, that means 
that the as-if rules can't be invoked as often.  In this specific 
instance, if the user writes a program where the compiler would optimize 
mem accesses based on non-overlapping provenance (e.g. a stored value is 
propagated downwards over a store of different provenance), and then 
somewhere else also compares these non-overlapping pointers for equality, 
and then, if they are equal prints out "hah! invalid optimization 
detected", and the outcome of the comparison of non-overlapping pointers 
weren't left unspecified, then that's the reason why the compiler would 
have to globally disable the first optimization (at least when it can't 
prove that there aren't any such comparisons).  Ideally we don't want that 
:)


Ciao,
Michael.


  1   2   3   4   5   6   7   8   >