Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-10 Thread Michael Matz via Gcc-patches
Hey,

On Thu, 10 Aug 2023, Martin Uecker wrote:

> > offset(struct foo_flex, t[0]) + N * sizeof(foo->t);
> > 
> > With GCC, offset(struct foo_flex,t[0]) == 6, which is also correct. 
> 
> This formula might be considered incorrect / dangerous because
> it might allocate less storage than sizeof(struct foo_flex). 

Oh indeed.  I hadn't even considered that.  That could be "fixed" with 
another max(theabove, sizeof(struct foo_flex)), but that starts to become 
silly when the obvious choice works fine.


Ciao,
Michael.


Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-10 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Aug 2023, Qing Zhao wrote:

> > So, should the equivalent FAM struct also have this sizeof()?  If no: 
> > there should be a good argument why it shouldn't be similar to the non-FAM 
> > one.
> 
> The sizeof() of a structure with FAM is defined as: (after I searched online,
>  I think that the one in Wikipedia is the most reasonable one):
> https://en.wikipedia.org/wiki/Flexible_array_member

Well, wikipedia has it's uses.  Here we are in language lawyering land 
together with a discussion what makes most sense in many circumstances.  
FWIW, in this case I think the cited text from wikipedia is correct in the 
sense of "not wrong" but not helpful in the sense of "good advise".

> By definition, the sizeof() of a struct with FAM might not be the same 
> as the non-FAM one. i.e, for the following two structures, one with FAM, 
> the other with fixed array:
> 
> struct foo_flex { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } };
> struct foo_fix {int a; short b; char t[3]; } 
> 
> With current GCC:
> sizeof(foo_flex) == 8
> sizeof(foo_fix) == 12
> 
> I think that the current behavior of sizeof for structure with FAM in 
> GCC is correct.

It is, yes.

> The major issue is what was pointed out by Martin in the previous email:
> 
> Whether using the following formula is correct to compute the 
> allocation?
> 
> sizeof(struct foo_flex) + N * sizeof(foo->t);
> 
> As pointed out  in the wikipedia, the value computed by this formula might
> be bigger than the actual size since “sizeof(struct foo_flex)” might include 
> paddings that are used as part of the array.

That doesn't make the formula incorrect, but rather conservatively 
correct.  If you don't want to be conservative, then yes, you can use a 
different formula if you happen to know the layout rules your compiler at 
hand uses (or the ones prescribed by an ABI, if it does that).  I think 
it would be bad advise to the general population do advertise this scheme 
as better.

> So the more accurate formula should be
> 
> offset(struct foo_flex, t[0]) + N * sizeof(foo->t);

"* sizeof(foo->t[0])", but yes.

> For the question, whether the compiler needs to allocate paddings after 
> the FAM field, I don’t know the answer, and it’s not specified in the 
> standard either. Does it matter?

It matters for two things:

1) Abstract reasons: is there as reason to deviate 
from the normal rules?  If not: it shouldn't deviate.  Future 
extensibility: while it right now is not possible to form an array 
of FMA-structs (in C!), who's to say that it may not be eventually added?  
It seems a natural enough extension of an extension, and while it has 
certain implementation problems (the "real" size of the elements needs to 
be computable, and hence be part of the array type) those could be 
overcome.  At that point you _really_ want to have the elements aligned 
naturally, be compatible with sizeof, and be the same as an individual 
object.

2) Practical reasons: codegeneration works better if the accessible sizes 
of objects are a multiple of their alignment, as often you have 
instructions that can move around alignment-sized blobs (say, words).  If 
you don't pad out objects you have to be careful to use only byte accesses 
when you get to the end of an object.

Let me ask the question in the opposite way: what would you _gain_ by not 
allocating padding?  And is that enough to deviate from the most obvious 
choices?  (Do note that e.g. global or local FMA-typed objects don't exist 
in isolation, and their surrounding objects have their own alignment 
rules, which often will lead to tail padding anyway, even if you would use 
the non-conservative size calculation; the same applies for malloc'ed 
objects).

> > Note that if one choses to allocate less space than sizeof implies that 
> > this will have quite some consequences for code generation, in that 
> > sometimes the instruction sequences (e.g. for copying) need to be careful 
> > to never access tail padding that should be there in array context, but 
> > isn't there in single-object context.  I think this alone should make it 
> > clear that it's advisable that sizeof() and allocated size agree.
> 
> Sizeof by definition is return the size of the TYPE, not the size of the 
> allocated object.

Sure.  Outside special cases like FMA it's the same, though.  And there 
sizeof does include tail padding.

> > And then the next question is what __builtin_object_size should do with 
> > these: should it return the size with or without padding at end (i.e. 
> > could/should it return 9 even if sizeof is 12).  I can see arguments for 
> > both.
> 
> Currently, GCC’s __builtin_object_size use the following formula to 
> compute the object size for The structure with FAM:
> 
> offset(struct foo_flex, t[0]) + N * sizeof(foo->t);
> 
> I think it’s correct.

See above.  It's non-conservatively correct.  And that may be the right 
choice for this builtin, considering its intended use-cases (strict 

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-09 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Aug 2023, Qing Zhao wrote:

> Although this is an old FAM related issue that does not relate to my current 
> patch 
> (and might need to be resolved in a separate patch).  I think that it’s 
> necessary to have
> more discussion on this old issue and resolve it. 
> 
> The first thing that I’d like to confirm is:
> 
> What the exact memory layout for the following structure x?
> 
> struct foo { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } };
> 
> And the key that is confusing me is, where should the field “t” start? 
> 
> A.  Starting at offset 8 as the following:
> 
> a 4-bytes
> b 2-bytes
> padding   2-bytes
> t 3-bytes

Why should there be padding before 't'?  It's a char array (FAM or not), 
so it can be (and should be) placed directly after 'b'.  So ...

> B. Starting at offset 6 as the following:
> 
> a 4-bytes
> b 2-bytes
> t 3-bytes

... this is the correct layout, when seen in isolation.  The discussion 
revolves around what should come after 't': if it's a non-FAM struct (with 
t[3]), then it's clear that there needs to be padding after it, so to pad 
out the whole struct to be 12 bytes long (for sizeof() purpose), as 
required by its alignment (due to the int field 'a').

So, should the equivalent FAM struct also have this sizeof()?  If no: 
there should be a good argument why it shouldn't be similar to the non-FAM 
one.

Then there is an argument that the compiler would be fine, when allocating 
a single object of such type (not as part of an array!), to only reserve 9 
bytes of space for the FAM-struct.  Then the question is: should it also 
do that for a non-FAM struct (based on the argument that the padding 
behind 't' isn't accessible, and hence doesn't need to be alloced).  I 
think it would be quite unexpected for the compiler to actually reserve 
less space than sizeof() implies, so I personally don't buy that argument.  
For FAM or non-FAM structs.

Note that if one choses to allocate less space than sizeof implies that 
this will have quite some consequences for code generation, in that 
sometimes the instruction sequences (e.g. for copying) need to be careful 
to never access tail padding that should be there in array context, but 
isn't there in single-object context.  I think this alone should make it 
clear that it's advisable that sizeof() and allocated size agree.

As in: I think sizeof for both structs should return 12, and 12 bytes 
should be reserved for objects of such types.

And then the next question is what __builtin_object_size should do with 
these: should it return the size with or without padding at end (i.e. 
could/should it return 9 even if sizeof is 12).  I can see arguments for 
both.


Ciao,
Michael.


RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Aug 2023, Zhang, Annita via Gcc-patches wrote:

> > The question is whether you want to mandate the 16-bit floating point
> > extensions.  You might get better adoption if you stay compatible with 
> > shipping
> > CPUs.  Furthermore, the 256-bit tuning apparently benefits current Intel 
> > CPUs,
> > even though they can do 512-bit vectors.
> > 
> > (The thread subject is a bit misleading for this sub-topic, by the way.)
> > 
> > Thanks,
> > Florian
> 
> Since 256bit and 512bit are diverged from AVX10.1 and will continue in 
> the future AVX10 versions, I think it's hard to keep a single version 
> number to cover both and increase monotonically. Hence I'd like to 
> suggest x86-64-v5 for 512bit and x86-64-v5-256 for 256bit, and so on.

The raison d'etre for the x86-64-vX scheme is to make life sensible as 
distributor.  That goal can only be achieved if this scheme contains only 
a few components that have a simple relationship.  That basically means: 
one dimension only.  If you now add a second dimension (with and without 
-512) we have to add another one if Intel (or whomever else) next does a 
marketing stunt for feature "foobar" and end up with x86-64-v6, 
x86-64-v6-512, x86-64-v6-1024, x86-64-v6-foobar, x86-64-v6-512-foobar, 
x86-64-v6-1024-foobar.

In short: no.

It isn't the right time anyway to assign meaning to x86-64-v5, as it 
wasn't the right time for assigning x86-64-v4 (as we now see).  These are 
supposed to reflect generally useful feature sets actually shipped in 
generally available CPUs in the market, and be vendor independend.  As 
such it's much too early to define v5 based purely on text documents.


Ciao,
Michael.


Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-08 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 8 Aug 2023, Martin Uecker via Gcc-patches wrote:

> There at least three different size expression which could
> make sense. Consider
> 
> short foo { int a; short b; char t[]; }; 
> 
> Most people seem to use
> 
> sizeof(struct foo) + N * sizeof(foo->t);
> 
> which for N == 3 yields 11 bytes on x86-64 because the formula
> adds the padding of the original struct. There is an example
> in the  C standard that uses this formula.
> 
> 
> But he minimum size of an object which stores N elements is
> 
> max(sizeof (struct s), offsetof(struct s, t[n]))
> 
> which is 9 bytes. 

But should it really?  struct sizes should usually be a multiple of a 
structs alignment, and if int is 4-aligned only size 12 would be correct. 
I don't see why one should deviate from that general rule for sizes for 
FAM structs.  (I realize that the above is not about sizeof(), but rather 
bdos/bos, but I think it's fairly useful that both agree when possbible).

Say, if you were to allocate an array of such struct foos, each having 3 
elements in foo.t[].  You need to add 12 bytes to go to the next array 
element, not 9.  (Of course arrays of FAM-structs are somewhat meh, but 
still).  It's true that you would be allowed to rely on only 9 bytes of 
those 12 bytes (the rest being padding), so perhaps it's really the right 
answer for __bos, but, hmm, 12 seems to be "more correct" to my guts :-)


Ciao,
Michael.


Re: [RFC] GCC Security policy

2023-08-08 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 8 Aug 2023, Jakub Jelinek via Gcc-patches wrote:

> What I'd really like to avoid is having all compiler bugs (primarily ICEs)
> considered to be security bugs (e.g. DoS category), it would be terrible to
> release every week a new compiler because of the "security" issues.
> Running compiler on untrusted sources can trigger ICEs (which we want to fix
> but there will always be some), or run into some compile time and/or compile
> memory issue (we have various quadratic or worse spots), compiler stack
> limits (deeply nested stuff e.g. during parsing but other areas as well).
> So, people running fuzzers and reporting issues is great, but if they'd get
> a CVE assigned for each ice-on-invalid-code, ice-on-valid-code,
> each compile-time-hog and each memory-hog, that wouldn't be useful.

This!  Double-this!

FWIW, the binutils security policy, and by extension the proposed GCC 
policy David posted, handles this.  (To me this is the most important 
aspect of such policy, having been on the receiving end of such nonsense 
on the binutils side).

> Runtime libraries or security issues in the code we generate for valid
> sources are of course a different thing.

Generate or otherwise provide for consumption.  E.g. a bug with security 
consequences in the runtime libs (either in source form (templates) or as 
executable code, but with the problem being in e.g. libgcc sources 
(unwinder!)) needs proper handling, similar to how glibc is handled.


Ciao,
Michael.


Re: _BitInt vs. _Atomic

2023-08-02 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 1 Aug 2023, Joseph Myers wrote:

> > Only because cmpxchg is defined in terms of memcpy/memcmp.  If it were 
> > defined in terms of the == operator (obviously applied recursively 
> > member-wise for structs) and simple-assignment that wouldn't be a problem.  
> 
> It also wouldn't work for floating point, where I think clearly the atomic 
> operations should consider positive and negative zero as different, and 
> should consider different DFP quantum exponents for the same real number 
> as different - but should also consider the same NaN (same payload, same 
> choice of quiet / signaling) as being the same.

That is all true.  But the current wording can't work either.  It happily 
requires copying around memory between types of different representations 
and sizes, it makes padding observable behaviour, and due to that makes 
basic algebraic guarantees not be observed (after two values are checked 
equal with the predicates of the algrabra, they are not then in fact equal 
with predicates of the same algebra).


Ciao,
Michael.


Re: _BitInt vs. _Atomic

2023-08-01 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 31 Jul 2023, Martin Uecker wrote:

> >  Say you have a loop like so:
> > 
> > _Atomic T obj;
> > ...
> > T expected1, expected2, newval;
> > newval = ...;
> > expected1 = ...;
> > do {
> >   expected2 = expected1;
> >   if (atomic_compare_exchange_weak(, , newval);
> > break;
> >   expected1 = expected2;
> > } while (1);
> > 
> > As written this looks of course stupid, and you may say "don't do that", 
> > but internally the copies might result from temporaries (compiler 
> > generated or wrapper function arguments, or suchlike). 
> >  Now, while 
> > expected2 will contain the copied padding bits after the cmpxchg the 
> > copies to and from expected1 will possibly destroy them.  Either way I 
> > don't see why the above loop should be out-of-spec, so I can write it and 
> > expect it to proceed eventually (certainly when the _strong variant is 
> > used).  Any argument that would declare the above loop out-of-spec I would 
> > consider a defect in the spec.
> 
> It is "out-of-spec" for C in the sense that it can not be
> expected work with the semantics as specified in the C standard.

(I call that a defect.  See below)

> In practice, what the semantics specified using memcpy/memcmp
> allow one to do is to also apply atomic operations on non-atomic 
> types.  This is not guaranteed to work by the C standard, but
> in practice  people often have to do this.  For example, nobody
> is going to copy a 256 GB numerical array with non-atomic types
> into another data structure with atomic versions of the same
> type just so that you can apply atomic operations on it.
> So one simply does an unsafe cast and hopes the compiler does
> not break this.
> 
> If the non-atomic struct now has non-zero values in the padding, 
> and the compiler would clear those automatically for "expected", 
> you would create the problem of an infinite loop (this time 
> for real).

Only because cmpxchg is defined in terms of memcpy/memcmp.  If it were 
defined in terms of the == operator (obviously applied recursively 
member-wise for structs) and simple-assignment that wouldn't be a problem.  
In addition that would get rid of all discussion of what happens or 
doesn't happen with padding.  Introducing reliance on padding bits (which 
IMHO goes against some fundamental ideas of the C standard) has 
far-reaching consequences, see below.  The current definition of the 
atomic_cmpxchg is also inconsistent with the rest of the standard:

We have:

  ... (C is non-atomic variant of A) ...
  _Bool atomic_compare_exchange_strong(volatile A *object,
   C *expected, C desired);
  ... (is equivalent to atomic variant of:) 
  if (memcmp(object, expected, sizeof (*object)) == 0)
{ memcpy(object, , sizeof (*object)); return true; }
  else
{ memcpy(expected, object, sizeof (*object)); return false; }

But we also have:

  The size, representation, and alignment of an atomic type need not be 
  the same as those of the corresponding unqualified type.

  (with later text only suggesting that at least for atomic integer 
  types these please be the same.  But here we aren't talking about
  integer types even.)

So, already the 'memcmp(object, expected, sizeof (*object)' may be 
undefined.  sizeof(*object) need not be the same as sizeof(*expected).
In particular the memcpy in the else branch might clobber memory outside 
*expected.

That alone should be sufficient to show that defining this all in terms of 
memcpy/memcmp is a bad idea.  But it also has other 
consequences: you can't copy (simple-assign) or compare (== operator) 
atomic values anymore reliably and expect the atomic_cmpxchg to work.  My 
example from earlier shows that you can't copy them, a similar one can be 
constructed for breaking ==.

But it goes further: you can also construct an example that shows an 
internal inconsistency just with using atomic_cmpxchg (of course, assume 
all this to run without concurrent accesses to the respective objects):

  _Atomic T obj;
  ...
  T expected, newval;
  expected = ...;
  newval = expected + 1; // just to make it different
  atomic_store (, expected);
  if (atomic_cmpxchg_strong (, , newval)) {
/* Now we have: obj == newval.
   Do we also have memcmp(,)==0? */
if (!atomic_cmpxchg_strong (, , expected)) {
  /* No, we can't rely on that!  */
  error("what's going on?");
}
  } else {
/* May happen, padding of expected may not be the same
   as in obj, even after atomic_store.  */
error("WTH? a compare after a store doesn't even work?");
  }

So, even though cmpxchg is defined in terms of memcpy/memcmp, we still 
can't rely on anything after it succeeded (or failed).  Simply because the 
by-value passing of the 'desired' argument will have unknown padding 
(within the implementation of cmpxchg) that isn't necessarily the same as 
the newval object.

Now, about your suggestion of clearing or ignoring the padding bits at 
specific 

Re: _BitInt vs. _Atomic

2023-07-31 Thread Michael Matz via Gcc-patches
Hello,

On Fri, 28 Jul 2023, Martin Uecker wrote:

> > > Sorry, somehow I must be missing something here.
> > > 
> > > If you add something you would create a new value and this may (in
> > > an object) have random new padding.  But the "expected" value should
> > > be updated by a failed atomic_compare_exchange cycle and then have
> > > same padding as the value stored in the atomic. So the next cycle
> > > should succeed.  The user would not change the representation of
> > > the "expected" value but create a new value for another object
> > > by adding something.
> > 
> > You're right that it would pass the expected value not something after an
> > operation on it usually.  But still, expected type will be something like
> > _BitInt(37) or _BitInt(195) and so neither the atomic_load nor what
> > atomic_compare_exchange copies back on failure is guaranteed to have the
> > padding bits preserved.
> 
> For atomic_load in C a value is returned. A value does not care about
> padding and when stored into a new object can produce new and different
> padding.  
> 
> But for atomic_compare_exchange the memory content is copied into 
> an object passed by pointer, so here the C standard requires to
> that the padding is preserved. It explicitely states that the effect
> is like:
> 
> if (memcmp(object, expected, sizeof(*object)) == 0)
>   memcpy(object, , sizeof(*object));
> else
>   memcpy(expected, object, sizeof(*object));
> 
> > It is true that if it is larger than 16 bytes the libatomic 
> > atomic_compare_exchange will memcpy the value back which copies the 
> > padding bits, but is there a guarantee that the user code doesn't 
> > actually copy that value further into some other variable?  
> 
> I do not think it would be surprising for C user when
> the next atomic_compare_exchange fails in this case.

But that is a problem (the same one the cited C++ paper tries to resolve, 
IIUC).  Say you have a loop like so:

_Atomic T obj;
...
T expected1, expected2, newval;
newval = ...;
expected1 = ...;
do {
  expected2 = expected1;
  if (atomic_compare_exchange_weak(, , newval);
break;
  expected1 = expected2;
} while (1);

As written this looks of course stupid, and you may say "don't do that", 
but internally the copies might result from temporaries (compiler 
generated or wrapper function arguments, or suchlike).  Now, while 
expected2 will contain the copied padding bits after the cmpxchg the 
copies to and from expected1 will possibly destroy them.  Either way I 
don't see why the above loop should be out-of-spec, so I can write it and 
expect it to proceed eventually (certainly when the _strong variant is 
used).  Any argument that would declare the above loop out-of-spec I would 
consider a defect in the spec.

It's never a good idea to introduce reliance on padding bits.  Exactly 
because you can trivially destroy them with simple value copies.

> > Anyway, for smaller or equal to 16 (or 8) bytes if 
> > atomic_compare_exchange is emitted inline I don't see what would 
> > preserve the bits.
> 
> This then seems to be incorrect for C.

Or the spec is.


Ciao,
Michael.


Re: [WIP RFC] Add support for keyword-based attributes

2023-07-17 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 17 Jul 2023, Richard Sandiford via Gcc-patches wrote:

> >> There are some existing attributes that similarly affect semantics
> >> in ways that cannot be ignored.  vector_size is one obvious example.
> >> But that doesn't make it a good thing. :)
> >...
> > If you had added __arm(bar(args)) instead of __arm_bar(args) you would only
> > need one additional keyword - we could set aside a similar one for each
> > target then.  I realize that double-nesting of arguments might prove a bit
> > challenging but still.
> 
> Yeah, that would work.

So, essentially you want unignorable attributes, right?  Then implement 
exactly that: add one new keyword "__known_attribute__" (invent a better 
name, maybe :) ), semantics exactly as with __attribute__ (including using 
the same underlying lists in our data structures), with only one single 
deviation: instead of the warning you give an error for unhandled 
attributes.  Done.

(Old compilers will barf of the unknown new keyword, new compilers will 
error on unknown values used within such attribute list)

> > In any case I also think that attributes are what you want and their 
> > ugliness/issues are not worse than the ugliness/issues of the keyword 
> > approach IMHO.
> 
> I guess the ugliness of keywords is the double underscore? What are the 
> issues with the keyword approach though?

There are _always_ problems with new keywords, the more new keywords the 
more problems :-)  Is the keyword context-sensitive or not?  What about 
existing system headers that use it right now?  Is it recognized in 
free-standing or not?  Is it only recognized for some targets?  Is it 
recognized only for certain configurations of the target?

So, let's define one new mechanism, for all targets, all configs, and all 
language standards.  Let's use __attribute__ with a twist :)


Ciao,
Michael.


Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Michael Matz via Gcc-patches
Hey,

On Tue, 11 Jul 2023, Alexander Monakov via Gcc-patches wrote:

> > > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> > > 
> > > This is the weak/active form; I'd suggest "preserve_high_sse".
> > 
> > But it preserves only the low parts :-)  You swapped the two in your 
> > mind when writing the reply?
> 
> Ahhh. By "high SSE" I mean the high-numbered SSE regs, i.e. xmm8-15, not
> the higher halves of (unspecified subset of) SSE regs.

Ah, gotcha :-)  It just shows that all these names are confusing.  Maybe 
I'll just go with "attribute1" and "attribute2" and rely on docu.  (SCNR)


Ciao,
Michael.


Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 10 Jul 2023, Alexander Monakov wrote:

> I think the main question is why you're going with this (weak) form
> instead of the (strong) form "may only clobber the low XMM regs":

I want to provide both.  One of them allows more arbitrary function 
definitions, the other allows more register (parts) to be preserved.  I 
feel both have their place.

> as Richi noted, surely for libcalls we'd like to know they preserve
> AVX-512 mask registers as well?

Yeah, mask registers.  I'm still pondering this.  We would need to split 
the 8 maskregs into two parts.  Hmm.

> Note this interacts with anything that interposes between the caller
> and the callee, like the Glibc lazy binding stub (which used to
> zero out high halves of 512-bit arguments in ZMM registers).
> Not an immediate problem for the patch, just something to mind perhaps.

Yeah, needs to be kept in mind indeed.  Anything coming in between the 
caller and a so-marked callee needs to preserve things.

> > I chose to make it possible to write function definitions with that
> > attribute with GCC adding the necessary callee save/restore code in
> > the xlogue itself.
> 
> But you can't trivially restore if the callee is sibcalling — what
> happens then (a testcase might be nice)?

I hoped early on that the generic code that prohibits sibcalls between 
call sites of too "different" ABIs would deal with this, and then forgot 
to check.  Turns out you had a good hunch here, it actually does a 
sibcall, destroying the guarantees.  Thanks! :)

> > Carefully note that this is only possible for the SSE2 registers, as 
> > other parts of them would need instructions that are only optional.
> 
> What is supposed to happen on 32-bit x86 with -msse -mno-sse2?

Hmm.  I feel the best answer here is "that should error out".  I'll add a 
test and adjust patch if necessary.

> > When a function doesn't contain calls to
> > unknown functions we can be a bit more lenient: we can make it so that
> > GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> > necessary.
> 
> What if the source code has a local register variable bound to xmm15,
> i.e. register double x asm("xmm15"); asm("..." : "+x"(x)); ?

Makes a good testcase as well.  My take: it's acceptable with the 
only-sse2-preserved attribute (xmm15 will in this case be saved/restored), 
and should be an error with the everything-preserved attribute (maybe we 
can make an exception as here we only specify an XMM reg, instead of 
larger parts).

> > To that end I introduce actually two related attributes (for naming
> > see below):
> > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> 
> This is the weak/active form; I'd suggest "preserve_high_sse".

But it preserves only the low parts :-)  You swapped the two in your 
mind when writing the reply?

> > I would welcome any comments, about the names, the approach, the attempt
> > at documenting the intricacies of these attributes and anything.
> 
> I hope the new attributes are supposed to be usable with function 
> pointers? From the code it looks that way, but the documentation doesn't 
> promise that.

Yes, like all ABI influencing attributes they _have_ to be part of the 
functions type (and hence transfer to function pointers), with appropriate 
incompatible-conversion errors and warnings at the appropriate places.  (I 
know that this isn't always the way we're dealing with ABI-infuencing 
attributes, and often refer to a decl only.  All those are actual bugs.)

And yes, I will adjust the docu to be explicit about this.


Ciao,
Michael.


Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 11 Jul 2023, Jan Hubicka wrote:

> > > > When a function doesn't contain calls to
> > > > unknown functions we can be a bit more lenient: we can make it so that
> > > > GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> > > > necessary.
> 
> One may also take into account that first 8 registers are cheaper to
> encode than the later 8, so perhaps we may want to choose range that
> contains both.

There is actually none in the low range that's usable.  xmm0/1 are used 
for return values and xmm2-7 are used for argument passing.  Arguments are 
by default callee clobbered, and we do not want to change this (or limit 
the number of register arguments for the alternate ABI).


Ciao,
Michael.


[x86-64] RFC: Add nosse abi attribute

2023-07-10 Thread Michael Matz via Gcc-patches
Hello,

the ELF psABI for x86-64 doesn't have any callee-saved SSE
registers (there were actual reasons for that, but those don't
matter anymore).  This starts to hurt some uses, as it means that
as soon as you have a call (say to memmove/memcpy, even if
implicit as libcall) in a loop that manipulates floating point
or vector data you get saves/restores around those calls.

But in reality many functions can be written such that they only need
to clobber a subset of the 16 XMM registers (or do the save/restore
themself in the codepaths that needs them, hello memcpy again).
So we want to introduce a way to specify this, via an ABI attribute
that basically says "doesn't clobber the high XMM regs".

I've opted to do only the obvious: do something special only for
xmm8 to xmm15, without a way to specify the clobber set in more detail.
I think such half/half split is reasonable, and as I don't want to
change the argument passing anyway (whose regs are always clobbered)
there isn't that much wiggle room anyway.

I chose to make it possible to write function definitions with that
attribute with GCC adding the necessary callee save/restore code in
the xlogue itself.  Carefully note that this is only possible for
the SSE2 registers, as other parts of them would need instructions
that are only optional.  When a function doesn't contain calls to
unknown functions we can be a bit more lenient: we can make it so that
GCC simply doesn't touch xmm8-15 at all, then no save/restore is
necessary.  If a function contains calls then GCC can't know which
parts of the XMM regset is clobbered by that, it may be parts
which don't even exist yet (say until avx2048 comes out), so we must
restrict ourself to only save/restore the SSE2 parts and then of course
can only claim to not clobber those parts.

To that end I introduce actually two related attributes (for naming
see below):
* nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
* noanysseclobber: claims (and ensures) that nothing of any of the
  registers overlapping xmm8-15 is clobbered (not even future, as of
  yet unknown, parts)

Ensuring the first is simple: potentially add saves/restore in xlogue
(e.g. when xmm8 is either used explicitely or implicitely by a call).
Ensuring the second comes with more: we must also ensure that no
functions are called that don't guarantee the same thing (in addition
to just removing all xmm8-15 parts alltogether from the available
regsters).

See also the added testcases for what I intended to support.

I chose to use the new target independend function-abi facility for
this.  I need some adjustments in generic code:
* the "default_abi" is actually more like a "current" abi: it happily
  changes its contents according to conditional_register_usage,
  and other code assumes that such changes do propagate.
  But if that conditonal_reg_usage is actually done because the current
  function is of a different ABI, then we must not change default_abi.
* in insn_callee_abi we do look at a potential fndecl for a call
  insn (only set when -fipa-ra), but doesn't work for calls through
  pointers and (as said) is optional.  So, also always look at the
  called functions type (it's always recorded in the MEM_EXPR for
  non-libcalls), before asking the target.
  (The function-abi accessors working on trees were already doing that,
  its just the RTL accessor that missed this)

Accordingly I also implement some more target hooks for function-abi.
With that it's possible to also move the other ABI-influencing code
of i386 to function-abi (ms_abi and friends).  I have not done so for
this patch.

Regarding the names of the attributes: gah!  I've left them at
my mediocre attempts of names in order to hopefully get input on better
names :-)

I would welcome any comments, about the names, the approach, the attempt
at documenting the intricacies of these attributes and anything.

FWIW, this particular patch was regstrapped on x86-64-linux
with trunk from a week ago (and sniff-tested on current trunk).


Ciao,
Michael.

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 37cb5a0dcc4..92358f4ac41 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3244,6 +3244,16 @@ ix86_set_indirect_branch_type (tree fndecl)
 }
 }
 
+unsigned
+ix86_fntype_to_abi_id (const_tree fntype)
+{
+  if (lookup_attribute ("nosseclobber", TYPE_ATTRIBUTES (fntype)))
+return ABI_LESS_SSE;
+  if (lookup_attribute ("noanysseclobber", TYPE_ATTRIBUTES (fntype)))
+return ABI_NO_SSE;
+  return ABI_DEFAULT;
+}
+
 /* Establish appropriate back-end context for processing the function
FNDECL.  The argument might be NULL to indicate processing at top
level, outside of any function scope.  */
@@ -3311,6 +3321,12 @@ ix86_set_current_function (tree fndecl)
   else
TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts ();
 }
+
+  unsigned prev_abi_id = 0;
+  if 

Re: [PATCH] Add targetm.libm_function_max_error

2023-04-27 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 27 Apr 2023, Jakub Jelinek wrote:

> The first really large error I see is for sinl with
> x/2gx 
> 0x748160ed90d9425b0xefd8b811d6293294
> i.e.
> 1.5926552660973502228303666578452949e+253
> with most significant double being
> 1.5926552660973502e+253
> and low double
> -5.9963639272208416e+230

Such large numbers will always be a problem with the range reduction step 
in sin/cos.  With double-double the possible mantissage length can be much 
larger than 106, and the range reduction needs to be precise to at 
least those many bits to give anything sensible.

Realistically I think you can only expect reasonably exact results for 
double-double on operations that require range-reductions for
(a) "smallish" values.  Where the low double is (say) <= 2^16 * PI, or
(b) where the low double is consecutive to the high double, i.e. the
overall mantissa size (including the implicit zeros in the middle) is 
less than 107 (or whatever the current precision for the 
range-reduction step on large values is)

> given is
> -0.4025472157704263326278375983156912
> and expected (mpfr computed)
> -0.46994008859023245970759964236618727
> But if I try on x86_64:
> #define _GNU_SOURCE
> #include 
> 
> int
> main ()
> {
>   _Float128 f, f2, f3, f4;
>   double d, d2;
>   f = 1.5926552660973502228303666578452949e+253f128;
>   d = 1.5926552660973502e+253;
>   f2 = d;
>   f2 += -5.9963639272208416e+230;
>   f3 = sinf128 (f);
>   f4 = sinf128 (f2);
>   d2 = sin (d);
>   return 0;
> }
> where I think f2 is what matches most closely the 106 bit precision value,
> (gdb) p f
> $7 = 1.5926552660973502228303666578452949e+253
> (gdb) p f2
> $8 = 1.59265526609735022283036665784527174e+253
> (gdb) p f3
> $9 = -0.277062522218693980443596385112227247
> (gdb) p f4
> $10 = -0.402547215770426332627837598315693221
> and f4 is much closer to the given than to expected.

Sure, but that's because f2 is only "close" to the double-double exact 
value of (1.5926552660973502e+253 + -5.9963639272208416e+230) relatively, 
not absolutely.  As you already wrote the mantissa to represent the exact 
value (which double-double and mpfr can!) is larger than 106 bits.  The 
necessary error of rounding to represent it in f128 is small in ULPs, but 
very very large in magnitude.  Large magnitude changes of input value to 
sin/cos essentially put the value into completely different quadrants and 
positions within those quadrants, and hence the result under such rounding 
in input can be _wildly_ off.

E.g. imagine a double-double representing (2^107 * PI + PI/2) exactly 
(assume PI is the 53-bit representation of pi, that's why I say 
"exactly").  The correct result of sin() on that is 1.  The result on the 
nearest f128 input value (2^107 * PI) will be 0.  So you really can't 
compare f128 arithmetic with double-double one when the mantissas are too 
far away.

So, maybe you want to only let your tester test "good" double-double 
values, i.e. those that can be interpreted as a about-106-bit number where 
(high-exp - low-exp) <= about 53.

(Or just not care about the similarities of cos() on double-double to a 
random number generator :) )


Ciao,
Michael.


Re: [PATCH] Add targetm.libm_function_max_error

2023-04-26 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 26 Apr 2023, Jakub Jelinek via Gcc-patches wrote:

> For glibc I've gathered data from:
> 4) using attached ulp-tester.c (how to invoke in file comment; tested
>both x86_64, ppc64, ppc64le 50M pseudo-random values in all 4 rounding
>modes, plus on x86_64 float/double sin/cos using libmvec - see
>attached libmvec-wrapper.c as well)

That ulp-tester.c file as attached here is not testing what you think it 
does.  (1) It doesn't compile as it doesn't #define the TESTS macro in the 
!LIBMVEC_TEST case, and (2) it almost never progresses 'm', the status 
variable used before the random numbers start, to beyond 1: you start with 
nextafter(0.0, 1.0), which is the smallest subnormal number (with a ERANGE 
error, but that's ignored), and you test for equality with THIS_MIN, the 
smallest normal (!) number, until you start incrementing 'm'.

>From subnormal smallest to normal smallest takes 1<

Re: [PATCH] lto: pass through -funwind-tables and -fasynchronous-unwind-tables

2023-01-18 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 18 Jan 2023, Jakub Jelinek wrote:

> > > > > Partly OT, what is riscv not defaulting that on as well?  Does it have
> > > > > usable unwind info even without that option, something else?
> > > > 
> > > > The RISC-V ABI does not address this, AFAICS.
> > > 
> > > And neither do many other ABIs, still we default there to
> > > -fasynchronous-unwind-tables because we've decided it is a good idea.
> > 
> > That might or might not be, but in the context of this thread that's 
> > immaterial.  Doing the same as the other archs will then simply hide the 
> > problem on risc-v as well, instead of fixing it.
> 
> Yeah, that is why I've mentioned "Partly OT".  We want this bug to be fixed
> (but the fix is not what has been posted but rather decide what we want to
> ask there; if it is at the end of compilation, whether it is at least one
> function with that flag has been compiled, or all functions have been with
> that flag, something else),

The answer to this should be guided by normal use cases.  The normal use 
case is that a whole input file is compiled with or without 
-funwind-tables, and not that individual functions change this.  So any 
solution in which that usecase doesn't work is not a solution.

The purest solution is to emit unwind tables for all functions that 
request it into .eh_frame and for those that don't request it put 
into .debug_frame (if also -g is on).  If that requires enabling 
unwind-tables globally first (i.e. if even just one input function 
requests it) then that is what needs to be done.  (this seems to be the 
problem currently, that the unwind-table activation on a per-function 
basis comes too late).

The easier solution might be to make funwind-tables also be a global 
special-cased option for LTO (so that the usual use-case above works), 
that would trade one or another bug, but I'd say the current bug is more 
serious than the other bug that would be introduced.

> and IMHO riscv should switch to
> -fasynchronous-unwind-tables by default.

I don't disagree, as long as that doesn't lead to the bug being ignored :)


Ciao,
Michael.


Re: [PATCH] lto: pass through -funwind-tables and -fasynchronous-unwind-tables

2023-01-18 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 18 Jan 2023, Jakub Jelinek wrote:

> On Wed, Jan 18, 2023 at 04:09:08PM +0100, Andreas Schwab wrote:
> > On Jan 18 2023, Jakub Jelinek wrote:
> > 
> > > Partly OT, what is riscv not defaulting that on as well?  Does it have
> > > usable unwind info even without that option, something else?
> > 
> > The RISC-V ABI does not address this, AFAICS.
> 
> And neither do many other ABIs, still we default there to
> -fasynchronous-unwind-tables because we've decided it is a good idea.

That might or might not be, but in the context of this thread that's 
immaterial.  Doing the same as the other archs will then simply hide the 
problem on risc-v as well, instead of fixing it.


Ciao,
Michael.


Re: [PATCH] lto: pass through -funwind-tables and -fasynchronous-unwind-tables

2023-01-18 Thread Michael Matz via Gcc-patches
On Wed, 18 Jan 2023, Andreas Schwab via Gcc-patches wrote:

> No unwind tables are generated, as if -funwind-tables is ignored.  If
> LTO is disabled, everything works as expected.

On Risc-V btw.  (which, unlike i386,aarch64,s390,rs6000 does not 
effectively enable funwind-tables always via either backend or 
common/config/$arch/ code, which is the reason why the problem can't be 
seen there).  It's an interaction with -g :

The problem (with cross compiler here, but shouldn't matter):

% riscv64-gcc -g -flto -funwind-tables -fPIC -c hello.c
% riscv64-gcc -shared hello.o
% readelf -wF a.out
... empty .eh_frame ...
Contents of the .debug_frame section:
... content ...

Using a compiler for any of the above archs makes this work (working means 
that the unwind info is placed into .eh_frame, _not_ into .debug_frame).  
Not using -g makes it work.  Adding -funwind-tables to the link command 
makes it work.  Adding -fexceptions to the compile command makes it work.  
Not using LTO makes it work.

So, it's quite clear that the option merging algorithm related to all this 
is somewhat broken, the global (or per function, or whatever) 
-funwind-tables option from hello.o doesn't make it correctly into the 
output (when -g is there).  Adding -fexception makes it work because then 
the functions will have personalities and on LTO-read-in _that_ will 
implicitely enable funwind-tables again (which should have been enabled 
already by the option-read-in).

As written initially the other archs are working because they all have 
various ways of essentially setting flag_unwind_tables always:

i386 via common/config/i386/i386-common.cc
   opts->x_flag_asynchronous_unwind_tables = 2;
  config/i386/i386-options.cc
 if (opts->x_flag_asynchronous_unwind_tables == 2)
   opts->x_flag_unwind_tables
 = opts->x_flag_asynchronous_unwind_tables = 1;

rs6000 via common/config/rs6000/rs6000-common.cc
   #ifdef OBJECT_FORMAT_ELF
 opts->x_flag_asynchronous_unwind_tables = 1;
   #endif
  (which ultimately also enabled flag_unwind_tables)

s390 via common/config/s390/s390-common.cc
opts->x_flag_asynchronous_unwind_tables = 1;

aarch64 via common/config/aarch64/aarch64-common.cc
  #if (TARGET_DEFAULT_ASYNC_UNWIND_TABLES == 1)
{ OPT_LEVELS_ALL, OPT_fasynchronous_unwind_tables, NULL, 1 },
{ OPT_LEVELS_ALL, OPT_funwind_tables, NULL, 1},
  #endif

  (the #define here is set for aarch64*-*-linux* )

So the problem cannot be readily demonstrated on these architectures.  
risc-v has no such code (and doesn't need to).


Ciao,
Michael.


Re: [PATCH 1/3] STABS: remove -gstabs and -gxcoff functionality

2022-11-10 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 10 Nov 2022, Martin Liška wrote:

> > These changes are part of
> > commit r13-2361-g7e0db0cdf01e9c885a29cb37415f5bc00d90c029
> > "STABS: remove -gstabs and -gxcoff functionality".  What this does is
> > remove these identifiers from "poisoning":
> > 
> >  /* As the last action in this file, we poison the identifiers that
> > shouldn't be used.
> >  [...]
> >  /* Other obsolete target macros, or macros that used to be in target
> > headers and were not used, and may be obsolete or may never have
> > been used.  */
> >   #pragma GCC poison [...]
> > 
> > Shouldn't these identifiers actually stay (so that any accidental future
> > use gets flagged, as I understand this machinery), and instead more
> > identifiers be added potentially: those where their definition/use got
> > removed with "STABS: remove -gstabs and -gxcoff functionality"?  (I've
> > not checked.)
> 
> Well, the identifiers are not used any longer, so I don't think we should
> poison them. Or do I miss something?

It's the very nature of poisoned identifiers that they aren't used (every 
use would get flagged as error).  The point of poisoning them is to avoid 
future new uses to creep in (e.g. via mislead back- or forward-ports, 
which can for instance happen easily with backend macros when an 
out-of-tree port is eventually tried to be integrated).  Hence, generally 
the list of those identifiers is only extended, never reduced.  (There may 
be exceptions of course)


Ciao,
Michael.


Re: [PATCH] sphinx: support Sphinx in lib*/Makefile.am.

2022-11-10 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 10 Nov 2022, Martin Liška wrote:

> This is a patch which adds support for Sphinx in lib*/Makefile.am where
> I wrongly modified Makefile.in that are generated.
> 
> One thing that's missing is that the generated Makefile.in does not
> contain 'install-info-am' target and thus the created info files
> are not installed with 'make install'. Does anybody know?

The whole generation/processing of '*info*' targets (and dvi,pdf,ps,html 
targets) is triggered by the presence of a 'TEXINFO' primary 
(here in the 'info_TEXINFO' variable), which you removed.  As the sphinx 
result is not appropriate for either TEXINFO or MANS primaries (the only 
ones in automake related specifically to documentation), you probably want 
to include them in the DATA primary.  For backward compatibility you might 
want to add your own {un,}install-info-am targets depending on 
{un,}install-data-am then, though I'm not sure why one would need one.

I currently don't quite see how you make the Sphinx results be installed 
at all, AFAICS there's no mention of them in any of the automake 
variables.  You have to list something somewhere (as said, probably in 
DATA) to enable automake to generate the usual set of Makefile targets.

(beware: I'm not an automake expert, so the above might turn out to be 
misleading advise :-) )


Ciao,
Michael.


> 
> Thanks,
> Martin
> 
> ---
>  libgomp/Makefile.am   |  27 ++-
>  libgomp/Makefile.in   | 275 +++---
>  libgomp/testsuite/Makefile.in |   3 +
>  libitm/Makefile.am|  26 ++-
>  libitm/Makefile.in| 278 ++
>  libitm/testsuite/Makefile.in  |   3 +
>  libquadmath/Makefile.am   |  37 ++--
>  libquadmath/Makefile.in   | 307 +++---
>  8 files changed, 208 insertions(+), 748 deletions(-)
> 
> diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
> index 428f7a9dab5..ab5e86b0f98 100644
> --- a/libgomp/Makefile.am
> +++ b/libgomp/Makefile.am
> @@ -11,6 +11,8 @@ config_path = @config_path@
>  search_path = $(addprefix $(top_srcdir)/config/, $(config_path))
> $(top_srcdir) \
> $(top_srcdir)/../include
>  +abs_doc_builddir = @abs_top_builddir@/doc
> +
>  fincludedir =
> $(libdir)/gcc/$(target_alias)/$(gcc_version)$(MULTISUBDIR)/finclude
>  libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
>  @@ -100,18 +102,6 @@ fortran.o: libgomp_f.h
>  env.lo: libgomp_f.h
>  env.o: libgomp_f.h
>  -
> -# Automake Documentation:
> -# If your package has Texinfo files in many directories, you can use the
> -# variable TEXINFO_TEX to tell Automake where to find the canonical
> -# `texinfo.tex' for your package. The value of this variable should be
> -# the relative path from the current `Makefile.am' to `texinfo.tex'.
> -TEXINFO_TEX   = ../gcc/doc/include/texinfo.tex
> -
> -# Defines info, dvi, pdf and html targets
> -MAKEINFOFLAGS = -I $(srcdir)/../gcc/doc/include
> -info_TEXINFOS = libgomp.texi
> -
>  # AM_CONDITIONAL on configure option --generated-files-in-srcdir
>  if GENINSRC
>  STAMP_GENINSRC = stamp-geninsrc
> @@ -127,7 +117,7 @@ STAMP_BUILD_INFO =
>  endif
>   -all-local: $(STAMP_GENINSRC)
> +all-local: $(STAMP_GENINSRC) $(STAMP_BUILD_INFO)
>   stamp-geninsrc: libgomp.info
>   cp -p $(top_builddir)/libgomp.info $(srcdir)/libgomp.info
> @@ -135,8 +125,15 @@ stamp-geninsrc: libgomp.info
>   libgomp.info: $(STAMP_BUILD_INFO)
>  -stamp-build-info: libgomp.texi
> - $(MAKEINFO) $(AM_MAKEINFOFLAGS) $(MAKEINFOFLAGS) -I $(srcdir) -o
> libgomp.info $(srcdir)/libgomp.texi
> +RST_FILES:=$(shell find $(srcdir) -name *.rst)
> +SPHINX_CONFIG_FILES:=$(srcdir)/doc/conf.py $(srcdir)/../doc/baseconf.py
> +SPHINX_FILES:=$(RST_FILES) $(SPHINX_CONFIG_FILES)
> +
> +stamp-build-info: $(SPHINX_FILES)
> + + if [ x$(HAS_SPHINX_BUILD) = xhas-sphinx-build ]; then \
> +   make -C $(srcdir)/../doc info SOURCEDIR=$(abs_srcdir)/doc
> BUILDDIR=$(abs_doc_builddir)/info SPHINXBUILD=$(SPHINX_BUILD); \
> +   cp ./doc/info/texinfo/libgomp.info libgomp.info; \
> + else true; fi
>   @touch $@
>   diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
> index 814ccd13dc0..4d0f2184e95 100644
> --- a/libgomp/Makefile.in
> +++ b/libgomp/Makefile.in
> @@ -177,7 +177,7 @@ am__uninstall_files_from_dir = { \
>  || { echo " ( cd '$$dir' && rm -f" $$files ")"; \
>   $(am__cd) "$$dir" && rm -f $$files; }; \
>}
> -am__installdirs = "$(DESTDIR)$(toolexeclibdir)" "$(DESTDIR)$(infodir)" \
> +am__installdirs = "$(DESTDIR)$(toolexeclibdir)" \
>   "$(DESTDIR)$(fincludedir)" "$(DESTDIR)$(libsubincludedir)" \
>   "$(DESTDIR)$(toolexeclibdir)"
>  LTLIBRARIES = $(toolexeclib_LTLIBRARIES)
> @@ -269,16 +269,9 @@ am__v_FCLD_0 = @echo "  FCLD" $@;
>  am__v_FCLD_1 =
>  SOURCES = $(libgomp_plugin_gcn_la_SOURCES) \
>   $(libgomp_plugin_nvptx_la_SOURCES) $(libgomp_la_SOURCES)
> -AM_V_DVIPS = $(am__v_DVIPS_@AM_V@)
> -am__v_DVIPS_ = 

Re: [RFC] docs: remove documentation for unsupported releases

2022-11-09 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Nov 2022, Martin Liška wrote:

> I think we should remove documentation for unsupported GCC releases
> as it's indexed by Google engine. Second reason is that the page is long
> one one can't easily jump to Current development documentation.
> 
> Thoughts?

I think everything that's on the web server (even the 2.95 docu) has to be 
reachable via a (reasonable) set of clicks from the main index.html.  It 
doesn't need to be _directly_ from the main index.html, though.

Also, you simply might want to move the "Current Development" section 
to the front if it's currently too cumbersome to reach.

(E.g. one could move the index of the obsolete versions to a different web 
page, make that one un-indexable by google, but leave a trace to that one 
from the main index.html).


Ciao,
Michael.


> Martin
> 
> ---
>  htdocs/onlinedocs/index.html | 1294 --
>  1 file changed, 1294 deletions(-)
> 
> diff --git a/htdocs/onlinedocs/index.html b/htdocs/onlinedocs/index.html
> index cfa8bf5a..138dde95 100644
> --- a/htdocs/onlinedocs/index.html
> +++ b/htdocs/onlinedocs/index.html
> @@ -255,1300 +255,6 @@
>   href="https://gcc.gnu.org/onlinedocs/gcc-10.4.0/docs-sources.tar.gz;>Texinfo
>   sources of all the GCC 10.4 manuals
>
> -
> -  GCC 9.5 manuals:
> -  
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gcc/;>GCC
> - 9.5 Manual ( - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gcc.pdf;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gcc.ps.gz;>PostScript or  - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gcc-html.tar.gz;>an
> - HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gfortran/;>GCC
> - 9.5 GNU Fortran Manual ( - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gfortran.pdf;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gfortran.ps.gz;>PostScript 
> or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gfortran-html.tar.gz;>an
> - HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/cpp/;>GCC
> - 9.5 CPP Manual ( - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/cpp.pdf;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/cpp.ps.gz;>PostScript or  - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/cpp-html.tar.gz;>an
> - HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_rm/;>GCC
> - 9.5 GNAT Reference Manual ( - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_rm.pdf;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_rm.ps.gz;>PostScript 
> or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_rm-html.tar.gz;>an
> - HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_ugn/;>GCC
> - 9.5 GNAT User's Guide ( - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_ugn.pdf;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_ugn.ps.gz;>PostScript 
> or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gnat_ugn-html.tar.gz;>an
> - HTML tarball)
> - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++/manual/;>GCC
> - 9.5 Standard C++ Library Manual  ( - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-manual.pdf.gz;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-manual.xml.gz;>XML
>  or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-manual-html.tar.gz;>an
> - HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++/api/;>GCC
> - 9.5 Standard C++ Library Reference Manual  ( - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-api.pdf.gz;>also
> - in PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-api.xml.gz;>XML 
> GPL or
> -  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-api-gfdl.xml.gz;>XML 
> GFDL or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libstdc++-api-html.tar.gz;>an
> - HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gccgo/;>GCCGO 9.5 
> Manual ( -   href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gccgo.pdf;>also in
> -   PDF or  -   
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gccgo.ps.gz;>PostScript or 
>  -   
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/gccgo-html.tar.gz;>an
> -   HTML tarball)
> -https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libgomp/;>GCC 9.5
> - GNU Offloading and Multi Processing Runtime Library Manual ( - href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libgomp.pdf;>also in
> - PDF or  - 
> href="https://gcc.gnu.org/onlinedocs/gcc-9.5.0/libgomp.ps.gz;>PostScript 
> or  -

Re: Remove support for Intel MIC offloading (was: [PATCH] Remove dead code.)

2022-10-20 Thread Michael Matz via Gcc-patches
Hey,

On Thu, 20 Oct 2022, Thomas Schwinge wrote:

> This had been done in
> wwwdocs commit 5c7ecfb5627e412a3d142d8dc212f4cd39b3b73f
> "Document deprecation of OpenMP MIC offloading in GCC 12".
> 
> I'm sad about this, because -- in theory -- such a plugin is very useful
> for offloading simulation/debugging (separate host/device memory spaces,
> allow sanitizers to run on offloaded code

Yeah, I think that's a _very_ useful feature, but indeed ...

> (like LLVM a while ago
> implemented), and so on), but all that doesn't help -- in practice -- if
> nobody is maintaining that code.

... it should then be somewhat maintained properly.  Maybe the 
MIC-specifics could be removed from the code, and it could be transformed 
into a "null"-offload target, as example and testing vehicle (and implying 
that such new liboffloadmic^H^H^Hnull would have its upstream in the GCC 
repo).  Alas, if noone is going to do that work removing is the right 
choice.


Ciao,
Michael.


Re: [PATCH 1/2] gcov: test switch/break line counts

2022-10-11 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 11 Oct 2022, Jørgen Kvalsvik wrote:

> I apologise for the confusion. The diff there is not a part of the 
> change itself (note the indentation) but rather a way to reproduce,

Ah!  Thanks, that explains it, sorry for adding confusion on top :-)


Ciao,
Michael.


Re: [PATCH 1/2] gcov: test switch/break line counts

2022-10-11 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 11 Oct 2022, Jørgen Kvalsvik via Gcc-patches wrote:

> The coverage support will under some conditions decide to split edges to
> accurately report coverage. By running the test suite with/without this
> edge splitting a small diff shows up, addressed by this patch, which
> should catch future regressions.
> 
> Removing the edge splitting:
> 
> $ diff --git a/gcc/profile.cc b/gcc/profile.cc
> --- a/gcc/profile.cc
> +++ b/gcc/profile.cc
> @@ -1244,19 +1244,7 @@ branch_prob (bool thunk)
> Don't do that when the locuses match, so
> if (blah) goto something;
> is not computed twice.  */
> - if (last
> - && gimple_has_location (last)
> - && !RESERVED_LOCATION_P (e->goto_locus)
> - && !single_succ_p (bb)
> - && (LOCATION_FILE (e->goto_locus)
> - != LOCATION_FILE (gimple_location (last))
> - || (LOCATION_LINE (e->goto_locus)
> - != LOCATION_LINE (gimple_location (last)
> -   {
> - basic_block new_bb = split_edge (e);
> - edge ne = single_succ_edge (new_bb);
> - ne->goto_locus = e->goto_locus;
> -   }
> +

Assuming this is correct (I really can't say) then the comment needs 
adjustments.  It specifically talks about this very code you remove.


Ciao,
Michael.


Re: [PATCH] [PR68097] frange::set_nonnegative should not contain -NAN.

2022-09-20 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 20 Sep 2022, Aldy Hernandez wrote:

> > FWIW, in IEEE, 'abs' (like 'copy, 'copysign' and 'negate') are not
> > arithmetic, they are quiet-computational.  Hence they don't rise
> > anything, not even for sNaNs; they copy the input bits and appropriately
> > modify the bit pattern according to the specification (i.e. fiddle the
> > sign bit).
> >
> > That also means that a predicate like negative_p(x) that would be
> > implemented ala
> >
> >   copysign(1.0, x) < 0.0
> 
> I suppose this means -0.0 is not considered negative,

It would be considered negative if the predicate is implemented like 
above:
   copysign(1.0, -0.0) == -1.0

But really, that depends on what _our_ definition of negative_p is 
supposed to be.  I think the most reasonable definition is indeed similar 
to above, which in turn is equivalent to simply looking at the sign bit 
(which is what copysign() does), i.e. ...

> though it has
> the signbit set?  FWIW, on real_value's real_isneg() returns true for
> -0.0 because it only looks at the sign.

... this seems the sensible thing.  I just wanted to argue the case that 
set_negative (or the like) which "sets" the sign bit does not make the 
nan-ness go away.  They are orthogonal.

> > deal with NaNs just fine and is required to correctly capture the sign of
> > 'x'.  If frange::set_nonnegative is supposed to be used in such contexts
> > (and I think it's a good idea if that were the case), then set_nonnegative
> > does _not_ imply no-NaN.
> >
> > In particular I would assume that, given an VAYRING frange FR, that
> > FR.set_nonnegative() would result in an frange {[+0.0,+inf],+nan} .
> 
> That was my understanding as well, and what my original patch did.
> But again, I'm just the messenger.

Ah, I obviously haven't followed the thread carefully then.  If that's 
what it was doing then IMO it was the right thing.


Ciao,
Michael.


Re: [PATCH] [PR68097] frange::set_nonnegative should not contain -NAN.

2022-09-19 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 19 Sep 2022, Richard Biener via Gcc-patches wrote:

> > but I guess it's good we do the right thing for correctness sake, and
> > if it ever gets used by someone else.
> >
> > >
> > > That said, 'set_nonnegative' could be interpreted to be without
> > > NaNs?
> >
> > Sounds good to me.  How's this?
> 
> Hmm, I merely had lots of questions above so I think to answer them
> we at least should document what 'set_nonnegative' implies in the
> abstract vrange class.
> 
> It's probably safer to keep NaN in.  For example unfolded copysign (x, 1.)
> will return true for x == -NaN but the result will be a NaN.

FWIW, in IEEE, 'abs' (like 'copy, 'copysign' and 'negate') are not 
arithmetic, they are quiet-computational.  Hence they don't rise 
anything, not even for sNaNs; they copy the input bits and appropriately 
modify the bit pattern according to the specification (i.e. fiddle the 
sign bit).

That also means that a predicate like negative_p(x) that would be 
implemented ala

  copysign(1.0, x) < 0.0

deal with NaNs just fine and is required to correctly capture the sign of 
'x'.  If frange::set_nonnegative is supposed to be used in such contexts 
(and I think it's a good idea if that were the case), then set_nonnegative 
does _not_ imply no-NaN.

In particular I would assume that, given an VAYRING frange FR, that 
FR.set_nonnegative() would result in an frange {[+0.0,+inf],+nan} .


Ciao,
Michael.


Re: [PATCH 2/3] rename DBX_REGISTER_NUMBER to DEBUGGER_REGISTER_NUMBER

2022-09-01 Thread Michael Matz via Gcc-patches
Hello,

okay, I'll bite :)  DBG_REGISTER_NUMBER?  DEBUGGER_REGNO?


Ciao,
Michael.


Re: [PATCH] match.pd: Add bitwise and pattern [PR106243]

2022-08-04 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 3 Aug 2022, Jeff Law via Gcc-patches wrote:

> > .optimized dump shows:
> >_1 = ABS_EXPR ;
> >_3 = _1 & 1;
> >return _3;
> > 
> > altho combine simplifies it to x & 1 on RTL, resulting in code-gen:
> > f1:
> >  and w0, w0, 1
> >  ret
> Doesn't the abs(x) & mask simplify to x & mask for any mask where the sign bit
> of x is off

No.  Only the lowest bit remains the same between x and -x, all others 
might or might not be inverted (first counter example: x=-3, mask=3).


Ciao,
Michael.


Re: [PATCH] Add GIMPLE switch support to loop unswitching

2022-05-25 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 25 May 2022, Richard Biener via Gcc-patches wrote:

> I guess we might want to (warn about labels in the "toplevel"
> scope in switch statements).  So warn about
> 
> switch (x)
> {
> case 1:
> bar:
> };

That style is actually used quite some time in GCC itself.  Sometimes with 
statements between 'case 1:' and 'bar:'.

> Maybe we just want to warn when the label is otherwise unused?

We do with -Wunused-labels (part of -Wall).  The testcases simply aren't 
compiled with that.


Ciao,
Michael.


Re: [x86 PATCH] Avoid andn and generate shorter not;and with -Oz.

2022-04-13 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 13 Apr 2022, Roger Sayle wrote:

> The x86 instruction encoding for SImode andn is longer than the
> equivalent notl/andl sequence when the source for the not operand
> is the same register as the destination.

_And_ when no REX prefixes are necessary for the notl,andn, which they are 
if the respective registers are %r8 or beyond.  As you seem to be fine 
with saving just a byte you ought to test that as well to not waste one 
again :-)


Ciao,
Michael.


Re: Consider 'TDF_UID', 'TDF_NOUID' in 'print_node_brief', 'print_node'

2022-02-10 Thread Michael Matz via Gcc-patches
Hi,

On Thu, 10 Feb 2022, Richard Biener via Gcc-patches wrote:

> On Wed, Feb 9, 2022 at 2:21 PM Thomas Schwinge  
> wrote:
> >
> > Hi!
> >
> > OK to push (now, or in next development stage 1?) the attached
> > "Consider 'TDF_UID', 'TDF_NOUID' in 'print_node_brief', 'print_node'",
> > or should that be done differently -- or, per the current state (why?)
> > not at all?
> >
> > This does work for my current debugging task, but I've not yet run
> > 'make check' in case anything needs to be adjusted there.
> 
> Hmm, I wonder if we shouldn't simply dump DECL_UID as
> 
>  'uid NNN'

Yes, much better in line with the normal dump_tree output.


Ciao,
Michael.

> 
> somewhere.  For example after or before DECL_NAME?
> 
> >
> > Grüße
> >  Thomas
> >
> >
> > -
> > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 
> > 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: 
> > Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; 
> > Registergericht München, HRB 106955
> 


Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth

2022-02-09 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Feb 2022, Richard Biener wrote:

> > That breaks down when a birth is there (because it was otherwise 
> > reachable) but not on the taken path:
> > 
> >   if (nevertrue)
> > goto start;
> >   goto forw;
> >   start:
> >   {
> > int i;
> > printf("not really reachable, but unknowingly so\n");
> >   forw:
> > i = 1;
> >   }
> 
> I think to cause breakage you need a use of 'i' on the side-entry
> path that is not reachable from the path with the birth.  I guess sth like
> 
>if (nevertrue)
>  goto start;
>goto forw;
>start:
>{
>  int i = 0;
>  printf("not really reachable, but unknowingly so\n");
>  goto common;
>forw:
>  i = 1;
>common:
>  foo ();
>}
> 
> if we'd have a variable that's live only on the side-entry path
> then it could share the stack slot with 'i' this way, breaking
> things (now we don't move CLOBBERs so it isn't easy to construct
> such case).  The present dataflow would, for the above, indeed
> compute 'i' not live in the forw: block.

Yes, now that we have established (in the subthread with Joseph) that the 
value becomes indeterminate at decls we only need to care for not sharing 
storage invalidly, so yeah, some changes in the conflict computation still 
are needed.

> > Except for switches side-entries are really really seldom, so we might 
> > usefully get away with that latter solution.  And for switches it might be 
> > okay to put the births at the block containing the switch (if it itself 
> > doesn't have side entries, and the switch block doesn't have side 
> > entries except the case labels).
> > 
> > If the birth is moved to outward blocks it might be best if also the 
> > dealloc/death clobbers are moved to it, otherwise there might be paths 
> > containing a birth but no death.
> > 
> > The less precise you get with those births the more non-sharing you'll 
> > get, but that's the price.
> 
> Yes, sure.  I don't see a good way to place births during gimplification
> then.

Well, for each BIND you can compute if there are side entries at all, 
then, when lowering a BIND you put the births into the containing 
innermost BIND that doesn't have side-entries, instead of into the current 
BIND.

> The end clobbers rely on our EH lowering machinery.  For the entries we 
> might be able to compute GIMPLE BIND transitions during BIND lowering if 
> we associate labels with BINDs.  There should be a single fallthru into 
> the BIND at this point.  We could put a flag on the goto destination 
> labels whether they are reached from an outer BIND.
> 
>  goto inner;
>  {
>{
> int i;
>  {
>int j;
> inner:
>goto middle;
>  }
> middle:
>}
>  }
> 
> Since an entry CLOBBER is also a clobber we have to insert birth
> clobbers for all decls of all the binds inbetwee the goto source
> and the destination.  So for the above case on the edge to
> inner: have births for i and j and at the edge to middle we'd
> have none.

Correct, that's basically the most precise scheme, it's what I called 
try-finally topside-down ("always-before"? :) ).  (You have to care for 
computed goto! i.e. BINDs containing address-taken labels, which make 
things even uglier)  I think the easier way to deal with the above is to 
notice that the inner BIND has a side entry and conceptually move the 
decls outwards to BINDs that don't have such (or bind-crossing gotos), 
i.e. do as if it were:

  int i;  // moved
  int j;  // moved
  goto inner;
  {
{
  {
  inner:
goto middle;
  }
  middle:
}
  }

> Requires some kind of back-mapping from label to goto sources
> that are possibly problematic.  One issue is that GIMPLE
> lowering re-builds the BLOCK tree (for whatever reason...),
> so I'm not sure if we need to do it before that (for correctness).
> 
> Does that make sense?

I honestly can't say for 100% :-)  It sounds like it makes sense, yes.


Ciao,
Michael.


Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth

2022-02-09 Thread Michael Matz via Gcc-patches
Hey,

On Tue, 8 Feb 2022, Joseph Myers wrote:

> On Tue, 8 Feb 2022, Richard Biener via Gcc-patches wrote:
> 
> > which I think is OK?  That is, when the abstract machine
> > arrives at 'int i;' then the previous content in 'i' goes
> > away?  Or would
> 
> Yes, that's correct.  "If an initialization is specified for the object, 
> it is performed each time the declaration or compound literal is reached 
> in the execution of the block; otherwise, the value becomes indeterminate 
> each time the declaration is reached.".

Okay, that makes things easier then.  We can put the birth 
clobbers at their point of declaration, just the storage associated with a 
decl needs to survive for the whole block.  We still need to make sure 
that side entries skipping declarations correctly "allocate" such storage 
(by introducing proper conflicts with other objects), but at least values 
don't need to survive decls.


Ciao,
Michael.


Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth

2022-02-08 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 8 Feb 2022, Richard Biener wrote:

> > int state = 2, *p, camefrom1 = 0;
> > for (;;) switch (state) {
> >   case 1: 
> >   case 2: ;
> > int i;
> > if (state != 1) { p =  i = 0; }
> > if (state == 1) { (*p)++; return *p; }
> > state = 1;
> > continue;
> > }
> > 
> > Note how i is initialized during state 2, and needs to be kept initialized 
> > during state 1, so there must not be a CLOBBER (birth or other) at the 
> > point of the declaration of 'i'.  AFAICS in my simple tests a DECL_EXPR 
> > for 'i' is placed with the statement associated with 'case 2' label, 
> > putting a CLOBBER there would be the wrong thing.  If the decl had an 
> > initializer it might be harmless, as it would be overwritten at that 
> > place, but even so, in this case there is no initializer.  Hmm.
> 
> You get after gimplification:
> 
>   state = 2;
>   camefrom1 = 0;
>   :
>   switch (state) , case 1: , case 2: >
>   {
> int i;
> 
> try
>   {
> i = {CLOBBER(birth)};  /// ignore, should go away
> :
> :
> i = {CLOBBER(birth)};

I think this clobber here would be the problem, because ...

> which I think is OK?  That is, when the abstract machine
> arrives at 'int i;' then the previous content in 'i' goes
> away?  Or would
> 
> int foo()
> {
>goto ick;
> use:
>int i, *p;
>return *p;
> ick:
>i = 1;
>p = 
>goto use;
> 
> }
> 
> require us to return 1?

... I think this is exactly the logical consequence of lifetime of 'i' 
being the whole block.  We need to return 1. (Joseph: correct me on that! 
:) )  That's what I was trying to get at with my switch example as well.

> With the current patch 'i' is clobbered before the return.
> 
> > Another complication arises from simple forward jumps:
> > 
> >   goto forw;
> >   {
> > int i;
> > printf("not reachable\n");
> >   forw:
> > i = 1;
> >   }
> > 
> > Despite the jump skiping the unreachable head of the BLOCK, 'i' needs to 
> > be considered birthed at the label.  (In a way the placement of births 
> > exactly mirrors the placements of deaths (of course), every path from 
> > outside a BLOCK to inside needs to birth-clobber all variables (in C), 
> > like every path leaving needs to kill them.  It's just that we have a 
> > convenient construct for the latter (try-finally), but not for the former)
> 
> The situation with an unreachable birth is handled conservatively
> since we consider a variable without a (visible at RTL expansion time)
> birth to conflict with all other variables.

That breaks down when a birth is there (because it was otherwise 
reachable) but not on the taken path:

  if (nevertrue)
goto start;
  goto forw;
  start:
  {
int i;
printf("not really reachable, but unknowingly so\n");
  forw:
i = 1;
  }

> I don't see a good way to have a birth magically appear at 'forw' 
> without trying to argue that the following stmt is the first mentioning 
> the variable.

That's what my 'Hmm' aluded to :)  The only correct and precise way I see 
is to implement something similar like try-finally topside-down.  An 
easier but less precise way is to place the births at the (start of) 
innermost block containing the decl _and all jumps into the block_.  Even 
less presice, but perhaps even easier is to place the births for decls in 
blocks with side-entries into the function scope (and for blocks without 
side entries at their start).

Except for switches side-entries are really really seldom, so we might 
usefully get away with that latter solution.  And for switches it might be 
okay to put the births at the block containing the switch (if it itself 
doesn't have side entries, and the switch block doesn't have side 
entries except the case labels).

If the birth is moved to outward blocks it might be best if also the 
dealloc/death clobbers are moved to it, otherwise there might be paths 
containing a birth but no death.

The less precise you get with those births the more non-sharing you'll 
get, but that's the price.


Ciao,
Michael.


Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth

2022-02-08 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 8 Feb 2022, Richard Biener wrote:

> int foo(int always_true_at_runtime)
> {
>   {
> int *p;
> if (always_true_at_runtime)
>   goto after;
> lab:
> return *p;
> after:
> int i = 0;
> p = 
> if (always_true_at_runtime)
>   goto lab;
>   }
>   return 0;
> }
> 
> For the implementation I considered starting lifetime at a DECL_EXPR
> if it exists and otherwise at the start of the BIND_EXPR.  Note the
> complication is that control flow has to reach the lifetime-start
> clobber before the first access.  But that might also save us here,
> since for the example above 'i' would be live via the backedge
> from goto lab.
> 
> That said, the existing complication is that when the gimplifier
> visits the BIND_EXPR it has no way to know whether there will be
> a following DECL_EXPR or not (and the gimplifier notes there are
> cases a DECL_EXPR variable is not in a BIND_EXPR).  My plan is to
> compute this during one of the body walks the gimplifier performs
> before gimplifying.
> 
> Another complication is that at least the C frontend + gimplifier
> combination for
> 
>   switch (i)
>   {
>   case 1:
> int i;
> break;
>   }
> 
> will end up having the BIND_EXPR mentioning 'i' containing the
> case label, so just placing the birth at the BIND will make it
> unreachable as
> 
>   i = BIRTH;
> case_label_for_1:
>   ...

I think anything that needs to happen (conceptually) during the jump from 
switch to case-label needs to happen right before the jump, that might 
mean creating a new fake BLOCK that contains just the jump and the 
associated births.

> conveniently at least the C frontend has a DECL_EXPR for 'i'
> which avoids this and I did not find a way (yet) in the gimplifier
> to rectify this when gimplifying switches.

In C a 'case' label is nothing else than a normal label, it doesn't start 
it's own block or the like.  So also for that one the births would need to 
be at the start of their containing blocks.

> So there's still work to do but I think that starting the lifetime at a 
> DECL_EXPR if it exists is the way to go?

Depends on where the DECL_EXPR is exactly placed, but probably it wouldn't 
be okay.  You can't place the birth at the fall-through path, because this 
needs to work (basically your jump example above rewritten as switch):

int state = 2, *p, camefrom1 = 0;
for (;;) switch (state) {
  case 1: 
  case 2: ;
int i;
if (state != 1) { p =  i = 0; }
if (state == 1) { (*p)++; return *p; }
state = 1;
continue;
}

Note how i is initialized during state 2, and needs to be kept initialized 
during state 1, so there must not be a CLOBBER (birth or other) at the 
point of the declaration of 'i'.  AFAICS in my simple tests a DECL_EXPR 
for 'i' is placed with the statement associated with 'case 2' label, 
putting a CLOBBER there would be the wrong thing.  If the decl had an 
initializer it might be harmless, as it would be overwritten at that 
place, but even so, in this case there is no initializer.  Hmm.

Another complication arises from simple forward jumps:

  goto forw;
  {
int i;
printf("not reachable\n");
  forw:
i = 1;
  }

Despite the jump skiping the unreachable head of the BLOCK, 'i' needs to 
be considered birthed at the label.  (In a way the placement of births 
exactly mirrors the placements of deaths (of course), every path from 
outside a BLOCK to inside needs to birth-clobber all variables (in C), 
like every path leaving needs to kill them.  It's just that we have a 
convenient construct for the latter (try-finally), but not for the former)

Ciao,
Michael.


Re: [PATCH] libgcc: Actually force divide by zero

2022-02-03 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 3 Feb 2022, Richard Biener via Gcc-patches wrote:

> /* Preserve explicit divisions by 0: the C++ front-end wants to detect
>undefined behavior in constexpr evaluation, and assuming that the 
> division
>traps enables better optimizations than these anyway.  */
> (for div (trunc_div ceil_div floor_div round_div exact_div)
>  /* 0 / X is always zero.  */
>  (simplify
>   (div integer_zerop@0 @1)
>   /* But not for 0 / 0 so that we can get the proper warnings and errors.  
> */
>   (if (!integer_zerop (@1))
>@0))
> 
> it suggests we want to preserve all X / 0 when the 0 is literal but
> I think we can go a bit further and require 0/0 to not unnecessarily
> restrict other special cases.

Just remember that 0/0 is completely different from X/0 (with X != 0), the 
latter is a sensible limit, the former is just non-sense.  There's a 
reason why one is a NaN and the other Inf in floating point.  So it does 
make sense to differ between both on integer side as well.

(i'm split mind on the usefullness of "1/x -> 0" vs. its effect on 
trapping behaviour)


Ciao,
Michael.

> 
> Comments on the libgcc case below
> 
> > 2022-02-03  Jakub Jelinek  
> > 
> > * libgcc2.c (__udivmoddi4): Add optimization barriers to actually
> > ensure division by zero.
> > 
> > --- libgcc/libgcc2.c.jj 2022-01-11 23:11:23.810270199 +0100
> > +++ libgcc/libgcc2.c2022-02-03 09:24:14.513682731 +0100
> > @@ -1022,8 +1022,13 @@ __udivmoddi4 (UDWtype n, UDWtype d, UDWt
> > {
> >   /* qq = NN / 0d */
> >  
> > - if (d0 == 0)
> > -   d0 = 1 / d0;/* Divide intentionally by zero.  */
> > + if (__builtin_expect (d0 == 0, 0))
> > +   {
> > + UWtype one = 1;
> > + __asm ("" : "+g" (one));
> > + __asm ("" : "+g" (d0));
> > + d0 = one / d0;/* Divide intentionally by zero.  */
> > +   }
> 
> I'm not sure why we even bother - division or modulo by zero is
> undefined behavior and we are not emulating CPU behavior because
> the wide instructions emulated here do not actually exist.  That
> gives us the freedom of choosing the implementation defined
> behavior.
> 
> That said, _if_ we choose to "care" I'd rather choose to
> define the implementation to use the trap mechanism the
> target provides and thus use __builtin_trap ().  That then
> at least traps reliably, compared to the division by zero
> which doesn't do that on all targets.
> 
> So I'm not sure there's anything to fix besides eventually
> just deleting the d0 == 0 special case?
> 
> Richard.
> 


Re: [PATCH] Add CLOBBER_MARKS_EOL to mark storage end-of-life clobbers

2022-02-03 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 3 Feb 2022, Richard Biener via Gcc-patches wrote:

> > > It indeed did occur to me as well, so as we're two now I tried to 
> > > see how it looks like.  It does like the following (didn't bother to 
> > > replace all build_clobber calls but defaulted the parameter - there 
> > > are too many).
> > 
> > Except for this part I like it (I wouldn't call ca. 25 calls too 
> > many).  I often find optional arguments to be a long term maintenance 
> > burden when reading code.
> 
> But I think in this case it makes clear that the remaining callers are 
> un-audited

My pedantic answer to that would be that to make something clear you want 
to be explicit, and being explicit means writing something, not 
leaving something out, so you'd add another "CLOBBER_DONTKNOW" and use 
that for the unaudited calls.  Pedantic, as I said :)

But, of course, you built the shed, so paint it green, all right :)


Ciao,
Michael.


Re: [PATCH] Add CLOBBER_MARKS_EOL to mark storage end-of-life clobbers

2022-02-03 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 3 Feb 2022, Richard Biener wrote:

> > > This adds a flag to CONSTRUCTOR nodes indicating that for clobbers this 
> > > marks the end-of-life of storage as opposed to just ending the lifetime 
> > > of the object that occupied it. The dangling pointer diagnostics uses 
> > > CLOBBERs but is confused by those emitted by the C++ frontend for 
> > > example which emits them for the second purpose at the start of CTORs.  
> > > The issue is also appearant for aarch64 in PR104092.
> > > 
> > > Distinguishing the two cases is also necessary for the PR90348 fix.
> > 
> > (Just FWIW: I agree with the plan to have different types of CLOBBERs, in 
> > particular those for marking births)
> > 
> > A style nit:
> > 
> > > tree clobber = build_clobber (TREE_TYPE (t));
> > > +   CLOBBER_MARKS_EOL (clobber) = 1;
> > 
> > I think it really were better if build_clobber() gained an additional 
> > argument (non-optional!) that sets the type of it.
> 
> It indeed did occur to me as well, so as we're two now I tried to see
> how it looks like.  It does like the following (didn't bother to
> replace all build_clobber calls but defaulted the parameter - there
> are too many).

Except for this part I like it (I wouldn't call ca. 25 calls too 
many).  I often find optional arguments to be a long term maintenance 
burden when reading code.

(And yeah, enum good, putting the enum to tree_base.u, if it works, 
otherwise use your bit shuffling)


Ciao,
Michael.


Re: [PATCH] Add CLOBBER_MARKS_EOL to mark storage end-of-life clobbers

2022-02-02 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 2 Feb 2022, Richard Biener via Gcc-patches wrote:

> This adds a flag to CONSTRUCTOR nodes indicating that for clobbers this 
> marks the end-of-life of storage as opposed to just ending the lifetime 
> of the object that occupied it. The dangling pointer diagnostics uses 
> CLOBBERs but is confused by those emitted by the C++ frontend for 
> example which emits them for the second purpose at the start of CTORs.  
> The issue is also appearant for aarch64 in PR104092.
> 
> Distinguishing the two cases is also necessary for the PR90348 fix.

(Just FWIW: I agree with the plan to have different types of CLOBBERs, in 
particular those for marking births)

A style nit:

> tree clobber = build_clobber (TREE_TYPE (t));
> +   CLOBBER_MARKS_EOL (clobber) = 1;

I think it really were better if build_clobber() gained an additional 
argument (non-optional!) that sets the type of it.


Ciao,
Michael.


Re: [PATCH] x86_64: Ignore zero width bitfields in ABI and issue -Wpsabi warning about C zero width bitfield ABI changes [PR102024]

2022-01-10 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 20 Dec 2021, Uros Bizjak wrote:

> > Thanks.
> > I see nobody commented on Micha's post there.
> >
> > Here is a patch that implements it in GCC, i.e. C++ doesn't change ABI (at 
> > least
> > not from the past few releases) and C does for GCC:
> >
> > 2021-12-15  Jakub Jelinek  
> >
> > PR target/102024
> > * config/i386/i386.c (classify_argument): Add zero_width_bitfields
> > argument, when seeing DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD bitfields,
> > always ignore them, when seeing other zero sized bitfields, either
> > set zero_width_bitfields to 1 and ignore it or if equal to 2 process
> > it.  Pass it to recursive calls.  Add wrapper
> > with old arguments and diagnose ABI differences for C structures
> > with zero width bitfields.  Formatting fixes.
> >
> > * gcc.target/i386/pr102024.c: New test.
> > * g++.target/i386/pr102024.C: New test.
> 
> Please get a signoff on the ABI change (perhaps HJ can help here),
> I'll approve the implementation after that.

Christmas came in the way, but I just merged the proposed change 
(zero-with bit-fields -> NO_CLASS) into the psABI.


Ciao,
Michael.

> 
> Uros.
> 
> >
> > --- gcc/config/i386/i386.c.jj   2021-12-10 17:00:06.024369219 +0100
> > +++ gcc/config/i386/i386.c  2021-12-15 15:04:49.245148023 +0100
> > @@ -2065,7 +2065,8 @@ merge_classes (enum x86_64_reg_class cla
> >
> >  static int
> >  classify_argument (machine_mode mode, const_tree type,
> > -  enum x86_64_reg_class classes[MAX_CLASSES], int 
> > bit_offset)
> > +  enum x86_64_reg_class classes[MAX_CLASSES], int 
> > bit_offset,
> > +  int _width_bitfields)
> >  {
> >HOST_WIDE_INT bytes
> >  = mode == BLKmode ? int_size_in_bytes (type) : (int) GET_MODE_SIZE 
> > (mode);
> > @@ -2123,6 +2124,16 @@ classify_argument (machine_mode mode, co
> >  misaligned integers.  */
> >   if (DECL_BIT_FIELD (field))
> > {
> > + if (integer_zerop (DECL_SIZE (field)))
> > +   {
> > + if (DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD (field))
> > +   continue;
> > + if (zero_width_bitfields != 2)
> > +   {
> > + zero_width_bitfields = 1;
> > + continue;
> > +   }
> > +   }
> >   for (i = (int_bit_position (field)
> > + (bit_offset % 64)) / 8 / 8;
> >i < ((int_bit_position (field) + (bit_offset % 
> > 64))
> > @@ -2160,7 +2171,8 @@ classify_argument (machine_mode mode, co
> >   num = classify_argument (TYPE_MODE (type), type,
> >subclasses,
> >(int_bit_position (field)
> > -   + bit_offset) % 512);
> > +   + bit_offset) % 512,
> > +  zero_width_bitfields);
> >   if (!num)
> > return 0;
> >   pos = (int_bit_position (field)
> > @@ -2178,7 +2190,8 @@ classify_argument (machine_mode mode, co
> >   {
> > int num;
> > num = classify_argument (TYPE_MODE (TREE_TYPE (type)),
> > -TREE_TYPE (type), subclasses, 
> > bit_offset);
> > +TREE_TYPE (type), subclasses, 
> > bit_offset,
> > +zero_width_bitfields);
> > if (!num)
> >   return 0;
> >
> > @@ -2211,7 +2224,7 @@ classify_argument (machine_mode mode, co
> >
> >   num = classify_argument (TYPE_MODE (TREE_TYPE (field)),
> >TREE_TYPE (field), subclasses,
> > -  bit_offset);
> > +  bit_offset, 
> > zero_width_bitfields);
> >   if (!num)
> > return 0;
> >   for (i = 0; i < num && i < words; i++)
> > @@ -2231,7 +2244,7 @@ classify_argument (machine_mode mode, co
> >  X86_64_SSEUP_CLASS, everything should be passed in
> >  memory.  */
> >   if (classes[0] != X86_64_SSE_CLASS)
> > - return 0;
> > +   return 0;
> >
> >   for (i = 1; i < words; i++)
> > if (classes[i] != X86_64_SSEUP_CLASS)
> > @@ -2257,8 +2270,8 @@ classify_argument (machine_mode mode, co
> >   classes[i] = X86_64_SSE_CLASS;
> > }
> >
> > - /*  If X86_64_X87UP_CLASS isn't preceded by X86_64_X87_CLASS,
> > -  everything should be passed in 

Re: [PATCH] Remove unreachable gcc_unreachable () at the end of functions

2021-11-25 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 25 Nov 2021, Richard Biener via Gcc-patches wrote:

> It seems to be a style to place gcc_unreachable () after a
> switch that handles all cases with every case returning.
> Those are unreachable (well, yes!), so they will be elided
> at CFG construction time and the middle-end will place
> another __builtin_unreachable "after" them to note the
> path doesn't lead to a return when the function is not declared
> void.
> 
> So IMHO those explicit gcc_unreachable () serve no purpose,
> if they could be replaced by a comment.

Never document in comments what you can document in code (IMO).  I think 
the code as-is clearly documents the invariants and expectations and 
removing the gcc_unreachable() leads to worse sources.

Can't you simply exempt warning on unreachable __builtin_unreachable()?
It seems an obvious thing that the warning should _not_ warn about, after 
all, quite clearly, the author is aware of that being unreachable, it says 
so, right there.


Ciao,
Michael.


Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code

2021-11-25 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 25 Nov 2021, Richard Biener wrote:

> > Yes, that's definitely the case - I was too lazy to re-use the old
> > option name here.  But I don't have a good name at hand, maybe clang
> > has an option covering the cases I'm thinking about.

As you asked: I already have difficulties to describe the exact semantics 
of the warning in sentences, so I don't find a good name either :-)

> > Btw, the diagnostic spotted qsort_chk doing
> > 
> > if (CMP (i1, i2))
> >   break;
> > else if (CMP (i2, i1))
> >   return ERR2 (i1, i2);
> > 
> > where ERR2 expands to a call to a noreturn void "returning"
> > qsort_chk_error, so the 'return' stmt is not reachable.  Not exactly
> > a bug but somewhat difficult to avoid the diagnostic for.  I suppose
> > the pointless 'return' is to make it more visible that the loop
> > terminates here (albeit we don't return normally).

Tough one.  You could also disable the warning when the fallthrough 
doesn't exist because of a non-returning call.  If it's supposed to find 
obvious programming mistakes it might make sense to regard all function 
calls the same, like they look, i.e. as function calls that can return.  
Or it might make sense to not do that for programmers who happen to know 
about non-returning functions.  :-/

> It also finds this strange code in label_rtx_for_bb:

So the warning is definitely useful!

> indeed the loop looks pointless.  Unless the DECL_NONLOCAL case was 
> meant to continue;

It's like that since it was introduced in 2007.  It's an invariant that 
DECL_NONLOCAL labels are first in a BB and are not followed by normal 
labels, so a 'continue' wouldn't change anything; the loop is useless.


Ciao,
Michael.


Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code

2021-11-24 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 24 Nov 2021, Richard Biener wrote:

> >> +/* Unreachable code in if (0) block.  */
> >> +void baz(int *p)
> >> +{
> >> +   if (0)
> >> + {
> >> +return;  /* { dg-bogus "not reachable" } */
> >
> >Hmm?  Why are you explicitely saying that warning here would be bogus? 
> 
> Because I don't think we want to warn here. Such code is common from 
> template instantiation or macro expansion.

Like all code with an (const-propagated) explicit 'if (0)', which is of 
course the reason why -Wunreachable-code is a challenge.  IOW: I could 
accept your argument but then wonder why you want to warn about the second 
statement of the guarded block.  The situation was:

  if (0) {
return;  // (1) don't warn here?
whatever++;  // (2) but warn here?
  }

That seems even more confusing.  So you don't want to warn about 
unreachable code (the 'return') but you do want to warn about unreachable 
code within unreachable code (point (2) is unreachable because of the 
if(0) and because of the return).  If your worry is macro/template 
expansion resulting if(0)'s then I don't see why you would only disable 
warnings for some of the statements therein.

It seems we are actually interested in code unreachable via fallthrough or 
labels, not in all unreachable code, so maybe the warning is mis-named.

Btw. what does the code now do about this situation:

  if (0) {
something++;  // 1
return;   // 2
somethingelse++;  // 3
  }

does it warn at (1) or not?  (I assume it unconditionally warns at (3))


Ciao,
Michael.


Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code

2021-11-24 Thread Michael Matz via Gcc-patches
Hello,

> +/* Unreachable code in if (0) block.  */
> +void baz(int *p)
> +{
> +   if (0)
> + {
> +return;  /* { dg-bogus "not reachable" } */

Hmm?  Why are you explicitely saying that warning here would be bogus?  It 
quite clearly _is_ unreachable, so warning there makes sense.  Maybe you 
want an XFAILed dg-warning if your current implementation fails to warn, 
and a further XFAILed dg-bogus on the next line?

(Or at the very least a comment in the test case that this is actually not 
what we really want, but rather what current GCCs produce)


Ciao,
Michael.


Re: [PATCH][RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

2021-10-18 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 18 Oct 2021, Richard Sandiford wrote:

> > (It's a really cute hack that works as a micro optimization, the question 
> > is, do we really need to go there already, are all other less hacky 
> > approaches not bringing similar improvements?  The cuter the hacks the 
> > less often they pay off in the long run of production software :) )
> 
> FWIW, having been guilty of adding a similar hack(?) to SYMBOL_REFs
> for block_symbol, I like the approach of concatenating/combining structures
> based on flags.

The problem is that if you unset the flag you can't free the (now useless) 
storage.  What's worse is that you can't even reuse it anymore, because 
you lost the knowledge that it exists (except if you want to use another 
flag to note that).  It's of course obvious, but it helps to spell that 
out if we want to argue about ...

> The main tree and rtl types have too much baggage and

... baggage.  What you actually gain by associating different info pieces 
by address (e.g. concatenate allocations) is that you don't need to refer 
to one from the other, that's the space you spare, not anything inherent 
in the structures (which remain to have the members they would have 
anyway).  So, you basically trade one pointer (or index), which would 
possibly be optional, with address association and inflexibility (with the 
impossibility to manage both pieces individually: you can't free the 
second piece, and you can't add the second piece post-allocation).  It 
might be a good trade off sometimes, but in the abstract it's not a good 
design.

Regarding trees and space: to make something a tree you need 8 bytes and 
get a number of flags, and an arbitrary 4-byte blob in return.  I don't 
see that as much baggage.  We could reduce it further by splitting the 
arbitrary union and the tree_code+flags parts.  Especially for things 
referred to from tree_exp it makes sense to try making them trees 
themself.

> so I think there are some things that are better represented outside
> of them.
> 
> I suppose cselib VALUE rtxes are also similar, although they're more
> of a special case, since cselib data doesn't survive between passes.


Ciao,
Michael.


Re: [PATCH] Allow different vector types for stmt groups

2021-10-14 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 14 Oct 2021, Richard Biener via Gcc-patches wrote:

> > I have bisected an AMD zen2 10% performance regression of SPEC 2006 FP
> > 433.milc bechmark when compiled with -Ofast -march=native -flto to this
> > commit.  See also:
> > 
> >   
> > https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=412.70.0=289.70.0;
> > 
> > I am not sure if a bugzilla bug is in order because I cannot reproduce
> > the regression neither on an AMD zen3 machine nor on Intel CascadeLake,
> 
> It's for sure worth a PR for tracking purposes.  But I've not been
> very successful in identifying regression causes on Zen2 - what perf
> points to is usually exactly the same assembly in both base and peak :/

Well, in this case it's at least a fairly impressive (40%) difference in 
samples for add_force_to_mom* :

> > BEFORE:
> > 18.47%105497  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> > add_force_to_mom
> > 
> > AFTER:
> > 24.04%149010  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> > add_force_to_mom

(as the change was in the vectorizer this shouldn't be different inlining 
decisions hopefully).


Ciao,
Michael.


Re: [PATCH][RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

2021-10-14 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 14 Oct 2021, Richard Biener wrote:

> > So, at _this_ write-through of the email I think I like the above idea 
> > best: make ao_ref be a tree (at least its storage, because it currently 
> > is a one-member-function class), make ao_ref.volatile_p be 
> > tree_base.volatile_flag (hence TREE_VOLATILE(ao_ref)) (this reduces 
> > sizeof(ao_ref) by 8), increase all nr-of-operand of each tcc_reference by 
> > 1, and make TREE_AO_REF(reftree) be "TREE_OPERAND(reftree, 
> > TREE_CODE_LENGTH(reftree) - 1)", i.e. the last operand of such 
> > tcc_reference tree.
> 
> Hmm.  I'm not sure that's really something I like - it's especially
> quite some heavy lifting while at the same time lacking true boldness
> as to changing the representation of memory refs ;)

Well, it would at least enable such changes later in an orderly fashion.

> That said - I've prototyped the TREE_ASM_WRITTEN way now because it's 
> even simpler than the original TREE_AOREFWRAP approach, see below.
> 
> Note that I'm not embedding it into the tree structure, I'm merely
> using the same allocation to store two objects, the outermost ref
> and the ao_ref associated with it.  Quote:
> 
> +  size_t length = tree_code_size (TREE_CODE (lhs));
> +  if (!TREE_ASM_WRITTEN (lhs))
> +{
> +  tree alt_lhs
> +   = ggc_alloc_cleared_tree_node_stat (length + sizeof (ao_ref));
> +  memcpy (alt_lhs, lhs, length);
> +  TREE_ASM_WRITTEN (alt_lhs) = 1;
> +  *ref = new ((char *)alt_lhs + length) ao_ref;

You need to ensure that alt_lhs+length is properly aligned for ao_ref, but 
yeah, for a hack that works.  If you really want to go that way you need 
good comments about this hack.  It's really somewhat worrisome that the 
size of the allocation depends on a bit in tree_base.

(It's a really cute hack that works as a micro optimization, the question 
is, do we really need to go there already, are all other less hacky 
approaches not bringing similar improvements?  The cuter the hacks the 
less often they pay off in the long run of production software :) )


Ciao,
Michael.


Re: [PATCH][RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

2021-10-13 Thread Michael Matz via Gcc-patches
Hello,

[this is the fourth attempt to write a comment/review/opinion for this 
ao_ref-in-tcc_reference, please accept some possible incoherence]

On Tue, 12 Oct 2021, Richard Biener via Gcc-patches wrote:

> This prototype hack introduces a new tcc_reference TREE_AOREFWRAP
> which we can use to wrap a reference tree, recording the ao_ref
> associated with it.  That comes in handy when trying to optimize
> the constant factor involved with alias stmt walking (or alias
> queries in general) where there's parts that are liner in the
> reference expression complexity, namely get_ref_base_and_extent,
> which shows up usually high on profiles.

So, generally I like storing things into the IL that are impossible to 
(re-)discover.  Remembering things that are merely slowish to rediscover 
is less clear, the increase of IL memory use, and potentially the 
necessary pointer chasing might just trade one clearly measurable slow 
point (the rediscover computation) with many slow points all over the 
place (the pointer chasing/cache effects).  Here ...

> The following patch is minimal as to make tree-ssa.exp=ssa-fre-*
> not ICE and make the testcases from PR28071 and PR39326 compile
> successfully at -O1 (both testcases show a moderately high
> load on alias stmt walking around 25%, resp. 34%).  With the
> patch which makes use of the cache only from stmt_may_clobber_ref_p
> for now the compile-time improves by 7%, resp. 19% which means
> overall the idea might be worth pursuing.

... you seem to have a point, though.  Also, I am of the opinion that our 
gimple memrefs could be even fatter (and also encode things like 
multi-dimensional accesses, either right from the source code or 
discovered by index analysis), and your idea goes into that direction.  
So, yay for encoding memref properties into the IL, even though here it's 
only a cache.  You solved the necessary invalidation already.  Perhaps 
only partly, that will be seen once exposed to the wild.

So, the only question seems to be how to encode it: either by ...

> transparent by instead of wrapping the refs with another tree

... this (wrapping) ...

> to reallocate the outermost handled_component_p (and only those),

... or that (aggregation).  There is a third possibility if you want this 
only in the gimple world (which is the case): encode it not in the trees 
but in the gimple statements.  This sort of works easily for everything 
except calls.  I will not consider this variant, nor the side table 
implementation.

While writing this email I switched between more liking one or the other, 
multiple times.  So, I'll write down some basic facts/requirements:

1) You basically want to add stuff to an existing structure:
(a) by wrapping: to work seemlessly the outer tree should have similar 
enough properties to the inner tree (e.g. also be tcc_reference) to be 
used interchangably in most code, except that which needs to look at 
the added stuff.
(b) by aggregating the stuff into the existing structure itself: if you 
need both structs (with and without stuff) the pure thing to do is to 
actually create two structs, once with, once without stuff.
2) the added stuff is optional
3) we have multiple things (all tcc_reference) to which to add stuff
4) all tcc_reference are tree_exp, which is variable number of operands,
   which constrain things we can do naturally (e.g. we can't add stuff 
   after tree_exp, except by constraining the number of operands)

Considering this it seems that aggregation is worse: you basically double 
the number of structure types (at least conceptually, if you go with your 
bit-idea).  So, some idea of wrapping seems more natural.

(I think your idea of aggregation but going with a bit flag to indicate if 
this tcc_reference is or isn't annotated, and therefore has things 
allocated after the variable number of operands, is a terrible hack)

There is another possibility doing something like your bit-flag 
aggregation but with fewer hackery: if ao_ref would be a tree it could be 
a normal operand of a tree_exp (and hence tcc_reference), just that the 
number of operands then would vary depending on if it's annotated or not.

Making ao_ref into a tree would also enable the use of ANNOTATE_EXPR for a 
generic wrapping tree.  (Currently its used only in very specific cases, 
so ANNOTATE_EXPR handling would need to be extended all over, and as it's 
not a tcc_reference it would probably rather mean to introduce a new 
ANNOTATE_REF).

Anyway, with this:

struct tree_ref_annotation {
  struct tree_base base;
  struct ao_ref ao_ref;
};

DEFTREECODE(TREE_MEM_ANNO, "mem_anno", tcc_exceptional, 0);

you could then add

DEFTREECODE(MEM_REF_A, "mem_ref_a", tcc_reference, 3);

where TREE_OPERAND(memref, 2) would then be a TREE_MEM_ANNO.  If we were 
to add one operand slot to each tcc_reference we could even do without new 
tree codes: the existence of an ao_ref would simply be indicated by 
TREE_OPERAND(ref, position) 

Re: [PATCH] tree-optimization: [PR102622]: wrong code due to signed one bit integer and "a?-1:0"

2021-10-11 Thread Michael Matz via Gcc-patches
Hello,

On Sat, 9 Oct 2021, apinski--- via Gcc-patches wrote:

> +  (lshift (convert (convert:boolean_type_node @0)) { shift; })))
> +/* a ? -1 : 0 -> -a.  No need to check the TYPE_PRECISION not being 1
> +   here as the powerof2cst case above will handle that case correctly.  
> */

Well, but the QoI will improve quite a bit when you just do the check, 
instead of relying on order of patterns.  It's not slow or harmful to 
check and will make the order irrelevant, which, given the number of 
patterns we already have, is a good thing.  (It will also be smaller to 
check than to document why the check isn't needed :-) )


Ciao,
Michael.


Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-05 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 5 Oct 2021, Richard Biener wrote:

> > First notice that this doesn't have an empty latch block to start with 
> > (in fact, there is no separate latch block at all), so the loop 
> > optimizers haven't been initialized for simple latches at this point.  
> > Still, I would say that even then we want to be careful with the latch 
> > edge, as putting code onto it will most probably create a problem 
> > downstream once we _do_ want to intialize the loop optimizers for 
> > simple latches.  So, that we are careful here is okay, we are just too 
> > careful in this specific case.
> 
> Not sure if the argument about empty or not empty latches is important...

In this case it's not (as there are no separate latches anyway), but 
generally a latch that is already non-empty (i.e. problematic) only 
becomes more non-empty, so doing the threading doesn't introduce that 
specific problem.

> > > There's a pretty obvious jump thread in there.  3->4->5.
> > >
> > > We used to allow this, but it's currently being rejected and I'm not
> > > sure it should.
> > >
> > > In simplest terms we're threading from inside the loop, through the
> > > latch to a point outside the loop.  At first glance, that seems safe.
> 
> All threadings that start inside the loop and end outside of it are OK
> in principle.  Even if the path crosses the latch or the header.
> 
> We _might_ end up creating a loop with multiple exits though.

And entries (and hence irreducable loops)!  That's why I also added the 
piece about "all of B2..Bn-1 don't contain jumps back into the loop".  
I'm not sure if candidate paths going into our threader can have this 
problem, but if they have then you also don't want to do the block 
copying.

> > (a) while the thread is through the latch block, it's _not_ through the
> > latch edge (which is 4->3).  That would create the problem, so here
> > we're fine.
> > (b) even if it were through the latch edge, and would be followed by
> > to-be-copied instructions (and hence fill the latch edge) the ultimate
> > end of the path is outside the loop, so all the edges and blocks that
> > would now carry instructions would also be outside the loop, and hence
> > be no problem
> 
> Yep.
> 
> > Now, capture this precisely ;-)
> 
> I tried to capture this with
> 
> +  // The path should either start and end in the same loop or exit the
> +  // loop it starts in but never enter a loop.  This also catches
> +  // creating irreducible loops, not only rotation.
> +  if (entry->src->loop_father != exit->dest->loop_father
> +  && !flow_loop_nested_p (exit->src->loop_father,
> + entry->dest->loop_father))
> +{
> +  cancel_thread (, "Path rotates loop");
> +  return true;
> +}
> 
> it's not really necessary to have (simple) latches to argue about this
> I think.

Correct.  I don't think the above captures the re-entry problem (when 
intermediary blocks contain jumps inside the loop), the one I'm not sure 
we even have, but otherwise I think it does capture the (a) and (b) parts.


Ciao,
Michael.


Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-04 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 4 Oct 2021, Jeff Law wrote:

> And just in case it got lost.  Here's the analysis on 960218-1 on visium:
> 
> We've got this going into ethread:
> 
> int f (int x)
> {
>   int D.1420;
>   int a;
> 
> ;;   basic block 2, loop depth 0, maybe hot
> ;;    prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:   ENTRY (FALLTHRU,EXECUTABLE)
>   a_4 = ~x_3(D);
>   goto ; [INV]
> ;;    succ:   4 (FALLTHRU,EXECUTABLE)
> 
> ;;   basic block 3, loop depth 1, maybe hot
> ;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:   4 (TRUE_VALUE,EXECUTABLE)
>   gl = a_1;
> ;;    succ:   4 (FALLTHRU,DFS_BACK,EXECUTABLE)
> 
> ;;   basic block 4, loop depth 1, maybe hot
> ;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:   2 (FALLTHRU,EXECUTABLE)
> ;;    3 (FALLTHRU,DFS_BACK,EXECUTABLE)
>   # a_1 = PHI 
>   if (a_1 != 0)
>     goto ; [INV]
>   else
>     goto ; [INV]
> ;;    succ:   3 (TRUE_VALUE,EXECUTABLE)
> ;;    5 (FALSE_VALUE,EXECUTABLE)
> 
> ;;   basic block 5, loop depth 0, maybe hot
> ;;    prev block 4, next block 1, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:   4 (FALSE_VALUE,EXECUTABLE)
>   return;
> ;;    succ:   EXIT
> 
> }

First notice that this doesn't have an empty latch block to start with 
(in fact, there is no separate latch block at all), so the loop optimizers 
haven't been initialized for simple latches at this point.  Still, I would 
say that even then we want to be careful with the latch edge, as putting 
code onto it will most probably create a problem downstream once we _do_ 
want to intialize the loop optimizers for simple latches.  So, that we are 
careful here is okay, we are just too careful in this specific case.

> There's a pretty obvious jump thread in there.  3->4->5.
> 
> We used to allow this, but it's currently being rejected and I'm not 
> sure it should.
> 
> In simplest terms we're threading from inside the loop, through the 
> latch to a point outside the loop.  At first glance, that seems safe.

Is at least the unrestricted jump threader (the one after loop optimizers) 
picking this up?

Independend of that, I agree, that this specific instance of threading 
through the latch block, even though followed by to-be-copied 
instructions, is okay.  We need to determine precisely when that is okay, 
though.  In this case there are several reasons I could see why this is 
so:
(a) while the thread is through the latch block, it's _not_ through the
latch edge (which is 4->3).  That would create the problem, so here
we're fine.
(b) even if it were through the latch edge, and would be followed by 
to-be-copied instructions (and hence fill the latch edge) the ultimate 
end of the path is outside the loop, so all the edges and blocks that 
would now carry instructions would also be outside the loop, and hence 
be no problem

Now, capture this precisely ;-)

I think something like so: we have a candidate path

  S -> L -> B1 (-> B2 ... Bn)

(Start, Latch, Blocks 1 to n following the latch).  (I think in our 
notation that means that the jump in L is redirected to Bn, and all code 
from B1..Bn be copied, right?  Do we even support multiple blocks after 
the to-be-redirected jump?)

Now if L is a latch block, but L->B1 is no latch/back edge : no 
restrictions apply.

Otherwise, L->B1 is a latch/back edge (that means B1 is a loop header): if 
all of B2..Bn-1 don't contain jumps back into the loop, and Bn is outside 
the loop, then the thread is okay as well.  (B2..Bn-1 can be inside the 
loop, as long as they don't contain jumps back into the loop, after 
copying by the threader, they don't create problems: their copies will be 
placed outside the loop and won't generate side entries back into the 
loop; the copied latch edge will not be a latch edge anymore, but a loop 
exit edge).

It's quite possible that the full generality above is not necessary.  
It's probably enough to just not deal with the (b) case above (and 
continue to reject it), and simply only accept the candidate path if none 
of the involved edges is a latch/back edge (despite one block being the 
latch block).  Especially so because I'm not convinced that I handled 
the idea of case (b) correctly above ;-)


Ciao,
Michael.


Re: [PATCH] middle-end/102360 - adjust .DEFERRED_INIT expansion

2021-09-16 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 16 Sep 2021, Richard Biener via Gcc-patches wrote:

> > Typically for the native_interpret/native_encode we punt if 
> > BITS_PER_UNIT != 8 || CHAR_BIT != 8 because nobody had the energy to 
> > deal with the weird platforms (especially if we have currently none, I 
> > believe dsp16xx that had 16-bit bytes had been removed in 4.0 and c4x 
> > that had 32-bit bytes had been removed in 4.3) - dunno if the 
> > DEFERRED_INIT etc. code has those guards or not and it clearly 
> > documents that this code is not ready for other configurations. A byte 
> > is not necessarily 8 bits, that is just the most common size for it, 
> > and TYPE_SIZE_UNIT is number of BITS_PER_UNIT bit units.
> 
> OK, I'll do s/8/BITS_PER_UNIT/ - I also see that we have 
> int_size_in_bytes returning TYPE_SIZE_UNIT and that TYPE_SIZE_UNIT is 
> documented to yeild the type size in 'bytes'.

For better or worse GCCs meaning of 'byte' is really 'unit'; I guess most 
introductions of the term 'byte' in comments and function names really 
came from either carelessness or from people who knew this fact and 
thought it obvious that 'byte' of course is the same as 'unit', but not 
the same as octet.

FWIW: (for GCC) both mean the smallest naturally addressable piece of 
memory (i.e. what you get when you increase an address by 'one'), and that 
is not necessarily 8 bit (but anything else is bit-rotten of course).

As modern use of 'byte' tends to actually mean octet, but old use of byte 
(and use in GCC) means unit, we probably should avoid the term byte 
alltogether in GCC.  But that ship has sailed :-/

> I do believe that we should officially declare hosts with CHAR_BIT != 8 
> as unsupported and as you say support for targets with BITS_PER_UNIT != 
> 8 is likely bit-rotten.

(And characters are still something else entirely, except on those couple 
platforms where chars, units and octets happen to be the same :) )  
But yes.


Ciao,
Michael.


Re: Regression with recent change

2021-09-14 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 13 Sep 2021, Aldy Hernandez via Gcc-patches wrote:

The testcase still tests what it's supposed to test with ...

> > typedef unsigned short uint16_t;
> >
> > uint16_t a, b;
> >
> > int *j_global;
> > uint16_t f(void)
> > {
> >int c, **p;
> >short d = 2, e = 4;
> >

... "c = a;" added here (i.e. it still hangs before the pr55107 change).

> >for (;; b++)
> >  {
> >int *j = j_global, k = 0;
> >
> >for (; *j; j++)
> >   {
> > for(; c; c++)
> >   for(; k < 1; k++)
> > {
> >   short *f = 
> >
> >   if(b)
> > return *f;
> > }
> >   }
> >
> >if(!c)
> >   d *= e;
> >
> >a = d;
> >if ((a ? b = 0 : (**p ? : 1) != (d != 1 ? 1 : (b = 0))) != ((k ? a
> : 0)
> > < (a * (c = k
> >   **p = 0;
> >  }
> > }
> >
> 
> Thanks for getting rid of the noise here.
> 
> I've simplified the above to show what's going on in the warning on
> nds32-elf:
> 
> int george, *global;
> int stuff(), readme();
> 
> int
> f (void)
> {
>int store;
> 
>for (;;)
>  {
>int k = 0;
> 
>while (global)
> {
>   for (; store; ++store)

Yeah, that seems a correct warning, your 'store' (the 'c' in the original 
testcase) is really used uninitialized (when 'global' aka '*j_global' is 
non-zero).  Sorry for not noticing earlier, I was only getting rid of 
warnings, not carefully looking at the testcase itself.  I think with 
above initialization of 'c' it's conforming, and still a correct test for 
the original bug.

> This looks like a latent bug.  For that matter, the above snippet warns 
> with -fdisable-tree-thread2, even on x86-64 (and before my patch).


Ciao,
Michael.


Re: Regression with recent change

2021-09-13 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 13 Sep 2021, Jeff Law via Gcc-patches wrote:

> > So it looks like there's some undefined behavior going on, even before
> > my patch.  I'd like to get some feedback, because this is usually the
> > type of problems I see in the presence of a smarter threader... things
> > get shuffled around, problematic code gets isolated, and warning
> > passes have an easier time (or sometimes harder time) diagnosing
> > things.
> The original issue was PRE hanging, so I'd lean towards keeping the test as-is
> and instead twiddling any warning flags we can to make the diagnostics go
> away.

Or use this changed test avoiding the issues that I see with -W -Wall on 
this testcase.  I've verified that it still hangs before r194358 and is 
fixed by that revision.

Generally I think, our testsuite, even for ICEs or these kinds of hangs, 
should make an effort to try to write conforming code; if at all possible.  
Here it is possible.

(I don't know if the new threader causes additional warnings, of course, 
but at least the problems with sequence points and uninitialized use of 
'j' aren't necessary to reproduce the bug)


Ciao,
Michael.

/* { dg-do compile } */
/* { dg-additional-options "-fno-split-loops" } */

typedef unsigned short uint16_t;

uint16_t a, b;

int *j_global;
uint16_t f(void)
{
  int c, **p;
  short d = 2, e = 4;

  for (;; b++)
{
  int *j = j_global, k = 0;

  for (; *j; j++)
{
  for(; c; c++)
for(; k < 1; k++)
  {
short *f = 

if(b)
  return *f;
  }
}

  if(!c)
d *= e;

  a = d;
  if ((a ? b = 0 : (**p ? : 1) != (d != 1 ? 1 : (b = 0))) != ((k ? a : 0)
  < (a * (c = k
**p = 0;
}
}


Re: [PATCH] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-09-10 Thread Michael Matz via Gcc-patches
Hello,

On Fri, 10 Sep 2021, Richard Biener via Gcc-patches wrote:

> diagnostic from the Ada frontend.  The warnings are pruned from the
> testsuite output via prune_gcc_output but somehow this doesn't work
> for the tests in gfortran.dg/debug which are now failing with excess
> errors.  That seems to be the case for other fortran .exp as well
> when appending -gstabs, something which works fine for gcc or g++ dgs.

Fortran emits warnings with a capitalized 'W'.  Your regexp only checks 
for lower-case.

> +# Ignore stabs obsoletion warnings
> +regsub -all "(^|\n)\[^\n\]*warning: STABS debugging information is 
> obsolete and not supported anymore\[^\n\]*" $text "" text

This needs to be ".arning" or "\[Ww\]arning".


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-10 Thread Michael Matz via Gcc-patches
Hello,

On Fri, 10 Sep 2021, Aldy Hernandez via Gcc-patches wrote:

> Like this?

Yep, but I can't approve.


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-10 Thread Michael Matz via Gcc-patches
Hi,

On Fri, 10 Sep 2021, Aldy Hernandez via Gcc-patches wrote:

>  }
> +
> +  /* Threading through a non-empty latch would cause code to be added

"through an *empty* latch".  The test in code is correct, though.

And for the before/after loops flag you added: we have a 
cfun->curr_properties field which can be used.  We even already have a 
PROP_loops flag but that is set throughout compilation from CFG 
construction until the RTL loop optimizers, so can't be re-used for what 
is needed here.  But you still could invent another PROP_ value instead of 
adding a new field in struct function.


Ciao,
Michael.


Re: [Patch] gfortran: Fix in-build-tree testing [PR101305, PR101660]

2021-08-10 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 10 Aug 2021, Jakub Jelinek via Gcc-patches wrote:

> > +# Place ISO_Fortran_binding.h also under include/ in the build directory 
> > such
> > +# that it can be used for in-built-tree testsuite runs without 
> > interference of
> > +# other files in the build dir - like intrinsic .mod files or other .h 
> > files.
> >  ISO_Fortran_binding.h: $(srcdir)/ISO_Fortran_binding-1-tmpl.h \
> >$(srcdir)/ISO_Fortran_binding-2-tmpl.h \
> >$(srcdir)/ISO_Fortran_binding-3-tmpl.h \
> > @@ -1085,6 +1088,8 @@ ISO_Fortran_binding.h: 
> > $(srcdir)/ISO_Fortran_binding-1-tmpl.h \
> > $(COMPILE) -E -dD $(srcdir)/ISO_Fortran_binding-2-tmpl.h \
> > | grep '^#define CFI_type' >> $@
> > cat $(srcdir)/ISO_Fortran_binding-3-tmpl.h >> $@
> > +   $(MKDIR_P) include
> > +   cp $@ include/ISO_Fortran_binding.h
> 
> I see many Makefile.* in GCC doing rm -f file before cp whatever file,
> but don't know if that is just trying to be extra cautious or if it is
> needed for portability.  coreutils cp (and what POSIX says) is that
> overwriting the destination file should be fine and shouldn't cause
> failures, at least when the destination is writable.

I think this is to deal cautiously with symlinks: if the destination 
filename is a symlink to an existing file that target file is overwritten 
by cp, not the symlink (which continues to point to the now changed target 
file).


Ciao,
Michael.