Re: How to avoid some built-in expansions in gcc?

2024-06-05 Thread Michael Matz via Gcc
Hey,

On Wed, 5 Jun 2024, David Brown wrote:

> The ideal here would be to have some way to tell gcc that a given 
> function has the semantics of a different function.  For example, a 
> programmer might have several implementations of "memcpy" that are 
> optimised for different purposes based on the size or alignment of the 
> arguments.  Maybe some of these are written with inline assembly or work 
> in a completely different way (I've used DMA on a microcontroller for 
> the purpose).  If you could tell the compiler that the semantic 
> behaviour and results were the same as standard memcpy(), that could 
> lead to optimisations.
> 
> Then you could declare your "isinf" function with 
> __attribute__((semantics_of(__builtin_isinf))).
> 
> And the feature could be used in any situation where you can write a 
> function in a simple, easy-to-analyse version and a more efficient but 
> opaque version.

Hmm, that actually sounds like a useful feature.  There are some details 
to care for, like what to do with arguments: e.g. do they need to have the 
same types as the referred builtin, only compatible ones, or even just 
convertible ones, and suchlike, but yeah, that sounds nice.


Ciao,
Michael.


Re: How to avoid some built-in expansions in gcc?

2024-06-05 Thread Michael Matz via Gcc
Hello,

On Tue, 4 Jun 2024, Jakub Jelinek wrote:

> On Tue, Jun 04, 2024 at 07:43:40PM +0200, Michael Matz via Gcc wrote:
> > (Well, and without reverse-recognition of isfinite-like idioms in the 
> > sources.  That's orthogonal as well.)
> 
> Why?  If isfinite is better done by a libcall, why isn't isfinite-like
> idiom also better done as a libcall?

It is.  I was just trying to avoid derailing the discussion for finding an 
immediately good solution by searching for the perfect solution.  Idiom 
finding simply is completely independend from the posed problem that 
Georg-Johann has, which remains unsolved AFAICS, as using 
fno-builtin-foobar has its own (perhaps mere theoretical for AVR) 
problems.


Ciao,
Michael.


Re: How to avoid some built-in expansions in gcc?

2024-06-04 Thread Michael Matz via Gcc
Hello,

On Sat, 1 Jun 2024, Richard Biener via Gcc wrote:

> >>> You have a pointer how to define a target optab? I looked into optabs 
> >>> code but found no appropriate hook.  For isinf is seems is is 
> >>> enough to provide a failing expander, but other functions like isnan 
> >>> don't have an optab entry, so there is a hook mechanism to extend optabs?
> >> Just add corresponding optabs for the missing cases (some additions are 
> >> pending, like isnornal).  There’s no hook to prevent folding to FP 
> >> compares nor is that guarded by other means (like availability of native 
> >> FP ops).  Changing the guards would be another reasonable option.
> >> Richard
> > 
> > There are many other such folds, e.g. for isdigit().  The AVR libraries 
> > have all this in hand-optimized assembly, and all these built-in expansions 
> > are bypassing that.  Open-coded C will never beat that assemlbly code, at 
> > least not with the current GCC capabilities.
> 
> The idea is that isdigit() || isalpha() or similar optimize without 
> special casing the builtins.
> 
> > How would I go about disabling / bypassing non-const folds from ctype.h and 
> > the many others?
> 
> I think this mostly shows we lack late recognition of open-coded isdigit 
> and friends, at least for the targets where inlining them is not 
> profitable.
> 
> A pragmatic solution might be a new target hook, indicating a specified 
> builtin is not to be folded into an open-coded form.

Well, that's what the mechanism behind -fno-builtin-foobar is supposed to 
be IMHO.  Hopefully the newly added additional mechanism using optabs and 
ifns (instead of builtins) heeds it.

> A good solution would base this on (size) costs, the perfect solution 
> would re-discover the builtins late and undo inlining that didn’t turn 
> out to enable further simplification.
> 
> How is inlined isdigit bad on AVR?  Is a call really that cheap 
> considering possible register spilling around it?

On AVR with needing to use 8bit registers to do everything?  I'm pretty 
sure the call is cheaper, yeah :)


Ciao,
Michael.


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-04 Thread Michael Matz via Gcc
Hello,

On Wed, 3 Apr 2024, Jonathon Anderson wrote:

> Of course, this doesn't make the build system any less complex, but 
> projects using newer build systems seem easier to secure and audit than 
> those using overly flexible build systems like Autotools and maybe even 
> CMake. IMHO using a late-model build system is a relatively low 
> technical hurdle to overcome for the benefits noted above, switching 
> should be considered and in a positive light.

Note that we're talking not (only) about the build system itself, i.e. how 
to declare dependencies within the sources, and how to declare how to 
build them.  make it just fine for that (as are many others).  (In a way 
I think we meanwhile wouldn't really need automake and autogen, but 
rewriting all that in pure GNUmake is a major undertaking).

But Martin also specifically asked about alternatives for feature tests, 
i.e. autoconfs purpose.  I simply don't see how any alternative to it 
could be majorly "easier" or "less complex" at its core.  Going with the 
examples given upthread there is usually only one major solution: to check 
if a given system supports FOOBAR you need to bite the bullet and compile 
(and potentially run!) a small program using FOOBAR.  A configuration 
system that can do that (and I don't see any real alternative to that), no 
matter in which language it's written and how traditional or modern it is, 
also gives you enough rope to hang yourself, if you so choose.

If you get away without many configuration tests in your project then this 
is because what (e.g.) the compiler gives you, in the form of libstdc++ 
for example, abstracts away many of the peculiarities of a system.  But 
in order to be able to do that something (namely the config system of 
libstdc++) needs to determine what is or isn't supported by the system in 
order to correctly implement these abstractions.  I.e. things you depend 
on did the major lifting of hiding system divergence.

(Well, that, or you are very limited in the number of systems you support, 
which can be the right thing as well!)


Ciao,
Michael.


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Michael Matz via Gcc
Hello,

On Wed, 3 Apr 2024, Martin Uecker wrote:

> The backdoor was hidden in a complicated autoconf script...

Which itself had multiple layers and could just as well have been a 
complicated cmake function.

> > (And, FWIW, testing for features isn't "complex".  And have you looked at 
> > other build systems?  I have, and none of them are less complex, just 
> > opaque in different ways from make+autotools).
> 
> I ask a very specific question: To what extend is testing 
> for features instead of semantic versions and/or supported
> standards still necessary?

I can't answer this with absolute certainty, but points to consider: the 
semantic versions need to be maintained just as well, in some magic way.  
Because ultimately software depend on features of dependencies not on 
arbitrary numbers given to them.  The numbers encode these features, in 
the best case, when there are no errors.  So, no, version numbers are not 
a replacement for feature tests, they are a proxy.  One that is manually 
maintained, and hence prone to errors.

Now, supported standards: which one? ;-)  Or more in earnest: while on 
this mailing list here we could chose a certain set, POSIX, some 
languages, Windows, MacOS (versions so-and-so).  What about other 
software relying on other 3rdparty feature providers (libraries or system 
services)?  Without standards?

So, without absolute certainty, but with a little bit of it: yes, feature 
tests are required in general.  That doesn't mean that we could not 
do away with quite some of them for (e.g.) GCC, those that hold true on 
any platform we support.  But we can't get rid of the infrastructure for 
that, and can't get rid of certain classes of tests.

> This seems like a problematic approach that may have been necessary 
> decades ago, but it seems it may be time to move on.

I don't see that.  Many aspects of systems remain non-standardized.


Ciao,
Michael.


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-03 Thread Michael Matz via Gcc
Hello,

On Wed, 3 Apr 2024, Martin Uecker via Gcc wrote:

> > > Seems reasonable, but note that it wouldn't make any difference to
> > > this attack.  The liblzma library was modified to corrupt the sshd
> > > binary, when sshd was linked against liblzma.  The actual attack
> > > occurred via a connection to a corrupt sshd.  If sshd was running as
> > > root, as is normal, the attacker had root access to the machine.  None
> > > of the attacking steps had anything to do with having root access
> > > while building or installing the program.
> 
> There does not seem a single good solution against something like this.
> 
> My take a way is that software needs to become less complex. Do 
> we really still need complex build systems such as autoconf?

Do we really need complex languages like C++ to write our software in?  
SCNR :)  Complexity lies in the eye of the beholder, but to be honest in 
the software that we're dealing with here, the build system or autoconf 
does _not_ come to mind first when thinking about complexity.

(And, FWIW, testing for features isn't "complex".  And have you looked at 
other build systems?  I have, and none of them are less complex, just 
opaque in different ways from make+autotools).


Ciao,
Michael.


Re: Improvement of CLOBBER descriptions

2024-02-21 Thread Michael Matz via Gcc
Hello,

On Wed, 21 Feb 2024, Daniil Frolov wrote:

> >> Following the recent introduction of more detailed CLOBBER types in GCC, a
> >> minor
> >> inconsistency has been identified in the description of
> >> CLOBBER_OBJECT_BEGIN:
> >> 
> >>   /* Beginning of object lifetime, e.g. C++ constructor.  */
> >>   CLOBBER_OBJECT_BEGIN

The "e.g." comment mixes concepts of the C++ language with a 
middle-end/GIMPLE concept, and hence is a bit misleading.  What the 
clobbers are intended to convey to the middle-end is the low-level notion 
of "storage associated with this and that object is now accessible 
(BEGIN)/not accessible anymore (END), for purposes related to that very 
object".  The underlying motivation, _why_ that knowledge is interesting 
to the middle-end, is to be able to share storage between different 
objects.

"purposes related to that object" are any operations on the object: 
construction, destruction, access, frobnification.  It's not tied to a 
particular frontend language (although it's the language semantics that 
dictate when emitting the begin/end clobbers is appropriate).  For the 
middle-end the C++ concept of construction/deconstruction are simply 
modifications (conceptual or real) of the storage associated with an 
object ("object" not in the C++ sense, but from a low-level perspective; 
i.e. an "object" doesn't only exist after c++-construction, it comes into 
existence before that, even if perhaps in an indeterminate/invalid state 
from the frontends perspective).

Initially these clobbers were only emitted when decls went ouf of 
scope, and so did have some relation to frontend language semantics 
(although a fairly universal one, namely "scope").  The 
C++ frontend then found further uses (e.g. after a dtor for an 
object _ends_ it's storage can also be reused), and eventually we also 
needed the BEGIN clobbers to convey the meaning of "now storage use for 
object potentially starts, don't share it with any other object".

If certain frontends find use for more fine-grained definitions of 
life-times, then further note kinds need to be invented for those 
frontends use.  They likely won't have influence on the middle-end though 
(perhaps for some sanitizers such new kinds might be useful?).  But the 
current BEGIN/END clobbers need to continue to mark the outermost borders 
of storage validity for an object.


Ciao,
Michael.


Re: Calling convention for Intel APX extension

2023-07-31 Thread Michael Matz via Gcc
Hello,

On Sun, 30 Jul 2023, Thomas Koenig wrote:

> > I've recently submitted a patch that adds some attributes that basically
> > say "these-and-those regs aren't clobbered by this function" (I did them
> > for not clobbered xmm8-15).  Something similar could be used for the new
> > GPRs as well.  Then it would be a matter of ensuring that the interesting
> > functions are marked with that attributes (and then of course do the
> > necessary call-save/restore).
> 
> Interesting.
> 
> Taking this a bit further: The compiler knows which registers it used
> (and which ones might get clobbered by called functions) and could
> generate such information automatically and embed it in the assembly
> file, and the assembler could, in turn, put it into the object file.
> 
> A linker (or LTO) could then check this and elide save/restore pairs
> where they are not needed.

LTO with interprocedural register allocation (-fipa-ra) already does this.  

Doing it without LTO is possible to implement in the way you suggest, but 
is very hard to get effective: the problem is that saving/restoring of 
registers might be scheduled in non-trivial ways and getting rid of 
instruction bytes within function bodies at link time is fairly 
non-trivial: it needs excessive meta-information to be effective (e.g. all 
jumps that potentially cross the removed bytes must get relocations).

So you either limit the ways that prologue and epilogues are emitted to 
help the linker (thereby limiting effectiveness of unchanged xlogues) or 
you emit more meta-info than the instruction bytes themself, bloating 
object files for dubious outcomes.

> It would probably be impossible for calls into shared libraries, since
> the saved registers might change from version to version.

The above scheme could be extended to also allow introducing stubs 
(wrappers) for shared lib functions, handled by the dynamic loader.  But 
then you would get hard problems to solve related to function addresses 
and their uniqueness.

> Still, potential gains could be substantial, and it could have an
> effect which could come close to inlining, while actually saving space
> instead of using extra.
> 
> Comments?

I think it would be an interesting experiment to implement such scheme 
fully just to see how effective it would be in practice.  But it's very 
non-trivial to do, and my guess is that it won't be super effective.  So, 
could be a typical research paper topic :-)

At least outside of extreme cases like the SSE regs, where none are 
callee-saved, and which can be handled in a different way like the 
explicit attributes.


Ciao,
Michael.


Re: Calling convention for Intel APX extension

2023-07-27 Thread Michael Matz via Gcc
Hey,

On Thu, 27 Jul 2023, Thomas Koenig via Gcc wrote:

> Intel recommends to have the new registers as caller-saved for
> compatibility with current calling conventions.  If I understand this
> correctly, this is required for exception unwinding, but not if the
> function called is __attribute__((nothrow)).

That's not the full truth.  It's not (only) exception handling but also 
context switching via setjmp/longjmp and make/get/setcontext.

The data structures for that are part of the ABI unfortunately, and can't 
be assumed to be extensible (as Florian says, for glibc there maybe be 
hacks (or maybe not) on x86-64.  Some other archs implemented 
extensibility from the outset).  So all registers (and register parts!) 
added after the initial psABI is defined usually _have_ to be 
call-clobbered.

> Since Fortran tends to use a lot of registers for its array descriptors,
> and also tends to call nothrow functions (all Fortran functions, and
> all Fortran intrinsics, such as sin/cos/etc) a lot, it could profit from
> making some of the new registers callee-saved, to save some spills
> at function calls.

I've recently submitted a patch that adds some attributes that basically 
say "these-and-those regs aren't clobbered by this function" (I did them 
for not clobbered xmm8-15).  Something similar could be used for the new 
GPRs as well.  Then it would be a matter of ensuring that the interesting 
functions are marked with that attributes (and then of course do the 
necessary call-save/restore).


Ciao,
Michael.


Re: [PATCH] Basic asm blocks should always be volatile

2023-06-29 Thread Michael Matz via Gcc
Hello,

On Thu, 29 Jun 2023, Julian Waters via Gcc wrote:

> int main() {
> asm ("nop" "\n"
>  "\t" "nop" "\n"
>  "\t" "nop");
> 
> asm volatile ("nop" "\n"
>   "\t" "nop" "\n"
>   "\t" "nop");
> }
> 
> objdump --disassemble-all -M intel -M intel-mnemonic a.exe > disassembly.txt
> 
> 0001400028c0 :
>1400028c0: 48 83 ec 28 subrsp,0x28
>1400028c4: e8 37 ec ff ffcall   140001500 <__main>
>1400028c9: 90nop
>1400028ca: 90nop
>1400028cb: 90nop
>1400028cc: 31 c0   xoreax,eax
>1400028cd: 48 83 c4 28 addrsp,0x28
>1400028ce: c3ret
> 
> Note how there are only 3 nops above when there should be 6, as the first 3
> have been deleted by the compiler. With the patch, the correct 6 nops
> instead of 3 are compiled into the final code.
> 
> Of course, the above was compiled with the most extreme optimizations
> available to stress test the possible bug, -O3, -ffunction-sections
> -fdata-sections -Wl,--gc-sections -flto=auto. Compiled as C++17 and intel
> assembly syntax

Works just fine here:

% cat xx.c
int main() {
asm ("nop" "\n"
 "\t" "nop" "\n"
 "\t" "nop");

asm volatile ("nop" "\n"
  "\t" "nop" "\n"
  "\t" "nop");
}

% g++ -v
...
gcc version 12.2.1 20230124 [revision 
193f7e62815b4089dfaed4c2bd34fd4f10209e27] (SUSE Linux)

% g++ -std=c++17 -flto=auto -O3 -ffunction-sections -fdata-sections xx.c
% objdump --disassemble-all -M intel -M intel-mnemonic a.out
...
00401020 :
  401020:   90  nop
  401021:   90  nop
  401022:   90  nop
  401023:   90  nop
  401024:   90  nop
  401025:   90  nop
  401026:   31 c0   xoreax,eax
  401028:   c3  ret
  401029:   0f 1f 80 00 00 00 00nopDWORD PTR [rax+0x0]
...

Testing with recent trunk works as well with no differences in output.
This is for x86_64-linux.

So, as suspected, something else is broken for you.  Which compiler 
version are you using?  (And we need to check if it's something in the 
mingw target)


Ciao,
Michael.


Re: types in GIMPLE IR

2023-06-29 Thread Michael Matz via Gcc
Hello,

On Thu, 29 Jun 2023, Krister Walfridsson wrote:

> > The thing with signed bools is that the two relevant values are -1 (true)
> > and 0 (false), those are used for vector bool components where we also
> > need them to be of wider type (32bits in this case).
> 
> My main confusion comes from seeing IR doing arithmetic such as
> 
>_127;
>_169;
>   ...
>   _169 = _127 + -1;
> 
> or
> 
>_127;
>_169;
>   ...
>   _169 = -_127;
> 
> and it was unclear to me what kind of arithmetic is allowed.
> 
> I have now verified that all cases seems to be just one operation of this form
> (where _127 has the value 0 or 1), so it cannot construct values such as 42.
> But the wide signed Boolean can have the three different values 1, 0, and -1,
> which I still think is at least one too many. :)

It definitely is.  For signed bool it should be -1 and 0, for unsigned 
bool 1 and 0.  And of course, arithmetic on bools is always dubious, that  
should all be logical operations.  Modulo-arithmetic (mod 2) could be 
made to work, but then we would have to give up the idea of signed bools 
and always use conversions to signed int to get a bitmaks of all-ones.  
And as mod-2-arithmetic is equivalent to logical ops it seems a bit futile 
to go that way.

Of course, enforcing this all might lead to a surprising heap of errors, 
but one has to start somewhere, so ...

> I'll update my tool to complain if the value is outside the range [-1, 
> 1].

... maybe not do that, at least optionally, that maybe somewhen someone 
can look into fixing that all up? :-)  -fdubious-bools?


Ciao,
Michael.


Re: types in GIMPLE IR

2023-06-28 Thread Michael Matz via Gcc
Hello,

On Wed, 28 Jun 2023, Krister Walfridsson via Gcc wrote:

> Type safety
> ---
> Some transformations treat 1-bit types as a synonym of _Bool and mix the types
> in expressions, such as:
> 
>_2;
>   _Bool _3;
>   _Bool _4;
>   ...
>   _4 = _2 ^ _3;
> 
> and similarly mixing _Bool and enum
> 
>   enum E:bool { E0, E1 };
> 
> in one operation.
> 
> I had expected this to be invalid... What are the type safety rules in the
> GIMPLE IR?

Type safety in gimple is defined in terms of type compatiblity, with 
_many_ exceptions for specific types of statements.  Generally stuff is 
verified in verify_gimple_seq., in this case of a binary assign statement 
in verify_gimple_assign_binary.  As you can see there the normal rules for 
bread-and-butter binary assigns is simply that RHS, LHS1 and LHS2 must 
all be type-compatible.

T1 and T2 are compatible if conversions from T1 to T2 are useless and 
conversion from T2 to T1 are also useless.  (types_compatible_p)  The meat 
for that is all in gimple-expr.cc:useless_type_conversion_p.  For this 
specific case again we have:

  /* Preserve conversions to/from BOOLEAN_TYPE if types are not
 of precision one.  */
  if (((TREE_CODE (inner_type) == BOOLEAN_TYPE)
   != (TREE_CODE (outer_type) == BOOLEAN_TYPE))
  && TYPE_PRECISION (outer_type) != 1)
return false;

So, yes, booleans and 1-bit types can be compatible (under certain other 
conditions, see the function).

> Somewhat related, gcc.c-torture/compile/pr96796.c performs a VIEW_CONVERT_EXPR
> from
> 
>   struct S1 {
> long f3;
> char f4;
>   } g_3_4;
> 
> to an int
> 
>   p_51_9 = VIEW_CONVERT_EXPR(g_3_4);
> 
> That must be wrong?

VIEW_CONVERT_EXPR is _very_ generous.  See 
verify_types_in_gimple_reference: 

  if (TREE_CODE (expr) == VIEW_CONVERT_EXPR)
{
  /* For VIEW_CONVERT_EXPRs which are allowed here too, we only check
 that their operand is not a register an invariant when
 requiring an lvalue (this usually means there is a SRA or IPA-SRA
 bug).  Otherwise there is nothing to verify, gross mismatches at
 most invoke undefined behavior.  */
  if (require_lvalue
  && (is_gimple_reg (op) || is_gimple_min_invariant (op)))
{
  error ("conversion of %qs on the left hand side of %qs",
 get_tree_code_name (TREE_CODE (op)), code_name);
  debug_generic_stmt (expr);
  return true;
}
  else if (is_gimple_reg (op)
   && TYPE_SIZE (TREE_TYPE (expr)) != TYPE_SIZE (TREE_TYPE 
(op)))
{
  error ("conversion of register to a different size in %qs",
 code_name);
  debug_generic_stmt (expr);
  return true;
}
}

Here the operand is not a register (but a global memory object), so 
everything goes.

It should be said that over the years gimples type system became stricter 
and stricter, but it started as mostly everything-goes, so making it 
stricter is a bumpy road that isn't fully travelled yet, because checking 
types often results in testcase regressions :-)

> Semantics of 
> 
> "Wide" Booleans, such as , seems to allow more values than
> 0 and 1. For example, I've seen some IR operations like:
> 
>   _66 = _16 ? _Literal () -1 : 0;
> 
> But I guess there must be some semantic difference between 
>  and a 32-bit int, otherwise the wide Boolean type 
> would not be needed... So what are the difference?

See above, normally conversions to booleans that are wider than 1 bit are 
_not_ useless (because they require booleanization to true/false).  In the 
above case the not-useless cast is within a COND_EXPR, so it's quite 
possible that the gimplifier didn't look hard enough to split this out 
into a proper conversion statement.  (The verifier doesn't look inside 
the expressions of the COND_EXPR, so also doesn't catch this one)

If that turns out to be true and the above still happens when -1 is 
replaced by (say) 42, then it might be possible to construct a 
wrong-code testcase based on the fact that _66 as boolean should contain 
only two observable values (true/false), but could then contain 42.  OTOH, 
it might also not be possible to create such testcase, namely when GCC is 
internally too conservative in handling wide bools :-)  In that case we 
probably have a missed optimization somewhere, which when implemented 
would enable construction of such wrong-code testcase ;)

(I'm saying that -1 should be replaced by something else for a wrong-code 
testcase, because -1 is special and could justifieably be special-cased in 
GCC: -1 is the proper arithmetic value for a signed boolean that is 
"true").


Ciao,
Michael.


Re: [PATCH] Basic asm blocks should always be volatile

2023-06-28 Thread Michael Matz via Gcc
Hello,

On Wed, 28 Jun 2023, Julian Waters via Gcc wrote:

> On the contrary, code compiled with gcc with or without the applied patch
> operates very differently, with only gcc with the applied patch producing a
> fully correctly operating program as expected. Even if the above inline
> assembly blocks never worked due to other optimizations however, the
> failure mode of the program would be very different from how it fails now:
> It should fail noisily with an access violation exception due to
> the malformed assembly, but instead all the assembly which installs an
> exception handler never makes it into the final program because with
> anything higher than -O1 gcc deletes all of it (I have verified this with
> objdump too),

Can you please provide a _full_ compilable testcase (preprocessed).  What 
Andrew says is (supposed to be) correct: ignoring the other 
problems you're going to see with your asms (even if you make them 
volatile) GCC should not remove any of the asm statements of them.

If something changes when you add 'volatile' by hand then we have another 
problem lurking somewhere, and adding the parser patch might not fully 
solve it (even if it changes behaviour for you).


Ciao,
Michael.


Re: [gimple-ssa] Get the gimple corresponding to the definition of a VAR_DECL

2023-06-27 Thread Michael Matz via Gcc
Hello,

On Tue, 27 Jun 2023, Pierrick Philippe wrote:

> My main problem is regarding uninitialized definition, but still not being
> considered undefined behavior.
> For example, the following code:
> 
>int x;
>int *y = &x;
>*y = 6;
> 
> What I'm doing is basically looking at each gimple statement if the lhs has a
> given attribute for the purpose of the analysis I'm trying to perform.
> To precise, I'm plugged into the analyzer, so an out-of-tree plugin.
> 
> But in the example above, the definition of x is not really within the
> gimple_seq of the function as it is never directly assigned.

Then you need to be a bit more precise in what exactly you want.  There 
are multiple ways to interpret "definition of a variable".

a) assigning a value to it: that's what Richard alluded to, you need to 
   iterate all gimple statements to see if any assign to variables you're 
   interested in (in SSA form there will only be one, outside SSA there 
   may be many).  As you notice there also may be none at all that 
   directly assign a value.  You need to solve the associated data-flow 
   problem in order to (conservatively) know the answer to this question.
   In particular you need points-to sets (above for instance, that 'y' 
   points to 'x' so that when you modify '*y' that you can note down that 
   "whatever y points to (i.e. at least x) is modified").

   There is no ready-made list of statements that potentially modify a 
   local variable in question.  You need to do that yourself, but GCC 
   contains many helper routines for parts of this problem (as it needs to 
   answer these questions itself as well, for optimization purposes).

b) allocating storage for the variable in question (and possibly giving it 
   an initial value): there are _no_ gimple instruction that express this 
   idea.  The very fact that variables are mentioned in local_decls (and 
   are used in a meaningful way in the instruction stream) leads to them
   being allocated on the stack during function expansion (see 
   expand_used_vars).

non-local variables are similarly handled, they are placed in various 
lists that lead to appropriate assembler statements allocating static 
storage for them (in the data or bss, or whatever other appropriate, 
segment).  They aren't defined (in the allocate-it sense) by gimple 
statement either.


Ciao,
Michael.


Re: Different ASM for ReLU function between GCC11 and GCC12

2023-06-20 Thread Michael Matz via Gcc
Hello,

On Tue, 20 Jun 2023, Jakub Jelinek via Gcc wrote:

> ce1 pass results in emit_conditional_move with
> (gt (reg/v:SF 83 [ x ]) (reg:SF 84)), (reg/v:SF 83 [ x ]), (reg:SF 84)
> operands in the GCC 11 case and so is successfully matched by
> ix86_expand_fp_movcc as ix86_expand_sse_fp_minmax.
> But, in GCC 12+, emit_conditional_move is called with
> (gt (reg/v:SF 83 [ x ]) (reg:SF 84)), (reg/v:SF 83 [ x ]), (const_double:SF 
> 0.0 [0x0.0p+0])
> instead (reg:SF 84 in both cases contains the (const_double:SF 0.0 [0x0.0p+0])
> value, in the GCC 11 case loaded from memory, in the GCC 12+ case set
> directly in a move.  The reason for the difference is exactly that
> because cheap SSE constant can be moved directly into a reg, it is done so
> instead of reusing a pseudo that contains that value already.

But reg84 is _also_ used as operand of emit_conditional_move, so there's 
no reason to not also use it as third operand.  It seems more canonical to 
call a function with

  X-containing-B, A, B

than with

  X-containing-B, A, something-equal-to-B-but-not-B

so either the (const_double) RTL should be used in both, or reg84 should, 
but not a mix.  Exactly so to ...

> actually a minmax.  Even if it allowed the cheap SSE constants,
> it wouldn't know that r84 is also zero (unless the expander checks that
> it is a pseudo with a single setter and verifies it or something similar).

... not have this problem.


Ciao,
Michael.


Re: gcc tricore porting

2023-06-19 Thread Michael Matz via Gcc
Hello,

note that I know next to nothing about Tricore in particular, so take 
everything with grains of salt.  Anyway:

On Mon, 19 Jun 2023, Claudio Eterno wrote:

> in your reply you mentioned "DSP". Do you want to use the DSP instructions
> for final assembly?

It's not a matter of me wanting or not wanting, I have no stake in 
tricore.  From a 20-second look at the Infineon architecture overview I've 
linked it looked like that some DSP instructions could be used for 
implementing normal floating point support, which of course would be 
desirable in a compiler supporting all of C (otherwise you'd have to 
resort to softfloat emulation).  But I have no idea if the CPU and the DSP 
parts are interconnected enough (hardware wise) to make that feasible (or 
even required, maybe the CPU supports floating point itself already?).

> Michael, based on your experience, how much time is necessary to release
> this porting?

Depending on experience in compilers in general and GCC in particular: 
anything between a couple weeks (fulltime) and a year.

> And.. have you any idea about where to start?

If you don't have an assembler and linker already, then with that.  An 
assembler/linker is not part of GCC, but it relies on one.  So look at 
binutils for this.

Once binutils are taken care of: Richis suggestion is a good one: start 
with an existing port of a target with similar features as you intend to 
implement, and modify it according to your needs.  After that works (say, 
you can compile a hello-world successfully): throw it away and restart a 
completely new target from scratch with everything you learned until then.  
(This is so that you don't start with all the cruft that the target you 
used as baseline comes with).

It helps if you already have a toolchain that you can work against, but 
it's not required.

You need to be familiar with some GCC internals, and the documentation 
coming with GCC is a good starting point: 
  https://gcc.gnu.org/onlinedocs/gccint/
(the "Machine Description" chapter will be the important one, but for that 
you need to read a couple other chapters as well)

There are a couple online resources about writing new targets for GCC.  
Stackoverflow refers to some.  E.g. 
  
https://stackoverflow.com/questions/44904644/gcc-how-to-add-support-to-a-new-architecture
refers to https://kristerw.blogspot.com/2017/08/writing-gcc-backend_4.html 
which is something not too old.  For concrete questions this mailing list 
is a good place to ask.


Good luck,
Michael.

> 
> Ciao
> Claudio
> 
> Il giorno lun 19 giu 2023 alle ore 16:16 Michael Matz  ha
> scritto:
> 
> > Hello,
> >
> > On Mon, 19 Jun 2023, Richard Biener via Gcc wrote:
> >
> > > On Sun, Jun 18, 2023 at 12:00 PM Claudio Eterno via Gcc 
> > wrote:
> > > >
> > > > Hi, this is my first time with open source development. I worked in
> > > > automotive for 22 years and we (generally) were using tricore series
> > for
> > > > these products. GCC doesn't compile on that platform. I left my work
> > some
> > > > days ago and so I'll have some spare time in the next few months. I
> > would
> > > > like to know how difficult it is to port the tricore platform on gcc
> > and if
> > > > during this process somebody can support me as tutor and... also if
> > the gcc
> > > > team is interested in this item...
> > >
> > > We welcome ports to new architectures.  Quick googling doesn't find me
> > > something like an ISA specification though so it's difficult to assess
> > the
> > > complexity of porting to that architecture.
> >
> > https://en.wikipedia.org/wiki/Infineon_TriCore
> >
> > https://www.infineon.com/dgdl/TC1_3_ArchOverview_1.pdf?fileId=db3a304312bae05f0112be86204c0111
> >
> > CPU part looks like fairly regular 32bit RISC.  DSP part seems quite
> > normal as well.  There even was once a GCC port to Tricore, version 3.3
> > from HighTec (now part of Infineon itself), but not even the wayback
> > machine has the files for that anymore:
> >
> >
> > https://web.archive.org/web/20150205040416/http://www.hightec-rt.com:80/en/downloads/sources.html
> >
> > Given the age of that port it's probably better to start from scratch
> > anyway :)  (the current stuff from them/Infineon doesn't seem to be
> > GCC-based anymore?)
> >
> >
> > Ciao,
> > Michael.
> 
> 
> 
> 


Re: gcc tricore porting

2023-06-19 Thread Michael Matz via Gcc
Hello,

On Mon, 19 Jun 2023, Richard Biener via Gcc wrote:

> On Sun, Jun 18, 2023 at 12:00 PM Claudio Eterno via Gcc  
> wrote:
> >
> > Hi, this is my first time with open source development. I worked in
> > automotive for 22 years and we (generally) were using tricore series for
> > these products. GCC doesn't compile on that platform. I left my work some
> > days ago and so I'll have some spare time in the next few months. I would
> > like to know how difficult it is to port the tricore platform on gcc and if
> > during this process somebody can support me as tutor and... also if the gcc
> > team is interested in this item...
> 
> We welcome ports to new architectures.  Quick googling doesn't find me
> something like an ISA specification though so it's difficult to assess the
> complexity of porting to that architecture.

https://en.wikipedia.org/wiki/Infineon_TriCore
https://www.infineon.com/dgdl/TC1_3_ArchOverview_1.pdf?fileId=db3a304312bae05f0112be86204c0111

CPU part looks like fairly regular 32bit RISC.  DSP part seems quite 
normal as well.  There even was once a GCC port to Tricore, version 3.3 
from HighTec (now part of Infineon itself), but not even the wayback 
machine has the files for that anymore:

https://web.archive.org/web/20150205040416/http://www.hightec-rt.com:80/en/downloads/sources.html

Given the age of that port it's probably better to start from scratch 
anyway :)  (the current stuff from them/Infineon doesn't seem to be 
GCC-based anymore?)


Ciao,
Michael.


Re: Passing the complex args in the GPR's

2023-06-07 Thread Michael Matz via Gcc
Hey,

On Tue, 6 Jun 2023, Umesh Kalappa via Gcc wrote:

> Question is : Why does GCC choose to use GPR's here and have any
> reference to support this decision  ?

You explicitely used -m32 ppc, so 
https://www.polyomino.org.uk/publications/2011/Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf
 
applies.  It explicitely states in "B.1 ATR-Linux Inclusion and 
Conformance" that it is "ATR-PASS-COMPLEX-IN-GPRS", and other sections 
detail what that means (namely passing complex args in r3 .. r10, whatever 
fits).  GCC adheres to that, and has to.

The history how that came to be was explained in the thread.


Ciao,
Michael.

 > 
> Thank you
> ~Umesh
> 
> 
> 
> On Tue, Jun 6, 2023 at 10:16 PM Segher Boessenkool
>  wrote:
> >
> > Hi!
> >
> > On Tue, Jun 06, 2023 at 08:35:22PM +0530, Umesh Kalappa wrote:
> > > Hi Adnrew,
> > > Thank you for the quick response and for PPC64 too ,we do have
> > > mismatches in ABI b/w complex operations like
> > > https://godbolt.org/z/bjsYovx4c .
> > >
> > > Any reason why GCC chose to use GPR 's here ?
> >
> > What did you expect, what happened instead?  Why did you expect that,
> > and why then is it an error what did happen?
> >
> > You used -O0.  As long as the code works, all is fine.  But unoptimised
> > code frequently is hard to read, please use -O2 instead?
> >
> > As Andrew says, why did you use -m32 for GCC but -m64 for LLVM?  It is
> > hard to compare those at all!  32-bit PowerPC Linux ABI (based on 32-bit
> > PowerPC ELF ABI from 1995, BE version) vs. 64-bit ELFv2 ABI from 2015
> > (LE version).
> >
> >
> > Segher
> 


Re: More C type errors by default for GCC 14

2023-05-15 Thread Michael Matz via Gcc
Hello,

On Fri, 12 May 2023, Florian Weimer wrote:

> * Alexander Monakov:
> 
> > This is not valid (constraint violation):
> >
> >   unsigned x;
> >
> >   int *p = &x;
> >
> > In GCC this is diagnosed under -Wpointer-sign:
> >
> >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25892
> 
> Thanks for the reference.  I filed:
> 
>   -Wpointer-sign must be enabled by default
>   

Can you please not extend the scope of your proposals in this thread but 
create a new one?

(FWIW: no, this should not be an error, a warning is fine, and I actually 
think the current state of it not being in Wall is the right thing as 
well)


Ciao,
Michael.


Re: More C type errors by default for GCC 14

2023-05-15 Thread Michael Matz via Gcc
Hello,

On Fri, 12 May 2023, Jakub Jelinek via Gcc wrote:

> On Fri, May 12, 2023 at 11:33:01AM +0200, Martin Jambor wrote:
> > > One fairly big GCC-internal task is to clear up the C test suite so that
> > > it passes with the new compiler defaults.  I already have an offer of
> > > help for that, so I think we can complete this work in a reasonable time
> > > frame.
> 
> I'd prefer to keep at least significant portion of those tests as is with
> -fpermissive added (plus of course we need new tests that verify the errors
> are emitted), so that we have some testsuite coverage for those.

Yes, this!  Try to (!) never change committed testcase souces, however 
invalid they may be (changing how they are compiled, including by 
introducing new dg-directives and using them in comments, is of course 
okay).

(And FWIW: I'm fine with Florians proposal.  I personally think the 
arguments for upgrading the warnings to errors are _not_ strong enough, 
but I don't think so very strongly :) )


Ciao,
Michael.


Re: MIN/MAX and trapping math and NANs

2023-04-11 Thread Michael Matz via Gcc
Hello,

On Tue, 11 Apr 2023, Richard Biener via Gcc wrote:

> In the case we ever implement conforming FP exception support
> either targets would need to be fixed to mask unexpected exceptions
> or we have to refrain from moving instructions where the target
> implementation may rise exceptions across operations that might
> raise exceptions as originally written in source (and across
> points of FP exception state inspection).
> 
> That said, the effect to the FP exception state according to IEEE
> is still unanswered.

The IEEE 754-2008 predicate here is minNum/maxNum, and those are 
general-computational with floating point result.  That means any sNaN 
input raises-invalid (and delivers-qNaN if masked).  For qNaN input 
there's a special case: the result is the non-qNaN input (normal handling 
would usually require the qNaN to be returned).

Note that this makes minNum/maxNum (and friends) not associative.  Also, 
different languages and different hardware implement fmin/fmax different 
and sometimes in conflict with 754-2008 (e.g. on SSE2 maxsd isn't 
commutative but maxNum is!).  This can be considered a defect in 754-2008.  
As result these operations were demoted in 754-2019 and new functions 
minimumNumber (and friends) recommended (those propagate a qNaN).

Of course IEEE standards aren't publicly available and I don't have 
754-2019 (but -2008), so I can't be sure about the _exact_ wording 
regarding minimumNumber, but for background of the min/max woes: 

  https://754r.ucbtest.org/background/minNum_maxNum_Removal_Demotion_v3.pdf

In short: it's not so easy :-)  (it may not be advisable to slavishly 
follow 754-2008 for min/max)

> The NaN handling then possibly allows
> implementation with unordered compare + mask ops.


Ciao,
Michael.


Re: [RFC PATCH] driver: unfilter default library path [PR 104707]

2023-04-06 Thread Michael Matz via Gcc
Hello,

On Thu, 6 Apr 2023, Shiqi Zhang wrote:

> Currently, gcc delibrately filters out default library paths "/lib/" and
> "/usr/lib/", causing some linkers like mold fails to find libraries.

If linkers claim to be a compatible replacement for other linkers then 
they certainly should behave in a similar way.  In this case: look into 
/lib and /usr/lib when something isn't found till then.

> This behavior was introduced at least 31 years ago in the initial
> revision of the git repo, personally I think it's obsolete because:
>  1. The less than 20 bytes of saving is negligible compares to the command
> line argument space of most hosts today.

That's not the issue that is solved by ignoring these paths in the driver 
for %D/%I directives.  The issue is (traditionally) that even if the 
startfiles sit in /usr/lib (say), you don't want to add -L/usr/lib to the 
linker command line because the user might have added -L/usr/local/lib 
explicitely into her link command and depending on order of spec file 
entries the -L/usr/lib would be added in front interacting with the 
expectations of where libraries are found.

Hence: never add something in (essentially) random places that is default 
fallback anyway.  (Obviously the above problem could be solved in a 
different, more complicated, way.  But this is the way it was solved since 
about forever).

If mold doesn't look into {,/usr}/lib{,64} (as appropriate) by default 
then that's the problem of mold.


Ciao,
Michael.


Re: [Tree-SSA] Question from observation, bogus SSA form?

2023-03-17 Thread Michael Matz via Gcc
Hello,

On Fri, 17 Mar 2023, Pierrick Philippe wrote:

> > This means that global variables, volatile variables, aggregates,
> > variables which are not considered aggregates but are nevertheless
> > partially modified (think insertion into a vector) or variables which
> > need to live in memory (most probably because their address was taken)
> > are not put into an SSA form.  It may not be easily possible.
> 
> Alright, I understand, but I honestly find it confusing.

You can write something only into SSA form if you see _all_ assignments to 
the entity in question.  That's not necessarily the case for stuff sitting 
in memory.  Code you may not readily see (or might not be able to 
statically know the behaviour of) might be able to get ahold of it and 
hence change it behind your back or in unknown ways.  Not in your simple 
example (and if you look at it during some later passes in the compiler 
you will see that 'x' will indeed be written into SSA form), but in some 
that are only a little more complex:

int foo (int i) {
  int x, *y=&x;
  x = i;  // #1
  bar(y); // #2
  return x;
}

or

int foo (int i) {
  int x, z, *y = i ? &x : &z;
  x = z = 1;  // #1
  *y = 42;// #2
  return x;
}

here point #1 is very obviously a definition of x (and z) in both 
examples.  And point #2?  Is it a definition or not?  And if it is, then 
what entity is assigned to?  Think about that for a while and what that 
means for SSA form.

> I mean, aren't they any passes relying on the pure SSA form properties 
> to analyze code? For example to DSE or DCE.

Of course.  They all have to deal with memory in a special way (many by 
not doing things on memory).  Because of the above problems they would 
need to special-case memory no matter what.  (E.g. in GCC memory is dealt 
with via the virtual operands, the '.MEM_x = VDEF<.MEM_y>' and VUSE 
constructs you saw in the dumps, to make dealing with memory in an 
SSA-based compiler at least somewhat natural).


Ciao,
Michael.


Re: access to include path in front end

2022-12-05 Thread Michael Matz via Gcc
Hey,

On Fri, 2 Dec 2022, James K. Lowden wrote:

> > > 3.  Correct the entries in the default_compilers array.  Currently I
> > > have in cobol/lang-specs.h:
> > > 
> > > {".cob", "@cobol", 0, 0, 0},
> > > {".COB", "@cobol", 0, 0, 0},
> > > {".cbl", "@cobol", 0, 0, 0},
> > > {".CBL", "@cobol", 0, 0, 0},
> > > {"@cobol", 
> > >   "cobol1 %i %(cc1_options) %{!fsyntax-only:%(invoke_as)}", 
> > >   0, 0, 0}, 
> > 
> > It misses %(cpp_unique_options) which was the reason why your -I
> > arguments weren't passed to cobol1.  
> 
> If I understood you correctly, I don't need to modify gcc.cc.  I only
> need to modify cobol/lang-specs.h, which I've done.  But that's
> evidently not all I need to do, because it doesn't seem to work.  
> 
> The last element in the fragment in cobol/lang-specs.h is now: 
> 
> {"@cobol",
>   "cobol1 %i %(cc1_options) %{!fsyntax-only:%(invoke_as)} "
>   "%(cpp_unique_options) ",

%(invoke_as) needs to be last.  What it does is effectively add this to 
the command line (under certain conditions): "-somemoreoptions | as".  
Note the pipe symbol.  Like in normal shell commands also the gcc driver 
interprets this as "and now start the following command as well connection 
stdout of the first to stdin of the second".  So all in all the generated 
cmdline will be somewhat like:

  cobol1 input.cbl -stuff-from-cc1-options | as - -stuff-from-cpp-options

Your cpp_unique_options addition will effectively be options to that 'as' 
command, but you wanted it to be options for cobol1.  So, just switch 
order of elements.

> I see the -B and -I options, and others, with their arguments, contained
> in COLLECT_GCC_OPTIONS on lines 9 and 11.  I guess that represents an
> environment string?

Yes.  It's our round-about-way of passing the gcc options as the user gave 
them downwards in case collect2 (a wrapper for the linking step for, gah, 
don't ask) needs to call gcc itself recursively.  But in the -### (or -v) 
output, if the assembler is invoked in your example (i.e. cobol1 doesn't 
fail for some reason) you should see your -I options being passed to that 
one (wrongly so, of course :) ).


Ciao,
Michael.


Re: access to include path in front end

2022-12-01 Thread Michael Matz via Gcc
Hey,

On Thu, 1 Dec 2022, James K. Lowden wrote:

> > E.g. look in gcc.cc for '@c' (matching the file extension) how that
> > entry uses '%(cpp_unique_options)', and how cpp_unique_options is
> > defined for the specs language:
> > 
> >   INIT_STATIC_SPEC ("cpp_unique_options",   &cpp_unique_options),
> > 
> > and
> > 
> > static const char *cpp_unique_options =
> >   "%{!Q:-quiet} %{nostdinc*} %{C} %{CC} %{v} %@{I*&F*} %{P} %I\  
> 
> Please tell me if this looks right and complete to you:
> 
> 1.  Add an element to the static_specs array: 
> 
> INIT_STATIC_SPEC ("cobol_options", &cobol_options),

That, or expand its contents where you'd use '%(cobol_options)' in the 
strings.

> 
> 2.  Define the referenced structure: 
> 
>   static const char *cobol_options =  "%{v} %@{I*&F*}"
>   or just
>   static const char *cobol_options =  "%{v} %@{I*}"
> 
>   because I don't know what -F does, or if I need it.

I.e. as long as it's that short expanding inline would work nicely.

> I'm using "cobol_options" instead of "cobol_unique_options" because the
> options aren't unique to Cobol, and because only cpp seems to have
> unique options.  
> 
> I'm including %{v} against the future, when the cobol1 compiler
> supports a -v option. 

Makes sense.

> 3.  Correct the entries in the default_compilers array.  Currently I
> have in cobol/lang-specs.h:
> 
> {".cob", "@cobol", 0, 0, 0},
> {".COB", "@cobol", 0, 0, 0},
> {".cbl", "@cobol", 0, 0, 0},
> {".CBL", "@cobol", 0, 0, 0},
> {"@cobol", 
>   "cobol1 %i %(cc1_options) %{!fsyntax-only:%(invoke_as)}", 
>   0, 0, 0}, 
> 
> That last one is a doozy.  Is it even slightly right?

It misses %(cpp_unique_options) which was the reason why your -I arguments 
weren't passed to cobol1.  You would just your new %(cobol_options), or 
simply '%{v} %{I*}' directly in addition to cc1_options.

> IIUC, I should at
> least remove 
> 
>   %{!fsyntax-only:%(invoke_as)}
> 
> because I don't need the options from the invoke_as string in gcc.cc. 

I think you do, as cobol1 will write out assembler code (it does, right?), 
so to get an object file the driver needs to invoke 'as' as well.  
Basically invoke_as tacks another command to run at the end of the already 
build command line (the one that above would start with 'cobol1 
inputfile.cob ... all the options ...'.  It will basically tack the 
equivalent of '| as tempfile.s -o realoutput.o' to the end (which 
eventually will make the driver executate that command as well).

> That would still leave me with too much, because cobol1 ignores most of
> the options cc1 accepts.  What would you do?

I would go through all cc1_options and see if they _really_ shouldn't be 
interpreted by cobol1.  I guess for instance '-Ddefine' really doesn't 
make sense, but e.g. '-m...' and '-f...' do, and maybe -quiet as well, and 
suchlike.  In that case I'd just use cc1_options (in addition to your new 
%{I*} snippet).

If you then really determine, that no, most options do not make sense you 
need to extract a subset of cc1_options that do, and write them into the 
@cobol entry.  Look e.g. what the fortran frontend does (in 
fortran/lang-specs.h) it simply attaches more things to cc1_options.

> I don't understand the relationship between default_compliers and
> static_specs.  

static_specs lists the names of 'variables' you can use within the specs 
strings, and to what they should expand.  E.g. when I would want to use 
'%(foobar)' in any of my specs strings that needs to be registered in 
static_spec[]:

  INIT_STATIC_SPEC ("foobar", &a_variable_containing_a_string)

The specs parse would then replace '%(foobar)' in specs strings with 
whatever that variable contains.

Using such variables mostly makes sense if you want to enable users (who 
can provide their own specs file) to refer to well-known snippets 
maintained by GCC itself.  For most such strings it's not necessary, and 
you'd be fine with the approach of fortran:

 #define MY_FOOBAR_STRING "%{v} ... this and that ..."

...

  {@mylang, ... "lang1 %i " MY_FOOBAR_STRING "" ... }

> I have made no special provision for "compiler can deal with
> multiple source files", except that cobol1 accepts multiple source
> files on the command line, and iterates over them.  If that's enough,
> then I'll set compiler::combinable to 1.

No good advise here for combinable.  Try it :)

> As I mentioned, for a year I've been able to avoid the Specs Language,
> apparently because some things happen by default.  The options defined
> in cobol/lang.opt are passed from gcobol to cobol1.  The value of the
> -findicator-column option is available (but only if the option starts
> with "f"; -indicator-column doesn't work).  cobol1 sees the value of
> -fmax-errors. 

That's because "%{f*}" is contained in %(cc1_options): 'pass on all 
options starting with "f"', and because you listed cc1_options in your 
cobol1 command line.


Ciao,
Michael.


Re: access to include path in front end

2022-11-30 Thread Michael Matz via Gcc
Hello,

On Tue, 29 Nov 2022, James K. Lowden wrote:

> I don't understand how to access in a front end the arguments to the -I
> option on the command line.  
> 
> Cobol has a feature similar to the C preprecessor, known as the
> Compiler Directing Facility (CDF).  The CDF has a COPY statement that
> resembles an #include directive in C, and shares the property that COPY
> names a file that is normally found in a "copybook" which, for our
> purposes, is a directory of such files.  The name of that directory is
> defined outside the Cobol program.  
> 
> I would like to use the -I option to pass the names of copybook
> directories to the cobol front end.  A bit of exploration yesterday left
> me with the sense that the -I argument, in C at least, is not passed to
> the compiler, but to the preprocessor. Access to -fmax-errors I think
> I've figured out, but -I is a mystery. 
> 
> I'm a little puzzled by the status quo as I understand it.  Unless I
> missed it, it's not discussed in gccint.  ISTM ideally there would be
> some kind of getopt(3) processing, and the whole set of command-line
> options captured in an array of structures accessible to any front
> end.

There is, it's just much more complicated than getopt :)

If you're looking at the C frontends for inspiration, then:

c-family/c.opt defines which options are recognized and several specifics 
about them, e.g. for -I it has:


I
C ObjC C++ ObjC++ Joined Separate MissingArgError(missing path after %qs)
-I Add  to the end of the main include path.


(look at some other examples therein, also in common.opt to get a feel).

Then code in c-family/c-opts.c:c_common_handle_option actually handles the 
option:

case OPT_I:
  if (strcmp (arg, "-"))
add_path (xstrdup (arg), INC_BRACKET, 0, true);
  else
  .,.

That function is made a langhook for option processing so that it's 
actually called via c/c-objc-common.h:

  #define LANG_HOOKS_HANDLE_OPTION c_common_handle_option

If you're also using the model of a compiler driver (i.e. the gcc program, 
source in gcc.cc) that actually calls compiler (cc1), assembler and 
linker, then you also need to arrange for that program to pass all -I 
options to the compiler proper.  That is done with the spec language, by 
somewhere having '{I*}' in the specs for invoking the cobol compiler.  
E.g. look in gcc.cc for '@c' (matching the file extension) how that entry 
uses '%(cpp_unique_options)', and how cpp_unique_options is defined for 
the specs language:

  INIT_STATIC_SPEC ("cpp_unique_options",   &cpp_unique_options),

and

static const char *cpp_unique_options =
  "%{!Q:-quiet} %{nostdinc*} %{C} %{CC} %{v} %@{I*&F*} %{P} %I\  

(the specs language used here is documented in a lengthy comment early in 
gcc.cc, "The Specs Language")

The "%@{I*F*}" is the one that makes gcc pass -Iwhatever to cc1 (and 
ensures relative order with -F options is retained and puts all these into 
an @file if one is given on the cmdline, otherwise leaves it on cmdline).  
If you use the compiler driver then using '-v' when invoking it will 
quickly tell you if that options passing worked, as it will show the 
concrete command it exec's for the compiler proper.

Hope this helps.


Ciao,
Michael.


Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

2022-11-29 Thread Michael Matz via Gcc
Hey,

On Tue, 29 Nov 2022, Uecker, Martin wrote:

> It does not require any changes on how arrays are represented.
> 
> As part of VM-types the size becomes part of the type and this
> can be used for static or dynamic analysis, e.g. you can 
> - today - get a run-time bounds violation with the sanitizer:
> 
> void foo(int n, char (*buf)[n])
> {
>   (*buf)[n] = 1;
> }

This can already statically analyzed as being wrong, no need for dynamic 
checking.  What I mean is the checking of the claimed contract.  Above you 
assure for the function body that buf has n elements.  This is also a 
pre-condition for calling this function and _that_ can't be checked in all 
cases because:

  void foo (int n, char (*buf)[n]) { (*buf)[n-1] = 1; }
  void callfoo(char * buf) { foo(10, buf); }

buf doesn't have a known size.  And a pre-condition that can't be checked 
is no pre-condition at all, as only then it can become a guarantee for the 
body.

The compiler has no choice than to trust the user that the pre-condition 
for calling foo is fulfilled.  I can see how being able to just check half 
of the contract might be useful, but if it doesn't give full checking then 
any proposal for syntax should be even more obviously orthogonal than the 
current one.

> For
> 
> void foo(int n, char buf[n]);
> 
> it semantically has no meaning according to the C standard,
> but a compiler could still warn. 

Hmm?  Warn about what in this decl?

> It could also warn for
> 
> void foo(int n, char buf[n]);
> 
> int main()
> {
> char buf[9];
> foo(buf);
> }

You mean if you write 'foo(10,buf)' (the above, as is, is simply a syntax 
error for non-matching number of args).  Or was it a mispaste and you mean 
the one from the godbolt link, i.e.:

void foo(char buf[10]){ buf[9] = 1; }
int main()
{
char buf[9];
foo(buf);
}

?  If so, yeah, we warn already.  I don't think this is an argument for 
(or against) introducing new syntax.

...

> But in general: This feature is useful not only for documentation
> but also for analysis.

Which feature we're talking about now?  The ones you used all work today, 
as you demonstrated.  I thought we would be talking about that ".whatever" 
syntax to refer to arbitrary parameters, even following ones?  I think a 
disrupting syntax change like that should have a higher bar than "in some 
cases, depending on circumstance, we might even be able to warn".


Ciao,
Michael.


Re: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters

2022-11-29 Thread Michael Matz via Gcc
Hey,

On Tue, 29 Nov 2022, Alex Colomar via Gcc wrote:

> How about the compiler parsing the parameter list twice?

This _is_ unbounded look-ahead.  You could avoid this by using "." for 
your new syntax.  Use something unambiguous that can't be confused with 
other syntactic elements, e.g. with a different punctuator like '@' or the 
like.  But I'm generally doubtful of this whole feature within C itself.  
It serves a purpose in documentation, so in man-pages it seems fine enough 
(but then still could use a different puncuator to not be confusable with 
C syntax).

But within C it still can only serve a documentation purpose as no 
checking could be performed without also changes in how e.g. arrays are 
represented (they always would need to come with a size).  It seems 
doubtful to introduce completely new and ambiguous syntax with all the 
problems Joseph lists just in order to be able to write documentation when 
there's a perfectly fine method to do so: comments.


Ciao,
Michael.


Re: How can Autoconf help with the transition to stricter compilation defaults?

2022-11-17 Thread Michael Matz via Gcc
Hello,

On Wed, 16 Nov 2022, Paul Eggert wrote:

> On 2022-11-16 06:26, Michael Matz wrote:
> > char foobar(void);
> > int main(void) {
> >return &foobar != 0;
> > }
> 
> That still has undefined behavior according to draft C23,

This is correct (and also holds for the actually working variant later, 
with a volatile variable).  If your argument is then that as both 
solutions for the link-test problem are relying on undefined behaviour 
they are equivalent and hence no change is needed, you have a point, but I 
disagree.  In practice one (with the call) will cause more problems than 
the other (with address taking).

> If Clang's threatened pickiness were of some real use elsewhere, it 
> might be justifiable for default Clang to break Autoconf. But so far we 
> haven't seen real-world uses that would justify this pickiness for 
> Autoconf's use of 'char memset_explicit(void);'.

Note that both, GCC and clang, already warn (not error out!) about the 
mismatching decl, even without any headers.  So we are in the pickiness 
era already.

I.e. a C file containing just a single line "char printf(void);" will be 
warned about, by default.  There is about nothing that autoconf could do 
to rectify this, except containing a long list of prototypes for 
well-known functions, with the associated maintenance hassle.  But 
autoconf _can_ do something about how the decls are used in the 
link-tests.


Ciao,
Michael.


Re: How can Autoconf help with the transition to stricter compilation defaults?

2022-11-16 Thread Michael Matz via Gcc
Hello,

On Wed, 16 Nov 2022, Jonathan Wakely wrote:

> > > Unrelated but I was a bit tempted to ask for throwing in
> > > -Wbuiltin-declaration-mismatch to default -Werror while Clang 16 was at
> > > it, but I suppose we don't want the world to burn too much,
> >
> > :-)  It's IMHO a bug in the standard that it misses "if any of its
> > associated headers are included" in the item for reservation of external
> > linkage identifiers; it has that for all other items about reserved
> > identifiers in the Library clause.  If that restriction were added you
> > couldn't justify erroring on the example at hand (because it doesn't
> > include e.g.  and then printf wouldn't be reserved).  A warning
> > is of course always okay and reasonable.  As is, you could justify
> > erroring out, but I too think that would be overzealous.
> 
> 
> I think that's very intentional and not a defect in the standard.
> 
> If one TU was allowed to define:
> 
> void printf() { }
> 
> and have that compiled into the program, then that would cause
> unexpected behaviour for every other TU which includes  and
> calls printf. They would get the non-standard rogue printf.

True.  But suppose the restriction would be added.  I could argue that 
then your problem program (in some other TU) _does_ include the header, 
hence the identifier would have been reserved and so the above definition 
would have been wrong.  I.e. I think adding the restriction wouldn't allow 
the problematic situation either.

I'm aware that the argument would then invoke all the usual problems of 
what constitutes a full program, and if that includes the library even 
when not including headers and so on.  And in any case currently the 
standard does say they're reserved so it's idle speculation anyway :)


Ciao,
Michael.


Re: How can Autoconf help with the transition to stricter compilation defaults?

2022-11-16 Thread Michael Matz via Gcc
Hello,

On Wed, 16 Nov 2022, Sam James wrote:

> Unrelated but I was a bit tempted to ask for throwing in 
> -Wbuiltin-declaration-mismatch to default -Werror while Clang 16 was at 
> it, but I suppose we don't want the world to burn too much,

:-)  It's IMHO a bug in the standard that it misses "if any of its 
associated headers are included" in the item for reservation of external 
linkage identifiers; it has that for all other items about reserved 
identifiers in the Library clause.  If that restriction were added you 
couldn't justify erroring on the example at hand (because it doesn't 
include e.g.  and then printf wouldn't be reserved).  A warning 
is of course always okay and reasonable.  As is, you could justify 
erroring out, but I too think that would be overzealous.


Ciao,
Michael.


Re: How can Autoconf help with the transition to stricter compilation defaults?

2022-11-16 Thread Michael Matz via Gcc
Hey,

On Wed, 16 Nov 2022, Alexander Monakov wrote:

> > The idea is so obvious that I'm probably missing something, why autoconf 
> > can't use that idiom instead.  But perhaps the (historic?) reasons why it 
> > couldn't be used are gone now?
> 
> Ironically, modern GCC and LLVM optimize '&foobar != 0' to '1' even at -O0,
> and thus no symbol reference remains in the resulting assembly.

Err, right, *head-->table*.
Playing with volatile should help:

char foobar(void);
char (* volatile ptr)(void);
int main(void) {
ptr = foobar;
return ptr != 0;
}


Ciao,
Michael.


Re: How can Autoconf help with the transition to stricter compilation defaults?

2022-11-16 Thread Michael Matz via Gcc
Hi,

On Tue, 15 Nov 2022, Paul Eggert wrote:

> On 2022-11-15 11:27, Jonathan Wakely wrote:
> > Another perspective is that autoconf shouldn't get in the way of
> > making the C and C++ toolchain more secure by default.
> 
> Can you cite any examples of a real-world security flaw what would be 
> found by Clang erroring out because 'char foo(void);' is the wrong 
> prototype? Is it plausible that any such security flaw exists?
> 
> On the contrary, it's more likely that Clang's erroring out here would 
> *introduce* a security flaw, because it would cause 'configure' to 
> incorrectly infer that an important security-relevant function is 
> missing and that a flawed substitute needs to be used.
> 
> Let's focus on real problems rather than worrying about imaginary ones.

I sympathize, and I would think a compiler emitting an error (not a 
warning) in the situation at hand (in absence of -Werror) is overly 
pedantic.  But, could autoconf perhaps avoid the problem?  AFAICS the 
ac_fn_c_check_func really does only a link test to check for symbol 
existence, and the perceived problem is that the call statement in main() 
invokes UB.  So, let's avoid the call then while retaining the access to 
the symbol?  Like:

-
char foobar(void);
int main(void) {
  return &foobar != 0;
}
-

No call involved: no reason for compiler to complain.  The prototype decl 
itself will still be "wrong", but compilers complaining about that (in 
absence of a pre-existing different prototype, which is avoided by 
autoconf) seem unlikely.

Obviously this program will also say "foobar exists" if it's a data 
symbol, but that's the same with the variant using the call on most 
platforms (after all it's not run).

The idea is so obvious that I'm probably missing something, why autoconf 
can't use that idiom instead.  But perhaps the (historic?) reasons why it 
couldn't be used are gone now?


Ciao,
Michael.


Re: Local type inference with auto is in C2X

2022-11-03 Thread Michael Matz via Gcc
Hello,

On Thu, 3 Nov 2022, Florian Weimer via Gcc wrote:

> will not have propagated widely once GCC 13 releases, so rejecting
> implicit ints in GCC 13 might be too early.  GCC 14 might want to switch
> to C23/C24 mode by default, activating auto support, if the standard
> comes out in 2023 (which apparently is the plan).
> 
> Then we would go from
> warning to changed semantics in a single release.
> 
> Comments?

I would argue that changing the default C mode to c23 in the year that 
comes out (or even a year later) is too aggressive and early.  Existing 
sources are often compiled with defaults, and hence would change 
semantics, which seems unattractive.  New code can instead easily use 
-std=c23 for a time.

E.g. c99/gnu99 (a largish deviation from gnu90) was never default and 
gnu11 was made default only in 2014.


Ciao,
Michael.


Re: Counting static __cxa_atexit calls

2022-08-24 Thread Michael Matz via Gcc
Hello,

On Wed, 24 Aug 2022, Florian Weimer wrote:

> > On Wed, 24 Aug 2022, Florian Weimer wrote:
> >
> >> > Isn't this merely moving the failure point from exception-at-ctor to 
> >> > dlopen-fails?
> >> 
> >> Yes, and that is a soft error that can be handled (likewise for
> >> pthread_create).
> >
> > Makes sense.  Though that actually hints at a design problem with ELF 
> > static ctors/dtors: they should be able to soft-fail (leading to dlopen or 
> > pthread_create error returns).  So, maybe the _best_ way to deal with this 
> > is to extend the definition of the various object-initionalization means 
> > in ELF to allow propagating failure.
> 
> We could enable unwinding through the dynamic linker perhaps.  But as I
> said, those Itanium ABI functions tend to be noexcept, so there's work
> on that front as well.

Yeah, my idea would have been slightly less ambitious: redefine the ABI of 
.init_array functions to be able to return an int.  The loader would abort 
loading if any of them return non-zero.  Now change GCC code emission of 
those helper functions placed in .init_array to catch all exceptions and 
(in case an exception happened) return non-zero.  Or, even easier, don't 
deal with exceptions, but rather just check if __cxa_atexit worked, and if 
not return non-zero right away.  That way all the exception propagation 
(or cxa_atexit error handling) stays purely within the GCC generated code 
and the dynamic loader only needs to deal with return values, not 
exceptions and unwinding.

For backward compat we can't just change the ABI of .init_array, but we 
can devise an alternative: .init_array_mayfail and the associated DT tags.

> For thread-local storage, it's even more difficult because any first
> access can throw even if the constructor is noexcept.

That's extending the scope somewhat, pre-counting cxa_atexit wouldn't 
solve this problem either, right?

> >> I think we need some level of link editor support to avoid drastically
> >> over-counting multiple static calls that get merged into one
> >> implementation as the result of vague linkage.  Not sure how to express
> >> that at the ELF level?
> >
> > Hmm.  The __cxa_atexit calls are coming from the per-file local static 
> > initialization_and_destruction routine which doesn't have vague linkage, 
> > so its contribution to the overall number of cxa_atexit calls doesn't 
> > change from .o to final-exe.  Can you show an example of what you're 
> > worried about?
> 
> Sorry if I didn't use the correct terminology.
> 
> I was thinking about this:
> 
> #include 
> 
> template 
> struct S {
>   static std::vector vec;
> };
> 
> template  std::vector S::vec(i);
> 
> std::vector &
> f()
> {
>   return S<1009>::vec;
> }
> 
> The initialization is deduplicated with the help of a guard variable,
> and that also bounds to number of __cxa_atexit invocations to at most
> one per type.

Ah, right, thanks.  The guard variable for class-local statics, I was 
thinking file-scope globals.  Double-hmm.  I don't readily see a nice way 
to correctly precalculate the number of cxa_atexit calls here.  A simple 
problem is the following: assume a couple files each defining such class 
templates, that ultimately define and initialize static members A<1>::a 
and B<1>::b (assume vague linkage).  Assume we have four files:

a:  defines A::a
b:  defines B::b
ab: defines A::a and B::b
ba: defines B::b and A::a

Now link order influences which file gets to actually initialize the 
members and which ones skip it due to guard variables.  But the object 
files themself don't know enough context of which will be which.  Not even 
the link editor know that because the non-taken cxa_atexit calls aren't in 
linkonce/group sections, there are all there in 
object.o:.text:_Z41__static_initialization_and_destruction_0ii .

So, what would need to be emitted is for instance a list of cxa_atexit 
calls plus guard variable; the link editor could then count all unguarded 
cxa_atexit calls plus all guarded ones, but the latter only once per 
guard.  The key would be the identity of the guard variable.

That seems like an awful lot of complexity at the wrong level for a very 
specific usecase when we could also make .init_array failable, which then 
even might have more usecases.

> > A completely different way would be to not use cxa_atexit at all: 
> > allocate memory statically for the object and dtor addresses in 
> > .rodata (instead of in .text right now), and iterate over those at 
> > static_destruction time.  (For the thread-local ones it would need to 
> > store arguments to __tls_get_addr).
> 
> That only works if the compiler and linker can figure out the
> construction order.  In general, that is not possible, and that case
> seems even quite common with C++.  If the construction order is not
> known ahead of time, it is necessary to record it somewhere, so that
> destruction can happen in reverse.  So I think storing things in .rodata
> is out.

Hmm, right.  The ba

Re: Counting static __cxa_atexit calls

2022-08-24 Thread Michael Matz via Gcc
Hello,

On Wed, 24 Aug 2022, Florian Weimer wrote:

> > Isn't this merely moving the failure point from exception-at-ctor to 
> > dlopen-fails?
> 
> Yes, and that is a soft error that can be handled (likewise for
> pthread_create).

Makes sense.  Though that actually hints at a design problem with ELF 
static ctors/dtors: they should be able to soft-fail (leading to dlopen or 
pthread_create error returns).  So, maybe the _best_ way to deal with this 
is to extend the definition of the various object-initionalization means 
in ELF to allow propagating failure.

> > Probably a note section, which the link editor could either transform into 
> > a dynamic tag or leave as note(s) in the PT_NOTE segment.  The latter 
> > wouldn't require any specific tooling support in the link editor.  But the 
> > consumer would have to iterate through all the notes to add the 
> > individual counts together.  Might be acceptable, though.
> 
> I think we need some level of link editor support to avoid drastically
> over-counting multiple static calls that get merged into one
> implementation as the result of vague linkage.  Not sure how to express
> that at the ELF level?

Hmm.  The __cxa_atexit calls are coming from the per-file local static 
initialization_and_destruction routine which doesn't have vague linkage, 
so its contribution to the overall number of cxa_atexit calls doesn't 
change from .o to final-exe.  Can you show an example of what you're 
worried about?

A completely different way would be to not use cxa_atexit at all: allocate 
memory statically for the object and dtor addresses in .rodata (instead of 
in .text right now), and iterate over those at static_destruction time.  
(For the thread-local ones it would need to store arguments to 
__tls_get_addr).

Doing that or defining failure modes for ELF init/fini seems a better 
design than hacking around the current limitation via counting static 
cxa_atexit calls.


Ciao,
Michael.


Re: Counting static __cxa_atexit calls

2022-08-23 Thread Michael Matz via Gcc
Hello,

On Tue, 23 Aug 2022, Florian Weimer via Gcc wrote:

> We currently have a latent bug in glibc where C++ constructor calls can
> fail if they have static or thread storage duration and a non-trivial
> destructor.  The reason is that __cxa_atexit (and
> __cxa_thread_atexit_impl) may have to allocate memory.  We can avoid
> that if we know how many such static calls exist in an object (for C++,
> the compiler will never emit these calls repeatedly in a loop).  Then we
> can allocate the resources beforehand, either during process and thread
> start, or when dlopen is called and new objects are loaded.

Isn't this merely moving the failure point from exception-at-ctor to 
dlopen-fails?  If an individual __cxa_atexit can't allocate memory anymore 
for its list structure, why should pre-allocation (which is still dynamic, 
based on the number of actual atexit calls) have any more luck?

> What would be the most ELF-flavored way to implement this?  After the
> final link, I expect that the count (or counts, we need a separate
> counter for thread-local storage) would show up under a new dynamic tag
> in the dynamic segment.  This is actually a very good fit because older
> loaders will just ignore it.  But the question remains what GCC should
> emit into assembler & object files, so that the link editor can compute
> the total count from that.

Probably a note section, which the link editor could either transform into 
a dynamic tag or leave as note(s) in the PT_NOTE segment.  The latter 
wouldn't require any specific tooling support in the link editor.  But the 
consumer would have to iterate through all the notes to add the 
individual counts together.  Might be acceptable, though.


Ciao,
Michael.


Re: DWARF question about size of a variable

2022-06-09 Thread Michael Matz via Gcc
Hello,

On Wed, 8 Jun 2022, Carl Love via Gcc wrote:

> Is there dwarf information that gives the size of a variable?

Yes, it's in the type description.  For array types the siblings of it 
give the index types and ranges.  If that range is 
computed at runtime DWARF will (try to) express it as an expression in 
terms of other available values (like registers, constants, or memory), 
and as such can also change depending on where (at which PC) you evaluate 
that expression (and the expression itself can also change per PC).  For 
instance, in your example, on x86 with -O3 we have these relevant DWARF 
snippets (readelf -wi):

 <2>: Abbrev Number: 12 (DW_TAG_variable)
   DW_AT_name: a
   DW_AT_type: <0xa29>

So, 'a' is a variable of type 0xa29, which is:

 <1>: Abbrev Number: 13 (DW_TAG_array_type)
   DW_AT_type: <0xa4a>
   DW_AT_sibling : <0xa43>
 <2>: Abbrev Number: 14 (DW_TAG_subrange_type)
   DW_AT_type: <0xa43>
   DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 31 1c   
(DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl; DW_OP_const1u: 32; 
DW_OP_shra; DW_OP_lit1; DW_OP_minus)
 <2>: Abbrev Number: 0

So, type 0xa29 is an array type, whose element type is 0xa4a (which will 
turn out to be a signed char), and whose (single) dimension type is 0xa43 
(unsigned long) with an upper bound that is runtime computed, see below.
The referenced types from that are:

 <1>: Abbrev Number: 1 (DW_TAG_base_type)
   DW_AT_byte_size   : 8
   DW_AT_encoding: 7   (unsigned)
   DW_AT_name: (indirect string, offset: 0x13b): long unsigned 
int

 <1>: Abbrev Number: 1 (DW_TAG_base_type)
   DW_AT_byte_size   : 1
   DW_AT_encoding: 6   (signed char)
   DW_AT_name: (indirect string, offset: 0x1ce): char

With that gdb has all information to compute the size of this array 
variable in its scope ((upper-bound + 1 minus lower-bound (default 0)) 
times sizeof(basetype)).  Compare the above for instance with the 
debuginfo generated at -O0, only the upper-range expression changes:

 <2>: Abbrev Number: 10 (DW_TAG_subrange_type)
   DW_AT_type: <0xa29>
   DW_AT_upper_bound : 3 byte block: 91 68 6   (DW_OP_fbreg: -24; 
DW_OP_deref)

Keep in mind that DWARF expressions are based on a simple stack machine.
So, for instance, the computation for the upper bound in the O3 case is:
 ((register %rdi + 1) << 32 >> 32) - 1
(i.e. basically the 32-to-64 signextension of %rdi).

On ppc I assume that either the upper_bound attribute isn't there or 
contains an uninformative expression (or one that isn't valid at the 
program-counter gdb stops at), in which case you would want to look at 
dwarf2out.cc:subrange_type_die or add_subscript_info (look for 
TYPE_MAX_VALUE of the subscripts domain type).  Hope this helps.


Ciao,
Michael.


Re: reordering of trapping operations and volatile

2022-01-17 Thread Michael Matz via Gcc
Hello,

On Sat, 15 Jan 2022, Martin Uecker wrote:

> > Because it interferes with existing optimisations. An explicit 
> > checkpoint has a clear meaning. Using every volatile access that way 
> > will hurt performance of code that doesn't require that behaviour for 
> > correctness.
> 
> This is why I would like to understand better what real use cases of 
> performance sensitive code actually make use of volatile and are 
> negatively affected. Then one could discuss the tradeoffs.

But you seem to ignore whatever we say in this thread.  There are now 
multiple examples that demonstrate problems with your proposal as imagined 
(for lack of a _concrete_ proposal with wording from you), problems that 
don't involve volatile at all.  They all stem from the fact that you order 
UB with respect to all side effects (because you haven't said how you want 
to avoid such total ordering with all side effects).

As I said upthread: you need to define a concept of time at whose 
granularity you want to limit the effects of UB, and the borders of each 
time step can't simply be (all) the existing side effects.  Then you need 
to have wording of what it means for UB to occur within such time step, in 
particular if multiple UB happens within one (for best results it should 
simply be UB, not individual instances of different UBs).

If you look at the C++ proposal (thanks Jonathan) I think you will find 
that if you replace 'std::observable' with 'sequence point containing a 
volatile access' that you basically end up with what you wanted.  The 
crucial point being that the time steps (epochs in that proposal) aren't 
defined by all side effects but by a specific and explicit thing only (new 
function in the proposal, volatile accesses in an alternative).

FWIW: I think for a new language feature reusing volatile accesses as the 
clock ticks are the worse choice: if you intend that feature to be used 
for writing safer programs (a reasonable thing) I think being explicit and 
at the same time null-overhead is better (i.e. a new internal 
function/keyword/builtin, specified to have no effects except moving the 
clock forward).  volatile accesses obviously already exist and hence are 
easier to integrate into the standard, but in a given new/safe program, 
whenever you see a volatile access you would always need to ask 'is thise 
for clock ticks, or is it a "real" volatile access for memmap IO'.


Ciao,
Michael.


Re: git hooks: too strict check

2022-01-14 Thread Michael Matz via Gcc
Hello,

On Fri, 14 Jan 2022, Martin Liška wrote:

> Hello.
> 
> I'm working on a testsuite clean-up where some of the files are wrongly named.
> More precisely, so files have .cc extension and should use .C. However there's
> existing C test-case and it leads to:
> 
> marxin@marxinbox:~/Programming/gcc/gcc/testsuite> find . -name test-asm.*
> ./jit.dg/test-asm.C
> ./jit.dg/test-asm.c

You can't have that, the check is correct.  There are filesystems (NTFS 
for instance) that are case-preserving but case-insensitive, on those you 
really can't have two files that differ only in casing.  You need to find 
a different solution, either consistently use .cc instead of .C, live with 
the inconsistency or rename the base name of these files.


Ciao,
Michael.

> 
> test-kunlun me/rename-testsuite-files
> Enumerating objects: 804, done.
> Counting objects: 100% (804/804), done.
> Delta compression using up to 16 threads
> Compressing objects: 100% (242/242), done.
> Writing objects: 100% (564/564), 142.13 KiB | 7.48 MiB/s, done.
> Total 564 (delta 424), reused 417 (delta 295), pack-reused 0
> remote: Resolving deltas: 100% (424/424), completed with 222 local objects.
> remote: *** The following filename collisions have been detected.
> remote: *** These collisions happen when the name of two or more files
> remote: *** differ in casing only (Eg: "hello.txt" and "Hello.txt").
> remote: *** Please re-do your commit, chosing names that do not collide.
> remote: ***
> remote: *** Commit: 7297e1de9bed96821d2bcfd034bad604ce035afb
> remote: *** Subject: Rename tests in jit sub-folder.
> remote: ***
> remote: *** The matching files are:
> remote: ***
> remote: *** gcc/testsuite/jit.dg/test-quadratic.C
> remote: *** gcc/testsuite/jit.dg/test-quadratic.c
> remote: ***
> remote: *** gcc/testsuite/jit.dg/test-switch.C
> remote: *** gcc/testsuite/jit.dg/test-switch.c
> remote: ***
> remote: *** gcc/testsuite/jit.dg/test-asm.C
> remote: *** gcc/testsuite/jit.dg/test-asm.c
> remote: ***
> remote: *** gcc/testsuite/jit.dg/test-alignment.C
> remote: *** gcc/testsuite/jit.dg/test-alignment.c
> 
> Can we please do something about it?
> 
> Thanks,
> Martin
> 


Re: reordering of trapping operations and volatile

2022-01-14 Thread Michael Matz via Gcc
Hello,

On Thu, 13 Jan 2022, Martin Uecker wrote:

> > > >  Handling all volatile accesses in the very same way would be 
> > > > possible but quite some work I don't see much value in.
> > > 
> > > I see some value. 
> > > 
> > > But an alternative could be to remove volatile
> > > from the observable behavior in the standard
> > > or make it implementation-defined whether it
> > > is observable or not.
> > 
> > But you are actually arguing for making UB be observable
> 
> No, I am arguing for UB not to have the power
> to go back in time and change previous defined
> observable behavior.

But right now that's equivalent to making it observable,
because we don't have any other terms than observable or
undefined.  As aluded to later you would have to
introduce a new concept, something pseudo-observable,
which you then started doing.  So, see below.
 
> > That's 
> > much different from making volatile not be
> > observable anymore (which  obviously would
> > be a bad idea), and is also much harder to
> 
> I tend to agree that volatile should be
> considered observable. But volatile is
> a bit implementation-defined anyway, so this
> would be a compromise so that implementations
> do not have to make all the implied changes
> if we revise the meaning of UB.

Using volatile accesses for memory mapped IO is a much stronger use-case 
than your wish of using volatile accesses to block moving of UB as a 
debugging aid, and the former absolutely needs some guarantees, so I don't 
think it would be a compromise at all.  Mkaing volatile not be observable 
would break the C language.

> > Well, what you _actually_ want is an implied
> > dependency between some UB and volatile accesses
> > (and _only_ those, not e.g. with other UB), and the 
> > difficulty now is to define "some" and to create
> > the dependency without making that specific UB
> > to be properly observable. 
> 
> Yes, this is what I actually want.
> 
> >  I think to define this 
> > all rigorously seems futile (you need a new
> > category between observable  and UB), so it comes
> > down to compiler QoI on a case by case basis.
> 
> We would simply change UB to mean "arbitrary
> behavior at the point of time the erraneous
> construct is encountered at run-time"  and 
> not "the complete program is invalid all
> together". I see no problem in specifying this
> (even in a formally precise way)

First you need to define "point in time", a concept which doesn't exist 
yet in C.  The obvious choice is of course observable behaviour in the 
execution environment and its specified ordering from the abstract 
machine, as clarified via sequence points.  With that your "at the point 
in time" becomes something like "after all side effects of previous 
sequence point, but strictly before all side effects of next sequence 
point".

But doing that would have very far reaching consequences, as already
stated in this thread.  The above would basically make undefined behaviour 
be reliably countable, and all implementations would need to produce the 
same counts of UB.  That in turn disables many code movement and 
commonization transformations, e.g. this:

int a = ..., b = ...;
int x = a + b;
int y = a + b;

can't be transformed into "y = x = a + b" anymore, because the addition 
_might_ overflow, and if it does you have two UBs originally but would 
have one UB after.  I know that you don't want to inhibit this or similar 
transformations, but that would be the result of making UB countable, 
which is the result of forcing UB to happen at specific points in time.  
So, I continue to see problems in precisely specifying what you want, _but 
not more_.

I think all models in which you order the happening of UB with respect to 
existing side effects (per abstract machine, so it includes modification 
of objects!) have this same problem, it always becomes a side effect 
itself (one where you don't specify what actually happens, but a side 
effect nontheless) and hence becomes observable.


Ciao,
Michael.


Re: reordering of trapping operations and volatile

2022-01-13 Thread Michael Matz via Gcc
Hello,

On Tue, 11 Jan 2022, Martin Uecker via Gcc wrote:

> >  Handling all volatile accesses in the
> > very same way would be possible but quite some work I don't
> > see much value in.
> 
> I see some value. 
> 
> But an alternative could be to remove volatile
> from the observable behavior in the standard
> or make it implementation-defined whether it
> is observable or not.

But you are actually arguing for making UB be observable (which then 
directly implies an ordering with respect to volatile accesses).  That's 
much different from making volatile not be observable anymore (which 
obviously would be a bad idea), and is also much harder to do, it's 
the nature of undefined behaviour to be hard to define :)

Well, what you _actually_ want is an implied dependency between some UB 
and volatile accesses (and _only_ those, not e.g. with other UB), and the 
difficulty now is to define "some" and to create the dependency without 
making that specific UB to be properly observable.  I think to define this 
all rigorously seems futile (you need a new category between observable 
and UB), so it comes down to compiler QoI on a case by case basis.


Ciao,
Michael.


Re: environment register / ABI

2021-10-14 Thread Michael Matz via Gcc
Hello,

On Wed, 13 Oct 2021, Martin Uecker wrote:

> > [... static chain ...]
> > If you mean that, then it's indeed psABI specific, and possibly not
> > al ABIs specify it (in which case GCC will probably have set a de-
> > facto standard at least for unixy systems).  The x86-64 psABI for
> > instance does specify a  register for this, which is separate from
> > the normal argument passing registers.  Other psABIs could say that
> > it's passed like a hidden  argument prepended to the formal list of
> > args.
> > 
> 
> Yes, most architecture seem to define a register. I am wondering
> if there is a table or summary somewhere.

Not that I know of, and I doubt it exists.  The most comprehensive is 
probably the result of (from within gcc sources):

% grep 'define.*STATIC_CHAIN_REG' config/*/*.[ch]

(that get's you all archs of GCC that support a static chain in registers, 
and it's often very obvious from above result which one it is), plus the 
result of

% grep TARGET_STATIC_CHAIN config/*/*.[ch]

(that get's you the few targets that don't necessarily use a reg for the 
static chain, but e.g. a stack slot or a different register depending on 
circumstances.  These are only i386, moxie and xtensa currently, but you 
need to inspect the target hook function to determine when which place is 
used, i.e. not as obvious as above).

> > Or do you mean something else entirely?  It might also help to know 
> > the purpose of your question :)
> 
> There is currently no standard way to set or query
> the static chain from C although this is used by
> many other languages. Also function pointers in C
> usually can not store the static chain. I am going
> to propose to WG14 to add some kind of wide function
> pointer to address this.  I am doing back ground
> research to understand whether this exists everywhere.

I see.  Is that sensible without C itself having the possibility to write 
nested functions?  There are other, more obvious (for C!) reasons to have 
wide function pointers: shared libs often are implemented such that the 
static data of a library is reachable by a pointer (often called GOT 
pointer, or PIC register or similar terms), so calling an indirect 
function needs to setup that GOT pointer plus contain the function address 
itself.  This is often implemented either by setup code in the function 
prologue or by using function descriptors, or by an extra entry point 
containing that setup.  Fat function pointers (which effectively are 
then function descriptors) could contain that as well. (it will be very 
target dependend how that would be filled, but that is the case with 
static chains as well).

There's another case for fat function pointers: C++ virtual methods: 
unbound pointers to them will just be type-id plus vtable index (in the 
usual implementations of C++), bound pointers will be a function address 
plus this pointer.

There may be more items that can be imagined to be stuffed into a fat 
function pointer.

So, I'm wondering what you are pondering about, to which extend you want 
to go with fat function pointers, what the usecases will be, i.e. which 
problem you want to solve :)


Ciao,
Michael.


Re: environment register / ABI

2021-10-13 Thread Michael Matz via Gcc
Hello,

On Wed, 13 Oct 2021, Martin Uecker wrote:

> does anybody know if all architectures support passing
> an environment pointer in their function call ABI? 

Define "environment pointer".  I can imagine two things you could mean: 
the custom to pass envp as third argument to main() in hosted C programs:

  int main (int argc, char *argv[], char *envp[]);

but then this is specific to main and more a question of process 
initialization than function call ABI.  If you mean this the answer will 
most often be: if envp is passed to main (a question of the operating 
system or runtime environment, e.g. if there's something like an 
environment in the getenv() sense to start with), then it is passed like 
any other third argument of pointer type on the psABI in question, and 
that definition would be independend of the psABI.

Or you could mean what normally would be called 'static chain', i.e. a 
pointer to the stack frame of outer functions for languages supporting 
nested functions.  I could imagine this also be called environment.  If 
you mean that, then it's indeed psABI specific, and possibly not all ABIs 
specify it (in which case GCC will probably have set a de-facto standard 
at least for unixy systems).  The x86-64 psABI for instance does specify a 
register for this, which is separate from the normal argument passing 
registers.  Other psABIs could say that it's passed like a hidden 
argument prepended to the formal list of args.

Or do you mean something else entirely?  It might also help to know the 
purpose of your question :)


Ciao,
Michael.


Re: S390 should change the meaning of -m31

2021-09-30 Thread Michael Matz via Gcc
Hello,

On Wed, 29 Sep 2021, Jesus Antonio via Gcc wrote:

> m31 is semantically the same as the m32 option.
> 
> 
> The m31 option allows for 32 bit addressing and that is confusing since 
> the m31 option in S390 would mean 2 GiB space addressing

Indeed that's exactly what it means, and what it's supposed to mean.  On 
s390, in AMODE(31) the toplevel bit of an (32bit) address is either 
ignored or an indicator to switch back to 24bit addresses from the s360 
times.  Either way that leaves 31 bits to generate the virtual address.  
On s390 you indeed have a 2GB address space, not more.

> Code used:
> 
>     volatile uint64_t *gib_test = (volatile uint64_t *)0x7FFF;
>     memset(gib_test, 1, 4096);
> 
> 
> Hercules dump:
> 
> r 0x7FFF-0x81FF
> R:7FFF:K:06=01 .

I'm not sure what you believe to have demonstrated here.  The (virtual or 
physical) address 0x7FFF is either (in AMODE(24)) equivalent to 
0x00ff or to 0x (in AMODE(31)), either way, the top byte of 
the addressable range ...

> R:800F:K:06=01 01010101 01010101 01010101 010101 

... while address 0x8001 is equivalent to address 0x1 (in AMODE(24) 
and AMODE(31)).  Again, the top bit (or bits in AMODE(24)) are ignored.  
So, you've built a memset that wraps around the line (AMODE(24)) or the 
bar (AMODE(32)).  Perhaps unusual and not what you expected, but as 
designed by IBM.

> The option i used was m31 of course, however this option is misleading 
> since it allows 32 bit mode, and there is no m32 so you have to use m31 
> - just lots of misleading options.

The -mXX options are supposed to reflect the address space's size, not the 
size of the general purpose registers.  An option that reflect AMODE(24) 
would also be called -m24, despite the registers still being 32bit in 
size.


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-09 Thread Michael Matz via Gcc
Hello,

On Thu, 9 Sep 2021, Aldy Hernandez wrote:

> > Here there's no simple latch block to start with (the backedge comes
> > directly out of the loop exit block).  So my suggested improvement
> > (testing if the latch was empty and only then reject the thread), would
> > solve this.
> 
> Well, there's the thing.  Loop discovery is marking BB5 as the latch, so 
> it's not getting threaded:

Yes, it's a latch, but not an empty one.  So the thread would make it just 
even more non-empty, which might not be a problem anymore.  So amending my 
patch somewhere with a strategic

  && empty_block_p (latch)

and only then rejecting the thread should make this testcase work again.

(There's still a catch, though: if this non-empty latch, which is also the 
exit test block, is threaded through and is followed by actual code, then 
that code will be inserted onto the back edge, not into the latch block 
before the exit test, and so also create a (new) non-empty latch.  That 
might or might not create problems downstreams, but not as many as 
transformaing an empty into a non-empty latch would do; but it could 
create for instance a second back edge (not in this testcase) and 
suchlike)

> BTW, I'm not sure your check for the non-last position makes a difference:
> 
> > diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
> > index 449232c7715..528a753b886 100644
> > --- a/gcc/tree-ssa-threadbackward.c
> > +++ b/gcc/tree-ssa-threadbackward.c
> > -   threaded_through_latch = true;
> > +   {
> > + threaded_through_latch = true;
> > + if (j != 0)
> > +   latch_within_path = true;
> > + if (dump_file && (dump_flags & TDF_DETAILS))
> > +   fprintf (dump_file, " (latch)");
> > +   }
> >  }
> 
> If the last position being considered is a simple latch, it only has a simple
> outgoing jump.  There's no need to thread that.  You need a block with >= 2
> succ edges to thread anything.

So, you are saying that any candidate thread path wouldn't have the latch 
in the last position if it were just an empty forwarder?  I simply wasn't 
sure about that, so was conservative and only wanted to reject things I 
knew where positively bad (namely code in the path following the latch 
that is in danger of being moved into the latch).


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-09 Thread Michael Matz via Gcc
Hello,

On Thu, 9 Sep 2021, Aldy Hernandez wrote:

> The ldist-22 regression is interesting though:
> 
> void foo ()
> {
>   int i;
> 
>:
>   goto ; [INV]
> 
>:
>   a[i_1] = 0;
>   if (i_1 > 100)
> goto ; [INV]
>   else
> goto ; [INV]
> 
>:
>   b[i_1] = i_1;
> 
>:
>   i_8 = i_1 + 1;
> 
>:
>   # i_1 = PHI <0(2), i_8(5)>
>   if (i_1 <= 1023)
> goto ; [INV]
>   else
> goto ; [INV]

Here there's no simple latch block to start with (the backedge comes 
directly out of the loop exit block).  So my suggested improvement 
(testing if the latch was empty and only then reject the thread), would 
solve this.

> Would it be crazy to suggest that we disable threading through latches 
> altogether,

I think it wouldn't be crazy, but we can do a bit better as suggested 
above (only reject empty latches, and reject it only for the threaders 
coming before the loop optims).


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-09 Thread Michael Matz via Gcc
Hello,

On Thu, 9 Sep 2021, Richard Biener wrote:

> > I believe something like below would be appropriate, it disables 
> > threading if the path contains a latch at the non-last position (due 
> > to being backwards on the non-first position in the array).  I.e. it 
> > disables rotating the loop if there's danger of polluting the back 
> > edge.  It might be improved if the blocks following (preceding!) the 
> > latch are themself empty because then no code is duplicated.  It might 
> > also be improved if the latch is already non-empty.  That code should 
> > probably only be active before the loop optimizers, but currently the 
> > backward threader isn't differentiating between before/after 
> > loop-optims.
> >
> > I haven't tested this patch at all, except that it fixes the testcase 
> > :)
> 
> Lame comment at the current end of the thread - it's not threading 
> through the latch but threading through the loop header that's 
> problematic,

I beg to differ, but that's probably because of the ambiguity of the word 
"through" (does it or does it not include the ends of the path :) ).  If 
you thread through the loop header from the entry block (i.e. duplicate 
code from header to entry) all would be fine (or not, in case you created 
multiple entries from outside).  If you thread through the latch, then 
through an empty header and then through another non-empty basic block 
within the loop, you still wouldn't be fine: you've just created code on 
the back edge (and hence into the new latch block).  If you thread through 
the latch and through an empty header (and stop there), you also are fine.

Also note that in this situation you do _not_ create new entries into the 
loop, not even intermediately.  The old back edge is the one that goes 
away due to the threading, the old entry edge is moved comletely out of 
the loop, the edge header->thread-dest becomes the new entry edge, and the 
edge new-latch->thread-dest becomes the back edge.  No additional entries.

So, no, it's not the threading through the loop header that is problematic 
but creating a situation that fills the (new) latch with code, and that 
can only happen if the candidate thread contains the latch block.

(Of course it's somewhat related to the header block as well, because that 
is simply the only destination the latch has, and hence is usually 
included in any thread that also include the latch; but it's not the 
header that indicates the situation).

> See tree-ssa-threadupdate.c:thread_block_1
> 
>   e2 = path->last ()->e;
>   if (!e2 || noloop_only)
> {
>   /* If NOLOOP_ONLY is true, we only allow threading through the
>  header of a loop to exit edges.  */
> 
>   /* One case occurs when there was loop header buried in a jump
>  threading path that crosses loop boundaries.  We do not try
>  and thread this elsewhere, so just cancel the jump threading
>  request by clearing the AUX field now.  */
>   if (bb->loop_father != e2->src->loop_father
>   && (!loop_exit_edge_p (e2->src->loop_father, e2)
>   || flow_loop_nested_p (bb->loop_father,
>  e2->dest->loop_father)))
> {
>   /* Since this case is not handled by our special code
>  to thread through a loop header, we must explicitly
>  cancel the threading request here.  */
>   delete_jump_thread_path (path);
>   e->aux = NULL;
>   continue;
> }

Yeah, sure, but I'm not sure if the above check is _really_ testing it 
wants to test or if the effects it achieves are side effects.  Like in my 
proposed patch: I could also test for existence of loop header in the 
thread and reject that; it would work as well, except that it works 
because any useful thread including a latch (which is the problematic one) 
also includes the header.  I'm not sure if the above check is in the same 
line, or tests for some still another situation.


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-08 Thread Michael Matz via Gcc
Hello,

[lame answer to self]

On Wed, 8 Sep 2021, Michael Matz wrote:

> > > The forward threader guards against this by simply disallowing 
> > > threadings that involve different loops.  As I see
> > 
> > The thread in question (5->9->3) is all within the same outer loop, 
> > though. BTW, the backward threader also disallows threading across 
> > different loops (see path_crosses_loops variable).
...
> Maybe it's possible to not disable threading over latches alltogether in 
> the backward threader (like it's tried now), but I haven't looked at the 
> specific situation here in depth, so take my view only as opinion from a 
> large distance :-)

I've now looked at the concrete situation.  So yeah, the whole path is in 
the same loop, crosses the latch, _and there's code following the latch 
on that path_.  (I.e. the latch isn't the last block in the path).  In 
particular, after loop_optimizer_init() (before any threading) we have:

   [local count: 118111600]:
  # j_19 = PHI 
  sum_11 = c[j_19];
  if (n_10(D) > 0)
goto ; [89.00%]
  else
goto ; [11.00%]

  [local count: 105119324]:
...

   [local count: 118111600]:
  # sum_21 = PHI 
  c[j_19] = sum_21;
  j_13 = j_19 + 1;
  if (n_10(D) > j_13)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:
  goto ; [100.00%]

With bb9 the outer (empty) latch, bb3 the outer header, and bb8 the 
pre-header of inner loop, but more importantly something that's not at the 
start of the outer loop.

Now, any thread that includes the backedge 9->3 _including_ its 
destination (i.e. where the backedge isn't the last to-be-redirected edge) 
necessarily duplicates all code from that destination onto the back edge.  
Here it's the load from c[j] into sum_11.

The important part is the code is emitted onto the back edge, 
conceptually; in reality it's simply included into the (new) latch block 
(the duplicate of bb9, which is bb12 intermediately, then named bb7 after 
cfg_cleanup).

That's what we can't have for some of our structural loop optimizers: 
there must be no code executed after the exit test (e.g. in the latch 
block).  (This requirement makes reasoning about which code is or isn't 
executed completely for an iteration trivial; simply everything in the 
body is always executed; e.g. loop interchange uses this to check that 
there are no memory references after the exit test, because those would 
then be only conditional and hence make loop interchange very awkward).

Note that this situation can't be later rectified anymore: the duplicated 
instructions (because they are memory refs) must remain after the exit 
test.  Only by rerolling/unrotating the loop (i.e. noticing that the 
memory refs on the loop-entry path and on the back edge are equivalent) 
would that be possible, but that's something we aren't capable of.  Even 
if we were that would simply just revert the whole work that the threader 
did, so it's better to not even do that to start with.

I believe something like below would be appropriate, it disables threading 
if the path contains a latch at the non-last position (due to being 
backwards on the non-first position in the array).  I.e. it disables 
rotating the loop if there's danger of polluting the back edge.  It might 
be improved if the blocks following (preceding!) the latch are themself 
empty because then no code is duplicated.  It might also be improved if 
the latch is already non-empty.  That code should probably only be active 
before the loop optimizers, but currently the backward threader isn't 
differentiating between before/after loop-optims.

I haven't tested this patch at all, except that it fixes the testcase :)


Ciao,
Michael.

diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 449232c7715..528a753b886 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -600,6 +600,7 @@ back_threader_profitability::profitable_path_p (const 
vec &m_path,
   loop_p loop = m_path[0]->loop_father;
   bool path_crosses_loops = false;
   bool threaded_through_latch = false;
+  bool latch_within_path = false;
   bool multiway_branch_in_path = false;
   bool threaded_multiway_branch = false;
   bool contains_hot_bb = false;
@@ -725,7 +726,13 @@ back_threader_profitability::profitable_path_p (const 
vec &m_path,
 the last entry in the array when determining if we thread
 through the loop latch.  */
   if (loop->latch == bb)
-   threaded_through_latch = true;
+   {
+ threaded_through_latch = true;
+ if (j != 0)
+   latch_within_path = true;
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, " (latch)");
+   }
 }
 
   gimple *stmt = get_gimple_control_stmt (m_path[0])

Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-08 Thread Michael Matz via Gcc
Hello,

On Wed, 8 Sep 2021, Aldy Hernandez wrote:

> > The forward threader guards against this by simply disallowing 
> > threadings that involve different loops.  As I see
> 
> The thread in question (5->9->3) is all within the same outer loop, 
> though. BTW, the backward threader also disallows threading across 
> different loops (see path_crosses_loops variable).
> 
> > the threading done here should be 7->3 (outer loop entry) to bb 8 
> > rather than one involving the backedge.  Also note the condition is 
> > invariant in the loop and in fact subsumed by the condition outside of 
> > the loop and it should have been simplified by VRP after pass_ch but I 
> > see there's a jump threading pass inbetween pass_ch and the next VRP 
> > which is likely the problem.
> 
> A 7->3->8 thread would cross loops though, because 7 is outside the 
> outer loop.

...
 
> However, even if there are alternate ways of threading this IL, 
> something like 5->9->3 could still happen.  We need a way to disallow 
> this.  I'm having a hard time determining the hammer for this.  I would 
> vote for disabling threading through latches, but it seems the backward 
> threader is aware of this scenario and allows it anyhow (see 
> threaded_through_latch variable).  Ughh.

The backward threader seems to want to be careful with latches, but still 
allow it in some situations, in particular when doing so doesn't create a 
loop with non-simple latches (which is basically a single and empty latch 
block).  If this improvement under discussion leads to creating a 
non-empty latch then those checks aren't restrictive enough (anymore).

I think threading through a latch is always dubious regarding the loop 
structure, it's basically either loop rotation or iteration peeling, even 
if it doesn't cause non-simple latches.  Those transformations should 
probably be left to a loop optimizer, or be only done when destroying loop 
structure is fine (i.e. late).

Maybe it's possible to not disable threading over latches alltogether in 
the backward threader (like it's tried now), but I haven't looked at the 
specific situation here in depth, so take my view only as opinion from a 
large distance :-)

Does anything break if you brutally disable any backward threading when 
any of the involved blocks is a latch when current_loops is set?  (I guess 
for that latter test to be effective you want to disable the 
loop_optimizer_init() for the "late" jump thread passes)


Ciao,
Michael.


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-07 Thread Michael Matz via Gcc
Hello,

On Tue, 7 Sep 2021, Aldy Hernandez via Gcc wrote:

> The regression comes from the simple_reduc_1() function in
> tree-ssa/loop-interchange-9.c, and it involves the following path:
> 
> === BB 5 
> Imports: n_10(D)  j_19
> Exports: n_10(D)  j_13  j_19
>  j_13 : j_19(I)
> n_10(D)   int [1, 257]
> j_19  int [0, 256]
> Relational : (j_13 > j_19)
>  [local count: 118111600]:
> # sum_21 = PHI 
> c[j_19] = sum_21;
> j_13 = j_19 + 1;
> if (n_10(D) > j_13)
>   goto ; [89.00%]
> else
>   goto ; [11.00%]

So, this is the outer loops exit block ...

> === BB 9 
> n_10(D)   int [2, 257]
> j_13  int [1, 256]
> Relational : (n_10(D) > j_19)
> Relational : (n_10(D) > j_13)
>  [local count: 105119324]:
> goto ; [100.00%]

... this the outer loops latch block ...

> === BB 3 
> Imports: n_10(D)
> Exports: n_10(D)
> n_10(D)   int [1, +INF]
>  [local count: 118111600]:
> # j_19 = PHI 
> sum_11 = c[j_19];
> if (n_10(D) > 0)
>   goto ; [89.00%]
> else
>   goto ; [11.00%]

... and this the outer loops header, as well as inner loops pre-header.
The attempted thread hence involves a back-edge (of the outer loop) and a 
loop-entry edge (bb3->bb8) of the inner loop.  Doing that threading would 
destroy some properties that our loop optimizers rely on, e.g. that the 
loop header of a natural loop is only reached by two edges: entry edge and 
back edge, and that the latch blocks are empty and that there's only a 
single latch.  (What exactly we require depends on several flags in 
loops_state_satisfies_p).

> With knowledge from previous passes (SSA_NAME_RANGE_INFO), we know that 
> the loop cannot legally iterate outside the size of c[256].  So, j_13 
> lies within [1, 257] and n_10 is [2, +INF] at the end of the path.  
> This allows us to thread BB3 to BB8.

So, IIUC doing this threading would create a new entry to BB8: it would 
then be entered by its natural entry (bb3), by its natural back edge 
(whatever bb that is now) and the new entry from the threaded outer back 
edge (bb9 or bb5?).

The inner loop wouldn't then be recognized as natural anymore and the 
whole nest not as perfect, and hence loop interchange can't easily happen 
anymore.  Most other structural loop optimizers of us would have problems 
with that situation as well.

> All the blocks lie within the same loop, and the path passes all the 
> tests in path_profitable_p().
> 
> Is there something about the shape of this path that should make it 
> invalid in the backward threader, or should the test be marked with 
> -fdisable-tree-threadN (or something else entirely)?

This is a functionality test checking that the very necessary interchange 
in this test does happen with default plus -floop-interchange (with the 
intention of it being enabled by O3 or with profile info).  So no 
additional flags can be added without changing the meaning of this test.

> Note that the 
> backward threader is allowed to peel through loop headers.

Something needs to give way in the path threaders before loop 
optimizations: either threading through back edges, through loop latches 
or through loop headers needs to be disabled.  I think traditionally at 
least threading through latches should be disabled, because doing so 
usually destroys simple loop structure.  I see that profitable_path_p() of 
the backward threader wants to be careful in some situations involving 
loops and latches; possibly it's not careful enough yet for the additional 
power brought by ranger.

See also the additional tests tree-cfgcleanup.c:tree_forwarder_block_p is 
doing when loops are active.

After the structural loop optimizers the threader can go wild and thread 
everything it finds.


Ciao,
Michael.


Re: post-commit hook failure

2021-08-25 Thread Michael Matz via Gcc
Hello,

On Wed, 25 Aug 2021, Martin Liška wrote:

> > remote:   File "hooks/post_receive.py", line 47, in post_receive_one
> > remote: update.send_email_notifications()
> > remote:   File
> > "/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py",
...
> > remote: UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in
> > position 14638: invalid start byte
...
> I believe ChangeLog will be updated correctly as we don't read content 
> of the changes:

But the email notifications (and bugzilla updating) isn't done if that 
place throws, so that should eventually be made more robust in the future.


Ciao,
Michael.


Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Michael Matz via Gcc
Hello,

On Fri, 6 Aug 2021, Stefan Kanthak wrote:

> >> For -ffast-math, where the sign of -0.0 is not handled and the 
> >> spurios invalid floating-point exception for |argument| >= 2**63 is 
> >> acceptable,
> > 
> > This claim would need to be proven in the wild.
> 
> I should have left the "when" after the "and" which I originally had 
> written...
> 
> > |argument| > 2**52 are already integer, and shouldn't generate a 
> > spurious exception from the various to-int conversions, not even in 
> > fast-math mode for some relevant set of applications (at least 
> > SPECcpu).
> > 
> > Btw, have you made speed measurements with your improvements?
> 
> No.
> 
> > The size improvements are obvious, but speed changes can be fairly 
> > unintuitive, e.g. there were old K8 CPUs where the memory loads for 
> > constants are actually faster than the equivalent sequence of shifting 
> > and masking for the >= compares.  That's an irrelevant CPU now, but it 
> > shows that intuition about speed consequences can be wrong.
> 
> I know. I also know of CPUs that can't load a 16-byte wide XMM register 
> in one go, but had to split the load into 2 8-byte loads.
> 
> If the constant happens to be present in L1 cache, it MAY load as fast
> as an immediate.
> BUT: on current CPUs, the code GCC generates
> 
> movsd  .LC1(%rip), %xmm2
> movsd  .LC0(%rip), %xmm4
> movapd %xmm0, %xmm3
> movapd %xmm0, %xmm1
> andpd  %xmm2, %xmm3
> ucomisd %xmm3, %xmm4
> jbe38 <_trunc+0x38>
>  
> needs
> - 4 cycles if the movsd are executed in parallel and the movapd are
>   handled by the register renamer,
> - 5 cycles if the movsd and the movapd are executed in parallel,
> - 7 cycles else,
> plus an unknown number of cycles if the constants are not in L1.

You also need to consider the case that the to-int converters are called 
in a loop (which ultimately are the only interesting cases for 
performance), where it's possible to load the constants before the loop 
and keep them in registers (at the expense of two register pressure of 
course) effectively removing the loads from cost considerations.  It's all 
tough choices, which is why stuff needs to be measured in some contexts 
:-)

(I do like your sequences btw, it's just not 100% clearcut that they are 
always a speed improvement).


Ciao,
Michael.

> The proposed
> 
> movq   rax, xmm0
> addrax, rax
> shrrax, 53
> cmpeax, 53+1023
> jaereturn
> 
> needs 5 cycles (moves from XMM to GPR are AFAIK not handled by the
> register renamer).
> 
> Stefan
> 


Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

2021-08-06 Thread Michael Matz via Gcc
Hello,

On Fri, 6 Aug 2021, Stefan Kanthak wrote:

> For -ffast-math, where the sign of -0.0 is not handled and the spurios
> invalid floating-point exception for |argument| >= 2**63 is acceptable,

This claim would need to be proven in the wild.  |argument| > 2**52 are 
already integer, and shouldn't generate a spurious exception from the 
various to-int conversions, not even in fast-math mode for some relevant 
set of applications (at least SPECcpu).

Btw, have you made speed measurements with your improvements?  The size 
improvements are obvious, but speed changes can be fairly unintuitive, 
e.g. there were old K8 CPUs where the memory loads for constants are 
actually faster than the equivalent sequence of shifting and masking for 
the >= compares.  That's an irrelevant CPU now, but it shows that 
intuition about speed consequences can be wrong.


Ciao,
Michael.


Re: Optional machine prefix for programs in for -B dirs, match ing Clang

2021-08-05 Thread Michael Matz via Gcc
Hello,

On Wed, 4 Aug 2021, John Ericson wrote:

> On Wed, Aug 4, 2021, at 10:48 AM, Michael Matz wrote:
> > ... the 'as' and 'ld' executables should be simply found within the 
> > version and target specific GCC libexecsubdir, possibly by being symlinks 
> > to whatever you want.  That's at least how my crosss are configured and 
> > installed, without any --with-{as,ld} options.
> 
> Yes that does work, and that's probably the best option today. I'm just 
> a little wary of unprefixing things programmatically.

The libexecsubdir _is_ the prefix in above case :)

> For some context, this is NixOS where we assemble a ton of cross 
> compilers automatically and each package gets its own isolated many FHS. 
> For that reason I would like to eventually avoid the target-specific 
> subdirs entirely, as I have the separate package trees to disambiguate 
> things. Now, I know that exact same argument could also be used to say 
> target prefixing is also superfluous, but eventually things on the PATH 
> need to be disambiguated.

Sure, which is why (e.g.) cross binutils do install with an arch prefix 
into ${bindir}.  But as GCC has the capability to look into libexecsubdir 
for binaries as well (which quite surely should never be in $PATH on any 
system), I don't see the conflict.

> There is no requirement that the libexec things be named like the bin 
> things, but I sort of feel it's one less thing to remember and makes 
> debugging easier.

Well, the naming scheme of binaries in libexecsubdir reflects the scheme 
that the compilers are using: cc1, cc1plus etc.  Not 
aarch64-unknown-linux-cc1.

> I am sympathetic to the issue that if GCC accepts everything Clang does 
> and vice-versa, we'll Postel's-law ourselves ourselves over time into 
> madness as mistakes are accumulated rather than weeded out.

Right.  I supposed it wouldn't hurt to also look for "${targettriple}-as" 
in $PATH before looking for 'as' (in $PATH).  But I don't think we can (or 
should) switch off looking for 'as' in libexecsubdir.  I don't even see 
why that behaviour should depend on an option, it could just be added by 
default.

> I now have some patches for this change I suppose I could also submit.

Even better :)


Ciao,
Michael.


Re: Re: Optional machine prefix for programs in for -B dirs, match ing Clang

2021-08-04 Thread Michael Matz
Hello,

On Wed, 4 Aug 2021, John Ericson wrote:

> > Doesn't GCC automatically look for those commands in the --prefix 
> > directory that you configure GCC with? Or is that only for native 
> > compilers?
> 
> It will search only if --with-*=... was not passed, and it will never 
> prefix the query. So yes in practice for cross compilers people do the 
> --with-* 

Hmm, no?  Because as you said ...

> I think the solution is to stop making cross compilers rely on these 
> --with-flags to do the obvious things. Executables like `collect2` 
> hidden within a libexesubdir (libexec/gcc//) have no 
> need for prefixing, but the assembler and linker are very much 
> public-facing executables in their own right, and usually are prefixed.

... the 'as' and 'ld' executables should be simply found within the 
version and target specific GCC libexecsubdir, possibly by being symlinks 
to whatever you want.  That's at least how my crosss are configured and 
installed, without any --with-{as,ld} options.

> and no searching happens, and for native compilers no one 
> bothers and searching does happen. But to be a pedant strictly speaking 
> the behavior is independent of whether the compiler is host == target or 
> not.


Ciao,
Michael.


Re: Disabling TLS address caching to help QEMU on GNU/Linux

2021-07-22 Thread Michael Matz
Hello,

On Thu, 22 Jul 2021, Richard Biener via Gcc wrote:

> But how does TLS usage transfer between threads?  On the gimple level 
> the TLS pointer is not visible and thus we'd happily CSE its address:

Yes.  All take-address operations then need to be encoded explicitely with 
a non-CSE-able internal function (or so):

  &x --> __ifn_get_tls_addr(&x);

(&x in the argument just so that it's clear that it doesn't access the 
value at x and to get the current effects of address-taken marking of 
ADDR_EXPR).

(Or of course, ADDR_EXPR could be taken as unstable when applied to TLS 
decls).

Quite a big hammer IMHO.


Ciao,
Michael.


Re: [PATCH] Port GCC documentation to Sphinx

2021-07-01 Thread Michael Matz
Hello,

On Thu, 1 Jul 2021, Martin Liška wrote:

> On 7/1/21 3:33 PM, Eli Zaretskii wrote:
> > > Cc: jos...@codesourcery.com, gcc@gcc.gnu.org, gcc-patc...@gcc.gnu.org
> > > From: Martin Liška 
> > > Date: Thu, 1 Jul 2021 14:44:10 +0200
> > > 
> > > > It helps some, but not all of the issues disappear.  For example,
> > > > stuff like this is still hard to read:
> > > > 
> > > > To select this standard in GCC, use one of the options -ansi
> > > >-
> > > > -std.‘=c90’ or -std.‘=iso9899:1990’
> > > >    
> > > 
> > > If I understand the notes correct, the '.' should be also hidden by e.g.
> > > Emacs.
> > 
> > No, it doesn't.  The actual text in the Info file is:
> > 
> > *note -std: f.‘=iso9899:1990’
> > 
> > and the period after " f" isn't hidden.  Where does that "f" come from
> > and what is its purpose here? can it be removed (together with the
> > period)?
> 
> It's name of the anchor used for the @ref. The names are automatically
> generated
> by makeinfo. So there's an example:
> 
> This is the warning level of @ref{e,,-Wshift-overflow3} and …
> 
> becomes in info:
> This is the warning level of *note -Wshift-overflow3: e. and …
> 
> I can ask the question at Sphinx, the Emacs script should hide that.

Not everyone reads info within Emacs; even if there's an emacs solution to 
postprocess info pages to make them nicer we can't rely on that.  It must 
look sensible without that.  In this case it seems that already the 
generated .texinfo input to makeinfo is bad, where does the 'e' (or 'f') 
come from?  The original texinfo file simply contains:

  @option{-std=iso9899:1990}

so that's what perhaps should be generated, or maybe the anchor in @ref is 
optional and could be left out if it doesn't provide any info (a single 
character as anchor doesn't seem very useful)?

> > > > > We can adjust 'emph' formatting to nil, what do you think?
> > > > 
> > > > Something like that, yes.  But the problem is: how will you format it
> > > > instead?  The known alternatives, _foo_ and *foo* both use punctuation
> > > > characters, which will get in the way similarly to the quotes.  Can
> > > > you format those in caps, like makeinfo does?
> > > 
> > > You are fully right, info is very simple format and it uses wrapping for
> > > the formatting
> > > purpose (by default * and _). So, I don't have any elegant solution.
> > 
> > Well, it sounds from another mail that you did find a solution: to
> > up-case the string in @var.
> 
> I don't know. Some of them can be e.g. keywords and using upper-case 
> does not seem to me feasible.

Then that needs to be different already in the input, so that the 
directive that (in info) capitalizes is only used in contexts where that 
makes sense.  People reading info pages will know that an all-caps word 
often means a syntactic variable/placeholder, so that should be preserved.


Ciao,
Michael.


Re: [llvm-dev] RFC: Add GNU_PROPERTY_UINT32_AND_XXX/GNU_PROPERTY_UINT32_OR_XXX

2021-06-22 Thread Michael Matz
Hello,

On Tue, 22 Jun 2021, H.J. Lu wrote:

> > > The issue is that libfoo.so used in link-time can be different from
> > > libfoo.so at run-time.  The symbol, foobar, in libfoo.so at link-time
> > > has the default visibility.  But foobar in libfoo.so at run-time can be
> > > protected.  ld.so should detect such cases which can lead to run-time
> > > failures.
> >
> > Yes, but I think we can _unconditionally_ give an error in this case, even
> > without a marker.  I view restricting visiblity of a symbol in furture
> 
> Unconditionally issuing an error can be an option, but mandatory.
> Otherwise working binary today will fail to run tomorrow.

Like a binary that's working today will fail tomorrow if the updated 
shared lib simply removes a symbol from it's export without proper 
versioning.  I don't see a material difference.

> > versions of shared libraries to be an ABI change, hence it has to be
> > something that either requires a soname bump or at the very least symbol
> 
> To support existing binaries, we need a soname bump.
> 
> > versioning with the old version staying on default visibility.
> 
> Symbol versioning doesn't work here since both symbols are at
> the same address.

I don't see why the address should matter.  My point was that in the end 
the exported symbol table for such shared lib (where foobar was changed 
from default to protected visiblity) should have

 1: 12345 42 FUNC GLOBAL DEFAULT 1 foobar@Old
 2: 12345 42 FUNC GLOBAL PROTECTED 1 foobar@@New

This might require new GAS directives and compiler attributes (or might be 
expressible already).  References from within the shared library would 
need to continue going via the default visiblity symbol, so that it 
supports old binaries containing copy relocs, which just again points out 
that it's a bad idea to restrict visibility after the fact.

AFAICS your scheme also doesn't support old binaries with newly protected 
symbols in newer shared libs, so an (unconditional) error in such 
situation seems appropriate.  IOW: which scenario do you want to not error 
on when you want to make the error conditional?


Ciao,
Michael.


Re: [llvm-dev] RFC: Add GNU_PROPERTY_UINT32_AND_XXX/GNU_PROPERTY_UINT32_OR_XXX

2021-06-21 Thread Michael Matz
Hello,

On Thu, 17 Jun 2021, H.J. Lu via Gcc wrote:

> > > • Disallow copy relocation against definition with the STV_PROTECTED
> > > visibility in the shared library with the marker.
> >
> > If this is for GNU ld x86 only, I'm fine with it:)
> >
> > gold and ld.lld just emit an error unconditionally. I think non-x86
> > GNU ld ports which never support "copy relocations on protected data
> > symbols" may want to make the diagnostic unconditional as well.
> > Well, while (Michael Matz and ) I think compatibility check for "copy
> > relocations on protected data symbols" is over-engineering (and
> > Alan/Cary think it was a mistake), if you still want to add it, it is
> > fine for me...
> > For Clang, I hope we will not emit such a property, because Clang
> > never supports the "copy relocations on protected data symbols"
> > scheme.
> 
> The issue is that libfoo.so used in link-time can be different from 
> libfoo.so at run-time.  The symbol, foobar, in libfoo.so at link-time 
> has the default visibility.  But foobar in libfoo.so at run-time can be 
> protected.  ld.so should detect such cases which can lead to run-time 
> failures.

Yes, but I think we can _unconditionally_ give an error in this case, even 
without a marker.  I view restricting visiblity of a symbol in furture 
versions of shared libraries to be an ABI change, hence it has to be 
something that either requires a soname bump or at the very least symbol 
versioning with the old version staying on default visibility.

Compare the situation to one where the old libfoo.so provided a symbol 
bar, and the new one doesn't (sort of very restricted visiblity).  ld.so 
will unconditionally give an error.  I don't see this situation materially 
different from bar's visibility be changed from default to protected.

> > I think this can be unconditional, because the "pointer equality for 
> > STV_PROTECTED function address in -shared" case hasn't been working 
> > for GNU ld for at least 20 years... Many ports don't even produce a 
> > dynamic relocation.
> 
> True.  But see above.  You may not care about such use cases.  But I 
> believe that ld.so should not knowingly and silently allow such run-time 
> failure to happen.

Agreed, but giving an error message unconditionally wouldn't be silent.


Ciao,
Michael.


Re: git gcc-commit-mklog doesn't extract PR number to ChangeLog

2021-06-17 Thread Michael Matz
Hello,

On Thu, 17 Jun 2021, Martin Sebor via Gcc wrote:

> > The original problem is that the PR wasn't _in the body_ of the commit 
> 
> But I see [PR100085] right there at the end of the _summary line_:

Emphasis mine.


Ciao,
Michael.


Re: GCC documentation: porting to Sphinx

2021-06-01 Thread Michael Matz
Hello,

On Tue, 1 Jun 2021, Martin Liška wrote:

> On 5/31/21 5:49 PM, Michael Matz wrote:
> > Hello Martin,
> > 
> > On Mon, 31 May 2021, Martin Liška wrote:
> > 
> >> I've made quite some progress with the porting of the documentation and
> >> I would like to present it to the community now:
> >> https://splichal.eu/scripts/sphinx/
> >>   Note the documentation is automatically ([1]) generated from texinfo with
> >> a
> >> GitHub workflow ([2]).
> > 
> > One other thing I was recently thinking about, in the Spinx vs. texinfo
> > discussion: locally available documentation browsable/searchable in
> > terminal with info(1) (or equivalents).
> 
> Yes, that's handy.
> 
> > I think the above (i.e. generating .rst from the texinfo file) would 
> > immediately nullify all my worries.  So, just to be extra sure: your 
> > proposal now is to generate the .rst files, and that .texinfo remains 
> > the maintained sources, right?
> 
> No, .texinfo files will be gone. However, Sphinx can output to info 
> format: 
> https://www.sphinx-doc.org/en/master/man/sphinx-build.html#cmdoption-sphinx-build-M

I see, that's good to hear.

> And I've just added the generated Info pages here:
> https://splichal.eu/scripts/sphinx/

Okay, but there's something amiss, just compare a local gcc.info with 
that.  The sphinx generated one seems to only contain command line 
options, but none of the other topics, in particular it seems to contain 
the "Invoking GCC" chapter (and only that) as top-level, and all other 
ones are missing (like "C implementation", "C++ implementation", "C 
extension", and so on).

Looking at gccint.info I also seem quite some confusion, it's unclear to 
me if content is missing or not.  But e.g. the top-level structure has a 
different order (a less logical one, this one is btw. shared with the 
order of the HTML generated docu, so it's probably specific to sphinx 
setup or such).

Ignoring that missing content what is there right now does seem somewhat 
acceptable for local use, though.


Ciao,
Michael.


Re: GCC documentation: porting to Sphinx

2021-05-31 Thread Michael Matz
Hello Martin,

On Mon, 31 May 2021, Martin Liška wrote:

> I've made quite some progress with the porting of the documentation and
> I would like to present it to the community now:
> https://splichal.eu/scripts/sphinx/
>  
> Note the documentation is automatically ([1]) generated from texinfo with a
> GitHub workflow ([2]).

One other thing I was recently thinking about, in the Spinx vs. texinfo 
discussion: locally available documentation browsable/searchable in 
terminal with info(1) (or equivalents).  I think the above (i.e. 
generating .rst from the texinfo file) would immediately nullify all my 
worries.  So, just to be extra sure: your proposal now is to generate the 
.rst files, and that .texinfo remains the maintained sources, right?


Ciao,
Michael.


Re: Successive hoisting and AVAIL_OUT in at least one successor heuristic

2021-05-06 Thread Michael Matz
Hello,

On Thu, 6 May 2021, Prathamesh Kulkarni via Gcc wrote:

> Well, I was thinking of this test-case:
> 
> int f(int cond1, int cond2, int cond3, int x, int y)
> {
>   void f1();
>   void f2(int);
>   void f3();
> 
>   if (cond1)
> f1 ();
>   else
> {
>   if (cond2)
> f2 (x + y);
>   else
> f3 ();
> }
> 
>   return x + y;
> }
> ...
> And during second iteration, it hoists x + y into bb2. So we are 
> effectively hoisting x + y from bb5, bb7 into bb2, which doesn't seem to 
> benefit other two paths (bb2->bb3->bb7 and bb2->bb4->bb6->bb7) since 
> they don't contain x + y.

But bb7 eventually does, so it also doesn't make those paths worse.  
That's the nature of partial redundancy elimination: it doesn't require 
benefits on all paths, only on one of them.  That's in difference to full 
redundancy eliminaton.

As with all hoistings it can increase register pressure.  The counter 
measure to this is not to limit the hoisting (because that can have 
ripple down effects when some hoists don't happen anymore), but rather to 
tackle the register pressure problem somewhen later, in the register 
allocator (or some preparatory pass).  But I'm not even sure if this is 
the reason you're wondering about how PRE hoists.


Ciao,
Michael.


Re: 'walk_stmt_load_store_addr_ops' for non-'gimple_assign_single_p (stmt)'

2021-03-17 Thread Michael Matz
Hello,

On Wed, 17 Mar 2021, Richard Biener wrote:

> > The walk_gimple functions are intended to be used on the SSA form of 
> > gimple (i.e. the one that it is in most of the time).
> 
> Actually they are fine to use pre-SSA.

Structurally, sure.

> They just even pre-SSA distinguish between registers and memory.  

And that's of course the thing.

I probably should have used a different term, but used "SSA rewriting" to 
name the point where this distinction really starts to matter.  Before it 
a binary gimple statement could conceivably contain a non-register in the 
LHS (perhaps not right now, but there's nothing that would inherently 
break with that), and then would include a store that 
walk_stmt_load_store_addr_ops would "miss".

So, yeah, using SSA as name for that was sloppy, it's gimple itself that 
has the invariant of only registers in binary statements.


Ciao,
Michael.

> That's what gimplification honors as well, in 'zzz = r + r2' all 
> operands are registers, otherwise GIMPLE requires loads and stores split 
> into separate stmts not doing any computation.
> 
> It's just less obivous in the dumps (compared to SSA name dumping).
> 
> Richard.
> 
> >  And in that it's
> > not the case that 'zzz = 1' and 'zzz = r + r2' are similar.  The former
> > can have memory as the lhs (that includes static variables, or indirection
> > through pointers), the latter can not.  The lhs of a binary statement is
> > always an SSA name.  A write to an SSA name is not a store, which is why
> > it's not walked for walk_stmt_load_store_addr_ops.
> >
> > Maybe it helps to look at simple C examples:
> >
> > % cat x.c
> > int zzz;
> > void foo(void) { zzz = 1; }
> > void bar(int i) { zzz = i + 1; }
> > % gcc -c x.c -fdump-tree-ssa-vops
> > % cat x.c.*ssa
> > foo ()
> > {
> >:
> >   # .MEM_2 = VDEF <.MEM_1(D)>
> >   zzz = 1;
> >   # VUSE <.MEM_2>
> >   return;
> > }
> >
> > bar (int i)
> > {
> >   int _1;
> >
> >:
> >   _1 = i_2(D) + 1;
> >   # .MEM_4 = VDEF <.MEM_3(D)>
> >   zzz = _1;
> >   # VUSE <.MEM_4>
> >   return;
> >
> > }
> >
> > See how the instruction writing to zzz (a global, and hence memory) is
> > going through a temporary for the addition in bar?  This will always be
> > the case when the expression is arithmetic.
> >
> > In SSA form gimple only very few instruction types can be stores, namely
> > calls and stores like above (where the RHS is an unary tree).  If you want
> > to capture writes into SSA names as well (which are more appropriately
> > thought of as 'setting the ssa name' or 'associating the ssa name with the
> > rhs value') you need the per-operand callback indeed.  But that depends on
> > what you actually want to do.
> >
> >
> > Ciao,
> > Michael.
> 


Re: 'walk_stmt_load_store_addr_ops' for non-'gimple_assign_single_p (stmt)'

2021-03-16 Thread Michael Matz
Hello,

On Tue, 16 Mar 2021, Thomas Schwinge wrote:

> >>Indeed, given (Fortran) 'zzz = 1', we produce GIMPLE:
> >>
> >>gimple_assign 
> >>
> >>..., and calling 'walk_stmt_load_store_addr_ops' on that, I see, as
> >>expected, the 'visit_store' callback invoked, with 'rhs' and 'arg':
> >>''.
> >>
> >>However, given (Fortran) 'zzz = r + r2', we produce GIMPLE:
> >>
> >>gimple_assign 

But that's pre-ssa form.  After writing into SSA 'zzz' will be replaced by 
an SSA name, and the actual store into 'zzz' will happen in a store 
instruction.

> >>..., and calling 'walk_stmt_load_store_addr_ops' on that, I see,
> >>unexpectedly, no callback at all invoked: neither 'visit_load', nor
> >>'visit_store' (nor 'visit_address', obviously).
> >
> > The variables involved are registers. You only get called on memory 
> > operands.
> 
> How would I have told that from the 'walk_stmt_load_store_addr_ops'
> function description?  (How to improve that one "to reflect relatity"?)
> 
> But 'zzz' surely is the same in 'zzz = 1' vs. 'zzz = r + r2' -- for the
> former I *do* see the 'visit_store' callback invoked, for the latter I
> don't?

The walk_gimple functions are intended to be used on the SSA form of 
gimple (i.e. the one that it is in most of the time).  And in that it's 
not the case that 'zzz = 1' and 'zzz = r + r2' are similar.  The former 
can have memory as the lhs (that includes static variables, or indirection 
through pointers), the latter can not.  The lhs of a binary statement is 
always an SSA name.  A write to an SSA name is not a store, which is why 
it's not walked for walk_stmt_load_store_addr_ops.

Maybe it helps to look at simple C examples:

% cat x.c
int zzz;
void foo(void) { zzz = 1; }
void bar(int i) { zzz = i + 1; }
% gcc -c x.c -fdump-tree-ssa-vops
% cat x.c.*ssa
foo ()
{
   :
  # .MEM_2 = VDEF <.MEM_1(D)>
  zzz = 1;
  # VUSE <.MEM_2>
  return;
}

bar (int i)
{
  int _1;

   :
  _1 = i_2(D) + 1;
  # .MEM_4 = VDEF <.MEM_3(D)>
  zzz = _1;
  # VUSE <.MEM_4>
  return;

}

See how the instruction writing to zzz (a global, and hence memory) is 
going through a temporary for the addition in bar?  This will always be 
the case when the expression is arithmetic.

In SSA form gimple only very few instruction types can be stores, namely 
calls and stores like above (where the RHS is an unary tree).  If you want 
to capture writes into SSA names as well (which are more appropriately 
thought of as 'setting the ssa name' or 'associating the ssa name with the 
rhs value') you need the per-operand callback indeed.  But that depends on 
what you actually want to do.


Ciao,
Michael.


RE: [EXTERNAL] Re: DWARF Debug Info Relocations (.debug_str STRP references)

2020-12-03 Thread Michael Matz
Hello,

On Tue, 1 Dec 2020, Bill Messmer via Gcc wrote:

> Thank you very much for the help.  I was so fixated on the fact that the 
> .rela.debug* sections were there that I didn't pay attention to the 
> e_type in the ELF header.  Apparently, neither did the library that I 
> was using to parse the DWARF data.
> 
> Interestingly, I have seen other non-RedHat kernel debug images where 
> the kernel is ET_EXEC

vmlinux is always final-linked.

> and there are still .rela.debug* sections present 
> in the image.

Depending on configuration vmlinux is linked with --emit-relocs, which 
causes all relocations, no matter if applied or not, to also be emitted in 
a final link.  That has its uses, but it also confuses most tools, as 
they blindly apply relocations again, even if they aren't from loadable 
segments.

As not much other software uses --emit-relocs, and even in linux it's 
optional and non-default you see these confused tools occuring in the wild 
instead of being fixed.


Ciao,
Michael.

> Though the effect of applying those relocs has always 
> been nil (the data in the original .debug* section is already the same 
> as what the .rela.debug* section indicates to alter).
> 
> Sincerely,
> 
> Bill Messmer
> 
> -Original Message-
> From: Mark Wielaard  
> Sent: Monday, November 30, 2020 6:39 PM
> To: Bill Messmer 
> Cc: gcc@gcc.gnu.org
> Subject: Re: [EXTERNAL] Re: DWARF Debug Info Relocations (.debug_str STRP 
> references)
> 
> Hi Bill,
> 
> On Mon, Nov 30, 2020 at 10:22:34PM +, Bill Messmer wrote:
> 
> > I'm still a bit confused here.  And the reason I ask this is because I 
> > open this particular vmlinux image with an OSS ELF/DWARF library...  
> > which gives me the *WRONG* names for various DWARF DIEs.
> > I stepped through the library...  and the reason the names are wrong 
> > is because the library applies all of the relocations in 
> > .rela.debug_info to the sections it opens.  I thought there might be a 
> > bug in the library somewhere, so I started down looking at the DWARF 
> > data with standard Linux tools and in hex dumps...  and it seemed 
> > incorrect to my -- admittedly limited -- understanding...
> >
> > Yes, I am using llvm-dwarfdump to textualize the DWARF data
> > (llvm-dwarfdump-10 --verbose vmlinux) and I would assume(?) this 
> > applies the relocations as necessary.  And I am using readelf to get 
> > the section data (readelf -S vmlinux) and the relocation data (readelf 
> > -r vmlinuix); however, the hex data I show is just a flat hexdump of 
> > the image (hexdump -C vmlinux -n ... -s ...).
> 
> I traced your steps and did the same on a local vmlinux copy and got the same 
> results as you. That didn't make sense to me. Till I realized my original 
> assumption, that the vmlinux image, like kernel modules were partially linked 
> and so ET_REL files that still needed relocation applied, seemed wrong. The 
> vmlinux file isn't actually ET_REL, but it is ET_EXEC (see readelf -h 
> vmlinux). In which case other tools don't apply the relocations. And so your 
> observation is correct. The offset to the .debug_str table is right in the 
> .debug_info section, the relocations are meaningless. That is surprising.
> 
> > Either both that library and my understanding are incorrect, there is 
> > something wrong with that relocation data, or it flat isn't supposed 
> > to be applied...
> 
> It is the last thing, the aren't supposed to be applied because it is an 
> ET_EXEC file (which isn't supposed to have .rela.debug sections, but 
> apparently has).
> 
> > I also tried what you suggested "eu-strip -- reloc-debug-sections vmlinux 
> > -f stripped" and looked at the resulting output:
> > 
> > "readelf -S stripped" still shows the reloc sections:
> > 
> >   [65] .debug_info   PROGBITS   00059e50
> >0c458644     0 0 1
> >   [66] .rela.debug_info  RELA   0c4b2498
> >1288ae68  0018   I  7865 8
> > 
> > And that relocation is still there via "readelf -r stripped":
> 
> Which now also makes sense, because as the --help text says "only relevant 
> for ET_REL files".
> 
> So you did find a real mystery, for some reason the way the vmlinux image is 
> created does get relocations correctly applied, but they (or at least some) 
> are still left behind in the ELF image even though they are no longer needed 
> (and if you do try to use/apply them, you get wrong results). We should 
> probably find out if this happened during the upstream build or during distro 
> packaging.
> 
> Cheers,
> 
> Mark
> 


Re: [RFC] Increase libstdc++ line length to 100(?) columns

2020-11-30 Thread Michael Matz
Hello,

On Mon, 30 Nov 2020, Allan Sandfeld Jensen wrote:

> > > On Sonntag, 29. November 2020 18:38:15 CET Florian Weimer wrote:
> > > > * Allan Sandfeld Jensen:
> > > > > If you _do_ change it. I would suggest changing it to 120, which is
> > > > > next
> > > > > common step for a lot of C++ projects.
> > > > 
> > > > 120 can be problematic for a full HD screen in portrait mode.  Nine
> > > > pixels per character is not a lot (it's what VGA used), and you can't
> > > > have any window decoration.  With a good font and screen, it's doable.
> > > > But if the screen isn't quite sharp, then I think you wouldn't be able
> > > > to use portrait mode anymore.
> > > 
> > > Using a standard condensed monospace font of 9px, it has a width of 7px,
> > > 120
> > A char width of 7px implies a cell width of at least 8px (so 960px for 120
> > chars), more often of 9px.  With your cell width of 7px your characters
> > will be max 6px, symmetric characters will be 5px, which is really small.
> > 
> I was talking about the full cell width. I tested it before commenting, 
> measuring the width in pixels of a line of text.

Yes, and I was saying that a cell width of 7px is very narrow because the 
characters itself will only be using 5px or 6px max (to leave room for 
inter-character spacing in normal words).  You might be fine with such 
narrow characters, but not everyone will be.


Ciao,
Michael.


Re: [RFC] Increase libstdc++ line length to 100(?) columns

2020-11-30 Thread Michael Matz
Hello,

On Sun, 29 Nov 2020, Allan Sandfeld Jensen wrote:

> On Sonntag, 29. November 2020 18:38:15 CET Florian Weimer wrote:
> > * Allan Sandfeld Jensen:
> > > If you _do_ change it. I would suggest changing it to 120, which is next
> > > common step for a lot of C++ projects.
> > 
> > 120 can be problematic for a full HD screen in portrait mode.  Nine
> > pixels per character is not a lot (it's what VGA used), and you can't
> > have any window decoration.  With a good font and screen, it's doable.
> > But if the screen isn't quite sharp, then I think you wouldn't be able
> > to use portrait mode anymore.
> 
> Using a standard condensed monospace font of 9px, it has a width of 7px, 120 

A char width of 7px implies a cell width of at least 8px (so 960px for 120 
chars), more often of 9px.  With your cell width of 7px your characters 
will be max 6px, symmetric characters will be 5px, which is really small.

> char would take up 940px fitting two windows in horizontal mode and one in 
> vertical. 9px isn't fuzzy, and 8px variants are even narrower.

Well, and if you're fine with a 5px cell-width font then you can even fit 
216 chars on a line in HD portrait mode.  But Florian posed the width of 
9px, and I agree with him that it's not a lot (if my monitor weren't as 
big as it is I would need to use an even wider font for comfortable 
reading, as it is 9px width are exactly right for me, I'm not using 
portrait, though).  So, it's the question if the line lengths should or 
should not cater for this situation.

> Sure using square monospace fonts might not fit, but that is an unusual 
> configuration and easily worked around by living with a non-square monospace 
> font, or accepting occational line overflow. Remember nobody is suggesting 
> every line should be that long, just allowing it to allow better structural 
> indentation.

The occasional line overflow will automatically become the usual case with 
time, space allowed to be filled will eventually be filled.


Ciao,
Michael.


Re: [libgcc2.c] Implementation of __bswapsi2()

2020-11-12 Thread Michael Matz
Hello,

On Thu, 12 Nov 2020, Stefan Kanthak wrote:

> Does GCC generate (unoptimised) code there, similar to the following i386
> assembly, using 4 loads, 4 shifts, 2 ands plus 3 ors?

Try for yourself.  '-m32 -O2 -march=i386' is your friend.


Ciao,
Michael.

Spoiler: it's generating:

movl4(%esp), %eax
rolw$8, %ax
roll$16, %eax
rolw$8, %ax
ret



Re: Problems with changing the type of an ssa name

2020-07-27 Thread Michael Matz
Hello,

On Sat, 25 Jul 2020, Gary Oblock via Gcc wrote:

>   if ( TYPE_P ( type) )
> {
>TREE_TYPE ( ssa_name) = TYPE_MAIN_VARIANT ( type);
>if ( ssa_defined_default_def_p ( ssa_name) )
>   {
>  // I guessing which I know is a terrible thing to do...
>  SET_SSA_NAME_VAR_OR_IDENTIFIER ( ssa_name, TYPE_MAIN_VARIANT ( 
> type));

As the macro name indicates this takes a VAR_DECL, or an IDENTIFIER_NODE.  
You put in a type, that won't work.

You also simply override the type of the SSA name, without also caring for 
the type of the underlying variable that is (potentially!) associated with 
the SSA name; if those two disagree then issues will arise, you have to 
replace either the variables type (not advisable!), or the associated 
variable, with either nothing or a new variable (of the appropriate type), 
or an mere identifier.  Generally you can't modify SSA names in place like 
this, i.e. as Richi says, create new SSA names, replace all occurences of 
one with the other.


Ciao,
Michael.


RE: New x86-64 micro-architecture levels

2020-07-23 Thread Michael Matz
Hello,

On Wed, 22 Jul 2020, Mallappa, Premachandra wrote:

> > That's deliberate, so that we can use the same x86-* names for 32-bit 
> > library selection (once we define matching micro-architecture levels there).
> 
> Understood.
> 
> > If numbers are out, what should we use instead?
> > x86-sse4, x86-avx2, x86-avx512?  Would that work?
> 
> Yes please, I think we have to choose somewhere, above would be more 
> descriptive

And IMHO that's exactly the problem.  These names should _not_ be 
descriptive, because any description invokes a wrong feeling of precision.  
E.g. what Florian already mentioned: sse4 - does it imply 4.1 and 4.2, or 
avx512: what of F, CD, ER, PF, VL, DQ, BW, IFMA, VBMI, 4VNNIW, 4FMAPS, 
VPOPCNTDQ, VNNI, VBMI2, BITALG, VP2INTERSECT, GFNI, VPCLMULQDQ, VAES does 
that one imply (rhethorical question, list shown just to make sillyness 
explicit).

Regarding precision: I think we should rule out any mathematically correct 
scheme, e.g. one in which every ISA subset gets an index and the directory 
name contains a hexnumber constructed by bits with the corresponding index 
being one or zero, depending on if the ISA subset is required or not: I 
think we're currently at about 40 ISA subsets, and hence would end up in 
names like x86-32001afff and x86-22001afef (the latter missing two subset 
compared to the former).

No, IMHO the non-vendor names should be non-descript, and either be 
numbers or characters, of which I would vote for characters, i.e. A, B, C.  
Obviously, as already mentioned here, the mapping of level to feature set 
needs to be described in documentation somewhere, and should be maintained 
by either glibc, glibc/gcc/llvm or psABI people.

I don't have many suggestions about vendor names, be them ISA-subset 
market names, or core names or company names.  I will just note that using 
such names has lead to an explosion of number of names without very good 
separation between them.  As long as we're only talking about -march= 
cmdline flags that may have been okay, if silly, but under this proposal 
every such name is potentially a subdirectory containing many shared 
libraries, and one that potentially needs to be searched at every library 
looking in the dynamic linker; so it's prudent to limit the size of this 
name set as well.

As for which subsets should or shouldn't be required in which level: I 
think the current suggestions all sound good, ultimately it's always going 
to be some compromise.


Ciao,
Michael.


Re: New mklog script

2020-05-19 Thread Michael Matz
Hello,

On Tue, 19 May 2020, Jakub Jelinek wrote:

> On Tue, May 19, 2020 at 05:21:16PM +0100, Richard Earnshaw wrote:
> > This is really a wart in the GNU coding style.  And one reason why I
> > tend to indent such labels by a single space.  It particularly affects
> > things like class definitions where public, private, etc statements
> > often appear in column 0.
> > 
> > IMO, it would be nice to get an official change in the coding style for
> > this, it's really irritating.
> 
> It doesn't have to be just label,
> void
> foo ()
> {
>   ...
> #define X ...
>   ...
> #undef X
>   ...
> }
> does the similar thing for mklog.

That particular one would be a mere bug in mklog then.  diff -p regards 
only members of [[:alpha:]$_] as acceptable start characters of function 
names (i.e. indeed things that can start a C identifier (ignoring details 
like non-base characters) with the '$' extension), of which '#' is none.


Ciao,
Michael.


Re: New mklog script

2020-05-19 Thread Michael Matz
Hello,

On Tue, 19 May 2020, Martin Liška wrote:

> > The common problems I remember is that e.g. when changing a function comment
> > above some function, it is attributed to the previous function rather than
> > following, labels in function confusing it:
> >   void
> >   foo ()
> >   {
> > ...
> >   label:
> > ...
> > -  ...
> > +  ...
> >   }
> 
> I've just tested that and it will take function for patch context
> (sem_variable::equals):
> @@ -1875,6 +1875,7 @@ sem_variable::equals (tree t1, tree t2)
>  default:
>return return_false_with_msg ("Unknown TREE code reached");
>  }
> +
>  }

No, the problem happens when the label is at column 0, like function names 
are.  Basically diff -p uses a regexp morally equivalent to 
'^[[:alpha:]$_]' to detect function headers, and git diff -p and friends 
followed suit.  But it should use something like
'^[[:alpha:]$_].*[^:]$' to rule out things ending with ':'.  See also diff 
-F for GNU diff.


Ciao,
Michael.


Re: Should ARMv8-A generic tuning default to -moutline-atomics

2020-04-30 Thread Michael Matz
Hello,

On Wed, 29 Apr 2020, Florian Weimer via Gcc wrote:

> Distributions are receiving requests to build things with
> -moutline-atomics:
> 
>   
> 
> Should this be reflected in the GCC upstream defaults for ARMv8-A
> generic tuning?  It does not make much sense to me if every distribution
> has to overide these flags, either in their build system or by patching
> GCC.

Yep, same here.  It would be nicest if upstream would switch to 
outline-atomics by default on armv8-a :-)  (the problem with build system 
overrides is that some compilers don't understand the option, complicating 
the overrides; and patching GCC package would create a deviation from 
upstream also for users)


Ciao,
Michael.


Re: GCC optimizations with O3

2020-04-22 Thread Michael Matz
Hello,

On Wed, 22 Apr 2020, Erick Ochoa wrote:

> in order for me to debug my issue, I'm going to have to refactor passes 
> which directly reference optimize.

For debugging you can also work backwards: use -O3 and add -fno-xy 
options.  At least you then know (after disabling all O3 passes) that it's 
one of those places that explicitely are tacked off the opt level.

> I am planning on refactoring them by creating a "$pass_opt_level". This 
> variable can be set via command line or somewhere in opts.c. I can then 
> substitute the references to optimize with $pass_opt_level.

I think for local decisions in passes that currenly use 'optimize' the 
better strategy is to determine the underlying cause for having that test, 
and add a flag for that (or reuse an existing one, e.g. if the reason was 
"don't disturb debugging experience" then create or use a flag specific to 
that role).

For the global decisions mentioned by Jakub: that's by nature not specific 
to a pass, hence a per-pass opt level wouldn't help.


Ciao,
Michael.


Re: Not usable email content encoding

2020-04-13 Thread Michael Matz
Hello,

On Mon, 13 Apr 2020, Christopher Faylor wrote:

> On Wed, Apr 08, 2020 at 04:15:27PM -0500, Segher Boessenkool wrote:
> >On Wed, Apr 08, 2020 at 01:50:51PM +, Michael Matz wrote:
> >>On Wed, 8 Apr 2020, Mark Wielaard wrote:
> >>>Earlier versions of Mainman2 had some issues which might accidentally
> >>>change some headers.  But the latest fixes make this possible.  It is
> >>>how the FSF handles DMARC for various GNU mailinglists (by NOT
> >>>modifying the headers and body and passing through the DKIM
> >>>signatures):
> >>>https://lists.gnu.org/archive/html/savannah-hackers-public/2019-06/msg00018.html
> >>
> >>Oh, that would be nice to have at sourceware.org.  Please?  :-)
> >
> >Yes, please please please, can we have this?
> 
> In case it isn't obvious, we are already running the latest available
> version of mailman 2.

I think that means that dmarc_moderation_action: "Munge From" can simply 
be switched off then (at least I don't see which other headers e.g. gcc@ 
is rewriting that would cause DMARC to scream; and if there are any, then 
it would be better to disable those as well.  Same with any potential 
body rewriting that might still happen).

I would offer help testing that this doesn't cause delivery issues, e.g. 
on some test email list, but it seems none of my domains is DMARC-infected 
:-/


Ciao,
Michael.

P.S: I wonder btw. why the From munging is enabled also for p=none domains 
like redhat.com.  The RFC says this is to be used for gathering DMARC 
feedback, not requiring any specific action for the mail text itself on 
the sender or receiver.  But an answer to this would be moot with the 
above non-munging of From.


Re: Not usable email content encoding

2020-04-08 Thread Michael Matz
Hello,

On Wed, 8 Apr 2020, Mark Wielaard wrote:

> On Tue, 2020-04-07 at 11:53 +0200, Florian Weimer via Overseers wrote:
> > Gmail can drop mail for any reason.  It's totally opaque, so it's a
> > poor benchmark for any mailing list configuration changes because it's
> > very hard to tell if a particular change is effective or not.
> > 
> > Many mailing lists have *not* made such changes and continue to work
> > just fine in the face of restrictive DMARC sender policies and
> > enforcement at the recipient.
> > 
> > In general, mail drop rates due to DMARC seem to increase in these two
> > cases if the original From: header is preserved:
> > 
> > * The sender (i.e., the domain mentioned in the From: header)
> >   publishes a restrictive DMARC policy and the mailing list strips the
> >   DKIM signature.
> > 
> > * The sender signs parts of the message that the mailing list alters,
> >   and the mailing list does not strip the DKIM signature.
> > 
> > If neither scenario applies, it's safe to pass through the message
> > without munging.  The mailing list software can detect this and
> > restricting the From: header munging to those cases.
> > 
> > I doubt Mailman 2.x can do this, so it is simply a poor choice as
> > mailing list software at this point.
> 
> Earlier versions of Mainman2 had some issues which might accidentally 
> change some headers. But the latest fixes make this possible. It is how 
> the FSF handles DMARC for various GNU mailinglists (by NOT modifying the 
> headers and body and passing through the DKIM signatures): 
> https://lists.gnu.org/archive/html/savannah-hackers-public/2019-06/msg00018.html

Oh, that would be nice to have at sourceware.org.  Please?  :-)


Ciao,
Michael.


Re: Not usable email content encoding

2020-04-07 Thread Michael Matz
Hello,

On Tue, 7 Apr 2020, Frank Ch. Eigler wrote:

> > I find that unconvincing, because even googlegroup email lists don't 
> > mangle From: from sender domains that are now mangled by sourceware.org 
> > :-/
> 
> It turns out receiving mail FROM google-groups mail is itself
> sometimes at risk because it fails to do this From: mangling, and its
> ARC/DKIM re-signature of mail requires even more software to process
> and bless.  (Its current behaviour on some groups-gmail lists I'm on
> are DMARC non-compliant.)

In a way that's amusing and just reinforces my p.o.v. that DMARC is 
bollocks.

> > Can we please switch it off?  It's not like we really had a problem
> > before the switch to mailman.
> 
> We have offered some first-hand evidence that there were problems,
> just they were worked on by people in the background.

Okay, now the question is, are those problems offsetting the current 
problems?  IMHO they don't, but of course I'm heavily biased, not having 
had those old problems  :)


Ciao,
Michael.


Re: Not usable email content encoding

2020-04-07 Thread Michael Matz
Hello,

On Tue, 7 Apr 2020, Jonathan Wakely via Gcc wrote:

> On Mon, 6 Apr 2020 at 23:00, Maciej W. Rozycki via Gcc  
> wrote:
> >  And can certainly score a positive though not a definite rating in spam
> > qualification.  I don't think we ought to encourage bad IT management
> > practices by trying to adapt to them too hard and hurting ourselves (our
> > workflow) in the process.
> 
> What you call "bad IT management practices" includes how Gmail works,
> which a HUGE number of people use.
> 
> A number of lists I'm on switched to our current style of minging a
> year or two ago, because gmail users were not receiving mail, because
> gmail was rejecting the mail.

I find that unconvincing, because even googlegroup email lists don't 
mangle From: from sender domains that are now mangled by sourceware.org 
:-/

Can we please switch it off?  It's not like we really had a problem before 
the switch to mailman.


Ciao,
Michael.


Re: Question about undefined functions' parameters during LTO

2020-04-07 Thread Michael Matz
Hello,

On Tue, 7 Apr 2020, Erick Ochoa wrote:

> > So, when you want to compare types use useless_type_conversion_p (for 
> > equivalence you need useless(a,b) && useless(b,a)).  In particular, 
> > for record types T it's TYPE_CANONICAL(T) that needs to be 
> > pointer-equal. (I.e. you could hard-code that as well, but it's 
> > probably better to use the existing predicates we have).  Note that 
> > useless_type_conversion_p is for the middle-end type system (it's 
> > actually one part of the definition of that type system), i.e. it's 
> > language agnostic.  If you need language specific equality you would 
> > have to use a different approach, but given that you're adding IPA 
> > passes you probably don't worry about that.
> 
> I've been using TYPE_MAIN_VARIANT(T) as opposed to TYPE_CANONICAL(T). 
> This was per the e-mail thread: 
> https://gcc.gnu.org/legacy-ml/gcc/2020-01/msg00077.html .

Well, Honza correctly said that TYPE_MAIN(a) == TYPE_MAIN(b) implies a and 
b to have the same representation.  But that doesn't imply the reserve 
direction, so that hint was somewhat misleading.

> I am not 100% sure what the differences are between these two yet,

Basically TYPE_MAIN_VARIANT "removes" qualifiers, i.e. the main variant is 
always the one without const/volatile.

TYPE_CANONICAL is defined for record types and being the same means they 
have the same representation as well, and are regarded the same from a 
type aliasing p.o.v.  In comparison to MAIN_VARIANT a non-equal CANONICAL 
pointer does imply non-equivalence of the types, so you can infer 
something from comparing CANONICAL.  That is true for the types that do 
have TYPE_CANONICAL set, the others need structural comparison.  See the 
docu of TYPE_CANONICAL and TYPE_STRUCTURAL_EQUALITY_P in tree.h.

> but I think TYPE_CANONICAL(T) was not helpful because of typedefs? I 
> might be wrong here, it has been a while since I did the test to see 
> what worked.
> 
> Using TYPE_MAIN_VARIANT(T) has gotten us far in an optimization we are 
> working on, but I do think that a custom type comparison is needed now.

Well, it really depends on what specific definition of type 
equality/equivalence/compatibility you need, and for what.  Do you want to 
differ between typedefs or not, do you regard structs of same members but 
different tag as equal or not, and so on.

> I do not believe I can use useless_type_conversion_p because I need a 
> partial order in order to place types in a set.

Apart from the fact that useless_type_conversion_p _is_ a partial order, 
do you mean the internal requirement of a set implementation relying on 
some partial order?  Well, yeah, I wouldn't necassarily expect you can use 
predicates defining a semantic order on items to be usable as an internal 
implementation requirement of some random data structure.  I don't know 
what exactly you need the sets for, so I don't know if you could just use 
the usual pointer sets that would then hold possibly multiple "same" 
trees, where the same-ness would only be used later when pulling elements 
out of the set.


Ciao,
Michael.


Re: Question about undefined functions' parameters during LTO

2020-04-07 Thread Michael Matz
Hello,

On Tue, 7 Apr 2020, Erick Ochoa wrote:

> Thanks for this lead! It is almost exactly what I need. I do have one more
> question about this. It seems that the types obtained via
> FOR_EACH_FUNCTION_ARGS and TREE_TYPE are different pointers when compiled with
> -flto.
> 
> What do I mean by this? Consider the following code:
> 
> #include 
> int main(){
>   FILE *f = fopen("hello.txt", "w");
>   fclose(f);
>   return 0;
> }
> 
> The trees corresponding to types FILE* and FILE obtained via the variable f
> are different from the trees obtained from the argument to fclose.

Yes, quite possible.

> However, when we are compiling the simple C program via
> /path/to/gcc -flto a.c -fdump-ipa-hello-world -fipa-hello-world
> /path/to/gcc -flto -flto-patition=none -fipa-hello-world a.c -o a.out
> one can see that the pointers are different:
> 
> pointers 0x79ee1c38 =?= 0x79ee0b28
> records 0x79ee1b90 =?= 0x79ee0a80
> 
> Do you, or anyone else for that matter, know if it would be possible to 
> keep the trees pointing to the same address? Or, in case it can be 
> possible with some modifications, where could I start looking to modify 
> the source code to make these addresses match? The other alternative for 
> me would be to make my own type comparison function, which is something 
> I can do. But I was wondering about this first.

So, generally type equality can't be established by pointer equality in 
GCC, even more so with LTO; there are various reasons why the "same" type 
(same as in language equality) is represented by different trees, and 
those reasons are amplified with LTO.  We try to unify some equal types to 
the same trees when reading in LTO bytecode, but that's only an 
optimization mostly.

So, when you want to compare types use useless_type_conversion_p (for 
equivalence you need useless(a,b) && useless(b,a)).  In particular, for 
record types T it's TYPE_CANONICAL(T) that needs to be pointer-equal.  
(I.e. you could hard-code that as well, but it's probably better to use 
the existing predicates we have).  Note that useless_type_conversion_p is 
for the middle-end type system (it's actually one part of the definition 
of that type system), i.e. it's language agnostic.  If you need language 
specific equality you would have to use a different approach, but given 
that you're adding IPA passes you probably don't worry about that.


Ciao,
Michael.


Re: Not usable email content encoding

2020-03-18 Thread Michael Matz
Hello,

On Wed, 18 Mar 2020, Frank Ch. Eigler via Gcc wrote:

> > > The From: header rewriting for DMARC participants is something sourceware
> > > is doing now.
> > 
> > Out of curiousity, is this rewriting you are talking about the cause for a
> > lot of mails showing up as "From: GCC List" rather than their real senders?
> > This has become very annoying recently.
> 
> Yes, for emails from domains with declared interest in email
> cleanliness, via DMARC records in DNS.  We have observed mail
> -blocked- at third parties, even just days ago, when we failed to
> sufficiently authenticate outgoing reflected emails.

Was this blocking also a problem before mailman (i.e. two weeks ago)?  
Why did nobody scream for not having received mail?  Or why is it blocked 
now, but wasn't before?  Can it be made so again, like it was with ezmlm?

(And DMARCs requirement of having to rewrite From: headers should make it 
clear to everyone that it's stupid).


Ciao,
Michael.


Re: Not usable email content encoding

2020-03-18 Thread Michael Matz
Hi,

On Wed, 18 Mar 2020, Frank Ch. Eigler via Gcc wrote:

> > > The key here is to realize that the raw message is not what you get
> > > back from the mailing list reflector, and also not the raw message
> > > that was sent by the sender.  In this day of mta intermediaries,
> > > proxies, reflectors, it may be time to revisit that suggestion.
> > 
> > But these largely are new problems.  It used to work flawlessly.
> 
> I understand that's frustrating.  But these workflows were counting on
> literally unspecified behaviours not changing, or outright standards
> violations continuing.

Wut?  How is "not mangle the mail body" in any way violating standards?  
You're talking about rewriting or adding headers (where the former is Real 
Bad, no matter what DMARC wants to impose), but the suggestion is based on 
not rewriting the body.  If the body (including attachtments) is rewritten 
any way then that simply is a bug.

> > Patch reencoding problems go back to the redhat.com changes last
> > November (I understand the responsible vendor is working on a fix,
> > but I'm not up-to-date on the current developments).
> 
> This one is a standards-compliant reencoding.  Even if mimecast (?)
> stops doing it, we can't be sure nothing else will.
> 
> > Since the sourceware.org Mailman migration, the From: header is being
> > rewritten, without any compelling reason.  I certainly do not do any
> > DMARC checking here, so the rewriting does not benefit me.
> 
> It benefits you because more and more email services are rejecting or
> interfering with mail that is not clean enough.  If you want to
> receive mail reliably, or send and have confidence that it is
> received, clean mail benefits you.

Depends on your definition of "clean".  If by that you mean rewriting mail 
bodies then I'm not sure what to say.


Ciao,
Michael.


Re: Question about undefined functions' parameters during LTO

2020-03-13 Thread Michael Matz
Hello,

On Fri, 13 Mar 2020, Erick Ochoa wrote:

> +for (tree parm = DECL_ARGUMENTS (undefined_function->decl); parm; parm =
> DECL_CHAIN (parm))
> + {
> +   tree type = TREE_TYPE(parm);
> +   if (dump_file) fprintf(dump_file, "I want the type, do I have it?
> %s\n", type ? "true" : "false");
> + }
> +  }
> +  return 0;
> +}
> 
> I have added the complete patch below, however the function iphw_execute
> encapsulates the logic I am trying at the moment.
> 
> The problem is that while this program runs, DECL_ARGUMENTS returns NULL and
> therefore the loop is never entered. This is true for functions that have
> arguments, such as puts/malloc/... and others in glibc.

As argument (types) conceptually belong to the functions type (not its 
decl), you should look at the function decls type, not at DECL_ARGUMENTS.
See the FOREACH_FUNCTION_ARGS iterator and its helpers.  Note that you 
need to pass it TREE_TYPE(funcdecl).

(DECL_ARGUMENTS is the list of formal parameters viewed from the function 
bodies perspective, so without a body that isn't filled).


Ciao,
Michael.


Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Michael Matz
Hello,

On Mon, 3 Feb 2020, Richard Earnshaw (lists) wrote:

> Well, I'd review a patch differently depending on whether or not it was 
> already committed, a patch requiring review or an RFC looking for more 
> general comments, so I *do* think such an email prefix is useful.

As I said: a very good argument must be made; it might be that rfc falls 
into the useful-tag category.

> >> 'git am' would strip leading [...] automatically unless
> >> you've configured, or asked git to do otherwise.  So that leading part
> >> is not counted for the length calculation.
> > 
> > There's still e-mail netiquette which also should be obeyed, or at least
> > not contradicted by git netiquette.
> 
> The 50 char limit seems to come from wanting git log --oneline to not wrap in
> an 80 column terminal.  Whilst laudable, I'm not sure that such a limit
> doesn't become too restrictive and then lead to hard-to-understand summaries.

In my experience hard-to-understand summaries are more related to people 
writing them than to length, IOW, I fear a larger limit like 72 characters 
won't help that.  And, as Segher put it, we aren't really talking about 
limits, only about suggestions, if you _really_ have to mention 
that 40-character function name in which you fixed something in your 
subject, then yeah, you'll go over the 50 chars.  But as recommendation 
the 50 chars make more sense than the 72 chars, IMHO.


Ciao,
Michael.


Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Michael Matz
Hello,

On Mon, 3 Feb 2020, Jakub Jelinek wrote:

> > > The idea is that the [...] part is NOT part of the commit, only part of 
> > > the email.
> > 
> > I understand that, but the subject line of this thread says "e-mail 
> > subject lines", so I thought we were talking about, well, exactly that; 
> > and I see no value of these tags in e-mails either.
> 
> In email, they do carry information that is useful there, the distinction
> whether a patch has been committed already and doesn't need review from
> others, or whether it is a patch intended for patch review, or just a RFC
> patch that is not yet ready for review, but submitter is looking for some
> feedback.

For tags like [cmt] or [rfc] I don't have much gripe, though I do think 
that info could be given in the body, and that e.g. in e-mail archives 
(where the tags are not removed automatically) they carry the same value 
as in git log, namely zero.

But suggesting that using the subject line for tagging is recommended can 
lead to subjects like

 [PATCH][GCC][Foo][component] Fix foo component bootstrap failure

in an e-mail directed to gcc-patc...@gcc.gnu.org (from somewhen last year, 
where Foo/foo was an architecture; I'm really not trying to single out the 
author).  That is, _none_ of the four tags carried any informational 
content.


Ciao,
Michael.


Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Michael Matz
Hi,

On Mon, 3 Feb 2020, Richard Earnshaw (lists) wrote:

> The idea is that the [...] part is NOT part of the commit, only part of 
> the email.

I understand that, but the subject line of this thread says "e-mail 
subject lines", so I thought we were talking about, well, exactly that; 
and I see no value of these tags in e-mails either.

(They might have a low but non-zero value for projects that use 
a single mailing list for patches and generic discussion, but we are not 
such project)

Basically: if they are deemed to clutter the git log for whatever reason, 
then there must be a very good argument for why they not also clutter 
e-mail subject lines, but instead are essential to have there, 
but not in the log.

> 'git am' would strip leading [...] automatically unless 
> you've configured, or asked git to do otherwise.  So that leading part 
> is not counted for the length calculation.

There's still e-mail netiquette which also should be obeyed, or at least 
not contradicted by git netiquette.


Ciao,
Michael.


Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-03 Thread Michael Matz
Hello,

On Mon, 3 Feb 2020, Richard Earnshaw (lists) wrote:

> Where does your '50 chars' limit come from?  It's not in the glibc text, 
> and it's not in the linux kernel text either.  AFAICT this is your 
> invention and you seem to be the only person proposing it.

Nope, it's fairly common, so much so that it's included in the "commonly 
accepted rules" that googling for "git subject lines" gives you (as a 
snippet glimpse from some website), and that vim changes color when 
spilling over 50 chars.  I actually thought it was universal and obvious 
until this thread (which is why I admittedly only did the above google 
right before writing this mail).  For the rationale: 'git log --oneline' 
with hash and author or date should fit the usual 72 char limit.  (An 
11-character hash plus space alone would come out as 60 chars for the 
subject)

That's also the reason why some people (certainly me) are nervous about or 
dislike all the "tags" in the subject line.  E.g. what essential 
information (and subjects are for essential info, right?) is "[committed]" 
(or, even worse "[patch]") supposed to transport?  If the rest of the 
subject doesn't interest me, I don't care if something was committed or 
not; if it _does_ interest me, then I'm going to look at the mail/patch 
either way, if committed or not; at which point the info if the author 
required review or has already committed it could be gives in the body as 
well.  Similar for some other metainfo tags.  (The "subsystem:" is good, 
though).

And if we must have these tags, then why not at least short ones?  Why 
isn't "[cmt]" or something enough?  There will be very few tags, so they 
become mnemonic pretty much immediately.  What becomes clearer when 
writing "[patch v2 1/13]" in comparison to "[v2 1/13]"?


Ciao,
Michael.



Re: [RFC] Characters per line: from punch card (80) to line printer (132) (was: [Patch][OpenMP/OpenACC/Fortran] Fix mapping of optional (present|absent) arguments)

2019-12-05 Thread Michael Matz
Hello,

(oh a flame bait :) )

On Thu, 5 Dec 2019, Thomas Schwinge wrote:

> So, I formally propose that we lift this characters per line restriction
> from IBM punch card (80) to mainframe line printer (132).
> 
> Tasks:
> 
>   - Discussion.

I object to cluttering code in excuse for using sensible function names or 
temporaries that otherwise can help clearing up code.  Using 132-char 
lines is cluttering code:
- long lines are harder to read/grasp: vertical eye movement is easier 
  than horizontal, and source code should be optimized for 
  reading, not writing
- long lines make it impossible to have two files next to each other at a 
  comfortable font size
- long lines are incompatible with existing netiquette re emails, for 
  instance

So, at least for me, that my terminals are 80 wide (but not x24) has 
multiple reasons, and the _least_ of it is because that's what punch cards 
had.


Ciao,
Michael.


Re: Use predicates for RTL objects

2019-08-08 Thread Michael Matz
Hi,

On Wed, 7 Aug 2019, Arvind Sankar wrote:

>   => x->is_a (REG)

Oh god, please no.  Currently at least the RTL parts of GCC still have 
mostly a consistent and obvious style, which is a good thing.  I have no 
idea why anyone would think the above is easier to read than REG_P (x).


Ciao,
Michael.
P.S: Consider this: the current style served us quite well for the last 35 
or so years, so before suggesting style changes, shouldn't you first work 
on the sources for some time?


Re: Can LTO minor version be updated in backward compatible way ?

2019-07-17 Thread Michael Matz
Hi,

On Wed, 17 Jul 2019, Romain Geissler wrote:

> However at scale, I think this can become a problem. What will happen
> when in gcc 9.3 we change the version to 8.2 ? Will Tumbleweed recompile
> 100% of the static libraris it ships ?

Every compiler change causes the whole distro to be rebuilt.  So for us 
the LTO byte stream instability is no problem.

> What about all users of Tumbleweed having their own private libs with 
> LTO as well ?

LTO is currently not designed for this use case, you can use fat objects 
to get around the limitation, as you say, but a stable LTO byte stream is 
currently not a focus.  But with time I indeed also hope that some 
backward compatiblity can be achieved, with degrading modes like you 
suggested.

> I am totally fine with having the major version mismatch as a 
> showstopper for the link. People will usually not combine a gcc 8 built 
> binary with a gcc 9 one.

That's actually not too far off from what people will want to do in the 
future.  Say some HPC vendor ships their libs as static archives, 
containing LTO byte code compiled by gcc 9.  Then a distro user might get 
gcc 10 at some point later, and it's reasonable to expect that the HPC 
libs still are working.  We aren't there yet, but we eventually want to be 
there.


Ciao,
Michael.


Re: [PATCH] Do not warn with warn_unused_result for alloca(0).

2019-06-13 Thread Michael Matz
Hi,

On Thu, 13 Jun 2019, Jeff Law wrote:

> > (In fact I think our builtin_alloca implementation could benefit when we 
> > added that behaviour as well; it's a natural wish to be able to free 
> > memory that you allocated).
> 
> Also note that simply sprinkling alloca(0) calls won't magically release
> memory.  In a stack implementation releasing happens when the frame is
> removed.  Doing something more complex than that seems unwise.

Yeah, on reconsideration I think I'm pedaling back on this suggestion 
(which really was to do something more complex).  We have the necessary 
support for this in GCC (for VLAs), and then we'd be able to support code 
like this:

  for () {
dostuff (alloca (size));
morestuff (alloca (size2));
alloca(0);   // free all allocas
  }

without running the risk of unlimited stack use.  But of course this would 
promote a programming style that'd only work with our alloca (and not even 
C-alloca), and we want to avoid that.  I thought it a cute idea, but was 
carried away by the cuteness ;-)

I like the suggestion of setting (and carrying through the pipeline) the 
TREE_NO_WARNING flag on an source 'alloca(0)'.


Ciao,
Michael.


Re: [PATCH] Do not warn with warn_unused_result for alloca(0).

2019-06-12 Thread Michael Matz
Hi,

On Wed, 12 Jun 2019, Martin Sebor wrote:

> > Otherwise LGTM as the patch, but I'd like to hear from others whether 
> > it is kosher to add such a special case to the warn_unused_result 
> > attribute warning.  And if the agreement is yes, I think it should be 
> > documented somewhere that alloca (0) will not warn even when the call 
> > has such an attribute (probably in the description of 
> > warn_unused_result attribute).
> 
> I'm not very happy about adding another special case to alloca
> (on top of not diagnosing zero allocation by -Walloc-zero).
> There is no valid use case for the zero argument, whether or not
> the return value is used.

That's the thing, there _is_ a valid use case for supplying a zero 
argument and then the returned value should _not_ be used.  There are 
alloca implementations that do something (freeing memory) when 
called with a zero size, so some (older) programs contain such calls.  
Warning on those calls for the unused results is exactly the wrong thing 
to do, if anything if the result is used we'd have to warn.  (That's of 
course non-standard, but so is alloca itself)  And just removing these 
calls isn't correct either except if it's ensured to not use an alloca 
implementation with that behaviour.

(In fact I think our builtin_alloca implementation could benefit when we 
added that behaviour as well; it's a natural wish to be able to free 
memory that you allocated).


Ciao,
Michael.


Re: alloca (0) in include/libiberty.h

2019-06-11 Thread Michael Matz
Hi,

On Tue, 11 Jun 2019, Martin Liška wrote:

> I see 3 occurrences of the alloca (0) in libiberty/regex.c, but there are 
> properly
> guarded within:
> 
> # ifdef C_ALLOCA
>   alloca (0);
> # endif
> 
> and then I noticed 2 more occurrences in gdb that break build right now:
> 
> gdb/regcache.c:  alloca (0);
> gdb/top.c:  alloca (0);
> 
> Is it the right approach to remove these 2 in gdb?

It's more an indication that the annotation requesting the warning for 
unused results is simply overeager (aka wrong) for alloca.  (sure, the 
uses in gdb probably could be cleaned up as well, but that doesn't affect 
the wrongness of the warning).


Ciao,
Michael.

Re: Fixing inline expansion of overlapping memmove and non-overlapping memcpy

2019-05-15 Thread Michael Matz
Hi,

On Wed, 15 May 2019, Jakub Jelinek wrote:

> Just one thing to note, our "memcpy" expectation is that either there is 
> no overlap, or there is 100% overlap (src == dest), both all the current 
> movmem == future cpymem expanders and all the supported library 
> implementations do support that, though the latter just de-facto, it 
> isn't a written guarantee.

Yes, I should have been more precise, complete overlap is always de-facto 
supported as well.


Ciao,
Michael.


Re: Fixing inline expansion of overlapping memmove and non-overlapping memcpy

2019-05-15 Thread Michael Matz
Hi,

On Wed, 15 May 2019, Aaron Sawdey wrote:

> Yes this would be a nice thing to get to, a single move/copy underlying 
> builtin, to which we communicate what the compiler's analysis tells us 
> about whether the operands overlap and by how much.
> 
> Next question would be how do we move from the existing movmem pattern 
> (which Michael Matz tells us should be renamed cpymem anyway) to this 
> new thing. Are you proposing that we still have both movmem and cpymem 
> optab entries underneath to call the patterns but introduce this new 
> memmove_with_hints() to be used by things called by 
> expand_builtin_memmove() and expand_builtin_memcpy()?

I'd say so.  There are multiple levels at play:
a) exposal to user: probably a new __builtint_memmove, or a new combined 
   builtin with a hint param to differentiate (but we can't get rid of 
   __builtin_memcpy/mempcpy/strcpy, which all can go through the same 
   route in the middleend)
b) getting it through the gimple pipeline, probably just a new builtin 
   code, trivial
c) expanding the new builtin, with the help of next items
d) RTL block moves: they are defined as non-overlapping and I don't think 
   we should change this (essentially they're the reflection of struct 
   copies in C)
e) how any of the above (builtins and RTL block moves) are implemented: 
   currently non-overlapping only, using movmem pattern when possible; 
   ultimately all sitting in the emit_block_move_hints() routine.

So, I'd add a new method to emit_block_move_hints indicating possible 
overlap, disabling the use of move_by_pieces.  Then in 
emit_block_move_via_movmem (alse getting an indication of overlap), do the 
equivalent of:

  finished = 0;
  if (overlap_possible) {
if (optab[movmem])
  finished = emit(movmem)
  } else {
if (optab[cpymem])
  finished = emit(cpymem);
if (!finished && optab[movmem])  // can use movmem also for overlap
  finished = emit(movmem);
  }

The overlap_possible method would only ever be used from the builtin 
expansion, and never from the RTL block move expand.  Additionally a 
target may optionally only define the movmem pattern if it's just as good 
as the cpymem pattern (e.g. because it only handles fixed small sizes and 
uses a load-all then store-all sequence).


Ciao,
Michael.


Re: Fixing inline expansion of overlapping memmove and non-overlapping memcpy

2019-05-15 Thread Michael Matz
Hi,

On Tue, 14 May 2019, Aaron Sawdey wrote:

> memcpy -> expand with movmem pattern
> memmove (no overlap) -> transform to memcpy -> expand with movmem pattern
> memmove (overlap) -> remains memmove -> glibc call
...
> However in builtins.c expand_builtin_memmove() does not actually do the 
> expansion using the memmove pattern.

Because it can't: the movmem pattern is not defined to require handling 
overlaps, and hence can't be used for any possibly overlapping 
memmove.  (So, in a way the pattern is misnamed and should probably have 
been called cpymem from the beginning, alas there we are).

> So here's my proposed set of fixes:
>  * Add new optab entries for nonoverlapping_memcpy and overlapping_memmove
>cases.

Wouldn't it be nicer to rename the current movmem pattern to cpymem 
wholesale for all ports (i.e. roughly a big s/movmem/cpymem/ over the 
whole tree) and then introduce a new optional movmem pattern with 
overlapping semantics?


Ciao,
Michael.


Re: GCC 8 vs. GCC 9 speed and size comparison

2019-04-16 Thread Michael Matz
Hello Martin,

On Tue, 16 Apr 2019, Martin Liška wrote:

> Yes, except kdecore.cc I used in all cases .ii pre-processed files. I'm 
> going to start using kdecore.ii as well.

If the kdecore.cc is the one from me it's also preprocessed and doesn't 
contain any #include directives, I just edited it somewhat to be 
compilable for different architecture.


Ciao,
Michael.

> 
> As Honza pointed out in the email that hasn't reached this mailing list
> due to file size, there's a significant change in inline-unit-growth. The 
> param
> has changed from 20 to 40 for GCC 9. Using --param inline-unit-growth=20 for 
> all
> benchmarks, I see green numbres for GCC 9!
> 
> Martin
> 
> > 
> > 
> > Ciao,
> > Michael.
> > 
> 
> 

Re: GCC 8 vs. GCC 9 speed and size comparison

2019-04-15 Thread Michael Matz
Hi,

On Mon, 15 Apr 2019, Jakub Jelinek wrote:

> > It seems the C++ parser got quite a bit slower with gcc 9 :-( Most 
> > visible in the compile time for tramp-3d (24%) and kdecore.cc (18% 
> > slower with just PGO); it seems that the other .ii files are C-like 
> > enough to not
> 
> Is that with the same libstdc++ headers (i.e. identical *.ii files) or 
> with the corresponding libstdc++ headers?  Those do change a lot every 
> release as well.

The tramp3d and kdecore testcases are preprocessed files from a 
collection of benchmark sources we use, i.e. the same 
input for all compilers.  I think the {gimple,generic}-match.ii are in the 
same league.


Ciao,
Michael.


  1   2   3   4   5   >