[Bug middle-end/114532] gcc -fno-common option causes performance degradation on certain architectures

2024-06-05 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532

--- Comment #11 from David Brown  ---
(In reply to Zhaohaifeng from comment #8)
> (In reply to David Brown from comment #7)
> > (In reply to Xi Ruoyao from comment #6)

> > Anyway, I cannot see any reason while -fno-common should result in the
> > slower run-times the OP saw (though I have only looked at current gcc
> > versions).  I haven't seen any differences in the code generated for
> > -fcommon and -fno-common on the x86-64.  And my experience on other targets
> > is that -fcommon allows optimisations that cannot be done with -fno-common,
> > thus giving faster code.
> > 
> > I have not, however, seen the OP's real code - I've just made small tests.
> 
> The difference generated for -fcommon and -fno-common is just the global
> variable order in memory address.
> 
> -fcommon is as following (some special order):
> stderr@GLIBC_2.2.5
> completed.0
> Begin_Time
...
> -fno-common is as following (reversed order of source code):
> stderr@GLIBC_2.2.5
> completed.0
> Dhrystones_Per_Second
> Microseconds
> User_Time
...

A change in the order is not unexpected.  But it is hard to believe this will
make a significant difference to the speed of the code as much as you describe
- it would have to involve particularly unlucky cache issues.

On the x86-64, defined variables appear to be allocated in the reverse order
from the source code unless there are overriding reasons to change that.  I
don't know why that is the case.  You can avoid this by using the
"-fno-toplevel-reorder" switch.  I don't know how common variables are
allocated - that may depend on ordering in the code, or linker scripts, or
declarations in headers.

I have no idea about your program, but one situation where the details of
memory  layout can have a big effect is if you have multiple threads, and
nominally independent data used by multiple threads happen to share a cache
line.  Access patterns to arrays and structs can also have different effects
depending on the alignment of the data to cache lines.

So you might try "-fno-toplevel-reorder" to have tighter control of the
ordering.  It may also be worth adding cacheline-sized _Alignas specifiers to
some objects, particularly bigger or critical structs or arrays.  (If you are
using a C standard prior to C11, gcc's __attribute__((aligned(XXX))) can be
used.)

[Bug middle-end/114532] gcc -fno-common option causes performance degradation on certain architectures

2024-06-05 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532

--- Comment #7 from David Brown  ---
(In reply to Xi Ruoyao from comment #6)
> (In reply to Zhaohaifeng from comment #5)
> 
> > Does gcc implement -fsection-anchors like function in -fcommon option for
> > x86? In general concept, gcc should has some similar feature for x86 and 
> > ARM.
> 

AFAIK, -fsection-anchors and -fcommon / -fno-common are completely independent.
 But section anchors cannot work with "common" symbols, no matter what
architecture, because at compile time the compiler does not know the order of
allocation of the common symbols.  It /does/ know the order of allocation of
symbols defined in the current translation unit, such as initialised data,
-fno-common zero initialised data, and static data.  This information can be
used with section anchors and also with other optimisations based on the
relative positions of objects.

> AFAIK it's not very useful for CISC architectures supporting variable-length
> fancy memory operands.

That seems strange to me.  But I know very little about how targets such as
x86-64 work for global data that might be complicated with load-time or
run-time linking - my experience and understanding is all with statically
linked binaries.

It seems, from my brief testing, that for the x86-64 target, the compiler does
not do any optimisations based on the relative positions of data defined in a
unit (whether initialised, non-common bss, or static).  For targets such as the
ARM, gcc can optimise as though the individual variables were fields in a
struct where it knows the relative positions.  I don't see any reason why
x86-64 should not benefit from some of these, though I realise that scheduling
and out-of-order execution will mean some apparent optimisations would be
counter-productive.  Maybe there is some kind of address space layout
randomisation that is playing a role here?


Anyway, I cannot see any reason while -fno-common should result in the slower
run-times the OP saw (though I have only looked at current gcc versions).  I
haven't seen any differences in the code generated for -fcommon and -fno-common
on the x86-64.  And my experience on other targets is that -fcommon allows
optimisations that cannot be done with -fno-common, thus giving faster code.

I have not, however, seen the OP's real code - I've just made small tests.

[Bug middle-end/114532] gcc -fno-common option causes performance degradation on certain architectures

2024-06-04 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532

--- Comment #4 from David Brown  ---
I'm not personally particularly interested in performance on x86 systems - my
work is in embedded microcontroller programming.  But I did push for
"-fno-common" to be the default in gcc because "-fcommon" greatly reduces the
risk of some kinds of errors in code.

I've tried fiddling around a bit with different gcc targets and options on
godbolt.org :



It's easy to see the difference between common symbols and non-common symbols
by using "-fcommon" and comparing non-initialised externally linked objects
with initialised ones (since these are never common).  It seems that for some
targets (like x86-64), there is no "-fsection-anchors" support at all.  For
some (like mips), you can choose it explicitly.  And for some (like ARM 32-bit
and 64-bit), it is automatic when optimising.  I assume section anchors can be
a gain for some targets, but not so much for others.

So certainly "-fsection-anchors" will not be a help for x86-64, since that
target does not support section anchors.  (And for targets that /do/ support
them, such as ARM, it's important not to enable -fdata-sections since that
blocks the anchors.)

[Bug rtl-optimization/113280] Strange error for empty inline assembly with +X constraint

2024-01-09 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280

--- Comment #12 from David Brown  ---
(In reply to Segher Boessenkool from comment #11)
> (In reply to David Brown from comment #8)
> > As for using "=X" in the "opt == 3" case, I worry that that could lead to
> > errors as the two assembly lines are independent.  The first says "put X
> > anywhere", and the second - if it had "=X" - says "take X from anywhere". 
> > These don't have to be the same "anywhere" unless the input and output are
> > in the same statement - and if the compiler picked different anywheres, the
> > code would not work.
> 
> This cannot lead to errors.  The compiler knows where "x" is (it put it there
> itself!)

If I write :

int regtest(int a, int x) {
(void) a;// Ignore first parameter
return x;
}

I get the assembly (x86-64) :

movl %esi, %eax
ret

or (arm 32-bit)

mov r0, r1
bx lr


That's all fine, and as expected.  But with this :

int regtest(int a, int x) {
(void) a;// Ignore first parameter
asm("" :: "X" (x);
asm("" : "=X" (x);
return x;
}

the result is just a "ret" or a "bx lr" instruction.  The register move is
missing, because for the first assembly the "anywhere" is picked as esi/r1, and
for the second assembly the "anywhere" is picked as eax/r0.  This is a
perfectly valid choice of registers for the compiler, but it is not what we
want in this use-case.

It is only if I use "+X" (or add a "0" (x) input), rather than "=X", that the
compiler picks the same register for input and output, because it is within the
one assembly instruction.

(The same applies to "=g" / "+g".)

I don't understand why having a "asm("" :: "X" (x))" seems to be necessary
sometimes for "asm("" : "+X" (x))" to work the way I expect, without a compiler
error, but I /do/ understand why "asm("" :: "X" (x))" followed by "asm("" :
"=X" (x))" will sometimes not give the results I am looking for.

[Bug rtl-optimization/113280] Strange error for empty inline assembly with +X constraint

2024-01-09 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280

--- Comment #8 from David Brown  ---
(In reply to Segher Boessenkool from comment #4)
> Nothing has changed here.
> 
> opt == 2 and opt == 3 should use "=X", not "+X", btw.
> 

I realise (since you told me - thanks) that

asm ("" : "+X" (x))

and

asm ("" : "=X" (x) : "0" (x))

are the same.  However, it seems that 

asm ("" : "+X" (x) : "0" (x))

is different.  If I change "opt == 2" to use "=X", then it acts exactly like
the "opt == 1" version (as expected), but having "+X" in "opt == 2" gives
different results.  I don't know /why/ this seems to be treated differently,
since it is just repeating the input in the same target, but it is.

As for using "=X" in the "opt == 3" case, I worry that that could lead to
errors as the two assembly lines are independent.  The first says "put X
anywhere", and the second - if it had "=X" - says "take X from anywhere". 
These don't have to be the same "anywhere" unless the input and output are in
the same statement - and if the compiler picked different anywheres, the code
would not work.

[Bug rtl-optimization/113280] Strange error for empty inline assembly with +X constraint

2024-01-09 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280

--- Comment #7 from David Brown  ---
Yes, the goal is an optimisation barrier with the least possible impact on the
code.  For most uses, asm("" : "+g" (x)) has been ideal, as far as I have
tested.  Typically it ensures "x" is evaluated in a register before the
assembly "executes", and uses that register for "x" afterwards.  But "g" also
matches memory, so if "x" happens to be in a stack slot, it gets left there -
there is no extra movement in and out of a register.

But "g" does not match floating point or SIMD registers.  So if "x" is in one
of these registers, extra moves are generated, and I'd like to avoid them.  So
the ideal constraint would be one that matches memory or any register that can
contain data.  That's not quite the same as "match anything" - it should not
match immediate values, nor "special" registers such as PC, SP, or a flags
register, or the extra registers many processors have for floating point
settings, link registers, interrupt state registers, and so on.  But it might
include registers for other accelerators that some chips have.

[Bug c/113280] New: Strange error for empty inline assembly with +X constraint

2024-01-08 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113280

Bug ID: 113280
   Summary: Strange error for empty inline assembly with +X
constraint
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

I am getting strange results with an empty inline assembly line, used as an
optimisation barrier.  This is the test code I've tried in different setups:

#define opt 1

typedef int T;
T test(T a, T b) {
T x = a + b;
#if opt == 1
asm ("" : "+X" (x));   
#endif
#if opt == 2
asm ("" : "+X" (x) : "0" (x));
#endif
#if opt == 3
asm ("" :: "X" (x));
asm ("" : "+X" (x));   
#endif
return x - b;
} 

The idea of the assembly line is to be an optimisation barrier - it forces the
compiler to calculate the value of "x" before the assembly line, and has to use
that value (rather than any pre-calculated information) after the assembly
line.  This makes it something like an "asm("" ::: "memory")" barrier, but
focused on a single variable, which may not need to be moved out of registers. 
I've found this kind of construct "asm("" : "+g" (x))" useful to control
ordering of code around disabled interrupt sections of embedded code, and it
can also force the order of floating point calculations even while -ffast-math
is used.  Usually I've used "g" as the operand constraint, and that has worked
perfectly for everything except floating point types on targets that have
dedicated floating point registers.

I had expected "+X" (option 1 above) to work smoothly for floating point types.
 And it does, sometimes - other times I get an error:

error: inconsistent operand constraints in an 'asm'

I've tested mainly on x86-64 (current trunk, which will be 14.0) and ARM 32-bit
(13.2), with -O2 optimisation.  I get no errors with -O0, but there is no need
of an optimisation barrier if optimisation is not enabled!  The exact compiler
version does not seem to make a difference, but the target does, as does the
type T.

I can't see why there should be a difference in the way these three variations
work, or why some combinations give errors.  When there is no compiler error,
the code (in all my testing) has been optimal as intended.

Some test results:

T = float :
  opt 1   arm-32 fail  x86-64 fail
  opt 2   arm-32 fail  x86-64 ok
  opt 3   arm-32 ok  x86-64 ok

T = int :
  opt 1   arm-32 fail  x86-64 fail
  opt 2   arm-32 fail  x86-64 ok
  opt 3   arm-32 ok  x86-64 ok

T = double :
  opt 1   arm-32 fail  x86-64 fail
  opt 2   arm-32 fail  x86-64 ok
  opt 3   arm-32 ok  x86-64 ok

T = long long int :
  opt 1   arm-32 ok  x86-64 fail
  opt 2   arm-32 ok  x86-64 ok
  opt 3   arm-32 ok  x86-64 ok

T = _Complex double :
  opt 1   arm-32 ok  x86-64 ok
  opt 2   arm-32 fail  x86-64 fail
  opt 3   arm-32 ok  x86-64 ok

T = long double :
  (arm-32 has 64-bit long double, so the results are not relevant)
  opt 1 gives "inconsistent operand constraints in an asm"
  opt 2 and 3 give "output constraint 0 must specify a single register", which
is an understandable error.


It looks like the "+X" constraint only works reliably if preceded by the
assembly line with the "X" input constraint.  For my own use, I can live with
that - it all goes in a macro anyway.  But maybe it points to a deeper issue,
or at least a potential for neatening things up.

[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy

2023-12-19 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

--- Comment #51 from David Brown  ---
(In reply to M Welinder from comment #48)
> It's your (1).  gcc is changing a program that can rely on errno not being
> changed to one where the C library can change it.  (The current C library or
> any future library that the resulting binary may be dynamically linked
> against.)
> 
> Is there any real-world situation that benefits from introducing these
> calls?  It has the feel of optimizing for a benchmark.

There are several real-world benefits from transforming back and forth between
library calls for this kind of small standard library function.  One is that
turning explicit code into library calls can give smaller code - often of
importance in small embedded targets.  Sometimes it can also result in run-time
improvements, especially for larger data sizes - user-written code might just
copy byte by byte, while the library implementation uses more efficient larger
blocks.

Another is that turning library calls into inlined code can speed up code by
using additional knowledge of sizes, alignment, etc., to get faster results. 
This is most obvious for calls to memcpy() or memmove(), which can sometimes be
required to get the semantics correct for type manipulation, but may generate
no actual code at all.

A "C implementation" consists of a compiler and a standard library in tandem. 
The C library can make use of its knowledge of the C compiler, and any special
features, in its implementation.  (This is, in fact, required - some things in
the standard library cannot be implemented in "pure" C.)  The C compiler can
make use of its knowledge of the library implementation in its code generation
or analysis.  For the most part, compilers only make use of their knowledge of
the specifications of standard library functions, but they might also use
implementation details.

This means it is quite legitimate for the GCC team to say that gcc requires a C
library that does not set errno except for functions that explicitly say so in
their specifications.  Users don't get to mix and match random compilers and
random standard libraries and assume they form a conforming C implementation -
the pair must always be checked for compatibility.

The question then is if this would be an onerous requirement for standard
library implementations - do common existing libraries set errno in functions
that don't require it?  I cannot say, but I would be very surprised if they
did.  Modern thought, AFAIUI, considers errno to be a bad idea which should be
avoided whenever possible - it is a hinder to optimisation, analysis, and
parallelisation of code, as well as severely limiting C++ constexpr and other
compile-time calculations.

My thoughts here are that GCC should make this library requirement explicit and
public, after first confirming with some "big name" libraries like glibc,
newlib and muslc.  They could also add a flag "-funknown-stdlib" to disable any
transforms back or forth between standard library calls, and assume nothing
about the calls (not even what is given in the standards specifications).


(As a note - the paragraph 7.5p3 allowing standard library functions to set
errno is still in the current draft of C23.)

[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy

2023-12-18 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

--- Comment #47 from David Brown  ---
(In reply to M Welinder from comment #46)
> Should "-std=c99" imply turning off these optimizations?
> 
> Creating calls to, say, strlen is incompatible with the C99 standard and
> perhaps better limited to "-std=gnu-something" or an opt-in f-flag.

How is it incompatible with C99 to create calls to library functions?  I can
think of a two possibilities:

1. If the function implementation plays with errno (allowed in 7.5p3), in a way
that is visible to the code.

2. If the function is called with parameters that may invoke undefined
behaviour (such as calling "strlen" without being sure that the parameter
points to a null-terminated string), where such undefined behaviour is not
already present.

If the user writes code that acts like a call to strlen (let's assume the
implementation knows strlen does not change errno), then the compiler can
replace it with a library call.  Similarly, if the user writes a call to
strlen, then the compiler can replace it with inline code.

As long as there is no difference in the observable behaviour, the
transformation is allowed.

Or am I missing something here?

[Bug c/96284] Outdated C features should be made errors with newer standards

2023-12-01 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96284

--- Comment #16 from David Brown  ---
Thank you for making these changes.  There's always a trade-off between
supporting code that "has always compiled fine and works in testing", and
making it harder for people to write new poor quality code with a high risk of
bugs.  I believe, however, that changes like this help all developers - and
more importantly, help all the end-users by reducing software bug rates, even
if it is just by a very small step.

[Bug c/111769] Annotate function definitions and calls to facilitate link-time checking

2023-10-12 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111769

--- Comment #4 from David Brown  ---
(In reply to Richard Biener from comment #1)
> If you compile with debug info enabled the info should be already there,
> just nothing looks at this (and mismatches) at link time.

Perhaps I should file this as an enhancement request for binutils?  If there is
enough information already generated by gcc, then only the linker part needs to
be implemented.  (It could also be checked by an external program, but that
would be a lot more inconvenient for the user.)

Will the binutils folk be familiar with the debug information and its layout,
or is there a document I can point them to?  I assume this kind of information
will be stable and documented, because gdb will need it.  (I can read through
the material for the details, but a rough pointer to get me started would
help.)

[Bug c/111769] Annotate function definitions and calls to facilitate link-time checking

2023-10-12 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111769

--- Comment #3 from David Brown  ---
(In reply to Andrew Pinski from comment #2)
> IIRC there was a bug about this specific thing which was closed as fixed
> with the use of LTO ...

Certainly if you use LTO, then this is not necessary.  But LTO use is far from
universal, and there can be good reasons not to use it.  (It is rarely seen in
small embedded systems, for various reasons - some good, some less good.)  The
thought here is for a very low cost (for users) enhancement - the check could
be introduced without any change to the users' existing build process, code, or
linker scripts.

[Bug c/111769] New: Annotate function definitions and calls to facilitate link-time checking

2023-10-11 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111769

Bug ID: 111769
   Summary: Annotate function definitions and calls to facilitate
link-time checking
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

This is an enhancement idea, rather than a bug, and would require linker
support as well as compiler support.

In C, it is possible to define a function with one set of parameters (number
and type) in one translation unit, and declare and use it with a different set
in a different translation unit.  This is undefined behaviour, and generally a
really bad idea, but it can't be spotted by the toolchain unless you use LTO.

In C++, you don't see the same effect as different name mangling would result
in linker failures.

Would it be possible for the C compiler, when emitting an external linkage
function definition and the use (or possibly declaration) of a function (for
calling or taking its address), to add a directive or debug info section entry
giving the function type in active use?  These could then be checked at link
time, even without LTO, to catch mismatches.

My suggestion would be to use the existing C++ name-mangling scheme to encode
the function types.  Thus the declaration and definition of "int square(int
num)" would be annotated with a note that the function "square" has signature
"_Z6squarei".

[Bug c++/111399] Bogus -Wreturn-type diagnostic

2023-09-18 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111399

--- Comment #2 from David Brown  ---
Would it be possible to have the "-Wreturn-type" warning pass not issue a
warning immediately, but inject a warning into the code that could then be
removed later by optimisation?

What I mean, is to have something roughly like this :

void __attribute__((warning("Reached return in non-void function")))
__ReturnNonVoid(void);

int sign(int x) {
if (x < 0) return -1;
if (x == 0) return 0;
if (x > 0) return 1;
__ReturnNonVoid();
}


Instead of issuing a warning straight away, if the compiler had added a call to
a non-existent "__ReturnNonVoid" function (just like the sanitizer adds a call
to "__ubsan_handle_missing_return"), this would be optimised away in functions
like this sample and the false positive warning would not be issued.  (Ideally,
of course, it would be suitable GIMPLE, or appropriate internal representation
for that pass, rather than a call to a non-existent function.)

[Bug c/111400] Missing return sanitization only works in C++

2023-09-13 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111400

--- Comment #4 from David Brown  ---
(In reply to Andreas Schwab from comment #3)
> You already have -W[error=]return-type.

Yes, and that is what I normally use - I am a big fan of gcc's static warnings.

Sometimes, however, there are false positives, or perhaps other reasons why the
programmer thinks it is safe to ignore the warning in a particular case.  Then
sanitizers can be a useful run-time fault-finding aid.  There's certainly a lot
of overlap in the kinds of mistakes that can be found with -Wreturn-type and
with -fsanitizer=return-type, but there are still benefits in have both.  (You
have both in C++, just not in C.)

[Bug c/111400] Missing return sanitization only works in C++

2023-09-13 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111400

--- Comment #2 from David Brown  ---
(In reply to Richard Biener from comment #1)
> Confirmed.  Note C17 disallows a return wotihout an expression for a funcion
> that returns a value, not sure if that means falling off the function
> without a return (value) is still OK, it at least feels inconsistent.

This has all remained unchanged from C99 to C23 (draft), I believe, which makes
things easier!

As far as I can tell, the relevant point in the standards is 6.9.1p12,
"Function definitions", which says "Unless otherwise specified, if the } that
terminates a function is reached, and the value of the function call is used by
the caller, the behaviour is undefined".  

So while a non-void function cannot have a return statement without an
expression (6.8.6.4p1), control flow /can/ run off the terminating }.  I think
this is perhaps a concession to older pre-void C code, when a function that
does not have a return value would still be declared to return "int".

Thus I think gcc's lack of a sanitizer here is technically accurate - but not
helpful, unless you are working with 35 year old code!

[Bug c/111400] New: Missing return sanitization only works in C++

2023-09-13 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111400

Bug ID: 111400
   Summary: Missing return sanitization only works in C++
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

With C++ and -fsanitize=return, the code :

int foo(void) { }

generates a call to __ubsan_handle_missing_return.

For C, there is no sanitizer call - just a simple "ret" instruction.

This is, of course, because in C (unlike C++), falling off the end of a
non-void function is legal and defined behaviour, as long as caller code does
not try to use the non-existent return value.  But just like in C++, it is
almost certainly an error in the C code if control flow ever falls off the end
of a non-void function.

Could -fsanitize=return be added to C?  It should not be included by
-fsanitize=undefined in C, since the behaviour is actually allowed, but it
would still be a useful option that could be enabled individually.

[Bug c++/111399] New: Sanitizer code generation smarter than warnings

2023-09-13 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111399

Bug ID: 111399
   Summary: Sanitizer code generation smarter than warnings
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

Given this code :

int sign(int x) {
if (x < 0) return -1;
if (x == 0) return 0;
if (x > 0) return 1;
}


and compiled with "-O2 -Wall", gcc is unable to see that all possible cases for
"x" are covered, so it generates a "control reaches end of non-void function
[-Wreturn-type]" warning.  It would be nice if gcc could see this is a false
positive, but analysis and warnings can't be perfect.

However, if I add the flag "-fsanitize=undefined", the compiler is smart enough
to see that all cases are covered, and there is no call to
__ubsan_handle_missing_return generated.

If the sanitizer code generation can see that all cases are covered, why can't
the -Wreturn-type warning detection?  I'm guessing it comes down to the
ordering of compiler passes and therefore the level of program analysis
information available at that point.  But perhaps the -Wreturn-type pass could
be done later when the information is available?

[Bug c/110249] __builtin_unreachable helps optimisation at -O1 but not at -O2

2023-06-14 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110249

--- Comment #4 from David Brown  ---
Yes, __builtin_assume_aligned is the best way to write things in this
particular case (and optimises fine for -O1 and -O2).  It is also clearer in
the source code (IMHO), as it shows the programmer's intention better.

I am just a little concerned that optimisation hints provided by the programmer
via __builtin_unreachable are getting lost at -O2.  When the compiler knows
that something cannot happen - whether by __builtin_unreachable or by other
code analysis - it can often use that information to generate more efficient
code.  If that information can be used for better code at -O1, but is lost at
-O2, then something is wrong in the way that information is collected or passed
on inside the compiler.  Any case of -O1 generating more efficient code than
-O2 is a sign of a problem or limitation.

So I am not particularly bothered about this one piece of code - I am reporting
the issue because it might be an indication of a wider problem.

[Bug c/110249] __builtin_unreachable helps optimisation at -O1 but not at -O2

2023-06-14 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110249

--- Comment #2 from David Brown  ---
My apologies - it does not optimise the code to a single aligned load at -O1
(at least, not with the combinations I have now tried).  The code was
originally C++, with a reference, which /does/ exhibit this behaviour of having
better code at -O1 than -O2.  I had translated the reference to a pointer to
get more generic code that is valid as C and C++, but I don't see the same
effect with the pointer.

For a reference, the issue is as I first described:

#include 
#include 

uint64_t read64r(const uint64_t ) {
if ((uint64_t)  % 8 ) {
__builtin_unreachable();
}
 uint64_t value;
 memcpy( , , sizeof(uint64_t) );
 return value; 
}

I tested with 64-bit RISC-V and 32-bit ARM using trunk (gcc 13.1, I think) on
godbolt.org.  The only relevant flag was the optimisation level.



[Bug c/110249] New: __builtin_unreachable helps optimisation at -O1 but not at -O2

2023-06-14 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110249

Bug ID: 110249
   Summary: __builtin_unreachable helps optimisation at -O1 but
not at -O2
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

Sometimes it can be useful to use __builtin_unreachable() to give the compiler
hints that can improve optimisation.  For example, it can be used here to tell
the compiler that the parameter is always 8-byte aligned:

#include 
#include 

uint64_t read64(const uint64_t * p) {
if ((uint64_t) p % 8 ) {
__builtin_unreachable();
}
 uint64_t value;
 memcpy( , p, sizeof(uint64_t) );
 return value; 
}

For some targets, such as 32-bit ARM and especially RISC-V, this can make a
difference to the generated code.  In testing, when given -O1 the compiler
takes advantage of the explicit undefined behaviour to see that the pointer is
aligned, and generates a single 64-bit load.  With -O2, however, it seems that
information is lost - perhaps due to earlier optimisation passes - and now slow
unaligned load code is generated.

Ideally, such optimisation information from undefined behaviour (explicit via a
builtin, or implicit via other code) should be kept - -O2 should have at least
as much information as -O1.  An alternative would be the addition of a more
directed "__builtin_assume" function that could be used.

(I know that in this particular case, __builtin_assume_aligned is available and
works exactly as intended at -O1 and -O2, but I think this is a more general
issue than just alignments.)

[Bug libstdc++/109631] New: Simple std::optional types returned on stack, not registers

2023-04-26 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109631

Bug ID: 109631
   Summary: Simple std::optional types returned on stack, not
registers
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

A std::optional is fundamentally like a std::pair, but with nice
access features, type safety, etc.  But for simple functions it is
significantly less efficient, because in gcc it is returned on the stack even
when T fits in a single register.



On targets such as x86-64 and AARCH64, simple structs and pairs that fit in two
registers, are returned in two registers - there is no need for the caller to
reserve stack space and pass a hidden pointer to the callee function.  On clang
with its standard library, this also applies to std::optional for suitable
types T such as "int", but on gcc and its standard C++ library, the stack is
used for returning the std::optional.  Since both clang and gcc follow the
same calling conventions, this must (I believe) be a result of different
implementations of std::optional<> in the C++ libraries.

(I note that code for std::pair is more efficient here than using
std::pair.  But perhaps the details of std::optional<> require that
the contained element comes first.)



#include 
#include 

using A = std::optional;
using B = std::pair;
using C = std::pair;

A foo_A(int x) {
return x;
}

B foo_b(int x) {
B y { x, true };
return y;
}

C foo_c(int x) {
C y { true, x };
return y;
}

[Bug target/105523] Wrong warning array subscript [0] is outside array bounds

2023-04-25 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105523

--- Comment #25 from David Brown  ---
(In reply to Andrew Pinski from comment #24)
> (In reply to LIU Hao from comment #22)
> > Yes, GCC should be told to shut up about dereferencing artificial address
> > values.
> 
> NO.
> Take:
> ```
> static inline int f(int *a)
> {
>   return a[10];
> }
> 
> int g()
> {
>   return f(0);
> }
> ```
> The warning is there for the above case really (and similar ones with struct
> offsets). Where you originally have a null pointer and have an offset from
> there; by the time the warning happens, the IR does not know if it was
> originally from an offset of a null pointer or if the value was written in.
> The paramater is there to "tune" the heurstic to figure out if it is null
> pointer deference or if it is some real (HW) address. Maybe
> -fno-delete-null-pointer-checks should imply --param=min-pagesize=0, though
> some folks want the warning indepdent of trying to delete null pointer
> checks.

It is worth noting, I think, that although on a target like the AVR (and most
embedded systems without an MMU) the address 0 is a real part of memory, and
can really be read and/or written, any code that tries to dereference a 0
pointer is almost always wrong.  I don't want gcc to consider 0 as an
acceptable address on these targets - I want it to warn me if it sees a null
pointer dereference.  If I really want to target address 0, as I might
occasionally do, I'll use a pointer to volatile - /then/ I'd like gcc to
believe me without question.

I don't know if every embedded developer feels the same way.  (Georg-Johann
could chime in with his opinion.)

[Bug target/105523] Wrong warning array subscript [0] is outside array bounds

2023-04-25 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105523

--- Comment #23 from David Brown  ---
(In reply to LIU Hao from comment #22)
> Yes, GCC should be told to shut up about dereferencing artificial address
> values.

One possibility is to have the warnings disabled whenever you are using a
volatile pointer.  After all, "volatile" is an indication that the compiler
doesn't know everything, and it has to trust the programmer.

[Bug target/105523] Wrong warning array subscript [0] is outside array bounds

2023-04-25 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105523

--- Comment #21 from David Brown  ---
(In reply to Andrew Pinski from comment #1)
> --param=min-pagesize= should be set to 0 for avr as zero is a valid address.

Is there any convenient description of "min-pagesize" ?  The user manual is
unhelpful: "Minimum page size for warning purposes."  Your comment here
suggests it is connected to whether zero is a valid address or not, which does
not sound at all related to the issue here.  (But you are correct that for the
AVR, and many embedded systems, accessing memory at address zero is valid.)

Testing (on godbolt) shows that setting "--param=min-pagesize=0" does stop this
warning triggering.  But this is an issue that affects all systems for which
the user might want to convert a fixed value to an address - i.e., all embedded
systems, and perhaps device drivers or other low-level code on bigger systems. 
Is setting "min-pagesize" to 0 actually a fix, or is it a workaround, and will
it have other effects on static warning checks and/or code generation?

A workaround is, of course, helpful - especially since this can be added to
peoples CFLAGS list.

[Bug target/105523] Wrong warning array subscript [0] is outside array bounds

2023-04-25 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105523

--- Comment #20 from David Brown  ---
(In reply to Georg-Johann Lay from comment #19)
> Created attachment 54912 [details]
> pr105532.diff: Proposed patch for the AVR backend
> 
> Here is a proposed, untested patch.
> 
> gcc/
>   PR target/105532
>   * config/avr/avr.cc (opts.h): Include it.
>   (avr_option_override): Set --param_min_pagesize=0.
> 
> gcc/testsuite/
>   PR target/105532
>   * gcc.target/avr/torture/pr105532.c: New test.

This is not an AVR backend issue - it is much wider than that.  It is perhaps
reasonable to test a patch just on the AVR, but this needs to be fixed in the
core of gcc.

[Bug target/105523] Wrong warning array subscript [0] is outside array bounds

2023-04-24 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105523

David Brown  changed:

   What|Removed |Added

 CC||david at westcontrol dot com

--- Comment #18 from David Brown  ---
This issue does not appear to be related to any particular backend.

Changing the initial code to remove the volatile :

void m(void) {
 uint8_t * ptr2 = ( uint8_t*)(0x0030);
*ptr2 = 0xd8;
}

gives a warning in gcc 11 as well - "warning: writing 1 byte into a region of
size 0".  (gcc 12 and 13 give the "array subscript [0] is outside array bounds"
warning.)


It is standard practice in embedded development to cast an integer value (often
a compile-time constant, but not always) to a pointer and use that to access
memory-mapped registers or other data at fixed addresses.  Typically the
pointers are pointer-to-volatile.  Sometimes the data type in question is a
"uintN_t" type, sometimes it is a struct covering a range of registers.  There
are other ways to specify fixed addresses, such as using "extern" data that is
defined in a linker file or an assembly file - but I have not seen other
methods used much in recent decades.

The compiler knows nothing about the size of the region pointed to here.  It is
no more appropriate for it to guess the size is 0 than to guess it is 42.  It
should either assume the size is one single object of the named type, or an
array of unknown size of the named type.  (I'd prefer the first, but the second
is also a reasonable choice.)

As an embedded developer, I don't want to have to turn off useful warnings to
avoid false positives on common and essential constructs.

[Bug c/82922] Request: add -Wstrict-prototypes to -Wextra as K style is obsolescent

2022-11-08 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82922

--- Comment #9 from David Brown  ---
Could -Wstrict-prototypes be added to -Wall, or even considered enabling by
default?  The next C standard will make "void foo()" mean the same as "void
foo(void)", like in C++, which makes the scope for confusion high.

[Bug ipa/101279] Function attributes often block inlining

2022-06-28 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101279

--- Comment #7 from David Brown  ---
(In reply to rguent...@suse.de from comment #6)
> > Am 28.06.2022 um 14:53 schrieb david at westcontrol dot com 
> > :
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101279
> > 
> > --- Comment #5 from David Brown  ---
> > (In reply to Richard Biener from comment #4)
> >> (In reply to frankhb1989 from comment #3)
> >>> There is a more specific instance here: can_inline_edge_by_limits_p in
> >>> ipa-inline.cc treats flags and "optimize" attributes differently.
> >> 
> >> A bit up there's a blacklist we maintain where inlining is not OK because 
> >> it
> >> results in semantic differences.
> >> 
> >> Generally we it's hard to second-guess the users intention when looking
> >> at an inline edge with different optimization settings of caller and 
> >> callee.
> >> For C++ comdats there might be even multiple variants with different
> >> optimization level (but we only keep one, special-casing this a bit).
> > 
> > I appreciate the "err on the side of caution" attitude.  Perhaps there 
> > could be
> > an extra "I know what I'm doing" attribute that lets you override the
> > blacklisting in a particular case.  This would only really make sense in 
> > cases
> > where the attribute can be attached to the expressions and statements within
> > the function (I think "-fwrapv" would be in this category).  In cases where
> > this is not possible, an error or warning message would be in order as the
> > compiler can't do what the programmer is asking.
> 
> Can you provide a specific example that you would allow this way?
> 
> 

I'd go back to my original example :


  __attribute__((optimize("-fwrapv")))
  static inline int wrapped_add(int a, int b) {
  return a + b;
  }

  int foo(int x, int y, int z) {
  return wrapped_add(wrapped_add(x, y), z);
  }

If you want to disable inlining of "wrapped_add" due to the change in the
semantics of integer arithmetic, that's fair enough.  But if I /really/ want
inlining, and write something like :

  __attribute__((optimize("-fwrapv"), always_inline))
  static inline int wrapped_add(int a, int b) {
  return a + b;
  }

then I want the code treated as :

  return (__wrapping_int) a + (__wrapping_int) b;

or

  return __wrapping_add_int(a, b);

If the way gcc marks and handles "-fwrapv" arithmetic does not support
something like that, which would allow inline mixing with other code, then that
would result in a compiler warning or error message.


It might be best to have a new attribute name here rather than using
"always_inline" - I have not thought through the consequences.

It might also be that if the compiler can be changed to support inlining of a
given optimisation attribute, then the attribute in question can be whitelisted
for inlining for everyone.  I suppose what I am saying is that if the compiler
can't be sure that inlining is safe, then it could be cautious by default while
letting the programmer override that caution explicitly.

[Bug ipa/101279] Function attributes often block inlining

2022-06-28 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101279

--- Comment #5 from David Brown  ---
(In reply to Richard Biener from comment #4)
> (In reply to frankhb1989 from comment #3)
> > There is a more specific instance here: can_inline_edge_by_limits_p in
> > ipa-inline.cc treats flags and "optimize" attributes differently.
> 
> A bit up there's a blacklist we maintain where inlining is not OK because it
> results in semantic differences.
> 
> Generally we it's hard to second-guess the users intention when looking
> at an inline edge with different optimization settings of caller and callee.
> For C++ comdats there might be even multiple variants with different
> optimization level (but we only keep one, special-casing this a bit).

I appreciate the "err on the side of caution" attitude.  Perhaps there could be
an extra "I know what I'm doing" attribute that lets you override the
blacklisting in a particular case.  This would only really make sense in cases
where the attribute can be attached to the expressions and statements within
the function (I think "-fwrapv" would be in this category).  In cases where
this is not possible, an error or warning message would be in order as the
compiler can't do what the programmer is asking.

[Bug c/105343] New: Inefficient initialisation in some kinds of structs

2022-04-22 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105343

Bug ID: 105343
   Summary: Inefficient initialisation in some kinds of structs
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

struct S { int a[1000]; };
struct X { struct S s; int b[2];};

extern int foobar(struct X * p);

int foo(struct S *s)
{
struct X x = { *s };
return foobar();
}


When the size of the array "a" is small enough that the compiler does the
initialisation inline, the code is fine.  With a bigger array it uses memset
and memcpy, either as calls to external functions or inline loops depending on
details of the version of gcc and the target.  (This too is appropriate.)

However, it does that by turning the code into the equivalent of :

memset(, 0, sizeof(struct X));
memcpy(, s, sizeof(struct S));

It /should/ be doing :

memset(, 0, sizeof(struct X.b));
memcpy(, s, sizeof(struct S));

In other words, it is first zeroing out the entire X structure, then copying
from *s into the structure.  Only the extra part of X, the array "b", needs to
be zero'ed.

[Bug c/103660] New: Sub-optimal code with relational operators

2021-12-11 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103660

Bug ID: 103660
   Summary: Sub-optimal code with relational operators
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

I recently looked at some of gcc's "if-conversions" and other optimisations of
expressions involving relational operators - something people might use when
trying to write branchless code.  I know that it is often best to write in the
clearest way, including branches, and let the compiler handle the optimisation.
 But people do try to do this kind of thing by hand.

I tested 6 examples of ways to write a simple "min" function:


int min1(int a, int b) {
if (a < b) return a; else return b;
}

int min2(int a, int b) {
return (a < b) ? a : b;
}

int min3(int a, int b) {
return (a < b) * a | (a >= b) * b;
}

int min4(int a, int b) {
return (a < b) * a + (a >= b) * b;
}

int min5(int a, int b) {
const int c = a < b;
return c * a + (1 - c) * b;
}

int min6(int a, int b) {
const bool c = a < b;
return c * a + !c * b;
}


gcc happily optimises the first two versions.  For the next two, it uses
conditional moves for each half of the expression, then combines them with "or"
or "add".  For version 5, it generates two multiply instructions, and version 6
is even worse in trunk (gcc 12).  This last one is a regression - gcc 11
generates the same code for version 6 as for version 4 (not optimal, but not as
bad).

For comparison, clang 5+ generates optimal code for all versions.  I have tried
a number of different targets (godbolt is wonderful for this stuff), with
similar results.

[Bug c/102427] -Woverflow only works in constant expressions

2021-09-21 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102427

--- Comment #2 from David Brown  ---
Runtime detection is good - compile-time detection is much, much better when it
is possible.  (IMHO, of course.)

[Bug c/102427] New: -Woverflow only works in constant expressions

2021-09-21 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102427

Bug ID: 102427
   Summary: -Woverflow only works in constant expressions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

The "-Woverflow" warning only works with constant expressions.  Should it not
also work in cases where the compiler knows the value at compile time, even if
it is not a constant expression?

int foo(void) {
const int a = 534567891;
int b = a * 6;
return b;
} 

(With gcc -O2 -Wall -Wextra)

When "a" is declared "const", the compiler warns about the overflow.  If the
"const" is removed, the compiler still does the same calculations at compile
time, but does not give a warning even though it knows there was a signed
integer overflow.

Could "-Woverflow" be changed to warn here, when there is clearly undefined
behaviour?

[Bug c/101279] New: Function attributes often block inlining

2021-07-01 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101279

Bug ID: 101279
   Summary: Function attributes often block inlining
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

When using many function attributes, the attributes only apply when the
function is not inlined.  And in some cases, they actively disable inlining
which would normally be expected.

For example, a programmer may have occasional need for two's complement
wrapping semantics for signed integers.  They don't want to use "-fwrapv"
globally, because that would lose some optimisation opportunities for the rest
of the code.  So they define functions like "wrapped_add" :

  __attribute__((optimize("-fwrapv")))
  static inline int wrapped_add(int a, int b) {
  return a + b;
  }

  int foo(int x, int y, int z) {
  return wrapped_add(wrapped_add(x, y), z);
  }

When used in a function like "foo" which does not have "-fwrapv" active,
function calls are made rather than the expected inlining.  The result is
significantly poorer code than the programmer would predict.

If "foo" also has "-fwrapv" from a compiler flag, attribute or pragma, the
generated code is inlined nicely.


The same issues apply to a range of function attributes.  (Obviously some, such
as "section", make no sense to inline to functions that don't share the same
attribute.)  For some attributes, such as "access", inlining disables the
attribute rather than the attribute disabling inlining.

Steadily more code is inlined in modern programming - helped by LTO.  Often the
code works the same whether code is inlined or not, with only minor differences
in speed.  But in template-heavy C++ code, inlining is essential as you have
large numbers of functions that generate very little (if any) code - failure to
inline as expected can have severe efficiency effects.  Conversely, if inlining
disables checking attributes like "access" or "nonnull" the result is a false
sense of security in the programmer.


I think it is important for the manual to be updated to reflect these
limitations as a first step.

Then of course it would be best to remove the limitations!

Ref. mailing list conversation
 and Martin Sebor's
blog article
.

(Thanks to Martin for his article and discussion that made me aware of these
points.)

[Bug c/100890] New: The "div" standard library functions could be optimised as builtins

2021-06-03 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100890

Bug ID: 100890
   Summary: The "div" standard library functions could be
optimised as builtins
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

The C standard library has a family of functions for calculating the division
and remainder as a pair - "div", "ldiv", "lldiv", "imaxdiv".  These could be
optimised by the compiler using inline code, in the same manner as many other
library functions such as the "abs" family (listed under "Other builtins" in
the manual).

For example, "div" can be implemented as:

div_t div(int a, int b) {
div_t d;
d.quot = a / b;
d.rem = a % b;
return d;
}

Inlining this in place of a call to "div" results in code that is smaller and
faster on most targets, as well as providing far more optimisation
opportunities (constant propagation, re-arranging with other code, etc.).

[Bug c/100702] Strict overflow warning regression in gcc 8 onwards

2021-05-20 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100702

--- Comment #2 from David Brown  ---
Runtime diagnostics can be very useful - but they are a different kind of
warning.  In particular, they only show what errors have occurred during your
testing - they don't show what errors /might/ occur.

There is a general rule that the earlier you find your problems, the cheaper,
faster and easier they are to handle.  Compile-time checks are better than
run-time checks when all else is equal.  Clearly compile-time checks have more
risk of false-positives - that's why it's important that most are optional or
can be disabled by pragmas or particular code constructs.  But when they /can/
be used, they are preferable.

In this particular example, there is no doubt of the undefined behaviour and
the infinite loop.  The compiler knows about them.  It would not be a false
positive to issue a warning in such cases.


Another limitation of runtime diagnostics is their use in embedded systems. 
gcc is heavily used in microcontrollers, where you often do not have much in
the way of a console output, no file system, etc.  Runtime diagnostic
opportunities are far more limited in such cases.


I fully appreciate that compile-time warnings are difficult, especially
avoiding false positives, and if you say that a warning here would be too niche
or too difficult to justify the effort, I am happy to accept that.  But
run-time diagnostics are not a replacement - they are an additional and
complementary tool.  Please do not consider sanitizers as a substitute for
static analysis and compile-time error checking and warnings.

[Bug c/100702] New: Strict overflow warning regression in gcc 8 onwards

2021-05-20 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100702

Bug ID: 100702
   Summary: Strict overflow warning regression in gcc 8 onwards
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

Incorrect code like this example relies on two's complement wrapping to do what
the programmer wanted:

void foo(int *a)
{
  int i;
  for (i=0; i

[Bug target/98884] Implement empty struct optimisations on ARM

2021-02-01 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

--- Comment #6 from David Brown  ---
(In reply to Jakub Jelinek from comment #3)
> If GCC and Clang are ABI incompatible on this, then one of the two compilers
> is buggy.  So, it is needed to look at the EABI and find out which case it
> is.

I've had a look at the ARM C++ ABI, to the best of my abilities:



Section 4.1 has this to say:

GC++ABI §2.27POD Data Types

The GC++ABI defines the way in which empty class types are laid out.  For the
purposes of parameter passing in [AAPCS], a parameter whose type is an empty
class shall be treated as if its type were an aggregate with a single member of
type unsigned byte.

Note: Of course, the single member has undefined content.



(This references )


If my reading is correct, then gcc is correct and clang is wrong here - empty
classes are treated as containing a single unsigned byte, and then expanded to
a 32-bit type before passing.  (There is still no need to put a zero in these
parameters, as the value is unspecified.)

It may be that the x86 gcc port is wrong here, but I haven't looked at the
details of x86 calling conventions.


I hope someone can check this out, and a perhaps file a bug report for clang so
that they can correct it.  (Alternatively, file a bug report with ARM so that
they can change the ABI!)


However, in this particular case, if clang is wrong then I don't want to be
right.  I can see no benefit, and significant cost, in passing zeros for these
empty tag structs.  I'd be quite happy with an explicitly non-conforming switch
to enable such optimisations (just like "-fshort-enums" or other switches that
mess with caller and callee registers).  Or I'd be even happier to find that
clang is wrong and gcc ARM gets optimised without a flag :-)

[Bug target/98884] Implement empty struct optimisations on ARM

2021-02-01 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

--- Comment #5 from David Brown  ---
(In reply to Jakub Jelinek from comment #4)
> Note, for ABI compatibility or incompatibility it might be better to check
> what happens when some argument is passed after the empty structs.  Because
> at least in some ABIs one could get away with just pretending the stack
> slots (or registers) are there even when they aren't actually allocated on
> the stack, but one would need to have guarantees the callee e.g. will never
> modify those stack slots (in most ABIs the call argument stack slot are
> owned by the callee).

Good point.  I tried with:

void needs_tags(int x, Tag1 t1, Tag2 t2, Tag3 t3, Tag4 t4, Tag5 t5, int y);

and

needs_tags(12345, t1, t2, t3, t4, t5, 200);

gcc (trunk - gcc 11) on x86 puts 12345 in edi and 200 in esi, just as if the
empty tags didn't exist.

So does clang for the ARM (putting them in r0 and r1 respectively).

gcc 9 for the ARM generates code as though t1 .. t5 were type "int" and value
0.

[Bug target/98884] Implement empty struct optimisations on ARM

2021-01-29 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

--- Comment #2 from David Brown  ---
Yes, ABI issues were my initial thought too.  If so, then optimising away the
assignments while leaving the stack manipulation (and possibly register
allocations) in place would still be a significant improvement.

However, I note that clang has no problem with generating ideal code here for
the ARM - it is not bothered by the ABI.  There could be several reasons for
that.  Perhaps the clang folk got the ABI wrong and the optimisation is not
valid for the ARM EABI.  Maybe the EABI used on the "arm (none)" target doesn't
specify these details, meaning the optimisation is valid there even if it is
not valid on "arm (linux)" targets.  I don't know the details of the ABIs well
enough to answer.

If it is an ABI issue, then I'd be quite happy with an ARM-specific flag to
enable an variation on the ABI that lets the compiler skip empty types
entirely.  When compiling for the Cortex-M devices, you rarely link much to
pre-compiled code (other than the C library) and it's usually fine to break
standards a bit to get more optimal code (like using "-ffast-math").

It would of course be best to have a general solution that works for all ARM
users (and ideally other targets too) without needing a flag.

[Bug c++/98884] New: Implement empty struct optimisations on ARM

2021-01-29 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

Bug ID: 98884
   Summary: Implement empty struct optimisations on ARM
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

Empty "tag" structs (or classes) are useful for strong typing, function
options, and so on.  The rules of C++ require these to have a non-zero size (so
that addresses of different instances are valid and distinct), but they contain
no significant data.  Ideally, therefore, the compiler will not generate code
that sets values or copies values when passing around such types. 
Unfortunately, that is not quite the case.

Consider these two examples, with foo1 creating a tag type, and foo2 passing it
on:

struct Tag {
friend Tag make_tag();
private:
Tag() {}
};

Tag make_tag() { 
return Tag{}; 
};

void needs_tag(Tag);

void foo1(void) {
Tag t = make_tag();
needs_tag(t);
}


struct Tag1 {};
struct Tag2 {};
struct Tag3 {};
struct Tag4 {};
struct Tag5 {};

void needs_tags(int x, Tag1 t1, Tag2 t2, Tag3 t3, Tag4 t4, Tag5 t5);

void foo2(Tag1 t1, Tag2 t2, Tag3 t3, Tag4 t4, Tag5 t5)
{
needs_tags(12345, t1, t2, t3, t4, t5);
}


(Here is a godbolt link for convenience: )

On x86, since gcc 8, this has been quite efficient (this is all with -O2):

make_tag():
xor eax, eax
ret
foo1():
jmp needs_tag(Tag)
foo2(Tag1, Tag2, Tag3, Tag4, Tag5):
mov edi, 12345
jmp needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5)

The contents of the tag instances are basically ignored.  The exception is on
"make_tag", where the return is given the value 0 unnecessarily.

But on ARM it is a different matter.  This is for the Cortex-M4:


make_tag():
mov r0, #0
bx  lr
foo1():
mov r0, #0
b   needs_tag(Tag)
foo2(Tag1, Tag2, Tag3, Tag4, Tag5):
push{lr}
sub sp, sp, #12
mov r2, #0
mov r3, r2
strbr2, [sp, #4]
strbr2, [sp]
mov r1, r2
movwr0, #12345
bl  needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5)
add sp, sp, #12
ldr pc, [sp], #4

The needless register and stack allocations, initialisations and copying mean
that this technique has a significant overhead for something that should really
"disappear in the compilation".

The x86 port manages this well.  Is it possible to get such optimisations into
the ARM port too?


Oh, and for comparison, clang with the same options (-std=c++17 -Wall -Wextra
-O2 -mcpu=cortex-m4) gives:

make_tag():
bx  lr
foo1():
b   needs_tag(Tag)
foo2(Tag1, Tag2, Tag3, Tag4, Tag5):
movwr0, #12345
b   needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5)

[Bug c/98313] New: Allow __attribute__((cleanup)) in types

2020-12-16 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98313

Bug ID: 98313
   Summary: Allow __attribute__((cleanup)) in types
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

The gcc __attribute__((cleanup)) feature can be useful for making the
equivalent of a C++ destructor in C code, as a way of ensuring resources are
freed even if you exit a function early with return.  Typical use-cases are
malloc'ed memory and locks.

In many situations, attaching the cleanup attribute to a type would give neater
code with fewer chances of error.  For example:

void indirect_free(void ** p)  {
free(*p);
*p = 0;
}

extern void do_something(int * p);

void foo(void) {
void * p = malloc(16);
do_something(p);
free(p);
}

void foo2(void) {
__attribute__((cleanup(indirect_free)))
void * p = malloc(16);
do_something(p);
}

Here, the cleanup attribute in "foo2" means you don't have to remember to free
p after calling do_something() - and if the function has multiple allocations,
this results in neater code with lower risk of bugs than a "traditional"
goto-based error handling.  But you still have to remember to put the cleanup
attribute on each relevant pointer.

If instead you had:

typedef __attribute__((cleanup(indirect_free))) void* auto_voidp;

auto_voidp auto_malloc(size_t size) {
return malloc(size);
}


then your code could be:

void foo3(void) {
void * p = auto_malloc(16);
do_something(p);
}


It would not remove all possibilities of error - copying an auto_voidp here
could easily lead to double-free errors.  (But this is C, not C++ - we need to
leave /some/ fun for the people debugging the code...)


Basically, all I'd like here is that if you attach a cleanup attribute to a
type (struct, typedef, etc.), then the attribute is automatically applied to
any definitions of variables of that type.